Message ID | 20230411143702.64495-1-jlayton@kernel.org |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2631154vqo; Tue, 11 Apr 2023 07:47:44 -0700 (PDT) X-Google-Smtp-Source: AKy350ZtHxDM2AMYmivl0/7DFAzdhrvgcX1s+FW/N1LpBSs6f/pvF/Kw+QDn07ABDsjhqHhJdk7G X-Received: by 2002:a05:6a20:5481:b0:bf:58d1:cea0 with SMTP id i1-20020a056a20548100b000bf58d1cea0mr17372097pzk.31.1681224464637; Tue, 11 Apr 2023 07:47:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681224464; cv=none; d=google.com; s=arc-20160816; b=cD5hlKgt6SL8a28XUbpt7YWnYpVh2eAJRnJza17cFA1KivgrnmaN7gJWwswyPFByEZ GLB0UZC9ZAmhU7GZBKW9fzu+6V/DEqByFaCDf57Cs882w+EilRFBMlARb8Hd2WqAr56K Ra5w4W41cZwbh5DpmE4XKkDvWJ6T+cjo2DfN80VbHY7DIW8+eEfwQ4doZ4ipjboswAi7 cLYsP5QNjhQhLBkdQrv3NuWa4dkDxgq+d+BlkPo2xBDaZJdnvPRjj6kNay6VevMs85lf EyVAM2OGeAHvZmLkJvUg8XgAcWjCsG7sx6zCBBSYDVY2LIcX2kOOgu0+PFdv7OjKVGJR N0Lg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=pwIqzlLgPxL9uHO3cMjsOKzk3pP9dtOVL/SY92iHXpU=; b=ytPa/Z9xNsefNN97mrMn058F9JBAdStLQmgINwavbgDbFD+9TJlE/+W+9/Zc8VXuMh 6RWs1aeVzojWSeQsaZHVFmMaOB6CxN87AG4DcQkJCF+cu7vDIebnnrQL3R66JnWR/3oa XyhbCIeidcUjg9lNU3s4x9ge0r0eZCoaOGa9KSKtwuVxDpmafFTIgq5dY3HyipK/Ud3D 0aT8PQknmJzd3Z1sRFfo+eQYu8EWk9snZtv3xosfJJ88bUy5948ZuvIaJHjC2QIU4LaR G/iPqFgUQDFsE6HqU67T7wROHMEjJ5xM6NvD/aP01hw8/dPQ9p7/8CBcoCGXS9OFahHx K2WQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=t4qT7x6t; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r11-20020a635d0b000000b0050b3083b64fsi13067195pgb.429.2023.04.11.07.47.31; Tue, 11 Apr 2023 07:47:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=t4qT7x6t; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230223AbjDKOhM (ORCPT <rfc822;leviz.kernel.dev@gmail.com> + 99 others); Tue, 11 Apr 2023 10:37:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53104 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230163AbjDKOhI (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 11 Apr 2023 10:37:08 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D69246AF; Tue, 11 Apr 2023 07:37:06 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 02937627A7; Tue, 11 Apr 2023 14:37:06 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 41813C433D2; Tue, 11 Apr 2023 14:37:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1681223825; bh=IRCClithvgPTE04hKaJiM2EAL0BbX3ippre0W+x0FD4=; h=From:To:Cc:Subject:Date:From; b=t4qT7x6t2LTb4FTbywjpwvf4EZgxgoZDbquly4oCmuXuqeQHVn8s3QnZNlI8SwIt+ pjjvDJdWxP73tNqCgdJkKkph2o3QfeFww7QhfVDKoKt92Cgbu3ROm7x5RtpjebsiJT TTVr2x9eaw3xhwKJuT4FJbUWsqcNUK3AVcl7oSJoyNRESlqJvFls5gMWhdOXNdU1aa y/xNNgsCceoYVMtskUS+ytTRKik/3kWiXiYxZDIjWZLGkVgvXnr6g6zhIXyb2LkX3p dPVQMmiI2WSWyRm6wI+wARvMxps10ptSmgDmffgC65ZOuOJqIqsxQmI5ED1yYIPJA/ wYw7FCMju4k6Q== From: Jeff Layton <jlayton@kernel.org> To: Alexander Viro <viro@zeniv.linux.org.uk>, Christian Brauner <brauner@kernel.org>, "Darrick J. Wong" <djwong@kernel.org>, Hugh Dickins <hughd@google.com>, Andrew Morton <akpm@linux-foundation.org>, Dave Chinner <david@fromorbit.com>, Chuck Lever <chuck.lever@oracle.com> Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-mm@kvack.org, linux-nfs@vger.kernel.org Subject: [RFC PATCH 0/3][RESEND] fs: opportunistic high-res file timestamps Date: Tue, 11 Apr 2023 10:36:59 -0400 Message-Id: <20230411143702.64495-1-jlayton@kernel.org> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.5 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762890851728667346?= X-GMAIL-MSGID: =?utf-8?q?1762891624468341252?= |
Series |
fs: opportunistic high-res file timestamps
|
|
Message
Jeff Layton
April 11, 2023, 2:36 p.m. UTC
(Apologies for the resend, but I didn't send this with a wide enough distribution list originally). A few weeks ago, during one of the discussions around i_version, Dave Chinner wrote this: "You've missed the part where I suggested lifting the "nfsd sampled i_version" state into an inode state flag rather than hiding it in the i_version field. At that point, we could optimise away the secondary ctime updates just like you are proposing we do with the i_version updates. Further, we could also use that state it to decide whether we need to use high resolution timestamps when recording ctime updates - if the nfsd has not sampled the ctime/i_version, we don't need high res timestamps to be recorded for ctime...." While I don't think we can practically optimize away ctime updates like we do with i_version, I do like the idea of using this scheme to indicate when we need to use a high-res timestamp. This patchset is a first stab at a scheme to do this. It declares a new i_state flag for this purpose and adds two new vfs-layer functions to implement conditional high-res timestamp fetching. It then converts both tmpfs and xfs to use it. This seems to behave fine under xfstests, but I haven't yet done any performance testing with it. I wouldn't expect it to create huge regressions though since we're only grabbing high res timestamps after each query. I like this scheme because we can potentially convert any filesystem to use it. No special storage requirements like with i_version field. I think it'd potentially improve NFS cache coherency with a whole swath of exportable filesystems, and helps out NFSv3 too. This is really just a proof-of-concept. There are a number of things we could change: 1/ We could use the top bit in the tv_sec field as the flag. That'd give us different flags for ctime and mtime. We also wouldn't need to use a spinlock. 2/ We could probably optimize away the high-res timestamp fetch in more cases. Basically, always do a coarse-grained ts fetch and only fetch the high-res ts when the QUERIED flag is set and the existing time hasn't changed. If this approach looks reasonable, I'll plan to start working on converting more filesystems. One thing I'm not clear on is how widely available high res timestamps are. Is this something we need to gate on particular CONFIG_* options? Thoughts? Jeff Layton (3): fs: add infrastructure for opportunistic high-res ctime/mtime updates shmem: mark for high-res timestamps on next update after getattr xfs: mark the inode for high-res timestamp update in getattr fs/inode.c | 40 +++++++++++++++++++++++++++++++-- fs/stat.c | 10 +++++++++ fs/xfs/libxfs/xfs_trans_inode.c | 2 +- fs/xfs/xfs_acl.c | 2 +- fs/xfs/xfs_inode.c | 2 +- fs/xfs/xfs_iops.c | 15 ++++++++++--- include/linux/fs.h | 5 ++++- mm/shmem.c | 23 ++++++++++--------- 8 files changed, 80 insertions(+), 19 deletions(-)
Comments
On Tue, Apr 11, 2023 at 10:36:59AM -0400, Jeff Layton wrote: > (Apologies for the resend, but I didn't send this with a wide enough > distribution list originally). > > A few weeks ago, during one of the discussions around i_version, Dave > Chinner wrote this: > > "You've missed the part where I suggested lifting the "nfsd sampled > i_version" state into an inode state flag rather than hiding it in > the i_version field. At that point, we could optimise away the > secondary ctime updates just like you are proposing we do with the > i_version updates. Further, we could also use that state it to > decide whether we need to use high resolution timestamps when > recording ctime updates - if the nfsd has not sampled the > ctime/i_version, we don't need high res timestamps to be recorded > for ctime...." > > While I don't think we can practically optimize away ctime updates > like we do with i_version, I do like the idea of using this scheme to > indicate when we need to use a high-res timestamp. > > This patchset is a first stab at a scheme to do this. It declares a new > i_state flag for this purpose and adds two new vfs-layer functions to > implement conditional high-res timestamp fetching. It then converts both > tmpfs and xfs to use it. > > This seems to behave fine under xfstests, but I haven't yet done > any performance testing with it. I wouldn't expect it to create huge > regressions though since we're only grabbing high res timestamps after > each query. > > I like this scheme because we can potentially convert any filesystem to > use it. No special storage requirements like with i_version field. I > think it'd potentially improve NFS cache coherency with a whole swath of > exportable filesystems, and helps out NFSv3 too. > > This is really just a proof-of-concept. There are a number of things we > could change: > > 1/ We could use the top bit in the tv_sec field as the flag. That'd give > us different flags for ctime and mtime. We also wouldn't need to use > a spinlock. > > 2/ We could probably optimize away the high-res timestamp fetch in more > cases. Basically, always do a coarse-grained ts fetch and only fetch > the high-res ts when the QUERIED flag is set and the existing time > hasn't changed. > > If this approach looks reasonable, I'll plan to start working on > converting more filesystems. Seems reasonable to me. In terms of testing, I suspect the main impact is going to be the additionaly overhead of taking a spinlock in normal stat calls. In which case, testing common tools like giti would be useful. e.g. `git status` runs about 170k stat calls on a typical kernel tree. If anything is going to be noticed by users that actually care, it'll be workloads like this... If we manage to elide the spinlock altogether, then I don't think we're going to be able to measure any sort perf difference on modern hardware short of high end NFS benchmarks that drive servers to their CPU usage limits.... > One thing I'm not clear on is how widely available high res timestamps > are. Is this something we need to gate on particular CONFIG_* options? Don't think so - the kernel should always provide the highest resoultion it can through the get_time interfaces - the _coarse variants simple return what was read from the high res timer at the last scheduler tick, hence avoiding the hardware timer overhead when high res timer resolution is not needed..... Cheers, Dave.
On Tue, Apr 11, 2023 at 5:38 PM Jeff Layton <jlayton@kernel.org> wrote: > > (Apologies for the resend, but I didn't send this with a wide enough > distribution list originally). > > A few weeks ago, during one of the discussions around i_version, Dave > Chinner wrote this: > > "You've missed the part where I suggested lifting the "nfsd sampled > i_version" state into an inode state flag rather than hiding it in > the i_version field. At that point, we could optimise away the > secondary ctime updates just like you are proposing we do with the > i_version updates. Further, we could also use that state it to > decide whether we need to use high resolution timestamps when > recording ctime updates - if the nfsd has not sampled the > ctime/i_version, we don't need high res timestamps to be recorded > for ctime...." > > While I don't think we can practically optimize away ctime updates > like we do with i_version, I do like the idea of using this scheme to > indicate when we need to use a high-res timestamp. > > This patchset is a first stab at a scheme to do this. It declares a new > i_state flag for this purpose and adds two new vfs-layer functions to > implement conditional high-res timestamp fetching. It then converts both > tmpfs and xfs to use it. > > This seems to behave fine under xfstests, but I haven't yet done > any performance testing with it. I wouldn't expect it to create huge > regressions though since we're only grabbing high res timestamps after > each query. > > I like this scheme because we can potentially convert any filesystem to > use it. No special storage requirements like with i_version field. I > think it'd potentially improve NFS cache coherency with a whole swath of > exportable filesystems, and helps out NFSv3 too. > > This is really just a proof-of-concept. There are a number of things we > could change: > > 1/ We could use the top bit in the tv_sec field as the flag. That'd give > us different flags for ctime and mtime. We also wouldn't need to use > a spinlock. > > 2/ We could probably optimize away the high-res timestamp fetch in more > cases. Basically, always do a coarse-grained ts fetch and only fetch > the high-res ts when the QUERIED flag is set and the existing time > hasn't changed. > > If this approach looks reasonable, I'll plan to start working on > converting more filesystems. > > One thing I'm not clear on is how widely available high res timestamps > are. Is this something we need to gate on particular CONFIG_* options? > > Thoughts? Jeff, Considering that this proposal is pretty uncontroversial, do you still want to discuss/lead a session on i_version changes in LSF/MM? I noticed that Chuck listed "timespamt resolution and i_version" as part of his NFSD BoF topic proposal [1], but I do not think all of these topics can fit in one 30 minute session. Dave, I would like to use this opportunity to invite you and any developers that are involved in fs development and not going to attend LSF/MM in-person, to join LSF/MM virtually for some sessions that you may be interested in. See this lore query [2] for TOPICs proposed this year. You can let me know privately which sessions you are interested in attending and your time zone and I will do my best to schedule those sessions in time slots that would be more convenient for your time zone. Obviously, I am referring to FS track sessions. Cross track sessions are going to be harder to accommodate, but I can try. Thanks, Amir. [1] https://lore.kernel.org/linux-fsdevel/FF0202C3-7500-4BB3-914B-DBAA3E0EA3D7@oracle.com/ [2] https://lore.kernel.org/linux-fsdevel/?q=LSF+TOPIC+-re+d%3A4.months.ago..
On Sat, 2023-04-15 at 14:35 +0300, Amir Goldstein wrote: > On Tue, Apr 11, 2023 at 5:38 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > (Apologies for the resend, but I didn't send this with a wide enough > > distribution list originally). > > > > A few weeks ago, during one of the discussions around i_version, Dave > > Chinner wrote this: > > > > "You've missed the part where I suggested lifting the "nfsd sampled > > i_version" state into an inode state flag rather than hiding it in > > the i_version field. At that point, we could optimise away the > > secondary ctime updates just like you are proposing we do with the > > i_version updates. Further, we could also use that state it to > > decide whether we need to use high resolution timestamps when > > recording ctime updates - if the nfsd has not sampled the > > ctime/i_version, we don't need high res timestamps to be recorded > > for ctime...." > > > > While I don't think we can practically optimize away ctime updates > > like we do with i_version, I do like the idea of using this scheme to > > indicate when we need to use a high-res timestamp. > > > > This patchset is a first stab at a scheme to do this. It declares a new > > i_state flag for this purpose and adds two new vfs-layer functions to > > implement conditional high-res timestamp fetching. It then converts both > > tmpfs and xfs to use it. > > > > This seems to behave fine under xfstests, but I haven't yet done > > any performance testing with it. I wouldn't expect it to create huge > > regressions though since we're only grabbing high res timestamps after > > each query. > > > > I like this scheme because we can potentially convert any filesystem to > > use it. No special storage requirements like with i_version field. I > > think it'd potentially improve NFS cache coherency with a whole swath of > > exportable filesystems, and helps out NFSv3 too. > > > > This is really just a proof-of-concept. There are a number of things we > > could change: > > > > 1/ We could use the top bit in the tv_sec field as the flag. That'd give > > us different flags for ctime and mtime. We also wouldn't need to use > > a spinlock. > > > > 2/ We could probably optimize away the high-res timestamp fetch in more > > cases. Basically, always do a coarse-grained ts fetch and only fetch > > the high-res ts when the QUERIED flag is set and the existing time > > hasn't changed. > > > > If this approach looks reasonable, I'll plan to start working on > > converting more filesystems. > > > > One thing I'm not clear on is how widely available high res timestamps > > are. Is this something we need to gate on particular CONFIG_* options? > > > > Thoughts? > > Jeff, > > Considering that this proposal is pretty uncontroversial, > do you still want to discuss/lead a session on i_version changes in LSF/MM? > > I noticed that Chuck listed "timespamt resolution and i_version" as part > of his NFSD BoF topic proposal [1], but I do not think all of these topics > can fit in one 30 minute session. > Agreed. I think we'll need an hour for the nfsd BoF. I probably don't need a full 30 min slot for this topic, between the nfsd BoF and hallway track. I've started a TOPIC email for this about 5 times now, and keep deleting it. I think most of the more controversial bits are pretty much settled at this point, and the rest (crash resilience) is still too embryonic for discussion. I might want a lightning talk at some point about what I'd _really_ like to do long term with the i_version counter (basically: I want to be able to do a write that is gated on the i_version not having changed). > Dave, > > I would like to use this opportunity to invite you and any developers that > are involved in fs development and not going to attend LSF/MM in-person, > to join LSF/MM virtually for some sessions that you may be interested in. > See this lore query [2] for TOPICs proposed this year. > > You can let me know privately which sessions you are interested in > attending and your time zone and I will do my best to schedule those > sessions in time slots that would be more convenient for your time zone. > > Obviously, I am referring to FS track sessions. > Cross track sessions are going to be harder to accommodate, > but I can try. > > Thanks, > Amir. > > [1] https://lore.kernel.org/linux-fsdevel/FF0202C3-7500-4BB3-914B-DBAA3E0EA3D7@oracle.com/ > [2] https://lore.kernel.org/linux-fsdevel/?q=LSF+TOPIC+-re+d%3A4.months.ago..
> On Apr 15, 2023, at 7:35 AM, Amir Goldstein <amir73il@gmail.com> wrote: > > On Tue, Apr 11, 2023 at 5:38 PM Jeff Layton <jlayton@kernel.org> wrote: >> >> (Apologies for the resend, but I didn't send this with a wide enough >> distribution list originally). >> >> A few weeks ago, during one of the discussions around i_version, Dave >> Chinner wrote this: >> >> "You've missed the part where I suggested lifting the "nfsd sampled >> i_version" state into an inode state flag rather than hiding it in >> the i_version field. At that point, we could optimise away the >> secondary ctime updates just like you are proposing we do with the >> i_version updates. Further, we could also use that state it to >> decide whether we need to use high resolution timestamps when >> recording ctime updates - if the nfsd has not sampled the >> ctime/i_version, we don't need high res timestamps to be recorded >> for ctime...." >> >> While I don't think we can practically optimize away ctime updates >> like we do with i_version, I do like the idea of using this scheme to >> indicate when we need to use a high-res timestamp. >> >> This patchset is a first stab at a scheme to do this. It declares a new >> i_state flag for this purpose and adds two new vfs-layer functions to >> implement conditional high-res timestamp fetching. It then converts both >> tmpfs and xfs to use it. >> >> This seems to behave fine under xfstests, but I haven't yet done >> any performance testing with it. I wouldn't expect it to create huge >> regressions though since we're only grabbing high res timestamps after >> each query. >> >> I like this scheme because we can potentially convert any filesystem to >> use it. No special storage requirements like with i_version field. I >> think it'd potentially improve NFS cache coherency with a whole swath of >> exportable filesystems, and helps out NFSv3 too. >> >> This is really just a proof-of-concept. There are a number of things we >> could change: >> >> 1/ We could use the top bit in the tv_sec field as the flag. That'd give >> us different flags for ctime and mtime. We also wouldn't need to use >> a spinlock. >> >> 2/ We could probably optimize away the high-res timestamp fetch in more >> cases. Basically, always do a coarse-grained ts fetch and only fetch >> the high-res ts when the QUERIED flag is set and the existing time >> hasn't changed. >> >> If this approach looks reasonable, I'll plan to start working on >> converting more filesystems. >> >> One thing I'm not clear on is how widely available high res timestamps >> are. Is this something we need to gate on particular CONFIG_* options? >> >> Thoughts? > > Jeff, > > Considering that this proposal is pretty uncontroversial, > do you still want to discuss/lead a session on i_version changes in LSF/MM? > > I noticed that Chuck listed "timespamt resolution and i_version" as part > of his NFSD BoF topic proposal [1], but I do not think all of these topics > can fit in one 30 minute session. That's fair. If lumping these topics together doesn't seem sensible, I'm happy to consider splitting off the major topics, and then including the remaining in a generic network filesystem session or relegating them to the hallway track. I can suggest something more specific in the LSF TOPIC thread. -- Chuck Lever