[v3,00/11] Performance fixes for 9p filesystem

Message ID 20230124023834.106339-1-ericvh@kernel.org
Headers
Series Performance fixes for 9p filesystem |

Message

Eric Van Hensbergen Jan. 24, 2023, 2:38 a.m. UTC
  This is the third version of a patch series which adds a number
of features to improve read/write performance in the 9p filesystem.
Mostly it focuses on fixing caching to help utilize the recently
increased MSIZE limits and also fixes some problematic behavior
within the writeback code.

All together, these show roughly 10x speed increases on simple
file transfers.  Future patch sets will improve cache consistency
and directory caching.

These patches are also available on github:
https://github.com/v9fs/linux/tree/ericvh/for-next
and on kernel.org:
https://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git

Tested against qemu, cpu, and diod with fsx, dbench, and some
simple benchmarks.

Eric Van Hensbergen (11):
  Adjust maximum MSIZE to account for p9 header
  Expand setup of writeback cache to all levels
  Consolidate file operations and add readahead and writeback
  Remove unnecessary superblock flags
  allow disable of xattr support on mount
  fix bug in client create for .L
  Add additional debug flags and open modes
  Add new mount modes
  fix error reporting in v9fs_dir_release
  writeback mode fixes
  Fix revalidate

 Documentation/filesystems/9p.rst |  26 +++--
 fs/9p/fid.c                      |  52 ++++-----
 fs/9p/fid.h                      |  33 +++++-
 fs/9p/v9fs.c                     |  49 +++++---
 fs/9p/v9fs.h                     |   9 +-
 fs/9p/v9fs_vfs.h                 |   4 -
 fs/9p/vfs_addr.c                 |  24 ++--
 fs/9p/vfs_dentry.c               |   3 +-
 fs/9p/vfs_dir.c                  |  16 ++-
 fs/9p/vfs_file.c                 | 194 +++++++------------------------
 fs/9p/vfs_inode.c                |  71 ++++-------
 fs/9p/vfs_inode_dotl.c           |  62 +++++-----
 fs/9p/vfs_super.c                |  28 +++--
 include/net/9p/9p.h              |   5 +
 net/9p/client.c                  |   8 +-
 15 files changed, 256 insertions(+), 328 deletions(-)
  

Comments

Christian Schoenebeck Feb. 2, 2023, 11:27 a.m. UTC | #1
On Tuesday, January 24, 2023 3:38:23 AM CET Eric Van Hensbergen wrote:
> This is the third version of a patch series which adds a number
> of features to improve read/write performance in the 9p filesystem.
> Mostly it focuses on fixing caching to help utilize the recently
> increased MSIZE limits and also fixes some problematic behavior
> within the writeback code.
> 
> All together, these show roughly 10x speed increases on simple
> file transfers.  Future patch sets will improve cache consistency
> and directory caching.
> 
> These patches are also available on github:
> https://github.com/v9fs/linux/tree/ericvh/for-next
> and on kernel.org:
> https://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git
> 
> Tested against qemu, cpu, and diod with fsx, dbench, and some
> simple benchmarks.

Looks like this needs more work.

I only had a glimpse on your patches yet, but made some tests by doing
compilations on guest on top of a 9p root fs [1], msize=500k. Under that
scenario:

* loose: this is suprisingly the only mode where I can see some performance
increase, over "loose" on master it compiled ~5% faster, but I also got some
misbehaviours on guest.

* writeahead: no performance results, as compilation already aborts when
trying to detect a compiler. I had to restore a previous snapshot to repair
things after this test.

* readahead: significant performance drop. In comparison to "loose" on master
compilation takes 6 times longer with "readahead". There are some severe
misbehaviours on guest as well, and after boot I get this warning:

[    5.782846] folio expected an open fid inode->i_private=0000000000000000
[    5.786641] WARNING: CPU: 0 PID: 321 at fs/9p/vfs_addr.c:174 v9fs_vfs_write_folio_locked (fs/9p/vfs_addr.c:174 (discriminator 3)) 9p
[    5.792496] Modules linked in: ppdev(E) bochs(E) sg(E) drm_vram_helper(E) joydev(E) evdev(E) drm_kms_helper(E) serio_raw(E) drm_ttm_helper(E) pcsp)
[    5.816806] CPU: 0 PID: 321 Comm: chown Tainted: G            E      6.2.0-rc6+ #61
[    5.821694] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014
[    5.827362] RIP: 0010:v9fs_vfs_write_folio_locked (fs/9p/vfs_addr.c:174 (discriminator 3)) 9p

Code starting with the faulting instruction
===========================================
        ...
[    5.835360] RSP: 0018:ffffc900007d3a38 EFLAGS: 00010282
[    5.836982] RAX: 0000000000000000 RBX: ffff888106c86680 RCX: 0000000000000000
[    5.838877] RDX: 0000000000000001 RSI: ffffffff821eb1e6 RDI: 00000000ffffffff
[    5.841179] RBP: ffffea0004279300 R08: 0000000000000000 R09: 00000000ffffefff
[    5.843039] R10: ffffc900007d38e8 R11: ffffffff824bede8 R12: 0000000000000000
[    5.844850] R13: 00000000ffffffea R14: 0000000000000014 R15: 0000000000000014
[    5.846366] FS:  00007fd0fc4a0580(0000) GS:ffff88842fc00000(0000) knlGS:0000000000000000
[    5.848250] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.849386] CR2: 00007fd0fc38f4f0 CR3: 0000000100302000 CR4: 00000000000006f0
[    5.850824] Call Trace:
[    5.851622]  <TASK>
[    5.852052] v9fs_vfs_writepage (fs/9p/vfs_addr.c:207) 9p
[    5.852841] __writepage (mm/page-writeback.c:2537) 
[    5.853438] write_cache_pages (mm/page-writeback.c:2473) 
[    5.854205] ? __pfx___writepage (mm/page-writeback.c:2535) 
[    5.855309] ? delete_node (lib/radix-tree.c:575) 
[    5.856122] ? radix_tree_delete_item (lib/radix-tree.c:1432) 
[    5.857101] do_writepages (mm/page-writeback.c:2564 mm/page-writeback.c:2552 mm/page-writeback.c:2583) 
[    5.857954] ? radix_tree_delete_item (lib/radix-tree.c:1432) 
[    5.859103] filemap_fdatawrite_wbc (mm/filemap.c:389 mm/filemap.c:378) 
[    5.860043] __filemap_fdatawrite_range (mm/filemap.c:422) 
[    5.861050] filemap_write_and_wait_range (mm/filemap.c:682 mm/filemap.c:665) 
[    5.862132] v9fs_vfs_setattr_dotl (./include/linux/pagemap.h:60 fs/9p/vfs_inode_dotl.c:583) 9p
[    5.863312] notify_change (fs/attr.c:486) 
[    5.864043] ? chown_common (fs/open.c:736) 
[    5.864793] chown_common (fs/open.c:736) 
[    5.865538] ? preempt_count_add (./include/linux/ftrace.h:977 kernel/sched/core.c:5737 kernel/sched/core.c:5734 kernel/sched/core.c:5762) 
[    5.866420] do_fchownat (fs/open.c:768) 
[    5.867419] __x64_sys_fchownat (fs/open.c:782 fs/open.c:779 fs/open.c:779) 
[    5.868319] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
[    5.869116] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) 
[    5.871008] RIP: 0033:0x7fd0fc3b7b9a

Best regards,
Christian Schoenebeck

[1] https://wiki.qemu.org/Documentation/9p_root_fs
  
Eric Van Hensbergen Feb. 3, 2023, 7:12 p.m. UTC | #2
Hi Christian, thanks for the feedback -- will dig in and see if I can
find what's gone south here.  Clearly my approach to writeback without
writeback_fid didn't cover all the corner cases and thats the cause of
the fault.  Can I get a better idea of how to reproduce - you booted
with a root 9p file system, and then tried to build...what?

Performance degradation is interesting, runs counter to the
unit-testing and benchmarking I did, but I didn't do something as
logical as a build to check -- need to tease apart whether this is a
read problem, a write problem...or both.  My intuition is that its on
the write side, but as part of going through the code I made the cache
code a lot more pessimistic so its possible I inadvertently killed an
optimistic optimization.

Finally, just to clarify, the panic you had at the end happened with
readahead?  Seems interesting because clearly it thought it was
writing back something that it shouldn't have been writing back (since
writeback caches weren't enabled).   I'm thinking something was marked
as dirty even though the underlying system just wrote-through the
change and so the writeback isn't actually required.  This may also be
an indicator of the performance issue if we are actually writing
through the data in addition to an unnecessary write-back (which I
also worry is writing back bad data in the second case).

Can you give me an idea of what the other misbehaviors were?

      -eric

On Thu, Feb 2, 2023 at 5:27 AM Christian Schoenebeck
<linux_oss@crudebyte.com> wrote:
>
> On Tuesday, January 24, 2023 3:38:23 AM CET Eric Van Hensbergen wrote:
> > This is the third version of a patch series which adds a number
> > of features to improve read/write performance in the 9p filesystem.
> > Mostly it focuses on fixing caching to help utilize the recently
> > increased MSIZE limits and also fixes some problematic behavior
> > within the writeback code.
> >
> > All together, these show roughly 10x speed increases on simple
> > file transfers.  Future patch sets will improve cache consistency
> > and directory caching.
> >
> > These patches are also available on github:
> > https://github.com/v9fs/linux/tree/ericvh/for-next
> > and on kernel.org:
> > https://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git
> >
> > Tested against qemu, cpu, and diod with fsx, dbench, and some
> > simple benchmarks.
>
> Looks like this needs more work.
>
> I only had a glimpse on your patches yet, but made some tests by doing
> compilations on guest on top of a 9p root fs [1], msize=500k. Under that
> scenario:
>
> * loose: this is suprisingly the only mode where I can see some performance
> increase, over "loose" on master it compiled ~5% faster, but I also got some
> misbehaviours on guest.
>
> * writeahead: no performance results, as compilation already aborts when
> trying to detect a compiler. I had to restore a previous snapshot to repair
> things after this test.
>
> * readahead: significant performance drop. In comparison to "loose" on master
> compilation takes 6 times longer with "readahead". There are some severe
> misbehaviours on guest as well, and after boot I get this warning:
>
> [    5.782846] folio expected an open fid inode->i_private=0000000000000000
> [    5.786641] WARNING: CPU: 0 PID: 321 at fs/9p/vfs_addr.c:174 v9fs_vfs_write_folio_locked (fs/9p/vfs_addr.c:174 (discriminator 3)) 9p
> [    5.792496] Modules linked in: ppdev(E) bochs(E) sg(E) drm_vram_helper(E) joydev(E) evdev(E) drm_kms_helper(E) serio_raw(E) drm_ttm_helper(E) pcsp)
> [    5.816806] CPU: 0 PID: 321 Comm: chown Tainted: G            E      6.2.0-rc6+ #61
> [    5.821694] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014
> [    5.827362] RIP: 0010:v9fs_vfs_write_folio_locked (fs/9p/vfs_addr.c:174 (discriminator 3)) 9p
>
> Code starting with the faulting instruction
> ===========================================
>         ...
> [    5.835360] RSP: 0018:ffffc900007d3a38 EFLAGS: 00010282
> [    5.836982] RAX: 0000000000000000 RBX: ffff888106c86680 RCX: 0000000000000000
> [    5.838877] RDX: 0000000000000001 RSI: ffffffff821eb1e6 RDI: 00000000ffffffff
> [    5.841179] RBP: ffffea0004279300 R08: 0000000000000000 R09: 00000000ffffefff
> [    5.843039] R10: ffffc900007d38e8 R11: ffffffff824bede8 R12: 0000000000000000
> [    5.844850] R13: 00000000ffffffea R14: 0000000000000014 R15: 0000000000000014
> [    5.846366] FS:  00007fd0fc4a0580(0000) GS:ffff88842fc00000(0000) knlGS:0000000000000000
> [    5.848250] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    5.849386] CR2: 00007fd0fc38f4f0 CR3: 0000000100302000 CR4: 00000000000006f0
> [    5.850824] Call Trace:
> [    5.851622]  <TASK>
> [    5.852052] v9fs_vfs_writepage (fs/9p/vfs_addr.c:207) 9p
> [    5.852841] __writepage (mm/page-writeback.c:2537)
> [    5.853438] write_cache_pages (mm/page-writeback.c:2473)
> [    5.854205] ? __pfx___writepage (mm/page-writeback.c:2535)
> [    5.855309] ? delete_node (lib/radix-tree.c:575)
> [    5.856122] ? radix_tree_delete_item (lib/radix-tree.c:1432)
> [    5.857101] do_writepages (mm/page-writeback.c:2564 mm/page-writeback.c:2552 mm/page-writeback.c:2583)
> [    5.857954] ? radix_tree_delete_item (lib/radix-tree.c:1432)
> [    5.859103] filemap_fdatawrite_wbc (mm/filemap.c:389 mm/filemap.c:378)
> [    5.860043] __filemap_fdatawrite_range (mm/filemap.c:422)
> [    5.861050] filemap_write_and_wait_range (mm/filemap.c:682 mm/filemap.c:665)
> [    5.862132] v9fs_vfs_setattr_dotl (./include/linux/pagemap.h:60 fs/9p/vfs_inode_dotl.c:583) 9p
> [    5.863312] notify_change (fs/attr.c:486)
> [    5.864043] ? chown_common (fs/open.c:736)
> [    5.864793] chown_common (fs/open.c:736)
> [    5.865538] ? preempt_count_add (./include/linux/ftrace.h:977 kernel/sched/core.c:5737 kernel/sched/core.c:5734 kernel/sched/core.c:5762)
> [    5.866420] do_fchownat (fs/open.c:768)
> [    5.867419] __x64_sys_fchownat (fs/open.c:782 fs/open.c:779 fs/open.c:779)
> [    5.868319] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
> [    5.869116] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
> [    5.871008] RIP: 0033:0x7fd0fc3b7b9a
>
> Best regards,
> Christian Schoenebeck
>
> [1] https://wiki.qemu.org/Documentation/9p_root_fs
>
>
>
  
Christian Schoenebeck Feb. 4, 2023, 1:40 p.m. UTC | #3
On Friday, February 3, 2023 8:12:14 PM CET Eric Van Hensbergen wrote:
> Hi Christian, thanks for the feedback -- will dig in and see if I can
> find what's gone south here.  Clearly my approach to writeback without
> writeback_fid didn't cover all the corner cases and thats the cause of
> the fault.  Can I get a better idea of how to reproduce - you booted
> with a root 9p file system, and then tried to build...what?

KDE, which builds numerous packages, multi-threaded by default. In the past we
had 9p issues which triggered only after hours of compiling, however in this
case I don't think that you need to build something fancy. Because it already
fails at the very beginning of any build process, just when detecting a
compiler.

May I ask what kind of scenario you have tested so far? It was not a multi-
threaded context, right? Large chunk or small chunk I/O?

> Performance degradation is interesting, runs counter to the
> unit-testing and benchmarking I did, but I didn't do something as
> logical as a build to check -- need to tease apart whether this is a
> read problem, a write problem...or both.  My intuition is that its on
> the write side, but as part of going through the code I made the cache
> code a lot more pessimistic so its possible I inadvertently killed an
> optimistic optimization.

I have not walked down the road to investigate individual I/O errors or even
their cause yet, but from my feeling it could also be related to fid vs.
writeback_fid. I saw you dropped a fix we made there last year, but haven't
checked yet whether your changes would handle it correctly in another way.

> Finally, just to clarify, the panic you had at the end happened with
> readahead?  Seems interesting because clearly it thought it was
> writing back something that it shouldn't have been writing back (since
> writeback caches weren't enabled).   I'm thinking something was marked
> as dirty even though the underlying system just wrote-through the
> change and so the writeback isn't actually required.  This may also be
> an indicator of the performance issue if we are actually writing
> through the data in addition to an unnecessary write-back (which I
> also worry is writing back bad data in the second case).

It was not a kernel panic. It's a warning that appears right after boot, but
the system continues to run. So that warning is printed before starting the
actual build process. And yes, the warning is printed with "readahead".

> Can you give me an idea of what the other misbehaviors were?

There were really all sorts of misbheaviour on application level, e.g. no
command history being available from shell (arrow up/down), things hanging on
the shell for a long time, error messages. And after the writeahead test the
build directory was screwed, i.e. even after rebooting with a regular kernel
things no longer built correctly, so I had to restore a snapshot.

Best regards,
Christian Schoenebeck
  
Eric Van Hensbergen Feb. 4, 2023, 9:38 p.m. UTC | #4
Okay, thanks for the additional detail - I have an idea of what the
problem might be.

As far as my tests - I've predominantly tested with dbench, fsx (with
and without mmap tests), postmark, and bonnie -- running single and
multithreaded.  All seem to work fine and didn't report errors.  I
thought the dbench trace was based on a build, but perhaps that's
inaccurate, or perhaps there's something peculiar with it being the
root file system (I have always just mounted it after boot, not tried
booting with it as root).

(thinking out loud)
In any case, I think the fact that we see that error when in readahead
mode is the key hint, because it means it thinks something is
writeback cached when it shouldn't be.  The writeback is triggered by
the setattr, which always calls filemap_write_and_wait -- this is all
old code, not something added by the change.  My assumption was that
if the inode had no dirty data (writebacks) then it would just not do
anything -- this should be the case in readahead mode.  So we gotta
figure out why it thinks it has dirty data.  Looking at the code where
the warning is printed, its a WARN_ONCE so its quite possible we are
hitting this left and right -- we can probably switch that to always
print to get an idea of this being the root cause.  Need to add some
more debug code to understand what we think we are writing back as
anything there should have been flushed when the file was closed.
To your multithreaded concern, I suppose there could be a race between
flushing mmap writes and the setattr also calling writeback, but the
folio is supposed to be locked at this point so you think there would
only be one writeback.  This will be easier to understand once I
reproduce and have a full trace and we know what file we are talking
about and what other operations might have been in flight.

There is a case in mmap that I was worried always required writeback,
but I did enough unit testing to convince myself that wasn't the case
-- so could be something down that path but will reproduce your
environment first and see if I can get the same types of error (I'm
most of the way there at this point, it is just we are digging out
from an ice storm here in texas so there's been more chainsawing than
coding....).

        -eric

On Sat, Feb 4, 2023 at 7:40 AM Christian Schoenebeck
<linux_oss@crudebyte.com> wrote:
>
> On Friday, February 3, 2023 8:12:14 PM CET Eric Van Hensbergen wrote:
> > Hi Christian, thanks for the feedback -- will dig in and see if I can
> > find what's gone south here.  Clearly my approach to writeback without
> > writeback_fid didn't cover all the corner cases and thats the cause of
> > the fault.  Can I get a better idea of how to reproduce - you booted
> > with a root 9p file system, and then tried to build...what?
>
> KDE, which builds numerous packages, multi-threaded by default. In the past we
> had 9p issues which triggered only after hours of compiling, however in this
> case I don't think that you need to build something fancy. Because it already
> fails at the very beginning of any build process, just when detecting a
> compiler.
>
> May I ask what kind of scenario you have tested so far? It was not a multi-
> threaded context, right? Large chunk or small chunk I/O?
>
> > Performance degradation is interesting, runs counter to the
> > unit-testing and benchmarking I did, but I didn't do something as
> > logical as a build to check -- need to tease apart whether this is a
> > read problem, a write problem...or both.  My intuition is that its on
> > the write side, but as part of going through the code I made the cache
> > code a lot more pessimistic so its possible I inadvertently killed an
> > optimistic optimization.
>
> I have not walked down the road to investigate individual I/O errors or even
> their cause yet, but from my feeling it could also be related to fid vs.
> writeback_fid. I saw you dropped a fix we made there last year, but haven't
> checked yet whether your changes would handle it correctly in another way.
>
> > Finally, just to clarify, the panic you had at the end happened with
> > readahead?  Seems interesting because clearly it thought it was
> > writing back something that it shouldn't have been writing back (since
> > writeback caches weren't enabled).   I'm thinking something was marked
> > as dirty even though the underlying system just wrote-through the
> > change and so the writeback isn't actually required.  This may also be
> > an indicator of the performance issue if we are actually writing
> > through the data in addition to an unnecessary write-back (which I
> > also worry is writing back bad data in the second case).
>
> It was not a kernel panic. It's a warning that appears right after boot, but
> the system continues to run. So that warning is printed before starting the
> actual build process. And yes, the warning is printed with "readahead".
>
> > Can you give me an idea of what the other misbehaviors were?
>
> There were really all sorts of misbheaviour on application level, e.g. no
> command history being available from shell (arrow up/down), things hanging on
> the shell for a long time, error messages. And after the writeahead test the
> build directory was screwed, i.e. even after rebooting with a regular kernel
> things no longer built correctly, so I had to restore a snapshot.
>
> Best regards,
> Christian Schoenebeck
>
>
  
Eric Van Hensbergen Feb. 5, 2023, 4:37 p.m. UTC | #5
On Thu, Feb 2, 2023 at 5:27 AM Christian Schoenebeck
<linux_oss@crudebyte.com> wrote:
>
> Looks like this needs more work.
>
> I only had a glimpse on your patches yet, but made some tests by doing
> compilations on guest on top of a 9p root fs [1], msize=500k. Under that
> scenario:
>
> * loose: this is suprisingly the only mode where I can see some performance
> increase, over "loose" on master it compiled ~5% faster, but I also got some
> misbehaviours on guest.
>

I was so focused on the bugs that I forgot to respond to the
performance concerns -- just to be clear, readahead and writeback
aren't meant to be more performant than loose, they are meant to have
stronger guarantees of consistency with the server file system.  Loose
is inclusive of readahead and writeback, and it keeps the caches
around for longer, and it does some dir caching as well -- so its
always going to win, but it does so with risk of being more
inconsistent with the server file system and should only be done when
the guest/client has exclusive access or the filesystem itself is
read-only.  I've a design for a "tight" cache, which will also not be
as performant as loose but will add consistent dir-caching on top of
readahead and writeback -- once we've properly vetted that it should
likely be the default cache option and any fscache should be built on
top of it.  I was also thinking of augmenting "tight" and "loose" with
a "temporal" cache that works more like NFS and bounds consistency to
a particular time quanta.  Loose was always a bit of a "hack" for some
particular use cases and has always been a bit problematic in my mind.

So, to make sure we are on the same page, was your performance
uplifts/penalties versus cache=none or versus legacy cache=loose?  The
10x perf improvement in the patch series was in streaming reads over
cache=none.  I'll add the cache=loose datapoints to my performance
notebook (on github) for the future as points of reference, but I'd
always expect cache=loose to be the upper bound (although I have seen
some things in the code to do with directory reads/etc. that could be
improved there and should benefit from some of the changes I have
planned once I get to the dir caching).

          -eric
  
Christian Schoenebeck Feb. 6, 2023, 1:20 p.m. UTC | #6
On Sunday, February 5, 2023 5:37:53 PM CET Eric Van Hensbergen wrote:
> I was so focused on the bugs that I forgot to respond to the
> performance concerns -- just to be clear, readahead and writeback
> aren't meant to be more performant than loose, they are meant to have
> stronger guarantees of consistency with the server file system.  Loose
> is inclusive of readahead and writeback, and it keeps the caches
> around for longer, and it does some dir caching as well -- so its
> always going to win, but it does so with risk of being more
> inconsistent with the server file system and should only be done when
> the guest/client has exclusive access or the filesystem itself is
> read-only.

Okay, that's surprising to me indeed. My expecation was that "loose" would 
still retain its previous behaviour, i.e. loose consistency cache but without 
any readahead or writeback. I already wondered about the transitivity you used 
in code for cache selection with direct `<=` comparison of user's cache 
option.

Having said that, I wonder whether it would make sense to handle these as 
options independent of each other (e.g. cache=loose,readahead), but not sure, 
maybe it would overcomplicate things unnecessarily.

> I've a design for a "tight" cache, which will also not be
> as performant as loose but will add consistent dir-caching on top of
> readahead and writeback -- once we've properly vetted that it should
> likely be the default cache option and any fscache should be built on
> top of it.  I was also thinking of augmenting "tight" and "loose" with
> a "temporal" cache that works more like NFS and bounds consistency to
> a particular time quanta.  Loose was always a bit of a "hack" for some
> particular use cases and has always been a bit problematic in my mind.

Or we could add notifications on file changes from server side, because that's 
what this is actually about, right?

> So, to make sure we are on the same page, was your performance
> uplifts/penalties versus cache=none or versus legacy cache=loose?

I have not tested cache=none at all, because in the scenario of 9p being a 
root fs, you need at least cache=mmap, otherwise you won't even be able to 
boot a minimum system.

I compared:

  * master(cache=loose) vs. this(cache=loose)

  * master(cache=loose) vs. this(cache=readahead)

  * master(cache=loose) vs. this(cache=writeback)

> The 10x perf improvement in the patch series was in streaming reads over
> cache=none.

OK, that's an important information to mention in the first place. Because 
when say you measured a performance plus of x times, I would assume you 
compared it to at least a somewhat similar setup. I mean cache=loose was 
always much faster than cache=none before.

> I'll add the cache=loose datapoints to my performance
> notebook (on github) for the future as points of reference, but I'd
> always expect cache=loose to be the upper bound (although I have seen
> some things in the code to do with directory reads/etc. that could be
> improved there and should benefit from some of the changes I have
> planned once I get to the dir caching).
> 
>           -eric
  
Eric Van Hensbergen Feb. 6, 2023, 1:37 p.m. UTC | #7
On Mon, Feb 6, 2023 at 7:20 AM Christian Schoenebeck
<linux_oss@crudebyte.com> wrote:
>
> Okay, that's surprising to me indeed. My expecation was that "loose" would
> still retain its previous behaviour, i.e. loose consistency cache but without
> any readahead or writeback. I already wondered about the transitivity you used
> in code for cache selection with direct `<=` comparison of user's cache
> option.
>
> Having said that, I wonder whether it would make sense to handle these as
> options independent of each other (e.g. cache=loose,readahead), but not sure,
> maybe it would overcomplicate things unnecessarily.
>

That's fair and I've considered it, but was waiting until I get to the
dir cache changes to figure out which way I wanted to go.  I imagine
the way that would play out is there are three types of caching
(readahead, writeback, dir) with writeback inclusive of readahead
still though.  Then there would be three cache policies (tight,
temporal, loose) and finally there'd be a seperate option for fscache
(open question as to whether or not fscache with < dir makes sense..I
think probably not).

> > I've a design for a "tight" cache, which will also not be
> > as performant as loose but will add consistent dir-caching on top of
> > readahead and writeback -- once we've properly vetted that it should
> > likely be the default cache option and any fscache should be built on
> > top of it.  I was also thinking of augmenting "tight" and "loose" with
> > a "temporal" cache that works more like NFS and bounds consistency to
> > a particular time quanta.  Loose was always a bit of a "hack" for some
> > particular use cases and has always been a bit problematic in my mind.
>
> Or we could add notifications on file changes from server side, because that's
> what this is actually about, right?
>

Yeah, that's always an option, but would be tricky to work out the 9p
model for this as model is explicitly RPC so we'd have to post a read
for file changes.  We had the same discussion for locks and decided to
keep it simple for now.  I'm not opposed to exploring this, but we'd
want to keep it as a invalidate log with a single open posted read --
could use a synthetic or something similar to the Tauth messages to
have that.  That's gonna go on the end-of-the-backlog for
consideration, but happy to review if someone else wants to go after
it.

> > So, to make sure we are on the same page, was your performance
> > uplifts/penalties versus cache=none or versus legacy cache=loose?
>
> I have not tested cache=none at all, because in the scenario of 9p being a
> root fs, you need at least cache=mmap, otherwise you won't even be able to
> boot a minimum system.
>

Yeah, understood -- mmap ~= writeback so the writeback issues would
persist there.  FWIW, I continue to see no problems with cache=none,
but that makes sense as all the changes are in the cache code.  Will
keep crunching on getting this fixed.

> I compared:
>
>   * master(cache=loose) vs. this(cache=loose)
>
>   * master(cache=loose) vs. this(cache=readahead)
>
>   * master(cache=loose) vs. this(cache=writeback)
>
> > The 10x perf improvement in the patch series was in streaming reads over
> > cache=none.
>
> OK, that's an important information to mention in the first place. Because
> when say you measured a performance plus of x times, I would assume you
> compared it to at least a somewhat similar setup. I mean cache=loose was
> always much faster than cache=none before.
>

Sorry that I didn't make that more clear.  The original motivation for
the patch series was the cpu project that Ron and I have been
collaborating on and cache==loose was problematic for that use case so
we wanted something that approached the performance of cache==loose
but with tighter consistency (in particular the ability to actually do
read-ahead with open-to-close consistency).  As you pointed out
though, there was a 5% improvement in loose (probably due to reduction
of messages associated with management of the writeback_fid).  In any
case, the hope is to make cache=mmap (and eventually cache=tight) the
default cache mode versus cache=none -- but have to get this stable
first.

As I said, the dir-cache changes in the WIP patch series are expected
to benefit loose a bit more (particularly around the dir-read pain
points) and I spotted several cases where loose appears to be
re-requesting files it already has in cache -- so there may be more to
it.  But that being said, I don't expected to get 10x out of those
changes (although depends on the types of operations being performed).
Will know better when I get further along.

         -eric