Message ID | 20230124023834.106339-1-ericvh@kernel.org |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1930838wrn; Mon, 23 Jan 2023 18:40:43 -0800 (PST) X-Google-Smtp-Source: AMrXdXu4agh08oOXbFpparTU1lg3LO3AzC4JQE+0oS/MtxV324cT7xnvBmXurSa7X1SXixCS3vHp X-Received: by 2002:a17:907:d23:b0:877:6873:70b9 with SMTP id gn35-20020a1709070d2300b00877687370b9mr27164462ejc.29.1674528043208; Mon, 23 Jan 2023 18:40:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674528043; cv=none; d=google.com; s=arc-20160816; b=Q3CQw07c5Bzf/VvqfWG2GFQ0q01/ST0gGQjqv3JVe+puXK2Tje4EMd02js5q9+NFoD lZ5rPzoNt3LIsZRuiIcA7r/teJ6bHFtRSbI4RIzywCndjTyaB5cEGC+Vemxyic5ovhBL x1j8pCNPGMptiiqyDw43KyA4H+Zjyj0J9FUVtQZw2WlIEKVjNJCqWg/yonE25VCnrhCh 5GU3LnmzdXxCmaFWTCSJsf9EY9WFiiNDQh+juP5DUShnK06to1/Q4Alwko2EMfjQeWJm WwVaWlTwrW1846KyzgOQqAlsdLyY6yvmrLp2wCu54FkrAD/GvJajjddoQiU3EYOYslpR qFhg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=W03lUavWvh+5NbPsinIKPGwjvugxe2minBOHruecpUg=; b=LnhhXk3/0apXeQaxUGN8FypGT3oQtuwLpnLKeOqJLEqy4icmYCS2pP6lT4mtqCnlvU dRiiYl85hEqaWZnovqutEdTf3p2JI0XFCExvmnxGPZpDsk16kjwKDqc4KLHBSN5bqgTE kV69B0EmVYeqRTSOJiikMAH7cmOqN172ZNtMoaPvH+ABHC1K2dFGUZ6BTdmvgAjzKasa kH4JoYfIvLzD9JbPhCCCvNQrSfv+v49Xhi86uGV/Se25eSzdLJnQoIp1337ccKp6t0Q9 iJd4pFThdGfn6r4LBZqEDorHCDF1php9aT6lD93lg5YoNNBAMseDqd5KkFd4mIeSUoio fJ0A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=QNS87hUF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ex24-20020a170907955800b007e08e2619e7si824101ejc.860.2023.01.23.18.40.19; Mon, 23 Jan 2023 18:40:43 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=QNS87hUF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231970AbjAXCjK (ORCPT <rfc822;rust.linux@gmail.com> + 99 others); Mon, 23 Jan 2023 21:39:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53538 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229603AbjAXCjJ (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 23 Jan 2023 21:39:09 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 650E6303F7; Mon, 23 Jan 2023 18:39:08 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 07C1661177; Tue, 24 Jan 2023 02:39:08 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8AE4DC433EF; Tue, 24 Jan 2023 02:39:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1674527947; bh=F4sTcOFyWGXSAq6Qam4JIEoGJfFTcWlqfUVCdiEqLJg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=QNS87hUFKYpRtzfj7nyDEfEASgBqbRaKmb564FhjdiUgItaDg/ta0btS6/H13vUiy CJSmOqpRkh/dJQvA3ED9Lh3l17vSXmyFKhvhfZrd12CnjqlJZgw2+w52jQkiSaJdKc 39jOnOFohqIP4JYj+IT4Dmyd3FF/Eq8yqBDT+5hDNs8zcl70n6Ih1XDpxx8bZxhuE8 PcKZzLPwFOl8+1jr86yFID4VtvoZpO3qfpXGHn9xiulLIbCQF/X9sYKsxuUSWV9IeF TxBDFoCyAeFlsjePGrk5Qc1JfzERZN4sG9b3+qGzN+DeTT3y5sXK70+xNknavMuz5/ k6qcnWBjGS6QQ== From: Eric Van Hensbergen <ericvh@kernel.org> To: v9fs-developer@lists.sourceforge.net, asmadeus@codewreck.org, rminnich@gmail.com, lucho@ionkov.net Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux_oss@crudebyte.com, Eric Van Hensbergen <ericvh@kernel.org> Subject: [PATCH v3 00/11] Performance fixes for 9p filesystem Date: Tue, 24 Jan 2023 02:38:23 +0000 Message-Id: <20230124023834.106339-1-ericvh@kernel.org> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20221218232217.1713283-1-evanhensbergen@icloud.com> References: <20221218232217.1713283-1-evanhensbergen@icloud.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752596047668354679?= X-GMAIL-MSGID: =?utf-8?q?1755869917366023726?= |
Series |
Performance fixes for 9p filesystem
|
|
Message
Eric Van Hensbergen
Jan. 24, 2023, 2:38 a.m. UTC
This is the third version of a patch series which adds a number of features to improve read/write performance in the 9p filesystem. Mostly it focuses on fixing caching to help utilize the recently increased MSIZE limits and also fixes some problematic behavior within the writeback code. All together, these show roughly 10x speed increases on simple file transfers. Future patch sets will improve cache consistency and directory caching. These patches are also available on github: https://github.com/v9fs/linux/tree/ericvh/for-next and on kernel.org: https://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git Tested against qemu, cpu, and diod with fsx, dbench, and some simple benchmarks. Eric Van Hensbergen (11): Adjust maximum MSIZE to account for p9 header Expand setup of writeback cache to all levels Consolidate file operations and add readahead and writeback Remove unnecessary superblock flags allow disable of xattr support on mount fix bug in client create for .L Add additional debug flags and open modes Add new mount modes fix error reporting in v9fs_dir_release writeback mode fixes Fix revalidate Documentation/filesystems/9p.rst | 26 +++-- fs/9p/fid.c | 52 ++++----- fs/9p/fid.h | 33 +++++- fs/9p/v9fs.c | 49 +++++--- fs/9p/v9fs.h | 9 +- fs/9p/v9fs_vfs.h | 4 - fs/9p/vfs_addr.c | 24 ++-- fs/9p/vfs_dentry.c | 3 +- fs/9p/vfs_dir.c | 16 ++- fs/9p/vfs_file.c | 194 +++++++------------------------ fs/9p/vfs_inode.c | 71 ++++------- fs/9p/vfs_inode_dotl.c | 62 +++++----- fs/9p/vfs_super.c | 28 +++-- include/net/9p/9p.h | 5 + net/9p/client.c | 8 +- 15 files changed, 256 insertions(+), 328 deletions(-)
Comments
On Tuesday, January 24, 2023 3:38:23 AM CET Eric Van Hensbergen wrote: > This is the third version of a patch series which adds a number > of features to improve read/write performance in the 9p filesystem. > Mostly it focuses on fixing caching to help utilize the recently > increased MSIZE limits and also fixes some problematic behavior > within the writeback code. > > All together, these show roughly 10x speed increases on simple > file transfers. Future patch sets will improve cache consistency > and directory caching. > > These patches are also available on github: > https://github.com/v9fs/linux/tree/ericvh/for-next > and on kernel.org: > https://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git > > Tested against qemu, cpu, and diod with fsx, dbench, and some > simple benchmarks. Looks like this needs more work. I only had a glimpse on your patches yet, but made some tests by doing compilations on guest on top of a 9p root fs [1], msize=500k. Under that scenario: * loose: this is suprisingly the only mode where I can see some performance increase, over "loose" on master it compiled ~5% faster, but I also got some misbehaviours on guest. * writeahead: no performance results, as compilation already aborts when trying to detect a compiler. I had to restore a previous snapshot to repair things after this test. * readahead: significant performance drop. In comparison to "loose" on master compilation takes 6 times longer with "readahead". There are some severe misbehaviours on guest as well, and after boot I get this warning: [ 5.782846] folio expected an open fid inode->i_private=0000000000000000 [ 5.786641] WARNING: CPU: 0 PID: 321 at fs/9p/vfs_addr.c:174 v9fs_vfs_write_folio_locked (fs/9p/vfs_addr.c:174 (discriminator 3)) 9p [ 5.792496] Modules linked in: ppdev(E) bochs(E) sg(E) drm_vram_helper(E) joydev(E) evdev(E) drm_kms_helper(E) serio_raw(E) drm_ttm_helper(E) pcsp) [ 5.816806] CPU: 0 PID: 321 Comm: chown Tainted: G E 6.2.0-rc6+ #61 [ 5.821694] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014 [ 5.827362] RIP: 0010:v9fs_vfs_write_folio_locked (fs/9p/vfs_addr.c:174 (discriminator 3)) 9p Code starting with the faulting instruction =========================================== ... [ 5.835360] RSP: 0018:ffffc900007d3a38 EFLAGS: 00010282 [ 5.836982] RAX: 0000000000000000 RBX: ffff888106c86680 RCX: 0000000000000000 [ 5.838877] RDX: 0000000000000001 RSI: ffffffff821eb1e6 RDI: 00000000ffffffff [ 5.841179] RBP: ffffea0004279300 R08: 0000000000000000 R09: 00000000ffffefff [ 5.843039] R10: ffffc900007d38e8 R11: ffffffff824bede8 R12: 0000000000000000 [ 5.844850] R13: 00000000ffffffea R14: 0000000000000014 R15: 0000000000000014 [ 5.846366] FS: 00007fd0fc4a0580(0000) GS:ffff88842fc00000(0000) knlGS:0000000000000000 [ 5.848250] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5.849386] CR2: 00007fd0fc38f4f0 CR3: 0000000100302000 CR4: 00000000000006f0 [ 5.850824] Call Trace: [ 5.851622] <TASK> [ 5.852052] v9fs_vfs_writepage (fs/9p/vfs_addr.c:207) 9p [ 5.852841] __writepage (mm/page-writeback.c:2537) [ 5.853438] write_cache_pages (mm/page-writeback.c:2473) [ 5.854205] ? __pfx___writepage (mm/page-writeback.c:2535) [ 5.855309] ? delete_node (lib/radix-tree.c:575) [ 5.856122] ? radix_tree_delete_item (lib/radix-tree.c:1432) [ 5.857101] do_writepages (mm/page-writeback.c:2564 mm/page-writeback.c:2552 mm/page-writeback.c:2583) [ 5.857954] ? radix_tree_delete_item (lib/radix-tree.c:1432) [ 5.859103] filemap_fdatawrite_wbc (mm/filemap.c:389 mm/filemap.c:378) [ 5.860043] __filemap_fdatawrite_range (mm/filemap.c:422) [ 5.861050] filemap_write_and_wait_range (mm/filemap.c:682 mm/filemap.c:665) [ 5.862132] v9fs_vfs_setattr_dotl (./include/linux/pagemap.h:60 fs/9p/vfs_inode_dotl.c:583) 9p [ 5.863312] notify_change (fs/attr.c:486) [ 5.864043] ? chown_common (fs/open.c:736) [ 5.864793] chown_common (fs/open.c:736) [ 5.865538] ? preempt_count_add (./include/linux/ftrace.h:977 kernel/sched/core.c:5737 kernel/sched/core.c:5734 kernel/sched/core.c:5762) [ 5.866420] do_fchownat (fs/open.c:768) [ 5.867419] __x64_sys_fchownat (fs/open.c:782 fs/open.c:779 fs/open.c:779) [ 5.868319] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 5.869116] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) [ 5.871008] RIP: 0033:0x7fd0fc3b7b9a Best regards, Christian Schoenebeck [1] https://wiki.qemu.org/Documentation/9p_root_fs
Hi Christian, thanks for the feedback -- will dig in and see if I can find what's gone south here. Clearly my approach to writeback without writeback_fid didn't cover all the corner cases and thats the cause of the fault. Can I get a better idea of how to reproduce - you booted with a root 9p file system, and then tried to build...what? Performance degradation is interesting, runs counter to the unit-testing and benchmarking I did, but I didn't do something as logical as a build to check -- need to tease apart whether this is a read problem, a write problem...or both. My intuition is that its on the write side, but as part of going through the code I made the cache code a lot more pessimistic so its possible I inadvertently killed an optimistic optimization. Finally, just to clarify, the panic you had at the end happened with readahead? Seems interesting because clearly it thought it was writing back something that it shouldn't have been writing back (since writeback caches weren't enabled). I'm thinking something was marked as dirty even though the underlying system just wrote-through the change and so the writeback isn't actually required. This may also be an indicator of the performance issue if we are actually writing through the data in addition to an unnecessary write-back (which I also worry is writing back bad data in the second case). Can you give me an idea of what the other misbehaviors were? -eric On Thu, Feb 2, 2023 at 5:27 AM Christian Schoenebeck <linux_oss@crudebyte.com> wrote: > > On Tuesday, January 24, 2023 3:38:23 AM CET Eric Van Hensbergen wrote: > > This is the third version of a patch series which adds a number > > of features to improve read/write performance in the 9p filesystem. > > Mostly it focuses on fixing caching to help utilize the recently > > increased MSIZE limits and also fixes some problematic behavior > > within the writeback code. > > > > All together, these show roughly 10x speed increases on simple > > file transfers. Future patch sets will improve cache consistency > > and directory caching. > > > > These patches are also available on github: > > https://github.com/v9fs/linux/tree/ericvh/for-next > > and on kernel.org: > > https://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git > > > > Tested against qemu, cpu, and diod with fsx, dbench, and some > > simple benchmarks. > > Looks like this needs more work. > > I only had a glimpse on your patches yet, but made some tests by doing > compilations on guest on top of a 9p root fs [1], msize=500k. Under that > scenario: > > * loose: this is suprisingly the only mode where I can see some performance > increase, over "loose" on master it compiled ~5% faster, but I also got some > misbehaviours on guest. > > * writeahead: no performance results, as compilation already aborts when > trying to detect a compiler. I had to restore a previous snapshot to repair > things after this test. > > * readahead: significant performance drop. In comparison to "loose" on master > compilation takes 6 times longer with "readahead". There are some severe > misbehaviours on guest as well, and after boot I get this warning: > > [ 5.782846] folio expected an open fid inode->i_private=0000000000000000 > [ 5.786641] WARNING: CPU: 0 PID: 321 at fs/9p/vfs_addr.c:174 v9fs_vfs_write_folio_locked (fs/9p/vfs_addr.c:174 (discriminator 3)) 9p > [ 5.792496] Modules linked in: ppdev(E) bochs(E) sg(E) drm_vram_helper(E) joydev(E) evdev(E) drm_kms_helper(E) serio_raw(E) drm_ttm_helper(E) pcsp) > [ 5.816806] CPU: 0 PID: 321 Comm: chown Tainted: G E 6.2.0-rc6+ #61 > [ 5.821694] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014 > [ 5.827362] RIP: 0010:v9fs_vfs_write_folio_locked (fs/9p/vfs_addr.c:174 (discriminator 3)) 9p > > Code starting with the faulting instruction > =========================================== > ... > [ 5.835360] RSP: 0018:ffffc900007d3a38 EFLAGS: 00010282 > [ 5.836982] RAX: 0000000000000000 RBX: ffff888106c86680 RCX: 0000000000000000 > [ 5.838877] RDX: 0000000000000001 RSI: ffffffff821eb1e6 RDI: 00000000ffffffff > [ 5.841179] RBP: ffffea0004279300 R08: 0000000000000000 R09: 00000000ffffefff > [ 5.843039] R10: ffffc900007d38e8 R11: ffffffff824bede8 R12: 0000000000000000 > [ 5.844850] R13: 00000000ffffffea R14: 0000000000000014 R15: 0000000000000014 > [ 5.846366] FS: 00007fd0fc4a0580(0000) GS:ffff88842fc00000(0000) knlGS:0000000000000000 > [ 5.848250] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 5.849386] CR2: 00007fd0fc38f4f0 CR3: 0000000100302000 CR4: 00000000000006f0 > [ 5.850824] Call Trace: > [ 5.851622] <TASK> > [ 5.852052] v9fs_vfs_writepage (fs/9p/vfs_addr.c:207) 9p > [ 5.852841] __writepage (mm/page-writeback.c:2537) > [ 5.853438] write_cache_pages (mm/page-writeback.c:2473) > [ 5.854205] ? __pfx___writepage (mm/page-writeback.c:2535) > [ 5.855309] ? delete_node (lib/radix-tree.c:575) > [ 5.856122] ? radix_tree_delete_item (lib/radix-tree.c:1432) > [ 5.857101] do_writepages (mm/page-writeback.c:2564 mm/page-writeback.c:2552 mm/page-writeback.c:2583) > [ 5.857954] ? radix_tree_delete_item (lib/radix-tree.c:1432) > [ 5.859103] filemap_fdatawrite_wbc (mm/filemap.c:389 mm/filemap.c:378) > [ 5.860043] __filemap_fdatawrite_range (mm/filemap.c:422) > [ 5.861050] filemap_write_and_wait_range (mm/filemap.c:682 mm/filemap.c:665) > [ 5.862132] v9fs_vfs_setattr_dotl (./include/linux/pagemap.h:60 fs/9p/vfs_inode_dotl.c:583) 9p > [ 5.863312] notify_change (fs/attr.c:486) > [ 5.864043] ? chown_common (fs/open.c:736) > [ 5.864793] chown_common (fs/open.c:736) > [ 5.865538] ? preempt_count_add (./include/linux/ftrace.h:977 kernel/sched/core.c:5737 kernel/sched/core.c:5734 kernel/sched/core.c:5762) > [ 5.866420] do_fchownat (fs/open.c:768) > [ 5.867419] __x64_sys_fchownat (fs/open.c:782 fs/open.c:779 fs/open.c:779) > [ 5.868319] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) > [ 5.869116] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) > [ 5.871008] RIP: 0033:0x7fd0fc3b7b9a > > Best regards, > Christian Schoenebeck > > [1] https://wiki.qemu.org/Documentation/9p_root_fs > > >
On Friday, February 3, 2023 8:12:14 PM CET Eric Van Hensbergen wrote: > Hi Christian, thanks for the feedback -- will dig in and see if I can > find what's gone south here. Clearly my approach to writeback without > writeback_fid didn't cover all the corner cases and thats the cause of > the fault. Can I get a better idea of how to reproduce - you booted > with a root 9p file system, and then tried to build...what? KDE, which builds numerous packages, multi-threaded by default. In the past we had 9p issues which triggered only after hours of compiling, however in this case I don't think that you need to build something fancy. Because it already fails at the very beginning of any build process, just when detecting a compiler. May I ask what kind of scenario you have tested so far? It was not a multi- threaded context, right? Large chunk or small chunk I/O? > Performance degradation is interesting, runs counter to the > unit-testing and benchmarking I did, but I didn't do something as > logical as a build to check -- need to tease apart whether this is a > read problem, a write problem...or both. My intuition is that its on > the write side, but as part of going through the code I made the cache > code a lot more pessimistic so its possible I inadvertently killed an > optimistic optimization. I have not walked down the road to investigate individual I/O errors or even their cause yet, but from my feeling it could also be related to fid vs. writeback_fid. I saw you dropped a fix we made there last year, but haven't checked yet whether your changes would handle it correctly in another way. > Finally, just to clarify, the panic you had at the end happened with > readahead? Seems interesting because clearly it thought it was > writing back something that it shouldn't have been writing back (since > writeback caches weren't enabled). I'm thinking something was marked > as dirty even though the underlying system just wrote-through the > change and so the writeback isn't actually required. This may also be > an indicator of the performance issue if we are actually writing > through the data in addition to an unnecessary write-back (which I > also worry is writing back bad data in the second case). It was not a kernel panic. It's a warning that appears right after boot, but the system continues to run. So that warning is printed before starting the actual build process. And yes, the warning is printed with "readahead". > Can you give me an idea of what the other misbehaviors were? There were really all sorts of misbheaviour on application level, e.g. no command history being available from shell (arrow up/down), things hanging on the shell for a long time, error messages. And after the writeahead test the build directory was screwed, i.e. even after rebooting with a regular kernel things no longer built correctly, so I had to restore a snapshot. Best regards, Christian Schoenebeck
Okay, thanks for the additional detail - I have an idea of what the problem might be. As far as my tests - I've predominantly tested with dbench, fsx (with and without mmap tests), postmark, and bonnie -- running single and multithreaded. All seem to work fine and didn't report errors. I thought the dbench trace was based on a build, but perhaps that's inaccurate, or perhaps there's something peculiar with it being the root file system (I have always just mounted it after boot, not tried booting with it as root). (thinking out loud) In any case, I think the fact that we see that error when in readahead mode is the key hint, because it means it thinks something is writeback cached when it shouldn't be. The writeback is triggered by the setattr, which always calls filemap_write_and_wait -- this is all old code, not something added by the change. My assumption was that if the inode had no dirty data (writebacks) then it would just not do anything -- this should be the case in readahead mode. So we gotta figure out why it thinks it has dirty data. Looking at the code where the warning is printed, its a WARN_ONCE so its quite possible we are hitting this left and right -- we can probably switch that to always print to get an idea of this being the root cause. Need to add some more debug code to understand what we think we are writing back as anything there should have been flushed when the file was closed. To your multithreaded concern, I suppose there could be a race between flushing mmap writes and the setattr also calling writeback, but the folio is supposed to be locked at this point so you think there would only be one writeback. This will be easier to understand once I reproduce and have a full trace and we know what file we are talking about and what other operations might have been in flight. There is a case in mmap that I was worried always required writeback, but I did enough unit testing to convince myself that wasn't the case -- so could be something down that path but will reproduce your environment first and see if I can get the same types of error (I'm most of the way there at this point, it is just we are digging out from an ice storm here in texas so there's been more chainsawing than coding....). -eric On Sat, Feb 4, 2023 at 7:40 AM Christian Schoenebeck <linux_oss@crudebyte.com> wrote: > > On Friday, February 3, 2023 8:12:14 PM CET Eric Van Hensbergen wrote: > > Hi Christian, thanks for the feedback -- will dig in and see if I can > > find what's gone south here. Clearly my approach to writeback without > > writeback_fid didn't cover all the corner cases and thats the cause of > > the fault. Can I get a better idea of how to reproduce - you booted > > with a root 9p file system, and then tried to build...what? > > KDE, which builds numerous packages, multi-threaded by default. In the past we > had 9p issues which triggered only after hours of compiling, however in this > case I don't think that you need to build something fancy. Because it already > fails at the very beginning of any build process, just when detecting a > compiler. > > May I ask what kind of scenario you have tested so far? It was not a multi- > threaded context, right? Large chunk or small chunk I/O? > > > Performance degradation is interesting, runs counter to the > > unit-testing and benchmarking I did, but I didn't do something as > > logical as a build to check -- need to tease apart whether this is a > > read problem, a write problem...or both. My intuition is that its on > > the write side, but as part of going through the code I made the cache > > code a lot more pessimistic so its possible I inadvertently killed an > > optimistic optimization. > > I have not walked down the road to investigate individual I/O errors or even > their cause yet, but from my feeling it could also be related to fid vs. > writeback_fid. I saw you dropped a fix we made there last year, but haven't > checked yet whether your changes would handle it correctly in another way. > > > Finally, just to clarify, the panic you had at the end happened with > > readahead? Seems interesting because clearly it thought it was > > writing back something that it shouldn't have been writing back (since > > writeback caches weren't enabled). I'm thinking something was marked > > as dirty even though the underlying system just wrote-through the > > change and so the writeback isn't actually required. This may also be > > an indicator of the performance issue if we are actually writing > > through the data in addition to an unnecessary write-back (which I > > also worry is writing back bad data in the second case). > > It was not a kernel panic. It's a warning that appears right after boot, but > the system continues to run. So that warning is printed before starting the > actual build process. And yes, the warning is printed with "readahead". > > > Can you give me an idea of what the other misbehaviors were? > > There were really all sorts of misbheaviour on application level, e.g. no > command history being available from shell (arrow up/down), things hanging on > the shell for a long time, error messages. And after the writeahead test the > build directory was screwed, i.e. even after rebooting with a regular kernel > things no longer built correctly, so I had to restore a snapshot. > > Best regards, > Christian Schoenebeck > >
On Thu, Feb 2, 2023 at 5:27 AM Christian Schoenebeck <linux_oss@crudebyte.com> wrote: > > Looks like this needs more work. > > I only had a glimpse on your patches yet, but made some tests by doing > compilations on guest on top of a 9p root fs [1], msize=500k. Under that > scenario: > > * loose: this is suprisingly the only mode where I can see some performance > increase, over "loose" on master it compiled ~5% faster, but I also got some > misbehaviours on guest. > I was so focused on the bugs that I forgot to respond to the performance concerns -- just to be clear, readahead and writeback aren't meant to be more performant than loose, they are meant to have stronger guarantees of consistency with the server file system. Loose is inclusive of readahead and writeback, and it keeps the caches around for longer, and it does some dir caching as well -- so its always going to win, but it does so with risk of being more inconsistent with the server file system and should only be done when the guest/client has exclusive access or the filesystem itself is read-only. I've a design for a "tight" cache, which will also not be as performant as loose but will add consistent dir-caching on top of readahead and writeback -- once we've properly vetted that it should likely be the default cache option and any fscache should be built on top of it. I was also thinking of augmenting "tight" and "loose" with a "temporal" cache that works more like NFS and bounds consistency to a particular time quanta. Loose was always a bit of a "hack" for some particular use cases and has always been a bit problematic in my mind. So, to make sure we are on the same page, was your performance uplifts/penalties versus cache=none or versus legacy cache=loose? The 10x perf improvement in the patch series was in streaming reads over cache=none. I'll add the cache=loose datapoints to my performance notebook (on github) for the future as points of reference, but I'd always expect cache=loose to be the upper bound (although I have seen some things in the code to do with directory reads/etc. that could be improved there and should benefit from some of the changes I have planned once I get to the dir caching). -eric
On Sunday, February 5, 2023 5:37:53 PM CET Eric Van Hensbergen wrote: > I was so focused on the bugs that I forgot to respond to the > performance concerns -- just to be clear, readahead and writeback > aren't meant to be more performant than loose, they are meant to have > stronger guarantees of consistency with the server file system. Loose > is inclusive of readahead and writeback, and it keeps the caches > around for longer, and it does some dir caching as well -- so its > always going to win, but it does so with risk of being more > inconsistent with the server file system and should only be done when > the guest/client has exclusive access or the filesystem itself is > read-only. Okay, that's surprising to me indeed. My expecation was that "loose" would still retain its previous behaviour, i.e. loose consistency cache but without any readahead or writeback. I already wondered about the transitivity you used in code for cache selection with direct `<=` comparison of user's cache option. Having said that, I wonder whether it would make sense to handle these as options independent of each other (e.g. cache=loose,readahead), but not sure, maybe it would overcomplicate things unnecessarily. > I've a design for a "tight" cache, which will also not be > as performant as loose but will add consistent dir-caching on top of > readahead and writeback -- once we've properly vetted that it should > likely be the default cache option and any fscache should be built on > top of it. I was also thinking of augmenting "tight" and "loose" with > a "temporal" cache that works more like NFS and bounds consistency to > a particular time quanta. Loose was always a bit of a "hack" for some > particular use cases and has always been a bit problematic in my mind. Or we could add notifications on file changes from server side, because that's what this is actually about, right? > So, to make sure we are on the same page, was your performance > uplifts/penalties versus cache=none or versus legacy cache=loose? I have not tested cache=none at all, because in the scenario of 9p being a root fs, you need at least cache=mmap, otherwise you won't even be able to boot a minimum system. I compared: * master(cache=loose) vs. this(cache=loose) * master(cache=loose) vs. this(cache=readahead) * master(cache=loose) vs. this(cache=writeback) > The 10x perf improvement in the patch series was in streaming reads over > cache=none. OK, that's an important information to mention in the first place. Because when say you measured a performance plus of x times, I would assume you compared it to at least a somewhat similar setup. I mean cache=loose was always much faster than cache=none before. > I'll add the cache=loose datapoints to my performance > notebook (on github) for the future as points of reference, but I'd > always expect cache=loose to be the upper bound (although I have seen > some things in the code to do with directory reads/etc. that could be > improved there and should benefit from some of the changes I have > planned once I get to the dir caching). > > -eric
On Mon, Feb 6, 2023 at 7:20 AM Christian Schoenebeck <linux_oss@crudebyte.com> wrote: > > Okay, that's surprising to me indeed. My expecation was that "loose" would > still retain its previous behaviour, i.e. loose consistency cache but without > any readahead or writeback. I already wondered about the transitivity you used > in code for cache selection with direct `<=` comparison of user's cache > option. > > Having said that, I wonder whether it would make sense to handle these as > options independent of each other (e.g. cache=loose,readahead), but not sure, > maybe it would overcomplicate things unnecessarily. > That's fair and I've considered it, but was waiting until I get to the dir cache changes to figure out which way I wanted to go. I imagine the way that would play out is there are three types of caching (readahead, writeback, dir) with writeback inclusive of readahead still though. Then there would be three cache policies (tight, temporal, loose) and finally there'd be a seperate option for fscache (open question as to whether or not fscache with < dir makes sense..I think probably not). > > I've a design for a "tight" cache, which will also not be > > as performant as loose but will add consistent dir-caching on top of > > readahead and writeback -- once we've properly vetted that it should > > likely be the default cache option and any fscache should be built on > > top of it. I was also thinking of augmenting "tight" and "loose" with > > a "temporal" cache that works more like NFS and bounds consistency to > > a particular time quanta. Loose was always a bit of a "hack" for some > > particular use cases and has always been a bit problematic in my mind. > > Or we could add notifications on file changes from server side, because that's > what this is actually about, right? > Yeah, that's always an option, but would be tricky to work out the 9p model for this as model is explicitly RPC so we'd have to post a read for file changes. We had the same discussion for locks and decided to keep it simple for now. I'm not opposed to exploring this, but we'd want to keep it as a invalidate log with a single open posted read -- could use a synthetic or something similar to the Tauth messages to have that. That's gonna go on the end-of-the-backlog for consideration, but happy to review if someone else wants to go after it. > > So, to make sure we are on the same page, was your performance > > uplifts/penalties versus cache=none or versus legacy cache=loose? > > I have not tested cache=none at all, because in the scenario of 9p being a > root fs, you need at least cache=mmap, otherwise you won't even be able to > boot a minimum system. > Yeah, understood -- mmap ~= writeback so the writeback issues would persist there. FWIW, I continue to see no problems with cache=none, but that makes sense as all the changes are in the cache code. Will keep crunching on getting this fixed. > I compared: > > * master(cache=loose) vs. this(cache=loose) > > * master(cache=loose) vs. this(cache=readahead) > > * master(cache=loose) vs. this(cache=writeback) > > > The 10x perf improvement in the patch series was in streaming reads over > > cache=none. > > OK, that's an important information to mention in the first place. Because > when say you measured a performance plus of x times, I would assume you > compared it to at least a somewhat similar setup. I mean cache=loose was > always much faster than cache=none before. > Sorry that I didn't make that more clear. The original motivation for the patch series was the cpu project that Ron and I have been collaborating on and cache==loose was problematic for that use case so we wanted something that approached the performance of cache==loose but with tighter consistency (in particular the ability to actually do read-ahead with open-to-close consistency). As you pointed out though, there was a 5% improvement in loose (probably due to reduction of messages associated with management of the writeback_fid). In any case, the hope is to make cache=mmap (and eventually cache=tight) the default cache mode versus cache=none -- but have to get this stable first. As I said, the dir-cache changes in the WIP patch series are expected to benefit loose a bit more (particularly around the dir-read pain points) and I spotted several cases where loose appears to be re-requesting files it already has in cache -- so there may be more to it. But that being said, I don't expected to get 10x out of those changes (although depends on the types of operations being performed). Will know better when I get further along. -eric