Message ID | e770188fd86595c6f39d4da86d906a824f8abca3.1687898895.git.nabijaczleweli@nabijaczleweli.xyz |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp8464350vqr; Tue, 27 Jun 2023 13:54:13 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5hdztkp1btpE9K5dhXQjH9h8Gkp/1/gbEiIcr6T29Kep/uWtDYej6NerWFn+AVtdTlwQDJ X-Received: by 2002:a05:6a20:7d96:b0:12b:f44f:bc98 with SMTP id v22-20020a056a207d9600b0012bf44fbc98mr1196855pzj.21.1687899253398; Tue, 27 Jun 2023 13:54:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687899253; cv=none; d=google.com; s=arc-20160816; b=FGAoeN7EUQF43Bl9aPlgzqMMARzIvoglqtl+mB9hysJIXRRdn3YyPy+0Ium90xaYIf im3X3hh9AgdmtflwJs5fEuxn62Csk6ndxGFydJXNoy/fifmUtqlrNVaLvQBFHlYRchfu Ef59VxDOe7Mgr+Ajf/Qx91ES38PHudoFeuQ+EYLoRkhzijPc7yb1+OcmXM2aKO9L0vG5 HAPBzSgIBbNOxhalqugqx10rJCkPZ88WtteeXdcuFEpzzKnUINmF79cCtYtQOdAmNO2J MZ1LVpJSmm4CwKinq3SixB9R7985vQldDO6E1IPFnX9QYcvFTrsFJIjNifCXWhjOgvLW 3M5w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=P8ekdkgBH92tA9Ly0VzKRksCfLz4wsaYbX9MRQxlGKk=; fh=EwHFucLs4bwDakTQKl/ndDudi5N9htFCanErUntz+qw=; b=CDyiRGW8tCm/G7/TG/LHwDPDTxz0wTspTumbTawvB61e7VmzCjjbOrRGnj9+SN8Tm4 AXgH+jGpdgmQ3US2E+l0mASlwoCAhWNwOJCMErJjBuRem4k9VujXpL/c9Z4n4kvVERH7 +alqL6ntOEcy8IUIoHNZ0/tdM3Q4fCDNpI03VLleMpqiwhzKNpCkAnvEE8OoBw1/ea/p RQunHsV0AnBSGjZCIDEGPoYIV0suUXnqbRnyzgJ9jpPBBvBZvAQ35FpdTcpw2BvSe4B9 dWt+M5R7JukrfnLPDr8QU3GmIQm5GNevSyslm4CLhrfhmzQHe6+tcJ8gI5MpE5ep2+g4 BVAw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@nabijaczleweli.xyz header.s=202305 header.b=c0yE1oTL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nabijaczleweli.xyz Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 201-20020a6301d2000000b005347d133470si8101233pgb.385.2023.06.27.13.54.01; Tue, 27 Jun 2023 13:54:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@nabijaczleweli.xyz header.s=202305 header.b=c0yE1oTL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nabijaczleweli.xyz Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230420AbjF0UvB (ORCPT <rfc822;nicolai.engesland@gmail.com> + 99 others); Tue, 27 Jun 2023 16:51:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53938 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231504AbjF0Uuz (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 27 Jun 2023 16:50:55 -0400 Received: from tarta.nabijaczleweli.xyz (unknown [139.28.40.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 92EA9ED; Tue, 27 Jun 2023 13:50:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nabijaczleweli.xyz; s=202305; t=1687899053; bh=qCdQpza0vGDqNSdxaU8DpurpP2cGaeH8ooUq+QibGG8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=c0yE1oTLMDmySbdvOBrMfwjecFrriPS22m25V4FrQ9pdR/RLc0J93Q3wNpU+YwTMx 5eegks9sgfjtmCD4RzXALcaMHBXL6L+zmZ7YgPOofSKJYravKbeg/2UnKxmzuTfxAK K16wd4fd9eIAoXfm+ZW35dh6NI3E14NjI0yz3y/WS7sb2wVDUIUaHgjMMNGhfveWw1 3vNqt0CseE0IcRJWEdHxiY0IQGuIAe0KahxXq+csSiAvUr9OTWq1TCckCCdxRCqyRA 8v5XvojpVaDT2nn9XR9eybkbzh2+MqZ5sEYH+wOjf4mrVCx31aFDxTGbh51yaZwS+F Rdl6dG4WDtWSw== Received: from tarta.nabijaczleweli.xyz (unknown [192.168.1.250]) by tarta.nabijaczleweli.xyz (Postfix) with ESMTPSA id E0BCA12C6; Tue, 27 Jun 2023 22:50:53 +0200 (CEST) Date: Tue, 27 Jun 2023 22:50:52 +0200 From: Ahelenia =?utf-8?q?Ziemia=C5=84ska?= <nabijaczleweli@nabijaczleweli.xyz> To: Amir Goldstein <amir73il@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk>, Christian Brauner <brauner@kernel.org>, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jan Kara <jack@suse.cz>, Chung-Chiang Cheng <cccheng@synology.com>, ltp@lists.linux.it Subject: [PATCH v4 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success Message-ID: <e770188fd86595c6f39d4da86d906a824f8abca3.1687898895.git.nabijaczleweli@nabijaczleweli.xyz> References: <t5az5bvpfqd3rrwla43437r5vplmkujdytixcxgm7sc4hojspg@jcc63stk66hz> <cover.1687898895.git.nabijaczleweli@nabijaczleweli.xyz> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="okjxcfyce7hzio6h" Content-Disposition: inline In-Reply-To: <cover.1687898895.git.nabijaczleweli@nabijaczleweli.xyz> User-Agent: NeoMutt/20230517 X-Spam-Status: No, score=-1.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,PDS_RDNS_DYNAMIC_FP, RDNS_DYNAMIC,SPF_HELO_PASS,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769890647510803660?= X-GMAIL-MSGID: =?utf-8?q?1769890647510803660?= |
Series |
[v4,1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success
|
|
Commit Message
Ahelenia Ziemiańska
June 27, 2023, 8:50 p.m. UTC
The current behaviour caused an asymmetry where some write APIs (write, sendfile) would notify the written-to/read-from objects, but splice wouldn't. This affected userspace which uses inotify, most notably coreutils tail -f, to monitor pipes. If the pipe buffer had been filled by a splice-family function: * tail wouldn't know and thus wouldn't service the pipe, and * all writes to the pipe would block because it's full, thus service was denied. (For the particular case of tail -f this could be worked around with ---disable-inotify.) Generate modify out before access in to let inotify merge the modify out events in thr ipipe case. Fixes: 983652c69199 ("splice: report related fsnotify events") Link: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u Link: https://bugs.debian.org/1039488 Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Reviewed-by: Amir Goldstein <amir73il@gmail.com> --- fs/splice.c | 26 +++++++++----------------- 1 file changed, 9 insertions(+), 17 deletions(-)
Comments
On Tue, Jun 27, 2023 at 11:50 PM Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> wrote: > > The current behaviour caused an asymmetry where some write APIs > (write, sendfile) would notify the written-to/read-from objects, > but splice wouldn't. > > This affected userspace which uses inotify, most notably coreutils > tail -f, to monitor pipes. > If the pipe buffer had been filled by a splice-family function: > * tail wouldn't know and thus wouldn't service the pipe, and > * all writes to the pipe would block because it's full, > thus service was denied. > (For the particular case of tail -f this could be worked around > with ---disable-inotify.) > Is my understanding of the tail code wrong? My understanding was that tail_forever_inotify() is not called for pipes, or is it being called when tailing a mixed collection of pipes and regular files? If there are subtleties like those you need to mention them , otherwise people will not be able to reproduce the problem that you are describing. I need to warn you about something regarding this patch - often there are colliding interests among different kernel users - fsnotify use cases quite often collide with the interest of users tracking performance regressions and IN_ACCESS/IN_MODIFY on anonymous pipes specifically have been the source of several performance regression reports in the past and have driven optimizations like: 71d734103edf ("fsnotify: Rearrange fast path to minimise overhead when there is no watcher") e43de7f0862b ("fsnotify: optimize the case of no marks of any type") The moral of this story is: even if your patches are accepted by fsnotify reviewers, once they are staged for merging they will be subject to performance regression tests and I can tell you with certainty that performance regression will not be tolerated for the tail -f use case. I will push your v4 patches to a branch in my github, to let the kernel test bots run the performance regressions on it whenever they get to it. Moreover, if coreutils will change tail -f to start setting inotify watches on anonymous pipes (my understanding is that currently does not?), then any tail -f on anonymous pipe can cripple the "no marks on sb" performance optimization for all anonymous pipes and that would be a *very* unfortunate outcome. I think we need to add a rule to fanotify_events_supported() to ban sb/mount marks on SB_KERNMOUNT and backport this fix to LTS kernels (I will look into it) and then we can fine tune the s_fsnotify_connectors optimization in fsnotify_parent() for the SB_KERNMOUNT special case. This may be able to save your patch for the faith of NACKed for performance regression. > Generate modify out before access in to let inotify merge the > modify out events in thr ipipe case. This comment is not clear and does not belong in this context, but it very much belongs near the code in question. Please wait to collect more feedback and specifically to hear what Jan has to say about this hack before posting v5! FYI, we are now in the beginning of the 6.5 "merge window", which means that maintainers may be less responsive for the next two weeks to non-critical patches as this one, which are not targeted for the 6.5 kernel release. Thanks, Amir.
On Wed 28-06-23 09:33:43, Amir Goldstein wrote: > On Tue, Jun 27, 2023 at 11:50 PM Ahelenia Ziemiańska > <nabijaczleweli@nabijaczleweli.xyz> wrote: > > > > The current behaviour caused an asymmetry where some write APIs > > (write, sendfile) would notify the written-to/read-from objects, > > but splice wouldn't. > > > > This affected userspace which uses inotify, most notably coreutils > > tail -f, to monitor pipes. > > If the pipe buffer had been filled by a splice-family function: > > * tail wouldn't know and thus wouldn't service the pipe, and > > * all writes to the pipe would block because it's full, > > thus service was denied. > > (For the particular case of tail -f this could be worked around > > with ---disable-inotify.) > > > > Is my understanding of the tail code wrong? > My understanding was that tail_forever_inotify() is not called for > pipes, or is it being called when tailing a mixed collection of pipes > and regular files? If there are subtleties like those you need to > mention them , otherwise people will not be able to reproduce the > problem that you are describing. Well, on my openSUSE 15.4 at least, tail -f does use inotify on FIFOs and indeed when data is spliced to the FIFO, tail doesn't notice. > I need to warn you about something regarding this patch - > often there are colliding interests among different kernel users - > fsnotify use cases quite often collide with the interest of users tracking > performance regressions and IN_ACCESS/IN_MODIFY on anonymous pipes > specifically have been the source of several performance regression reports > in the past and have driven optimizations like: > > 71d734103edf ("fsnotify: Rearrange fast path to minimise overhead > when there is no watcher") > e43de7f0862b ("fsnotify: optimize the case of no marks of any type") > > The moral of this story is: even if your patches are accepted by fsnotify > reviewers, once they are staged for merging they will be subject to > performance regression tests and I can tell you with certainty that > performance regression will not be tolerated for the tail -f use case. > I will push your v4 patches to a branch in my github, to let the kernel > test bots run the performance regressions on it whenever they get to it. > > Moreover, if coreutils will change tail -f to start setting inotify watches > on anonymous pipes (my understanding is that currently does not?), > then any tail -f on anonymous pipe can cripple the "no marks on sb" > performance optimization for all anonymous pipes and that would be > a *very* unfortunate outcome. Do you mean the "s_fsnotify_connectors" check? Yeah, a fsnotify watch on any pipe inode is going to somewhat slow down the fsnotify calls for any pipe. OTOH I don't expect inotify watches on pipe inodes to be common and it is not like the overhead is huge. Also nobody really prevents you from placing watch on pipe inode now with similar consequences, this patch only makes it actually working with splice. So I'm not worried about the performance impact. At least until somebody comes with a realistic complaint ;-). > I think we need to add a rule to fanotify_events_supported() to ban > sb/mount marks on SB_KERNMOUNT and backport this > fix to LTS kernels (I will look into it) and then we can fine tune > the s_fsnotify_connectors optimization in fsnotify_parent() for > the SB_KERNMOUNT special case. Yeah, probably makes sense. Honza
On Wed, Jun 28, 2023 at 09:33:43AM +0300, Amir Goldstein wrote: > On Tue, Jun 27, 2023 at 11:50 PM Ahelenia Ziemiańska > <nabijaczleweli@nabijaczleweli.xyz> wrote: > > The current behaviour caused an asymmetry where some write APIs > > (write, sendfile) would notify the written-to/read-from objects, > > but splice wouldn't. > > > > This affected userspace which uses inotify, most notably coreutils > > tail -f, to monitor pipes. > > If the pipe buffer had been filled by a splice-family function: > > * tail wouldn't know and thus wouldn't service the pipe, and > > * all writes to the pipe would block because it's full, > > thus service was denied. > > (For the particular case of tail -f this could be worked around > > with ---disable-inotify.) > Is my understanding of the tail code wrong? > My understanding was that tail_forever_inotify() is not called for > pipes, or is it being called when tailing a mixed collection of pipes > and regular files? If there are subtleties like those you need to > mention them , otherwise people will not be able to reproduce the > problem that you are describing. I can't squeak to the code itself, but it's trivial to check: $ tail -f fifo & [1] 3213996 $ echo zupa > fifo zupa $ echo zupa > fifo zupa $ echo zupa > fifo zupa $ cat /bin/tail > fifo # ... $ cat /bin/tail > fifo hangs: the fifo is being watched with inotify. This happens regardless of other files being specified. tail -f doesn't follow FIFOs or pipes if they're fd 0 (guaranteed by POSIX, coreutils conforms). OTOH, you could theoretically do $ cat | tail -f /dev/fd/3 3<&0 which first reads from the pipe until completion (⇔ hangup, cat died), then hangs, because it's waiting for more data on the pipe. This can never happen under a normal scenario, but doing $ echo zupa > /proc/3238590/fd/3 a few times reveals it's using classic 1/s polling (and splicing to /proc/3238590/fd/3 actually yields that data being output from tail). > I need to warn you about something regarding this patch - > often there are colliding interests among different kernel users - > fsnotify use cases quite often collide with the interest of users tracking > performance regressions and IN_ACCESS/IN_MODIFY on anonymous pipes > specifically have been the source of several performance regression reports > in the past and have driven optimizations like: > > 71d734103edf ("fsnotify: Rearrange fast path to minimise overhead > when there is no watcher") > e43de7f0862b ("fsnotify: optimize the case of no marks of any type") > > The moral of this story is: even if your patches are accepted by fsnotify > reviewers, once they are staged for merging they will be subject to > performance regression tests and I can tell you with certainty that > performance regression will not be tolerated for the tail -f use case. > I will push your v4 patches to a branch in my github, to let the kernel > test bots run the performance regressions on it whenever they get to it. > > Moreover, if coreutils will change tail -f to start setting inotify watches > on anonymous pipes (my understanding is that currently does not?), > then any tail -f on anonymous pipe can cripple the "no marks on sb" > performance optimization for all anonymous pipes and that would be > a *very* unfortunate outcome. As seen above, it doesn't set inotify watches on anon pipes, and (since it manages to distinguish "| /dev/fd/3 3<&0" from "fifo", so it must be going further than S_ISFIFO(fstat())) this is an explicit design decision. If you refuse setting inotifies on anon pipes then that likely won't impact any userspace program (it's pathological, and for tail-like cases it'd only be meaningful for magic /proc/$pid/fd/* symlinks), and if it's in the name of performance then no-one'll likely complain, or even notice. > I think we need to add a rule to fanotify_events_supported() to ban > sb/mount marks on SB_KERNMOUNT and backport this > fix to LTS kernels (I will look into it) and then we can fine tune > the s_fsnotify_connectors optimization in fsnotify_parent() for > the SB_KERNMOUNT special case. > This may be able to save your patch for the faith of NACKed > for performance regression. This goes over my head, but if Jan says it makes sense then it must do. > > Generate modify out before access in to let inotify merge the > > modify out events in thr ipipe case. > This comment is not clear and does not belong in this context, > but it very much belongs near the code in question. Turned it into /* * Generate modify out before access in: * do_splice_from() may've already sent modify out, * and this ensures the events get merged. */ for v5.
On Wed, Jun 28, 2023 at 8:09 PM Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> wrote: > > On Wed, Jun 28, 2023 at 09:33:43AM +0300, Amir Goldstein wrote: > > On Tue, Jun 27, 2023 at 11:50 PM Ahelenia Ziemiańska > > <nabijaczleweli@nabijaczleweli.xyz> wrote: > > > The current behaviour caused an asymmetry where some write APIs > > > (write, sendfile) would notify the written-to/read-from objects, > > > but splice wouldn't. > > > > > > This affected userspace which uses inotify, most notably coreutils > > > tail -f, to monitor pipes. > > > If the pipe buffer had been filled by a splice-family function: > > > * tail wouldn't know and thus wouldn't service the pipe, and > > > * all writes to the pipe would block because it's full, > > > thus service was denied. > > > (For the particular case of tail -f this could be worked around > > > with ---disable-inotify.) > > Is my understanding of the tail code wrong? > > My understanding was that tail_forever_inotify() is not called for > > pipes, or is it being called when tailing a mixed collection of pipes > > and regular files? If there are subtleties like those you need to > > mention them , otherwise people will not be able to reproduce the > > problem that you are describing. > I can't squeak to the code itself, but it's trivial to check: > $ tail -f fifo & > [1] 3213996 > $ echo zupa > fifo > zupa > $ echo zupa > fifo > zupa > $ echo zupa > fifo > zupa > $ cat /bin/tail > fifo > # ... > $ cat /bin/tail > fifo > hangs: the fifo is being watched with inotify. > > This happens regardless of other files being specified. > > tail -f doesn't follow FIFOs or pipes if they're fd 0 > (guaranteed by POSIX, coreutils conforms). > > OTOH, you could theoretically do > $ cat | tail -f /dev/fd/3 3<&0 > which first reads from the pipe until completion (⇔ hangup, cat died), > then hangs, because it's waiting for more data on the pipe. > > This can never happen under a normal scenario, but doing > $ echo zupa > /proc/3238590/fd/3 > a few times reveals it's using classic 1/s polling > (and splicing to /proc/3238590/fd/3 actually yields that data being > output from tail). > > > I need to warn you about something regarding this patch - > > often there are colliding interests among different kernel users - > > fsnotify use cases quite often collide with the interest of users tracking > > performance regressions and IN_ACCESS/IN_MODIFY on anonymous pipes > > specifically have been the source of several performance regression reports > > in the past and have driven optimizations like: > > > > 71d734103edf ("fsnotify: Rearrange fast path to minimise overhead > > when there is no watcher") > > e43de7f0862b ("fsnotify: optimize the case of no marks of any type") > > > > The moral of this story is: even if your patches are accepted by fsnotify > > reviewers, once they are staged for merging they will be subject to > > performance regression tests and I can tell you with certainty that > > performance regression will not be tolerated for the tail -f use case. > > I will push your v4 patches to a branch in my github, to let the kernel > > test bots run the performance regressions on it whenever they get to it. > > > > Moreover, if coreutils will change tail -f to start setting inotify watches > > on anonymous pipes (my understanding is that currently does not?), > > then any tail -f on anonymous pipe can cripple the "no marks on sb" > > performance optimization for all anonymous pipes and that would be > > a *very* unfortunate outcome. > As seen above, it doesn't set inotify watches on anon pipes, and > (since it manages to distinguish "| /dev/fd/3 3<&0" from "fifo", > so it must be going further than S_ISFIFO(fstat())) > this is an explicit design decision. > > If you refuse setting inotifies on anon pipes then that likely won't > impact any userspace program (it's pathological, and for tail-like cases > it'd only be meaningful for magic /proc/$pid/fd/* symlinks), > and if it's in the name of performance then no-one'll likely complain, > or even notice. > Unfortunately, it doesn't work this way - most of the time we are not supposed to break existing applications and I have no way of knowing if those applications exist... > > I think we need to add a rule to fanotify_events_supported() to ban > > sb/mount marks on SB_KERNMOUNT and backport this > > fix to LTS kernels (I will look into it) and then we can fine tune > > the s_fsnotify_connectors optimization in fsnotify_parent() for > > the SB_KERNMOUNT special case. > > This may be able to save your patch for the faith of NACKed > > for performance regression. > This goes over my head, but if Jan says it makes sense > then it must do. > Here you go: https://github.com/amir73il/linux/commits/fsnotify_pipe I ended up using SB_NOUSER which is narrower than SB_KERNMOUNT. Care to test? 1) Functionally - that I did not break your tests. 2) Optimization - that when one anon pipe has an inotify watch write to another anon pipe stops at fsnotify_inode_has_watchers() and does not get to fsnotify(). > > > Generate modify out before access in to let inotify merge the > > > modify out events in thr ipipe case. > > This comment is not clear and does not belong in this context, > > but it very much belongs near the code in question. > Turned it into > /* > * Generate modify out before access in: > * do_splice_from() may've already sent modify out, > * and this ensures the events get merged. > */ > for v5. OK. Thanks, Amir.
On Wed, Jun 28, 2023 at 09:38:03PM +0300, Amir Goldstein wrote: > On Wed, Jun 28, 2023 at 8:09 PM Ahelenia Ziemiańska > <nabijaczleweli@nabijaczleweli.xyz> wrote: > > On Wed, Jun 28, 2023 at 09:33:43AM +0300, Amir Goldstein wrote: > > > I think we need to add a rule to fanotify_events_supported() to ban > > > sb/mount marks on SB_KERNMOUNT and backport this > > > fix to LTS kernels (I will look into it) and then we can fine tune > > > the s_fsnotify_connectors optimization in fsnotify_parent() for > > > the SB_KERNMOUNT special case. > > > This may be able to save your patch for the faith of NACKed > > > for performance regression. > > This goes over my head, but if Jan says it makes sense > > then it must do. > Here you go: > https://github.com/amir73il/linux/commits/fsnotify_pipe > > I ended up using SB_NOUSER which is narrower than > SB_KERNMOUNT. > > Care to test? > 1) Functionally - that I did not break your tests. ) | gzip -d > inotify13; chmod +x inotify13; exec ./inotify13 tst_test.c:1560: TINFO: Timeout per run is 0h 00m 30s inotify13.c:260: TINFO: file_to_pipe inotify13.c:269: TPASS: ок inotify13.c:260: TINFO: file_to_pipe inotify13.c:269: TPASS: ок inotify13.c:260: TINFO: splice_pipe_to_file inotify13.c:269: TPASS: ок inotify13.c:260: TINFO: pipe_to_pipe inotify13.c:269: TPASS: ок inotify13.c:260: TINFO: pipe_to_pipe inotify13.c:269: TPASS: ок inotify13.c:260: TINFO: vmsplice_pipe_to_mem inotify13.c:269: TPASS: ок inotify13.c:260: TINFO: vmsplice_mem_to_pipe inotify13.c:269: TPASS: ок Summary: passed 7 failed 0 broken 0 skipped 0 warnings 0 The discrete tests from before also work as expected, both to a fifo and an anon pipe. > 2) Optimization - that when one anon pipe has an inotify watch > write to another anon pipe stops at fsnotify_inode_has_watchers() > and does not get to fsnotify(). Yes, I can confirm this as well: fsnotify_parent() only continues to fsnotify() for the watched pipe; writes to other pipes early-exit. To validate the counterfactual, I reverted "fsnotify: optimize the case of anonymous pipe with no watches" and fsnotify() was being called for each anon pipe write, so long as any anon pipe watches were registered.
On Wed, Jun 28, 2023 at 11:18 PM Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> wrote: > > On Wed, Jun 28, 2023 at 09:38:03PM +0300, Amir Goldstein wrote: > > On Wed, Jun 28, 2023 at 8:09 PM Ahelenia Ziemiańska > > <nabijaczleweli@nabijaczleweli.xyz> wrote: > > > On Wed, Jun 28, 2023 at 09:33:43AM +0300, Amir Goldstein wrote: > > > > I think we need to add a rule to fanotify_events_supported() to ban > > > > sb/mount marks on SB_KERNMOUNT and backport this > > > > fix to LTS kernels (I will look into it) and then we can fine tune > > > > the s_fsnotify_connectors optimization in fsnotify_parent() for > > > > the SB_KERNMOUNT special case. > > > > This may be able to save your patch for the faith of NACKed > > > > for performance regression. > > > This goes over my head, but if Jan says it makes sense > > > then it must do. > > Here you go: > > https://github.com/amir73il/linux/commits/fsnotify_pipe > > > > I ended up using SB_NOUSER which is narrower than > > SB_KERNMOUNT. > > > > Care to test? > > 1) Functionally - that I did not break your tests. > ) | gzip -d > inotify13; chmod +x inotify13; exec ./inotify13 > tst_test.c:1560: TINFO: Timeout per run is 0h 00m 30s > inotify13.c:260: TINFO: file_to_pipe > inotify13.c:269: TPASS: ок > inotify13.c:260: TINFO: file_to_pipe > inotify13.c:269: TPASS: ок > inotify13.c:260: TINFO: splice_pipe_to_file > inotify13.c:269: TPASS: ок > inotify13.c:260: TINFO: pipe_to_pipe > inotify13.c:269: TPASS: ок > inotify13.c:260: TINFO: pipe_to_pipe > inotify13.c:269: TPASS: ок > inotify13.c:260: TINFO: vmsplice_pipe_to_mem > inotify13.c:269: TPASS: ок > inotify13.c:260: TINFO: vmsplice_mem_to_pipe > inotify13.c:269: TPASS: ок > > Summary: > passed 7 > failed 0 > broken 0 > skipped 0 > warnings 0 > > The discrete tests from before also work as expected, > both to a fifo and an anon pipe. > > > 2) Optimization - that when one anon pipe has an inotify watch > > write to another anon pipe stops at fsnotify_inode_has_watchers() > > and does not get to fsnotify(). > Yes, I can confirm this as well: fsnotify_parent() only continues to > fsnotify() for the watched pipe; writes to other pipes early-exit. > > To validate the counterfactual, I reverted "fsnotify: optimize the case > of anonymous pipe with no watches" and fsnotify() was being called > for each anon pipe write, so long as any anon pipe watches were registered. Thank you for testing! As Jan suggested, when you post v5, with my Reviewed-by and Jan's Acked-by, please ask Christian to review and consider taking these patches through the vfs tree for the 6.6 release. Please include a link to your LTP test in the cover letter and a link to my performance optimization patches. Unless the kernel test bots detect a performance regression due to your patches, I am not sure whether or not or when we will apply the optimization patches. Thanks, Amir.
diff --git a/fs/splice.c b/fs/splice.c index 3e06611d19ae..b5c7a5ae0e94 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1154,10 +1154,8 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, if ((in->f_flags | out->f_flags) & O_NONBLOCK) flags |= SPLICE_F_NONBLOCK; - return splice_pipe_to_pipe(ipipe, opipe, len, flags); - } - - if (ipipe) { + ret = splice_pipe_to_pipe(ipipe, opipe, len, flags); + } else if (ipipe) { if (off_in) return -ESPIPE; if (off_out) { @@ -1182,18 +1180,11 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, ret = do_splice_from(ipipe, out, &offset, len, flags); file_end_write(out); - if (ret > 0) - fsnotify_modify(out); - if (!off_out) out->f_pos = offset; else *off_out = offset; - - return ret; - } - - if (opipe) { + } else if (opipe) { if (off_out) return -ESPIPE; if (off_in) { @@ -1209,18 +1200,19 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, ret = splice_file_to_pipe(in, opipe, &offset, len, flags); - if (ret > 0) - fsnotify_access(in); - if (!off_in) in->f_pos = offset; else *off_in = offset; + } else + return -EINVAL; - return ret; + if (ret > 0) { + fsnotify_modify(out); + fsnotify_access(in); } - return -EINVAL; + return ret; } static long __do_splice(struct file *in, loff_t __user *off_in,