Message ID | 2730511.1687559668@warthog.procyon.org.uk |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp6085717vqr; Fri, 23 Jun 2023 15:57:21 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6NGqKGDL4giN5Jy8ECvO7dvrlMiUjLcMD5pn6B0xebOElrX4QWtD/gs+ZfP4drKRgIneY9 X-Received: by 2002:a17:902:d4d1:b0:1b7:c362:6cdb with SMTP id o17-20020a170902d4d100b001b7c3626cdbmr361544plg.60.1687561041347; Fri, 23 Jun 2023 15:57:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687561041; cv=none; d=google.com; s=arc-20160816; b=Wgfw6UBh1i3TVdLgKOnBBCw1xzufltTGVAsbrGoNXU7lonsaXGxCe/Ux2ETo7+VxXs eQu0y8gUF9FjX7OHSQ9MhQvP9dmrFAlM1b0d41jJZuAIJ4cwylfc3xKX/uA5i3tBBkhl lQNkJS2eznwFqWyPjVFBmTnleL5osZzSjbAsc7QacCBnG2e4A6iMQHIn1H+QLRJne9M4 VV4KYdxb/6ONpWquav/+qWKRTefm1VsWijNeVev0lolpm+pQxaNuFMfW7KwJ5oV91B/i mHN8Q32sKsBb9X9PKhtzs1KPYK2iX/bq39d/bOjTMaftxjL0N3P1TiZeK2tPKc4vDQPd 5y2Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:content-id:mime-version:subject :cc:to:from:organization:dkim-signature; bh=OXHK801MHw6sMHY68DL31vt5Z/u/q42/rUnJs7PRou0=; fh=QAtWHkaW9gJ0moNccuuIfDXcdgxDsLcouCF/DkdZ48Y=; b=QpJzZDX1yP+PaitefoTk5Olj67JBtJWGzM+M66e7QSFg3m3vP5HyPlWylxJdQNMTrV SI2x6KglmldVY84Vp6p7SsnIFBRPJhihHl8BapvDYo8ltvD1VfQeyPLrcCR0dwbsd0cI RPYJATkbpKN/rbX1o50ln8XEiS9cRaRtEH4+9+1NeLoZHBu5N2K+dGg+Xtyz3B4MWDdz d2RgFvVSwtY1cuuQ3pbyPXzTBlKBVnVz8e0nrkYRQNu0Quxl0PkScO1ykkQPRbZjWWBI cjScXeWVQgGrXbIYAMniFWS9iKIJnPFMXX5b5lz0H1apzjdbKLqlZcbWRP699HhTGTaY FCKw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=grZNkrOg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i1-20020a17090332c100b001b53dfb85c4si149573plr.606.2023.06.23.15.57.08; Fri, 23 Jun 2023 15:57:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=grZNkrOg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231949AbjFWWfV (ORCPT <rfc822;maxin.john@gmail.com> + 99 others); Fri, 23 Jun 2023 18:35:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59760 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231667AbjFWWfT (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 23 Jun 2023 18:35:19 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2AB5C193 for <linux-kernel@vger.kernel.org>; Fri, 23 Jun 2023 15:34:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1687559671; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type; bh=OXHK801MHw6sMHY68DL31vt5Z/u/q42/rUnJs7PRou0=; b=grZNkrOg0sNQKjMjqGmdYlBXtp90aedIYYWw/zglNZA13O4uXIajxgcMvocGaGR6dsGNYA 5dkXAQPwZKiserJCXtBChInljRaqz98zByIN3D/ESWg5oTXmC8Xp4n3u/NAeAy8phh/Eym cYkoNqcKlqlemGq1edPCepFP8/sq0lE= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-399-qKtsGPptMGm18tLLs81clg-1; Fri, 23 Jun 2023 18:34:30 -0400 X-MC-Unique: qKtsGPptMGm18tLLs81clg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A39BC1C03D8A; Fri, 23 Jun 2023 22:34:29 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.42.28.4]) by smtp.corp.redhat.com (Postfix) with ESMTP id AE10140C2063; Fri, 23 Jun 2023 22:34:28 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells <dhowells@redhat.com> To: Linus Torvalds <torvalds@linux-foundation.org> cc: dhowells@redhat.com, Franck Grosjean <fgrosjea@redhat.com>, Phil Auld <pauld@redhat.com>, Alexander Viro <viro@zeniv.linux.org.uk>, Christian Brauner <brauner@kernel.org>, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] pipe: Make a partially-satisfied blocking read wait for more MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <2730510.1687559668.1@warthog.procyon.org.uk> Date: Fri, 23 Jun 2023 23:34:28 +0100 Message-ID: <2730511.1687559668@warthog.procyon.org.uk> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769536006071217747?= X-GMAIL-MSGID: =?utf-8?q?1769536006071217747?= |
Series |
pipe: Make a partially-satisfied blocking read wait for more
|
|
Commit Message
David Howells
June 23, 2023, 10:34 p.m. UTC
Hi Linus, Can you consider merging something like the attached patch? Unfortunately, there are applications out there that depend on a read from pipe() waiting until the buffer is full under some circumstances. Patch a28c8b9db8a1 removed the conditionality on there being an attached writer. I'm not sure this is the best solution though as it goes over the other way and will now block reads for which there isn't an active writer - and I'm sure that, somewhere, there's an app that will break on tht. Thanks, David --- pipe: Make a partially-satisfied blocking read wait for more data A read on a pipe may return short after reading some data from a pipe, even though the pipe isn't non-blocking. This is stated in the read(2) manual page: ... It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now (maybe because we were close to end-of-file, or because we are reading from a pipe, or from a terminal)... However, some applications depend on a blocking read on a pipe not returning until it fills the buffer unless it hits EOF or a signal occurs - at least as long as there's an active writer on the other end. Fix the pipe reader to restore this behaviour by only breaking out with a short read in the non-block (and signal) cases. Here's a reproducer for it: #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <sys/uio.h> #define F_GETPIPE_SZ 1032 int main(int argc, char *argv[]) { int fildes[2]; if (pipe(fildes) == -1) { perror("in pipe"); return -1; } printf("%d %d\n", fcntl(fildes[0], F_GETPIPE_SZ), fcntl(fildes[1], F_GETPIPE_SZ)); if (fork() != 0) { void *tata = malloc(100000); int res = read(fildes[0], tata, 100000); printf("could read %d bytes\n", res); return -1; } void *toto = malloc(100000); struct iovec iov; iov.iov_base = toto; iov.iov_len = 100000; int d = writev(fildes[1], &iov, 1); if (d == -1) { perror("in writev"); return -1; } printf("could write %d bytes\n", d); sleep(1); return 0; } It should show the same amount read as written, but shows a short read because the pipe capacity isn't sufficient. Fixes: a28c8b9db8a1 ("pipe: remove 'waiting_writers' merging logic") Reported-by: Franck Grosjean <fgrosjea@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Phil Auld <pauld@redhat.com> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Christian Brauner <brauner@kernel.org> cc: linux-fsdevel@vger.kernel.org --- fs/pipe.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
Comments
On Fri, 23 Jun 2023 at 15:34, David Howells <dhowells@redhat.com> wrote: > > Can you consider merging something like the attached patch? Unfortunately, > there are applications out there that depend on a read from pipe() waiting > until the buffer is full under some circumstances. Patch a28c8b9db8a1 > removed the conditionality on there being an attached writer. This patch seems actively wrong, in that now it's possibly waiting for data that won't come, even if it's nonblocking. What are these alleged broken apps? That commit a28c8b9db8a1 ("pipe: remove 'waiting_writers' merging logic") is 3+ years old, and we haven't heard people complain about it. POSIX guarantees PIPE_BUF data, but that's 4kB. Your made-up test-case is not valid, and never has been. Yes, we used to do that write merging for performance reasons to avoid extra system calls. And yes, we'll maintain semantics if people actually end up having broken apps that depend on them, but I want to know *what* broken app depends on this before I re-instate the write merging. And if we do re-instate it, I'm afraid we will have to do so with that whole "waiting_writers" logic, so that we don't have the "reader waits for data that might not come". Because that patch of yours seems really broken. Nobody has *ever* said "a read() on a pipe will always satisfy the whole buffer". That's just completely bogus. So let's name and shame the code that actually depended on it. And maybe we'll have to revert commit a28c8b9db8a1, but after three+ years of nobody reporting it I'm not really super-convinced. Linus
On Fri, 23 Jun 2023 at 15:41, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > This patch seems actively wrong, in that now it's possibly waiting for > data that won't come, even if it's nonblocking. In fact, I'd expect that patch to fail immediately on a perfectly normal program that passes a token around by doing a small write to a pipe, and have the "token reader" do a bigger write. Blocking on read(), waiting for more data, would be blocking forever. The read already got the token, there isn't going to be anything else. So I'm pretty sure that patch is completely wrong, and whatever program is "fixed" by it is very very buggy. Again - we do have the rule that regressions are regressions even for buggy user space, but when it's been 3+ years and you don't even mention the broken app, I am not impressed. Linus
On Fri, 23 Jun 2023 at 16:08, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > In fact, I'd expect that patch to fail immediately on a perfectly > normal program that passes a token around by doing a small write to a > pipe, and have the "token reader" do a bigger write. Bigger _read_, of course. This might be hidden by such programs typically doing a single byte write and a single byte read, but I could easily imagine situations where people actually depend on the POSIX atomicity guarantees, ie you write a "token packet" that might be variable-sized, and the reader then just does a maximally sized read, knowing that it will get a full packet or nothing. So a read() of a pipe absolutely has to return when it has gotten *any* data. Except if it can know that there is a writer that is still busy and still in the process of writing more data. Which was that old 'pipe->waiting_writers' logic - it basically counted "there are <N> active writers that still have more data to write, but the buffer filled up". That logic went back to ancient times, when our pipe buffer was just a single page - so it helped throughput immensely if we had writers that did big writes, and readers would continue to read even when the small buffer was completely used up (rather than return data just one page at a time for each read() system call). Linus
From: Linus Torvalds > Sent: 24 June 2023 00:32 > > On Fri, 23 Jun 2023 at 16:08, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > In fact, I'd expect that patch to fail immediately on a perfectly > > normal program that passes a token around by doing a small write to a > > pipe, and have the "token reader" do a bigger write. > > Bigger _read_, of course. > > This might be hidden by such programs typically doing a single byte > write and a single byte read, but I could easily imagine situations > where people actually depend on the POSIX atomicity guarantees, ie you > write a "token packet" that might be variable-sized, and the reader > then just does a maximally sized read, knowing that it will get a full > packet or nothing. There are definitely programs that just do a large read in order to consume all the single byte 'wakeup' writes. (The 'must check' on these reads is a right PITA.) They ought to set the pipe non-blocking, but I suspect many don't - because it all works anyway. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
From: Linus Torvalds > Sent: 23 June 2023 23:42 > > On Fri, 23 Jun 2023 at 15:34, David Howells <dhowells@redhat.com> wrote: > > > > Can you consider merging something like the attached patch? Unfortunately, > > there are applications out there that depend on a read from pipe() waiting > > until the buffer is full under some circumstances. Patch a28c8b9db8a1 > > removed the conditionality on there being an attached writer. > > This patch seems actively wrong, in that now it's possibly waiting for > data that won't come, even if it's nonblocking. I think it pretty much breaks: command | tee file where 'command' is careful to fflush(stdout). David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
diff --git a/fs/pipe.c b/fs/pipe.c index 2d88f73f585a..c5c992f19d28 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -340,11 +340,10 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to) if (!pipe->writers) break; - if (ret) - break; if ((filp->f_flags & O_NONBLOCK) || (iocb->ki_flags & IOCB_NOWAIT)) { - ret = -EAGAIN; + if (!ret) + ret = -EAGAIN; break; } __pipe_unlock(pipe);