From patchwork Mon Jun 26 01:12:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Ahelenia_Ziemia=C5=84ska?= X-Patchwork-Id: 112688 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp7182766vqr; Sun, 25 Jun 2023 18:29:27 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ50whLF8hmMMows0hrS5hGhjmZpEH1FovA6HrOlehc9WnEwyLzt9TYwwv3bUEpNlgzpPBcM X-Received: by 2002:a05:6808:6397:b0:39c:767e:bfc6 with SMTP id ec23-20020a056808639700b0039c767ebfc6mr25702160oib.10.1687742967035; Sun, 25 Jun 2023 18:29:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687742967; cv=none; d=google.com; s=arc-20160816; b=Q+v6XPUQ3nCYEGuxNXjHfIIMfK5jCcn1Ab8xks7c2iIRXa5wFUK2PNIPgLQ+U/kRg0 2+Iq7rtX2KBkWoQ+1OOANzfpc/IYo/fvR+3XLKI0CPANzXmpmMuBZGhAnfupQA3NCPJw 9j33+Q8IJrOEyX7GBup2l7g9GcAtsZkMUF2Pjs46LwXBjDR+oXmO3VGIEsu2eHfvG322 oX9ZoWLsMi8l485k/D0RNhoPICv0SfrBsdlWMzUt6Lzc+7/jjQErX3OSulUukeU7o8qN llRFFSc/QhrXK8g+N+RmWF3rPVmzv7khztoS+r9nVYpdFDWuUeWV7w6E7Rg9P/ERty5V iZ1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:content-disposition:mime-version :message-id:subject:to:from:date:dkim-signature; bh=QU1610WbVyZSZ86sWpXs/Om4sV/ealLtN9xqXvEU3kg=; fh=c1hve9LK7bidUltaxotdOOkNtWZ8Drk/KOyPdUWU/OI=; b=R3MJdV0Sq9T5ST8KhVn8u3ZfpARAGfCjVQIQ3Z1oxKIhC17SsxQpCVP5Cb0MVK9grJ IOYyN6CCJW6MxVT2qojoh358IfHqPZbgHGXAXBm0NXu76eBzf/yCi6XC012AR9rZ6G41 Z9KNJxS7KLDMbjegATKseIDCX19ShyTEt8SD8YhG5NtCTcAogLyWcR29uq3tt+Hrl5YW XqREW+crCl8tyrthIe7zEtHZXZLO2mTUlHIs4q3qsW9bSZ5t0MUG2XF6sYX+fGm0woWB HTq5j+nU9miP7iLJwKp1u+IX5Y6NBzW+JT8mJLnmmoLBViOAWmMHNSuTw5Efu/Gnou5H FxAA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@nabijaczleweli.xyz header.s=202305 header.b=iXxfdo8L; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nabijaczleweli.xyz Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ix19-20020a170902f81300b001b80b3106basi382459plb.564.2023.06.25.18.29.13; Sun, 25 Jun 2023 18:29:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@nabijaczleweli.xyz header.s=202305 header.b=iXxfdo8L; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nabijaczleweli.xyz Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230102AbjFZBMQ (ORCPT + 99 others); Sun, 25 Jun 2023 21:12:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229683AbjFZBMO (ORCPT ); Sun, 25 Jun 2023 21:12:14 -0400 Received: from tarta.nabijaczleweli.xyz (unknown [139.28.40.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id F2400194; Sun, 25 Jun 2023 18:12:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nabijaczleweli.xyz; s=202305; t=1687741930; bh=pqnPYGTfh1ZIMZQeiwgOUejdA3LSVPWbpNYvcMysFf8=; h=Date:From:To:Subject:From; b=iXxfdo8LgtiVhBj7vyKKmHRzb/590CG4zTFOGSqiTockvak6aMAkWhkHOhoKXoKcZ lzt13zDH66SgwQUNyi7DZLBK1yDoCGlKaBHxzmIfWmDEHRl2xM9GiSgFWeG25k/m5j 7JbQLes6Kst46omDevVJ9Id7YVj59Q9WeydH6Bi7UYSHyUH7rgFry7AweJg5NS5CIv GlmHDe9UO+C6Q+RNJqSgdr5ZgukGNISPVMwxVzYlsdfbm1tk9+K5U9v0nIvxPe6qjT gv+bpHkoFdWmKdcPXfwFvDrsHAFFEiUEmJ7mwz+E4NTdZougael8cT1FbM6D2rOo9s Kr15xH3VfZ6Tw== Received: from tarta.nabijaczleweli.xyz (unknown [192.168.1.250]) by tarta.nabijaczleweli.xyz (Postfix) with ESMTPSA id 957CA12AE; Mon, 26 Jun 2023 03:12:10 +0200 (CEST) Date: Mon, 26 Jun 2023 03:12:09 +0200 From: Ahelenia =?utf-8?q?Ziemia=C5=84ska?= To: Alexander Viro , Christian Brauner , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Pending splice(file -> FIFO) always blocks read(FIFO), regardless of O_NONBLOCK on read side? Message-ID: MIME-Version: 1.0 Content-Disposition: inline User-Agent: NeoMutt/20230517 X-Spam-Status: No, score=-1.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,PDS_RDNS_DYNAMIC_FP, RDNS_DYNAMIC,SPF_HELO_PASS,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769726769609381065?= X-GMAIL-MSGID: =?utf-8?q?1769726769609381065?= Hi! (starting with get_maintainers.pl fs/splice.c, idk if that's right though) Per fs/splice.c: * The traditional unix read/write is extended with a "splice()" operation * that transfers data buffers to or from a pipe buffer. so I expect splice() to work just about the same as read()/write() (and, to a large extent, it does so). Thus, a refresher on pipe read() semantics (quoting Issue 8 Draft 3; Linux when writing with write()): 60746 When attempting to read from an empty pipe or FIFO: 60747 • If no process has the pipe open for writing, read( ) shall return 0 to indicate end-of-file. 60748 • If some process has the pipe open for writing and O_NONBLOCK is set, read( ) shall return 60749 −1 and set errno to [EAGAIN]. 60750 • If some process has the pipe open for writing and O_NONBLOCK is clear, read( ) shall 60751 block the calling thread until some data is written or the pipe is closed by all processes that 60752 had the pipe open for writing. However, I've observed that this is not the case when splicing from something that sleeps on read to a pipe, and that in that case all readers block, /including/ ones that are reading from fds with O_NONBLOCK set! As an example, consider these two programs: -- >8 -- // wr.c #define _GNU_SOURCE #include #include int main() { while (splice(0, 0, 1, 0, 128 * 1024 * 1024, 0) > 0) ; fprintf(stderr, "wr: %m\n"); } -- >8 -- -- >8 -- // rd.c #define _GNU_SOURCE #include #include #include #include int main() { fcntl(0, F_SETFL, fcntl(0, F_GETFL) | O_NONBLOCK); char buf[64 * 1024] = {}; for (ssize_t rd;;) { #if 1 while ((rd = read(0, buf, sizeof(buf))) == -1 && errno == EINTR) ; #else while ((rd = splice(0, 0, 1, 0, 128 * 1024 * 1024, 0)) == -1 && errno == EINTR) ; #endif fprintf(stderr, "rd=%zd: %m\n", rd); write(1, buf, rd); errno = 0; sleep(1); } } -- >8 -- Thus: -- >8 -- a$ make rd wr a$ mkfifo fifo a$ ./rd < fifo b$ echo qwe > fifo rd=4: Success qwe rd=0: Success rd=0: Success b$ sleep 2 > fifo rd=-1: Resource temporarily unavailable rd=-1: Resource temporarily unavailable rd=0: Success rd=0: Success rd=-1: Resource temporarily unavailable b$ /bin/cat > fifo rd=-1: Resource temporarily unavailable rd=4: Success abc abc rd=-1: Resource temporarily unavailable rd=4: Success def def rd=0: Success ^D rd=0: Success rd=0: Success b$ ./wr > fifo -- >8 -- and nothing. Until you actually type a line (or a few) into teletype b so that the splice completes, at which point so does the read. An even simpler case is -- >8 -- $ ./wr | ./rd abc def rd=8: Success abc def ghi jkl rd=8: Success ghi jkl ^D wr: Success rd=-1: Resource temporarily unavailable rd=0: Success rd=0: Success -- >8 -- splice flags don't do anything. Tested on bookworm (6.1.27-1) and Linus' HEAD (v6.4-rc7-234-g547cc9be86f4). You could say this is a "denial of service", since this is a valid way of following pipes (and, sans SIGIO, the only portable one), and this makes it so the reader is completely blocked, and impervious to all deadly signals (incl. SIGKILL). I've also observed strace hanging (but it responded to SIGKILL). Rudimentary analysis shows that sys_splice() -> __do_splice() -> do_splice() -> {!(ipipe && opipe) -> !(ipipe) -> (ipipe)}: splice_file_to_pipe which then does -- >8 -- pipe_lock(opipe); ret = wait_for_space(opipe, flags); if (!ret) ret = do_splice_to(in, offset, opipe, len, flags); pipe_unlock(opipe); -- >8 -- conversely: -- >8 -- static ssize_t pipe_read(struct kiocb *iocb, struct iov_iter *to) { size_t total_len = iov_iter_count(to); struct file *filp = iocb->ki_filp; struct pipe_inode_info *pipe = filp->private_data; bool was_full, wake_next_reader = false; ssize_t ret; /* Null read succeeds. */ if (unlikely(total_len == 0)) return 0; ret = 0; __pipe_lock(pipe); -- >8 -- so, naturally(?), all non-empty reads wait for the splice to end. To validate this, I've applied the following (which I'm 100% sure is wrong and breaks unrelated stuff): -- >8 -- (naturally this now means it always EAGAINs even if there is data to be read since the splice() loop keeps it locked, but you get the picture). Signed-off-by: Ahelenia Ziemiańska diff --git a/fs/pipe.c b/fs/pipe.c index 2d88f73f585a..a76535839d32 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -240,6 +240,11 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to) if (unlikely(total_len == 0)) return 0; + if ((filp->f_flags & O_NONBLOCK) || (iocb->ki_flags & IOCB_NOWAIT)) { + if (mutex_is_locked(&pipe->mutex)) + return -EAGAIN; + } + ret = 0; __pipe_lock(pipe); -- >8 -- which does make the aforementioned cases less pathological