[RERESEND,10/11] splice: file->pipe: -EINVAL for non-regular files w/o FMODE_NOWAIT
Message ID | 25974c79b84c0b3aad566ff7c33b082f90ac5f17e.1697486714.git.nabijaczleweli@nabijaczleweli.xyz |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:3b04:b0:fb:cd0c:d3e with SMTP id c4csp8767873dys; Thu, 14 Dec 2023 10:46:10 -0800 (PST) X-Google-Smtp-Source: AGHT+IHGjGaod/aJ5DoNJN3BKnA5EthUEkawINKOBAEeeMDDJ6tZA0qVRgJ66DsCBTrJgiw4m2cg X-Received: by 2002:a05:6a00:1388:b0:6ce:fa73:a046 with SMTP id t8-20020a056a00138800b006cefa73a046mr11809030pfg.50.1702579569940; Thu, 14 Dec 2023 10:46:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702579569; cv=none; d=google.com; s=arc-20160816; b=GaMgNvhkWaLlX6tIMdELMm+8Ai0ZsA8BCkUgWcn/sukhaPFVyioAJEtDF7vLMuNgWA 95KEEt2WVDDcm97aJsEgaJPTwKfkS/DXYDG5MnAXE4kIQPw11izKgqQ/YO0sye5+KQhf Nb5pqk0JhCCph8y+Nzl167TyVao4lWE72URu4ULNE0dHDJIxVs+qbDV888YXX5AjJBwK +Tk/Pkkc/lCc06PmJygS8TCpOh5nkzWx1xPn00j54suNal0xJx2Huo77p5Z983cSJsPb a3k2Ma8WkutwCANSzrDfKnSbN8FjL7CQVa7hMKnomEX6OQtLBjae6o78pkt3ySVeXpy8 Pu0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:in-reply-to:content-disposition:mime-version :references:user-agent:message-id:subject:cc:from:date :dkim-signature; bh=dOsvCvvko640GyLs3c2ueoX0r+kYs6M9vD3YPA0Zpbo=; fh=F3CYGF5tLt2QRCGz6RnC5dSNkSKVVQOih9/vnGahG3Q=; b=M6LO468CEcwpdORsofRpBrE8An2resszKWNdOLJtc31KFImcYLGTWMrgjIuIUCxUMm GTm5kUm0ddE54wnYLB+F74xrZR3xJIXwMmMCqs59RWhKncm0PIr+izXAiwEy/JYvajIp QdNm7HWMnhtAf5kGesVCUo9gTlZvOGd6xaSGwrgI1UJbdUevW13u309rWEvNln1VOfGu OnjFLSul7qbDjyW6sDEvAxRHx71SSvMbK5k/AJQY7Z0vMS9YJoH49aZmgHLQisYO+vpR PeXdUC6vI+7aoVE2BSUV52rjU+8LIPPAKNo1cFApcWlhYrj8nejmq+JUUm5emZ3UKdrL OKNw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@nabijaczleweli.xyz header.s=202305 header.b=bU9lIeGD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nabijaczleweli.xyz Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id u12-20020a056a00124c00b006cd8454cbb4si11791449pfi.337.2023.12.14.10.46.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 10:46:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; dkim=pass header.i=@nabijaczleweli.xyz header.s=202305 header.b=bU9lIeGD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nabijaczleweli.xyz Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 2C81D8200CB2; Thu, 14 Dec 2023 10:46:07 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1573132AbjLNSpo (ORCPT <rfc822;winker.wchi@gmail.com> + 99 others); Thu, 14 Dec 2023 13:45:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58108 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1573260AbjLNSpR (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 14 Dec 2023 13:45:17 -0500 Received: from tarta.nabijaczleweli.xyz (tarta.nabijaczleweli.xyz [139.28.40.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D3FF610C4; Thu, 14 Dec 2023 10:45:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nabijaczleweli.xyz; s=202305; t=1702579516; bh=nDrwqDx6g17tp76N5xVUDx8sUEDcvzOb97eotkFu3/k=; h=Date:From:Cc:Subject:References:In-Reply-To:From; b=bU9lIeGDIS4sb4eigHq8GkEC2G0QMzGoWlh/3T6TCHg4E7CiLMx6bQhTrB2uPpk8j ssozwnxjkVhZyEpDAVSf+GeMQ2iDxQaNQB9ZJF33EZtkloiW25yAV6e2kfwdpVNQBg rAo1xySnVgDvdqKVSOX/Hx4LKdqcTnlQaI84JLOg6QMzXXlCbUAe6Ced9wsyGUJjqU ybnPvgcZ4j8Qo80Ln49MG0FiNV3kUv2kbj6z9B/91NC+JUFPl13k8ro6EBaBE/uYLt aYiwtqxmm6DHMe24nl6WsxT3vvR/q0x+MAbHsLbLdGqvfOlyB/Lo/b3On3E8BUVJEp V6V+Z3IxNwrbA== Received: from tarta.nabijaczleweli.xyz (unknown [192.168.1.250]) by tarta.nabijaczleweli.xyz (Postfix) with ESMTPSA id 4786313990; Thu, 14 Dec 2023 19:45:16 +0100 (CET) Date: Thu, 14 Dec 2023 19:45:16 +0100 From: Ahelenia =?utf-8?q?Ziemia=C5=84ska?= <nabijaczleweli@nabijaczleweli.xyz> Cc: Jens Axboe <axboe@kernel.dk>, Christian Brauner <brauner@kernel.org>, Alexander Viro <viro@zeniv.linux.org.uk>, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH RERESEND 10/11] splice: file->pipe: -EINVAL for non-regular files w/o FMODE_NOWAIT Message-ID: <25974c79b84c0b3aad566ff7c33b082f90ac5f17e.1697486714.git.nabijaczleweli@nabijaczleweli.xyz> User-Agent: NeoMutt/20231103 References: <2cover.1697486714.git.nabijaczleweli@nabijaczleweli.xyz> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="nahgatr7vznet5uy" Content-Disposition: inline In-Reply-To: <2cover.1697486714.git.nabijaczleweli@nabijaczleweli.xyz> X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email To: unlisted-recipients:; (no To-header on input) Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Thu, 14 Dec 2023 10:46:07 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785284074993824989 X-GMAIL-MSGID: 1785284074993824989 |
Series |
splice(file<>pipe) I/O on file as-if O_NONBLOCK
|
|
Commit Message
Ahelenia Ziemiańska
Dec. 14, 2023, 6:45 p.m. UTC
We request non-blocking I/O in the generic implementation, but some
files ‒ ttys ‒ only check O_NONBLOCK. Refuse them here, lest we
risk sleeping with the pipe locked for indeterminate lengths of
time.
This also masks inconsistent wake-ups (usually every second line)
when splicing from ttys in icanon mode.
Regular files don't /have/ a distinct O_NONBLOCK mode,
because they always behave non-blockingly, and for them FMODE_NOWAIT is
used in the purest sense of
/* File is capable of returning -EAGAIN if I/O will block */
which is not set by the vast majority of filesystems,
and it's not the semantic we want here.
Link: https://lore.kernel.org/linux-fsdevel/qk6hjuam54khlaikf2ssom6custxf5is2ekkaequf4hvode3ls@zgf7j5j4ubvw/t/#u
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
---
fs/splice.c | 2 ++
1 file changed, 2 insertions(+)
Comments
On 12/14/23 11:45 AM, Ahelenia Ziemia?ska wrote: > We request non-blocking I/O in the generic implementation, but some > files ? ttys ? only check O_NONBLOCK. Refuse them here, lest we > risk sleeping with the pipe locked for indeterminate lengths of > time. A worthy goal here is ensuring that _everybody_ honors IOCB_NOWAIT, rather than just rely on O_NONBLOCK. This does involve converting to ->read_iter/->write_iter if the driver isn't already using it, but some of them already have that, yet don't check IOCB_NOWAIT or treat it the same as O_NONBLOCK. Adding special checks like this is not a good idea, imho. > This also masks inconsistent wake-ups (usually every second line) > when splicing from ttys in icanon mode. > > Regular files don't /have/ a distinct O_NONBLOCK mode, > because they always behave non-blockingly, and for them FMODE_NOWAIT is > used in the purest sense of > /* File is capable of returning -EAGAIN if I/O will block */ > which is not set by the vast majority of filesystems, > and it's not the semantic we want here. The main file systems do very much set it, like btrfs, ext4, and xfs. If you look at total_file_systems / ones_flagging_it the ratio may be high, but in terms of installed userbase, the majority definitely will have it. Also see comment on cover letter for addressing this IOCB_NOWAIT confusion.
On Fri, Dec 15, 2023 at 08:47:15AM -0700, Jens Axboe wrote: > On 12/14/23 11:45 AM, Ahelenia Ziemiańska wrote: > > We request non-blocking I/O in the generic implementation, but some > > files ‒ ttys ‒ only check O_NONBLOCK. Refuse them here, lest we > > risk sleeping with the pipe locked for indeterminate lengths of > > time. > A worthy goal here is ensuring that _everybody_ honors IOCB_NOWAIT, > rather than just rely on O_NONBLOCK. This does involve converting to > ->read_iter/->write_iter if the driver isn't already using it, but some > of them already have that, yet don't check IOCB_NOWAIT or treat it the > same as O_NONBLOCK. This doesn't really mean much to me, sorry. > Adding special checks like this is not a good idea, imho. That's what Linus said I should do so that's what I did https://lore.kernel.org/linux-fsdevel/CAHk-=wimmqG_wvSRtMiKPeGGDL816n65u=Mq2+H3-=uM2U6FmA@mail.gmail.com/ I can't fix the tty layer :/ > > This also masks inconsistent wake-ups (usually every second line) > > when splicing from ttys in icanon mode. > > > > Regular files don't /have/ a distinct O_NONBLOCK mode, > > because they always behave non-blockingly, and for them FMODE_NOWAIT is > > used in the purest sense of > > /* File is capable of returning -EAGAIN if I/O will block */ > > which is not set by the vast majority of filesystems, > > and it's not the semantic we want here. > > The main file systems do very much set it, like btrfs, ext4, and xfs. If > you look at total_file_systems / ones_flagging_it the ratio may be high, > but in terms of installed userbase, the majority definitely will have > it. Also see comment on cover letter for addressing this IOCB_NOWAIT > confusion. Reassessing [1] https://lore.kernel.org/linux-fsdevel/5osglsw36dla3mubtpsmdwdid4fsdacplyd6acx2igo4atogdg@yur3idyim3cc/ I see FMODE_NOWAIT in blockdevs /dev/{null,zero,random,urandom} btrfs/ext4/f2fs/ocfs2/xfs eventfd pipes sockets tun/tap which means that vfat/fuse/nfs/tmpfs/ramfs/procfs/sysfs don't. (zfs also doesn't, but that's not for this list.) I don't know if that's actually a "majority" in a meaningful sense, I agree, but I think I primarily committed to this exclusion because tmpfs/ramfs didn't. I s'pose ramfs can already be tagged since it already returns -EAGAIN when I/O would block (never). tmpfs not being spliceable is also questionable. But this'd also mean effectively deleting afs_file_splice_read ceph_splice_read coda_file_splice_read ecryptfs_splice_read_update_atime fuse_dev_splice_read nfs_file_splice_read orangefs_file_splice_read shmem_file_splice_read v9fs_file_splice_read (not to mention the many others (adfs/affs/bfs/bcachefs/cramfs/erofs/fat/hfs*/hostfs/hpfs/jffs2/jfs/minix/nilfs/ntfs/omfs/reiserfs/isofs/sysv/ubifs/udf/ufs/vboxsf/squashfs/romfs) which just use the filemap impl verbatim). There's no point to restricting splice access on a per-filesystem level (which this'd do), since to mount a malicious network filesystem you need to be root. A denial of service attack makes no sense if you're already root. (Maybe except for fuse, which people typically run suid; that I could see potentially making sense to disable..) I have indeed managed to confuse myself into the NOWAIT hole, but this is actually about "not letting unprivileged users escalate into hanging system daemons by writing to a pipe" rather than "if we ever hold the pipe lock for >2µs we die instantly". O_NONBLOCK filtered by FMODE_NOWAIT is used as a semantic proxy for the 10 different types of files anyone can create that we know are safe. Anyone can open a socket and not write to it, so we must refuse to splice from a socket with no data in it. But only root can mount filesystems, so a regular file is always safe. And, actually defining a slightly-heuristic per-file policy in the syscall itself is stupid, you've talked me out of this. This check only actually applies to the generic copy_splice_read() implementation, since the "real"/non-generic splices (fiemap_splice_read/per-filesystem ‒ all the others that this patchset touches) are already known to be safe (and aren't reads so FMODE_NOWAIT doesn't factor in at all). I've dropped this patch and have instead added this to 01/11: diff --git a/fs/splice.c b/fs/splice.c index f8bfc9cf8cdc..6d369d7d56d5 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -331,0 +332,7 @@ ssize_t copy_splice_read(struct file *in, loff_t *ppos, + /* + * This generic implementation is only safe w.r.t. the pipe lock + * if the file actually respects IOCB_NOWAIT, which ttys don't. + */ + if (!(in->f_mode & FMODE_NOWAIT)) + return -EINVAL; (Indeed, in many ways, Linus' post to which I reply in [1] pretty much says this explcitly. Actually he literally says this. I just don't realise and instead of adding the snippet to copy_splice_read(), which he already diffed and talks about, I copied it to the syscall.) Now I just need to re-consider the prose in a way that avoids this deeply embarrassing IOCB_NOWAIT/regular-file nonsense, and this series ends up being just "fixing splice implementations" without also special-casing the syscall itself. Thanks for asking the right questions. Sorry for longposting.
diff --git a/fs/splice.c b/fs/splice.c index 9d29664f23ee..81788bf7daa1 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1300,6 +1300,8 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, } else if (opipe) { if (off_out) return -ESPIPE; + if (!((in->f_mode & FMODE_NOWAIT) || S_ISREG(in->f_inode->i_mode))) + return -EINVAL; if (off_in) { if (!(in->f_mode & FMODE_PREAD)) return -EINVAL;