[for-linus,1/3] eventfs: Have the inodes all for files and directories all be the same
Message ID | 20240117143810.531966508@goodmis.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-29085-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:42cf:b0:101:a8e8:374 with SMTP id q15csp950602dye; Wed, 17 Jan 2024 06:37:41 -0800 (PST) X-Google-Smtp-Source: AGHT+IEn8C2vbmvFEXPD9DIHf4iuoTpV0tHShkHcFRVSfY2O6538K4/3HEvQUnYxlcvqgjF5Cq6f X-Received: by 2002:a17:907:bb87:b0:a2e:98af:ba3e with SMTP id xo7-20020a170907bb8700b00a2e98afba3emr1248010ejc.115.1705502261001; Wed, 17 Jan 2024 06:37:41 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705502260; cv=pass; d=google.com; s=arc-20160816; b=nfzIlvAFK4XxTOGwcHeiT3oqDK4/BpetnFPo2dRwPC37PAh3CsRA7NYf2KdiK4HOss VJFQ14LT2d/HZnK8ubZN8judvvhZVOdP9XjnDlykOEEG9w1zLjORe87xi7WfQXFrgtLU tG0nibOHWNKchkhM6xS18N6ac4h6oyv1Kfdaz4ArPk3cPQ8Z/k82hc8j9UUywWLtjezt pZPOTmGeVkCMMfrHBNRqSy8Y0tGFTt1u8XVDeWyNFjspQiDl5+xHyjSBf2y5+Folc6lj 7xCNPIBHw0+gqiofJEMPYPE0nwIcKB4NclCDrv+NVfIjnnSdTZf5I1Qxc/mwLsbPa0Br ibgQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:subject:cc:to:from:date:user-agent:message-id; bh=I1A9XLc5OJzUlzQ1jciqsKfoXBmPjrac43W3isgcF3c=; fh=Onpp8nv7sOQgdX5WIAYpdraNIVCq2xlr/LuOORw6Jp0=; b=Krs4hU99wh/7zAdgulTbVyw+QCSTSldOhn1uNNlALEg6wEN2ZQ50tiA0g6uyFCmjUJ G3Nrv/Kq540HKn2/c/Ry0bmDO5rvsHUm6Yr9cc1jj+R2PcrKxWNiUsdxjIbgZsIaiZ6I HQ7vdAy+IlZ3c2H5ueWHqbN7ZrLBLmkYq2vrGU7eZ/85IOG8STGFvlSWs/luljaTh8yc mgvJj7tceM5+OW0cDCRObFz3sdK3peOIsUGhfokSqiiJa2iHreLpQcyjmCZOY7AnpC5M rPSloe2xgxOL9jBG89dghhB5gm4p3m8V2D471GxBTPjNB6RVmv83XO5817qRu+GTMUCp WUlg== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of linux-kernel+bounces-29085-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-29085-ouuuleilei=gmail.com@vger.kernel.org" Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id bl7-20020a170906c24700b00a2adc52f10fsi5979468ejb.876.2024.01.17.06.37.40 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jan 2024 06:37:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-29085-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of linux-kernel+bounces-29085-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-29085-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 7007F1F22589 for <ouuuleilei@gmail.com>; Wed, 17 Jan 2024 14:37:40 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 870392136C; Wed, 17 Jan 2024 14:36:55 +0000 (UTC) Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 03CDD20334 for <linux-kernel@vger.kernel.org>; Wed, 17 Jan 2024 14:36:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705502214; cv=none; b=FCnPZXKvLa1KoPAoXkBtdcYlScU5QXRZVMoXJxf85jg9OOM5u34Vdsvkhua1cG/+R0AtIuhV0CrIPZjGDheVBUAmBbB6oZVvUqFV+lOVgfdgg4hs4v7K2RNbQIbUE9tBK7obXKaegC0v+EHUWG4EbyVNc7d7L8svzEfo/YspCoE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705502214; c=relaxed/simple; bh=+l/2J1wuNvwwox8H/9diOMBy2Vp1lI/iE/4NnUV0Ybw=; h=Received:Received:Message-ID:User-Agent:Date:From:To:Cc:Subject: References:MIME-Version:Content-Type; b=FZNMTsUO2lvJrMLYd/9JDY4wuxXJ9/MB8Yfkb217f+wgowVEs/9Q4YMgoA4HKRXR3PnXU8g+/Bg5+m53nx4R3WC1/LgK3LoDJ71FZM/5A3id7/czR6lf+WgRrSWNxjeidDqWxT3yMsy0lCpOip8KsPlVNsxyXQzb89usGYjlxk0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9275BC43399; Wed, 17 Jan 2024 14:36:53 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.97) (envelope-from <rostedt@goodmis.org>) id 1rQ73S-00000001XY4-2pb8; Wed, 17 Jan 2024 09:38:10 -0500 Message-ID: <20240117143810.531966508@goodmis.org> User-Agent: quilt/0.67 Date: Wed, 17 Jan 2024 09:35:49 -0500 From: Steven Rostedt <rostedt@goodmis.org> To: linux-kernel@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org>, Mark Rutland <mark.rutland@arm.com>, Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, Andrew Morton <akpm@linux-foundation.org>, Christian Brauner <brauner@kernel.org>, Al Viro <viro@ZenIV.linux.org.uk>, Ajay Kaher <ajay.kaher@broadcom.com>, Linus Torvalds <torvalds@linux-foundation.org> Subject: [for-linus][PATCH 1/3] eventfs: Have the inodes all for files and directories all be the same References: <20240117143548.595884070@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1788348738711658798 X-GMAIL-MSGID: 1788348738711658798 |
Series |
eventfs: A few more fixes for 6.8
|
|
Commit Message
Steven Rostedt
Jan. 17, 2024, 2:35 p.m. UTC
From: "Steven Rostedt (Google)" <rostedt@goodmis.org> The dentries and inodes are created in the readdir for the sole purpose of getting a consistent inode number. Linus stated that is unnecessary, and that all inodes can have the same inode number. For a virtual file system they are pretty meaningless. Instead use a single unique inode number for all files and one for all directories. Link: https://lore.kernel.org/all/20240116133753.2808d45e@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20240116211353.412180363@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Ajay Kaher <ajay.kaher@broadcom.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> --- fs/tracefs/event_inode.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
Comments
On Mon, 22 Jan 2024 at 08:46, Steven Rostedt <rostedt@goodmis.org> wrote: > > I can add this patch to make sure directory inodes are unique, as it causes > a regression in find, but keep the file inodes the same. Yeah, limiting it to directories will at least somewhat help the address leaking. However, I also note that you never did the "set i_nlink to one" trick, which is the traditional thing to do to tell 'find' that it cannot do its directory optimization thing. I'm not sure that the nlink trick disables this part of the find sanity checks, but the *first* thing to check would be something like this --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -182,6 +182,7 @@ static int tracefs_getattr(struct mnt_idmap *idmap, set_tracefs_inode_owner(inode); generic_fillattr(idmap, request_mask, inode, stat); + stat->nlink = 1; return 0; } because it might just fix the issue. Having nlink == 1 is how non-unix filesystems (like FAT etc) indicate that you can't try to count directory entries to optimize traversal. And it is possible that that is where the whole find thing comes from, but who knows, it could be a generic loop detector that runs independently of the usual link detection. Linus
On 2024-01-22 12:50, Steven Rostedt wrote: > On Mon, 22 Jan 2024 12:14:36 -0500 > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: [...] >> On my 6.1.0 kernel: >> >> find /sys/kernel/tracing | wc -l >> 15598 >> >> (mainly due to TRACE_EVENT ABI files) >> >> Hashing risks: >> >> - Exposing kernel addresses if the hashing algorithm is broken, > > Well this was my biggest concern, but if I truncate at least a nibble, with > the unique salt to the algorithm for each file, how easily does that expose > kernel addresses. > > The ei itself, is created from kmalloc() so you would at best get a heap > address. But with the missing nibble (if I mask it with ((1 << 28) - 1), > and much more taken away for 64 bit systems), and the added unique salt, is > it possible for this to expose anything that could be used in an attack? I don't know, which is why I am concerned about it. But I don't think we should spend time trying to understand all possible attack scenarios associated with hashing of kernel addresses when there are much simpler options available. > >> - Collisions if users are unlucky (which could trigger those >> 'find' errors). >> >> Those 15598 inode values fit within a single page (bitmap of >> 1922 bytes). >> >> So I would recommend simply adding a bitmap per tracefs filesystem >> instance to keep track of inode number allocation. > > And how do I recover this bit after the inode is freed, but then referenced > again? You would keep the allocated inode number value within your data structure associated with the inode. If you never free inodes, then you can just use a static increment as Linus suggested. But AFAIU there are cases where you free inodes, hence my suggestion of bitmap. When the inode is freed, you know which inode number is associated from the field in your data structure, so you can clear this bit in the bitmap. On the next inode allocation, you find-first-zero-bit in the bitmap, and set it to one to reserve it. > >> >> Creation/removal of files/directories in tracefs should not be >> a fast-path anyway, so who cares about the speed of a find first >> bit within a single page ? >> > > When an inode is no longer referenced, it is freed. When it is referenced > again, I want it to be recreated with the same inode number it had > previously. How would having a bitmask help with that? I need a way to map > an ei structure with a unique number without adding another 4 bytes to the > structure itself. As discussed in a separate exchange with Linus, why do you care so much about not adding a 4 bytes field to the structure ? Thanks, Mathieu
On January 22, 2024 10:19:12 AM PST, Linus Torvalds <torvalds@linux-foundation.org> wrote: >On Mon, 22 Jan 2024 at 09:39, Linus Torvalds ><torvalds@linux-foundation.org> wrote: >> >> Actually, why not juist add an inode number to your data structures, >> at least for directories? And just do a static increment on it as they >> get registered? Yeah, this is what I'd suggest too. It avoids all the hash complexity. Is wrap-around realistic for it? -Kees
On Mon, Jan 22, 2024 at 02:44:43PM -0500, Steven Rostedt wrote: > On Mon, 22 Jan 2024 10:19:12 -0800 > Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Mon, 22 Jan 2024 at 09:39, Linus Torvalds > > <torvalds@linux-foundation.org> wrote: > > > > > > Actually, why not juist add an inode number to your data structures, > > > at least for directories? And just do a static increment on it as they > > > get registered? > > > > > > That avoids the whole issue with possibly leaking kernel address data. > > > > The 'nlink = 1' thing doesn't seem to make 'find' any happier for this > > case, sadly. > > > > But the inode number in the 'struct eventfs_inode' looks trivial. And > > doesn't even grow that structure on 64-bit architectures at least, > > because the struct is already 64-bit aligned, and had only one 32-bit > > entry at the end. > > > > On 32-bit architectures the structure size grows, but I'm not sure the > > allocation size grows. Our kmalloc() is quantized at odd numbers. > > > > IOW, this trivial patch seems to be much safer than worrying about > > some pointer exposure. > > I originally wanted to avoid the addition of the 4 bytes, but your comment > about it not making a difference on 64bit due to alignment makes sense. Non-unique inode numbers aren't exactly great for userspace and there are a surprisingly small yet large enough number of tools that trip over this in various ways. So if that can be avoided then great. But luckily no one is probably going to tar up tracefs. ;)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index fdff53d5a1f8..5edf0b96758b 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -32,6 +32,10 @@ */ static DEFINE_MUTEX(eventfs_mutex); +/* Choose something "unique" ;-) */ +#define EVENTFS_FILE_INODE_INO 0x12c4e37 +#define EVENTFS_DIR_INODE_INO 0x134b2f5 + /* * The eventfs_inode (ei) itself is protected by SRCU. It is released from * its parent's list and will have is_freed set (under eventfs_mutex). @@ -352,6 +356,9 @@ static struct dentry *create_file(const char *name, umode_t mode, inode->i_fop = fop; inode->i_private = data; + /* All files will have the same inode number */ + inode->i_ino = EVENTFS_FILE_INODE_INO; + ti = get_tracefs(inode); ti->flags |= TRACEFS_EVENT_INODE; d_instantiate(dentry, inode); @@ -388,6 +395,9 @@ static struct dentry *create_dir(struct eventfs_inode *ei, struct dentry *parent inode->i_op = &eventfs_root_dir_inode_operations; inode->i_fop = &eventfs_file_operations; + /* All directories will have the same inode number */ + inode->i_ino = EVENTFS_DIR_INODE_INO; + ti = get_tracefs(inode); ti->flags |= TRACEFS_EVENT_INODE;