From patchwork Sun Nov 5 15:56:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 161654 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:8f47:0:b0:403:3b70:6f57 with SMTP id j7csp2193021vqu; Sun, 5 Nov 2023 08:02:43 -0800 (PST) X-Google-Smtp-Source: AGHT+IFqTgGJeoFmZx2QAAvJ8mN1lQW7OnLUc9T0XRJmzOsrvZkG+sQzbTDBbBC+3rC9SksZW0hF X-Received: by 2002:a05:6a20:6a0d:b0:160:a752:59e with SMTP id p13-20020a056a206a0d00b00160a752059emr31897051pzk.40.1699200163385; Sun, 05 Nov 2023 08:02:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699200163; cv=none; d=google.com; s=arc-20160816; b=W+kJV3wV8301dQ/YbIvQN4fVD3AlTxHH/H+oEKggMYvXLzFR4QeXiEgK3YwQzeMPnm ItukJtyUbIKd2VDEl8ME9JqY8K6eW7CeBNdW1nyDg51X7FzofAPt6RDEwObkBF8B9c/Y qQBDbHihLz3w1PTqE9ZMSWzFQs49gQB2AgcMoA7GX7c6NpQCafxXPM5ZzgJ6jK0/vx9w p7lpkzvlqLceE7Uvv8w9garaJjg1Gext+RpzA0Te60j9HB51crYkfmE+SFTHrjWgmtJn apN8l8/nfo+HEBKDIKPeX5WyEz9OPGPCswaRndZ8dik3XSrm7w0pvbyCTbD/cJuYDMFo DaiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id; bh=3RHJ+Ozgs653WNV/+Inf7Ha8qDIU4cdLEW20aKnjiSo=; fh=aBYeIKHEaEgUqOkB4TyzM4Nr+wTssKZX9qr+dhXH1II=; b=DBCAxc7QjWwOHr/0W0De2MlopOfFJWbh8jA4gWv3eRj/2x5H8NY9lGSqX4zH57ZEdh mGhJhpNHXSZQE5VFMoDni8TAqPazKDTMsu0xyBOr6/Ar/EWvj2UROxCJQ/FjElywTQ7p 5TjTs2NKTKp7CmRDd9SStbwtA8PgNguqPGxA2x3jAU7zJDL6MPUTfWzZGXq9nFP69BWg giM0AM+iXby+FVrxon2GpCtD5gH9FYmh6KwtYw2kLn9olkCXxhvlkDqa2DZzbQmM73Ac YwJ5ep7xYO0Gh8xOCdw5Ra4/2LvW0O7VZXeu5D0pA5EUoVip3iHUgl3LEVDqPMKzGkLu N1yA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id c68-20020a633547000000b005b982b4f5a9si6146228pga.429.2023.11.05.08.02.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 05 Nov 2023 08:02:43 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 9CD588092C91; Sun, 5 Nov 2023 08:02:25 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229668AbjKEQBv (ORCPT + 34 others); Sun, 5 Nov 2023 11:01:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46894 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229505AbjKEQBm (ORCPT ); Sun, 5 Nov 2023 11:01:42 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 68E20F2; Sun, 5 Nov 2023 08:01:38 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 02F93C433CA; Sun, 5 Nov 2023 16:01:38 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.97-RC3) (envelope-from ) id 1qzfZD-00000000Cgk-20l0; Sun, 05 Nov 2023 11:01:39 -0500 Message-ID: <20231105160139.346174799@goodmis.org> User-Agent: quilt/0.67 Date: Sun, 05 Nov 2023 10:56:31 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, stable@vger.kernel.org, Cc: Masami Hiramatsu , Mark Rutland , Andrew Morton , Beau Belgrave Subject: [v6.6][PATCH 1/5] tracing: Have trace_event_file have ref counters References: <20231105155630.925114107@goodmis.org> MIME-Version: 1.0 X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Sun, 05 Nov 2023 08:02:25 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781740510637604774 X-GMAIL-MSGID: 1781740510637604774 From: "Steven Rostedt (Google)" commit bb32500fb9b78215e4ef6ee8b4345c5f5d7eafb4 upstream The following can crash the kernel: # cd /sys/kernel/tracing # echo 'p:sched schedule' > kprobe_events # exec 5>>events/kprobes/sched/enable # > kprobe_events # exec 5>&- The above commands: 1. Change directory to the tracefs directory 2. Create a kprobe event (doesn't matter what one) 3. Open bash file descriptor 5 on the enable file of the kprobe event 4. Delete the kprobe event (removes the files too) 5. Close the bash file descriptor 5 The above causes a crash! BUG: kernel NULL pointer dereference, address: 0000000000000028 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 6 PID: 877 Comm: bash Not tainted 6.5.0-rc4-test-00008-g2c6b6b1029d4-dirty #186 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:tracing_release_file_tr+0xc/0x50 What happens here is that the kprobe event creates a trace_event_file "file" descriptor that represents the file in tracefs to the event. It maintains state of the event (is it enabled for the given instance?). Opening the "enable" file gets a reference to the event "file" descriptor via the open file descriptor. When the kprobe event is deleted, the file is also deleted from the tracefs system which also frees the event "file" descriptor. But as the tracefs file is still opened by user space, it will not be totally removed until the final dput() is called on it. But this is not true with the event "file" descriptor that is already freed. If the user does a write to or simply closes the file descriptor it will reference the event "file" descriptor that was just freed, causing a use-after-free bug. To solve this, add a ref count to the event "file" descriptor as well as a new flag called "FREED". The "file" will not be freed until the last reference is released. But the FREE flag will be set when the event is removed to prevent any more modifications to that event from happening, even if there's still a reference to the event "file" descriptor. Link: https://lore.kernel.org/linux-trace-kernel/20231031000031.1e705592@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20231031122453.7a48b923@gandalf.local.home Cc: stable@vger.kernel.org Cc: Mark Rutland Fixes: f5ca233e2e66d ("tracing: Increase trace array ref count on enable and filter files") Reported-by: Beau Belgrave Tested-by: Beau Belgrave Reviewed-by: Masami Hiramatsu (Google) Signed-off-by: Steven Rostedt (Google) --- include/linux/trace_events.h | 4 ++++ kernel/trace/trace.c | 15 +++++++++++++++ kernel/trace/trace.h | 3 +++ kernel/trace/trace_events.c | 31 ++++++++++++++++++++++++++---- kernel/trace/trace_events_filter.c | 3 +++ 5 files changed, 52 insertions(+), 4 deletions(-) diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 21ae37e49319..cf9f0c61796e 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -492,6 +492,7 @@ enum { EVENT_FILE_FL_TRIGGER_COND_BIT, EVENT_FILE_FL_PID_FILTER_BIT, EVENT_FILE_FL_WAS_ENABLED_BIT, + EVENT_FILE_FL_FREED_BIT, }; extern struct trace_event_file *trace_get_event_file(const char *instance, @@ -630,6 +631,7 @@ extern int __kprobe_event_add_fields(struct dynevent_cmd *cmd, ...); * TRIGGER_COND - When set, one or more triggers has an associated filter * PID_FILTER - When set, the event is filtered based on pid * WAS_ENABLED - Set when enabled to know to clear trace on module removal + * FREED - File descriptor is freed, all fields should be considered invalid */ enum { EVENT_FILE_FL_ENABLED = (1 << EVENT_FILE_FL_ENABLED_BIT), @@ -643,6 +645,7 @@ enum { EVENT_FILE_FL_TRIGGER_COND = (1 << EVENT_FILE_FL_TRIGGER_COND_BIT), EVENT_FILE_FL_PID_FILTER = (1 << EVENT_FILE_FL_PID_FILTER_BIT), EVENT_FILE_FL_WAS_ENABLED = (1 << EVENT_FILE_FL_WAS_ENABLED_BIT), + EVENT_FILE_FL_FREED = (1 << EVENT_FILE_FL_FREED_BIT), }; struct trace_event_file { @@ -671,6 +674,7 @@ struct trace_event_file { * caching and such. Which is mostly OK ;-) */ unsigned long flags; + atomic_t ref; /* ref count for opened files */ atomic_t sm_ref; /* soft-mode reference counter */ atomic_t tm_ref; /* trigger-mode reference counter */ }; diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index abaaf516fcae..a40d6baf101f 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4986,6 +4986,20 @@ int tracing_open_file_tr(struct inode *inode, struct file *filp) if (ret) return ret; + mutex_lock(&event_mutex); + + /* Fail if the file is marked for removal */ + if (file->flags & EVENT_FILE_FL_FREED) { + trace_array_put(file->tr); + ret = -ENODEV; + } else { + event_file_get(file); + } + + mutex_unlock(&event_mutex); + if (ret) + return ret; + filp->private_data = inode->i_private; return 0; @@ -4996,6 +5010,7 @@ int tracing_release_file_tr(struct inode *inode, struct file *filp) struct trace_event_file *file = inode->i_private; trace_array_put(file->tr); + event_file_put(file); return 0; } diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 77debe53f07c..d608f6128704 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1664,6 +1664,9 @@ extern void event_trigger_unregister(struct event_command *cmd_ops, char *glob, struct event_trigger_data *trigger_data); +extern void event_file_get(struct trace_event_file *file); +extern void event_file_put(struct trace_event_file *file); + /** * struct event_trigger_ops - callbacks for trace event triggers * diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index f49d6ddb6342..82cb22ad6d61 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -990,13 +990,35 @@ static void remove_subsystem(struct trace_subsystem_dir *dir) } } +void event_file_get(struct trace_event_file *file) +{ + atomic_inc(&file->ref); +} + +void event_file_put(struct trace_event_file *file) +{ + if (WARN_ON_ONCE(!atomic_read(&file->ref))) { + if (file->flags & EVENT_FILE_FL_FREED) + kmem_cache_free(file_cachep, file); + return; + } + + if (atomic_dec_and_test(&file->ref)) { + /* Count should only go to zero when it is freed */ + if (WARN_ON_ONCE(!(file->flags & EVENT_FILE_FL_FREED))) + return; + kmem_cache_free(file_cachep, file); + } +} + static void remove_event_file_dir(struct trace_event_file *file) { eventfs_remove(file->ef); list_del(&file->list); remove_subsystem(file->system); free_event_filter(file->filter); - kmem_cache_free(file_cachep, file); + file->flags |= EVENT_FILE_FL_FREED; + event_file_put(file); } /* @@ -1369,7 +1391,7 @@ event_enable_read(struct file *filp, char __user *ubuf, size_t cnt, flags = file->flags; mutex_unlock(&event_mutex); - if (!file) + if (!file || flags & EVENT_FILE_FL_FREED) return -ENODEV; if (flags & EVENT_FILE_FL_ENABLED && @@ -1407,7 +1429,7 @@ event_enable_write(struct file *filp, const char __user *ubuf, size_t cnt, ret = -ENODEV; mutex_lock(&event_mutex); file = event_file_data(filp); - if (likely(file)) + if (likely(file && !(file->flags & EVENT_FILE_FL_FREED))) ret = ftrace_event_enable_disable(file, val); mutex_unlock(&event_mutex); break; @@ -1681,7 +1703,7 @@ event_filter_read(struct file *filp, char __user *ubuf, size_t cnt, mutex_lock(&event_mutex); file = event_file_data(filp); - if (file) + if (file && !(file->flags & EVENT_FILE_FL_FREED)) print_event_filter(file, s); mutex_unlock(&event_mutex); @@ -2803,6 +2825,7 @@ trace_create_new_event(struct trace_event_call *call, atomic_set(&file->tm_ref, 0); INIT_LIST_HEAD(&file->triggers); list_add(&file->list, &tr->events); + event_file_get(file); return file; } diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c index 33264e510d16..0c611b281a5b 100644 --- a/kernel/trace/trace_events_filter.c +++ b/kernel/trace/trace_events_filter.c @@ -2349,6 +2349,9 @@ int apply_event_filter(struct trace_event_file *file, char *filter_string) struct event_filter *filter = NULL; int err; + if (file->flags & EVENT_FILE_FL_FREED) + return -ENODEV; + if (!strcmp(strstrip(filter_string), "0")) { filter_disable(file); filter = event_filter(file); From patchwork Sun Nov 5 15:56:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 161652 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:8f47:0:b0:403:3b70:6f57 with SMTP id j7csp2192545vqu; Sun, 5 Nov 2023 08:02:11 -0800 (PST) X-Google-Smtp-Source: AGHT+IEcAEE82TvJPcnsj56hu/mCBQJLAk0oOpdh9/xj0CWeGlD4ad5Bs3OyN22DtGImJE5SW8Ts X-Received: by 2002:a05:6602:3c8:b0:7a9:b1c9:4380 with SMTP id g8-20020a05660203c800b007a9b1c94380mr33292619iov.1.1699200131421; Sun, 05 Nov 2023 08:02:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699200131; cv=none; d=google.com; s=arc-20160816; b=NLebQ+1mzaZDS7vN/KzFINNTSNgVUuNfThSVTT5xkJhLZZJnfvWCa7ano81i9ZBq+b hknr+KMRTObRwrVLN8cK/HUANP+xLNZB9rejsJKCxvEGLbP93Hp55p97M80chKCn6Sao f7+gInghATDIw3ys6zHAqZQn9CwP0M8Wtmiee2oDm2PhD4TVilyz4Rs9I1ic0EwTb1Gc TTGZAjj27IOZ5YfLFTlb70k9X0j6Hos+3DgX4JuC6OfFlCy0ZhbGPwFCtc64KOmYW3ri ArkFX1gFvKt1TSzTQZCfiQFyet8sLVxmUENy7ztIdFp5JTvPXvKOI3SiiLcOP6tumj2B /lTA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id; bh=D/8FuqLXvQ6kxvXl9z2YJntTwAsEsmPzTC65coR6OXs=; fh=5fXiwuINd15usHF5af0bVHUWftYpivNmVKkJ+nuVsWg=; b=ft6tAkZGFiIAD7xn3kxGPYk44mPa6voLQu+Xr7sWfsWyy95ed+lMi6B9XJEE8HYE+f jmI1r1vNAJTMaLRiQk8OVAPCRQcAwc9Lvv6AGxKOs7P7TVWurl9d4wvcDcoAKvfJ7Wrk KSeXoaaxS6i/KpSYRtxP7kPotwcO1dwlLLgtKZ9CUEnWNjBeIZtgaYJqzwt6i5NCPAdI gMsBHCaxL7yz+autxYoTqFxrF0KvGv6H0sr56FTGlP5TXqpkv9FWwnGs6y32tffJBWi2 5/Xf37SjKBy5bfcM6U2zghRuA29cgVMgrzqOm6vEOP1WN3kuOFt+yIY6XAx0xlBhcB26 rXqw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id ay26-20020a056638411a00b0043d281279a5si2948839jab.8.2023.11.05.08.02.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 05 Nov 2023 08:02:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 1DFD880AEB08; Sun, 5 Nov 2023 08:01:55 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229599AbjKEQBp (ORCPT + 34 others); Sun, 5 Nov 2023 11:01:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46864 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229379AbjKEQBl (ORCPT ); Sun, 5 Nov 2023 11:01:41 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 526BBE1; Sun, 5 Nov 2023 08:01:38 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 02DECC433C8; Sun, 5 Nov 2023 16:01:38 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.97-RC3) (envelope-from ) id 1qzfZD-00000000ChE-2gI6; Sun, 05 Nov 2023 11:01:39 -0500 Message-ID: <20231105160139.498444992@goodmis.org> User-Agent: quilt/0.67 Date: Sun, 05 Nov 2023 10:56:32 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, stable@vger.kernel.org, Cc: Masami Hiramatsu , Mark Rutland , Andrew Morton , Ajay Kaher Subject: [v6.6][PATCH 2/5] eventfs: Remove "is_freed" union with rcu head References: <20231105155630.925114107@goodmis.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.0 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Sun, 05 Nov 2023 08:01:55 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781740476773277993 X-GMAIL-MSGID: 1781740476773277993 From: "Steven Rostedt (Google)" commit f2f496370afcbc5227d7002da28c74b91fed12ff upstream The eventfs_inode->is_freed was a union with the rcu_head with the assumption that when it was on the srcu list the head would contain a pointer which would make "is_freed" true. But that was a wrong assumption as the rcu head is a single link list where the last element is NULL. Instead, split the nr_entries integer so that "is_freed" is one bit and the nr_entries is the next 31 bits. As there shouldn't be more than 10 (currently there's at most 5 to 7 depending on the config), this should not be a problem. Link: https://lkml.kernel.org/r/20231101172649.049758712@goodmis.org Cc: stable@vger.kernel.org Cc: Mark Rutland Cc: Andrew Morton Cc: Ajay Kaher Fixes: 63940449555e7 ("eventfs: Implement eventfs lookup, read, open functions") Reviewed-by: Masami Hiramatsu (Google) Signed-off-by: Steven Rostedt (Google) --- fs/tracefs/event_inode.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 8c8d64e76103..a64d8fa39e54 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -38,6 +38,7 @@ struct eventfs_inode { * @fop: file_operations for file or directory * @iop: inode_operations for file or directory * @data: something that the caller will want to get to later on + * @is_freed: Flag set if the eventfs is on its way to be freed * @mode: the permission that the file or directory should have */ struct eventfs_file { @@ -52,15 +53,14 @@ struct eventfs_file { * Union - used for deletion * @del_list: list of eventfs_file to delete * @rcu: eventfs_file to delete in RCU - * @is_freed: node is freed if one of the above is set */ union { struct list_head del_list; struct rcu_head rcu; - unsigned long is_freed; }; void *data; - umode_t mode; + unsigned int is_freed:1; + unsigned int mode:31; }; static DEFINE_MUTEX(eventfs_mutex); @@ -814,6 +814,8 @@ static void eventfs_remove_rec(struct eventfs_file *ef, struct list_head *head, } } + ef->is_freed = 1; + list_del_rcu(&ef->list); list_add_tail(&ef->del_list, head); } From patchwork Sun Nov 5 15:56:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 161655 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:8f47:0:b0:403:3b70:6f57 with SMTP id j7csp2193123vqu; Sun, 5 Nov 2023 08:02:53 -0800 (PST) X-Google-Smtp-Source: AGHT+IG26D1IKEcbT1bco9VpYO7NZtL8feCnO7OK16JEt8ekj+iWAnKe5yGoDy9Iq9BnizFVC6xO X-Received: by 2002:a05:6358:52c8:b0:169:a54e:eb25 with SMTP id z8-20020a05635852c800b00169a54eeb25mr18309901rwz.19.1699200173526; Sun, 05 Nov 2023 08:02:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699200173; cv=none; d=google.com; s=arc-20160816; b=OudtxQTOUHaOsqUnqAURbzpLsKmCwp1Dj98D2TR1ivGOVmmo6wfTp3nbV71YyfXNyQ Z+omZZLpZEJ6g8fxUpiQVkcSXduF2QoHL3UOm8Tt3oq9iUBWq0kqM1j08uH1rO7yAn3F AV4RcDge/SMg5BnJPglekdzuWg1C5OCC2CGO71dGFpexN5NqdG3Hp0qiOTQ7wd6fYZkZ HotyA61CJc3ReJ4tQmfeR48H5L+g89Sm80D3kdv9laHRLw4CYKZLYXdR4kKL1LxN5hHa ZTIoZs0skpx6wEXNUoGTpRTruPdTRfOPdE2ClqQ8yEAn3gXs7r28prc6uoob7T1P47n4 fcgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id; bh=zHrnAKqwqjXEnYCZy3ZoDgtoLaHTm17OmiJa7WL49Is=; fh=5fXiwuINd15usHF5af0bVHUWftYpivNmVKkJ+nuVsWg=; b=JHWaIxPp2f0mVKVjCMuDFvb8bxODJ0e09bIEcPEIcxVmrAeQTvQQJauZv65KIvIjUb FzEb5PKGSI1U6PShb8p5KpYK+7nFTc0oI2QWKos4kygqXlg6D1IgkZvi1xWDFoyn6h17 +oTkVCr2UGbCdNX77L17n5WaXRyrdCcdKQ3qYTRyjl3tP00V0fIuvrq5XkOpqf36e2C8 AuEQrKCYwgivtUkjQxXfmSYIGrEDAksgSyBKZNRFLThTsateFly5cefqvWOnwNd+l5eE h+0nY9eY/N9HAtcmZ6fhaRxe1O8ms8/MA6YDp3KtKoMJUyjnBl2qb8NGNjon2mEHlmNC M7AQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id 6-20020a630906000000b005b999968b87si6001359pgj.580.2023.11.05.08.02.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 05 Nov 2023 08:02:53 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 7670880568E7; Sun, 5 Nov 2023 08:02:50 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229468AbjKEQBs (ORCPT + 34 others); Sun, 5 Nov 2023 11:01:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46906 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229515AbjKEQBm (ORCPT ); Sun, 5 Nov 2023 11:01:42 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A7D9CEB; Sun, 5 Nov 2023 08:01:38 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 457F8C433D9; Sun, 5 Nov 2023 16:01:38 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.97-RC3) (envelope-from ) id 1qzfZD-00000000Chi-3Maz; Sun, 05 Nov 2023 11:01:39 -0500 Message-ID: <20231105160139.660634360@goodmis.org> User-Agent: quilt/0.67 Date: Sun, 05 Nov 2023 10:56:33 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, stable@vger.kernel.org, Cc: Masami Hiramatsu , Mark Rutland , Andrew Morton , Ajay Kaher Subject: [v6.6][PATCH 3/5] eventfs: Save ownership and mode References: <20231105155630.925114107@goodmis.org> MIME-Version: 1.0 X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Sun, 05 Nov 2023 08:02:50 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781740520907198262 X-GMAIL-MSGID: 1781740520907198262 From: "Steven Rostedt (Google)" commit 28e12c09f5aa081b2d13d1340e3610070b6c624d upstream Now that inodes and dentries are created on the fly, they are also reclaimed on memory pressure. Since the ownership and file mode are saved in the inode, if they are freed, any changes to the ownership and mode will be lost. To counter this, if the user changes the permissions or ownership, save them, and when creating the inodes again, restore those changes. Link: https://lkml.kernel.org/r/20231101172649.691841445@goodmis.org Cc: stable@vger.kernel.org Cc: Ajay Kaher Cc: Masami Hiramatsu Cc: Mark Rutland Cc: Andrew Morton Fixes: 63940449555e7 ("eventfs: Implement eventfs lookup, read, open functions") Reviewed-by: Masami Hiramatsu (Google) Signed-off-by: Steven Rostedt (Google) --- fs/tracefs/event_inode.c | 107 +++++++++++++++++++++++++++++++++------ 1 file changed, 91 insertions(+), 16 deletions(-) diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index a64d8fa39e54..6a3f7502310c 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -40,6 +40,8 @@ struct eventfs_inode { * @data: something that the caller will want to get to later on * @is_freed: Flag set if the eventfs is on its way to be freed * @mode: the permission that the file or directory should have + * @uid: saved uid if changed + * @gid: saved gid if changed */ struct eventfs_file { const char *name; @@ -61,11 +63,22 @@ struct eventfs_file { void *data; unsigned int is_freed:1; unsigned int mode:31; + kuid_t uid; + kgid_t gid; }; static DEFINE_MUTEX(eventfs_mutex); DEFINE_STATIC_SRCU(eventfs_srcu); +/* Mode is unsigned short, use the upper bits for flags */ +enum { + EVENTFS_SAVE_MODE = BIT(16), + EVENTFS_SAVE_UID = BIT(17), + EVENTFS_SAVE_GID = BIT(18), +}; + +#define EVENTFS_MODE_MASK (EVENTFS_SAVE_MODE - 1) + static struct dentry *eventfs_root_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags); @@ -73,8 +86,54 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file); static int dcache_readdir_wrapper(struct file *file, struct dir_context *ctx); static int eventfs_release(struct inode *inode, struct file *file); +static void update_attr(struct eventfs_file *ef, struct iattr *iattr) +{ + unsigned int ia_valid = iattr->ia_valid; + + if (ia_valid & ATTR_MODE) { + ef->mode = (ef->mode & ~EVENTFS_MODE_MASK) | + (iattr->ia_mode & EVENTFS_MODE_MASK) | + EVENTFS_SAVE_MODE; + } + if (ia_valid & ATTR_UID) { + ef->mode |= EVENTFS_SAVE_UID; + ef->uid = iattr->ia_uid; + } + if (ia_valid & ATTR_GID) { + ef->mode |= EVENTFS_SAVE_GID; + ef->gid = iattr->ia_gid; + } +} + +static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry, + struct iattr *iattr) +{ + struct eventfs_file *ef; + int ret; + + mutex_lock(&eventfs_mutex); + ef = dentry->d_fsdata; + /* The LSB is set when the eventfs_inode is being freed */ + if (((unsigned long)ef & 1UL) || ef->is_freed) { + /* Do not allow changes if the event is about to be removed. */ + mutex_unlock(&eventfs_mutex); + return -ENODEV; + } + + ret = simple_setattr(idmap, dentry, iattr); + if (!ret) + update_attr(ef, iattr); + mutex_unlock(&eventfs_mutex); + return ret; +} + static const struct inode_operations eventfs_root_dir_inode_operations = { .lookup = eventfs_root_lookup, + .setattr = eventfs_set_attr, +}; + +static const struct inode_operations eventfs_file_inode_operations = { + .setattr = eventfs_set_attr, }; static const struct file_operations eventfs_file_operations = { @@ -85,10 +144,20 @@ static const struct file_operations eventfs_file_operations = { .release = eventfs_release, }; +static void update_inode_attr(struct inode *inode, struct eventfs_file *ef) +{ + inode->i_mode = ef->mode & EVENTFS_MODE_MASK; + + if (ef->mode & EVENTFS_SAVE_UID) + inode->i_uid = ef->uid; + + if (ef->mode & EVENTFS_SAVE_GID) + inode->i_gid = ef->gid; +} + /** * create_file - create a file in the tracefs filesystem - * @name: the name of the file to create. - * @mode: the permission that the file should have. + * @ef: the eventfs_file * @parent: parent dentry for this file. * @data: something that the caller will want to get to later on. * @fop: struct file_operations that should be used for this file. @@ -104,7 +173,7 @@ static const struct file_operations eventfs_file_operations = { * If tracefs is not enabled in the kernel, the value -%ENODEV will be * returned. */ -static struct dentry *create_file(const char *name, umode_t mode, +static struct dentry *create_file(struct eventfs_file *ef, struct dentry *parent, void *data, const struct file_operations *fop) { @@ -112,13 +181,13 @@ static struct dentry *create_file(const char *name, umode_t mode, struct dentry *dentry; struct inode *inode; - if (!(mode & S_IFMT)) - mode |= S_IFREG; + if (!(ef->mode & S_IFMT)) + ef->mode |= S_IFREG; - if (WARN_ON_ONCE(!S_ISREG(mode))) + if (WARN_ON_ONCE(!S_ISREG(ef->mode))) return NULL; - dentry = eventfs_start_creating(name, parent); + dentry = eventfs_start_creating(ef->name, parent); if (IS_ERR(dentry)) return dentry; @@ -127,7 +196,10 @@ static struct dentry *create_file(const char *name, umode_t mode, if (unlikely(!inode)) return eventfs_failed_creating(dentry); - inode->i_mode = mode; + /* If the user updated the directory's attributes, use them */ + update_inode_attr(inode, ef); + + inode->i_op = &eventfs_file_inode_operations; inode->i_fop = fop; inode->i_private = data; @@ -140,7 +212,7 @@ static struct dentry *create_file(const char *name, umode_t mode, /** * create_dir - create a dir in the tracefs filesystem - * @name: the name of the file to create. + * @ei: the eventfs_inode that represents the directory to create * @parent: parent dentry for this file. * @data: something that the caller will want to get to later on. * @@ -155,13 +227,14 @@ static struct dentry *create_file(const char *name, umode_t mode, * If tracefs is not enabled in the kernel, the value -%ENODEV will be * returned. */ -static struct dentry *create_dir(const char *name, struct dentry *parent, void *data) +static struct dentry *create_dir(struct eventfs_file *ef, + struct dentry *parent, void *data) { struct tracefs_inode *ti; struct dentry *dentry; struct inode *inode; - dentry = eventfs_start_creating(name, parent); + dentry = eventfs_start_creating(ef->name, parent); if (IS_ERR(dentry)) return dentry; @@ -169,7 +242,8 @@ static struct dentry *create_dir(const char *name, struct dentry *parent, void * if (unlikely(!inode)) return eventfs_failed_creating(dentry); - inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO; + update_inode_attr(inode, ef); + inode->i_op = &eventfs_root_dir_inode_operations; inode->i_fop = &eventfs_file_operations; inode->i_private = data; @@ -306,10 +380,9 @@ create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup) inode_lock(parent->d_inode); if (ef->ei) - dentry = create_dir(ef->name, parent, ef->data); + dentry = create_dir(ef, parent, ef->data); else - dentry = create_file(ef->name, ef->mode, parent, - ef->data, ef->fop); + dentry = create_file(ef, parent, ef->data, ef->fop); if (!lookup) inode_unlock(parent->d_inode); @@ -475,6 +548,7 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file) if (d) { struct dentry **tmp; + tmp = krealloc(dentries, sizeof(d) * (cnt + 2), GFP_KERNEL); if (!tmp) break; @@ -549,13 +623,14 @@ static struct eventfs_file *eventfs_prepare_ef(const char *name, umode_t mode, return ERR_PTR(-ENOMEM); } INIT_LIST_HEAD(&ef->ei->e_top_files); + ef->mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO; } else { ef->ei = NULL; + ef->mode = mode; } ef->iop = iop; ef->fop = fop; - ef->mode = mode; ef->data = data; return ef; } From patchwork Sun Nov 5 15:56:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 161656 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:8f47:0:b0:403:3b70:6f57 with SMTP id j7csp2193147vqu; Sun, 5 Nov 2023 08:02:55 -0800 (PST) X-Google-Smtp-Source: AGHT+IE5hH/MCh14He8c9f/U9tUCx2wJpIw+oBZKJ7wD1ggkhNU4JvtpWxn4CWP3YJck5fyhb/C2 X-Received: by 2002:a4a:df58:0:b0:581:f2de:25f8 with SMTP id j24-20020a4adf58000000b00581f2de25f8mr24664165oou.0.1699200175679; Sun, 05 Nov 2023 08:02:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699200175; cv=none; d=google.com; s=arc-20160816; b=lKOsSL2jjAAEgqKKruvjwA8aoL+x6OvsfedJLXr024qJuRdOvjvntyBzVdZwndzBe+ nsI66tysl3iXsU2gCASNe+87EuPsxWAgleMixTsL9VuIZefLEtG7IW/LIlKXaeHKznU1 QplX16CsK9fA63jqyUt+qOHzUEF2rGDXj0tpoqTMsgKIJFT0D5C2DbTpFqe3OFN1JHSB tec2uYkYGWu4wKVi+O83nlHCeUy0gF/3i3TUfLoGnj7A8RoZ9PjbNfofNRaGCf1AHy+b sV+Lc+i0KaFVaK/LkYhYMOCGpJUKZRBBzqzOdDXzN50Xl69e2kaZbNH8afZgjaJ/l5bZ 25zQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id; bh=/qfgYmNoqcFfOxiNZnt1Q2KtxlwE65KhVVkh22rVT9c=; fh=5fXiwuINd15usHF5af0bVHUWftYpivNmVKkJ+nuVsWg=; b=Py3mGhIbSVUcRlKVzP/J3OY0OLQ4ys5K7E00JxGXBTYwocaMeA/qKUIyL/XvEK2i8O IcIOtuiQ1r4qaIJw1MLUfHre+x87de24s2GLSda1cZwF3dJg2innOi3JFZF7/fTcjSRl Wevn+ByRTwhu8oyNeVeaFb1+5XiFIIxUetwjP6ySlHPz5oM5NyfkN3ZyaFjMA6i4nt7t 4/rKveUC/lAlyb4LArDqVkBxA8TfsilFpl495A71XMtIFTULut0iAfh1ygh+hr60s2Or hAY2fcTG85cphb7bNtR8d8VxwDiplAPymeMo+j0JLhNqYbxrqX2KQNCxnbWs5Q8qzwc2 UIkg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id r6-20020a4a4e06000000b0057b80b61a12si2366626ooa.86.2023.11.05.08.02.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 05 Nov 2023 08:02:55 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id CA0E08056999; Sun, 5 Nov 2023 08:02:52 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229731AbjKEQB6 (ORCPT + 34 others); Sun, 5 Nov 2023 11:01:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46894 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229562AbjKEQBn (ORCPT ); Sun, 5 Nov 2023 11:01:43 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3DBBE100; Sun, 5 Nov 2023 08:01:40 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8866EC433C7; Sun, 5 Nov 2023 16:01:38 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.97-RC3) (envelope-from ) id 1qzfZD-00000000CiC-41xO; Sun, 05 Nov 2023 11:01:39 -0500 Message-ID: <20231105160139.821908367@goodmis.org> User-Agent: quilt/0.67 Date: Sun, 05 Nov 2023 10:56:34 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, stable@vger.kernel.org, Cc: Masami Hiramatsu , Mark Rutland , Andrew Morton , Ajay Kaher Subject: [v6.6][PATCH 4/5] eventfs: Delete eventfs_inode when the last dentry is freed References: <20231105155630.925114107@goodmis.org> MIME-Version: 1.0 X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Sun, 05 Nov 2023 08:02:52 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781740523731889458 X-GMAIL-MSGID: 1781740523731889458 From: "Steven Rostedt (Google)" commit 020010fbfa202aa528a52743eba4ab0da3400a4e upstream There exists a race between holding a reference of an eventfs_inode dentry and the freeing of the eventfs_inode. If user space has a dentry held long enough, it may still be able to access the dentry's eventfs_inode after it has been freed. To prevent this, have he eventfs_inode freed via the last dput() (or via RCU if the eventfs_inode does not have a dentry). This means reintroducing the eventfs_inode del_list field at a temporary place to put the eventfs_inode. It needs to mark it as freed (via the list) but also must invalidate the dentry immediately as the return from eventfs_remove_dir() expects that they are. But the dentry invalidation must not be called under the eventfs_mutex, so it must be done after the eventfs_inode is marked as free (put on a deletion list). Link: https://lkml.kernel.org/r/20231101172650.123479767@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu Cc: Mark Rutland Cc: Andrew Morton Cc: Ajay Kaher Fixes: 5bdcd5f5331a2 ("eventfs: Implement removal of meta data from eventfs") Signed-off-by: Steven Rostedt (Google) --- fs/tracefs/event_inode.c | 150 +++++++++++++++++++-------------------- 1 file changed, 74 insertions(+), 76 deletions(-) diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 6a3f7502310c..7aa92b8ebc51 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -53,10 +53,12 @@ struct eventfs_file { const struct inode_operations *iop; /* * Union - used for deletion + * @llist: for calling dput() if needed after RCU * @del_list: list of eventfs_file to delete * @rcu: eventfs_file to delete in RCU */ union { + struct llist_node llist; struct list_head del_list; struct rcu_head rcu; }; @@ -113,8 +115,7 @@ static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry, mutex_lock(&eventfs_mutex); ef = dentry->d_fsdata; - /* The LSB is set when the eventfs_inode is being freed */ - if (((unsigned long)ef & 1UL) || ef->is_freed) { + if (ef->is_freed) { /* Do not allow changes if the event is about to be removed. */ mutex_unlock(&eventfs_mutex); return -ENODEV; @@ -258,6 +259,13 @@ static struct dentry *create_dir(struct eventfs_file *ef, return eventfs_end_creating(dentry); } +static void free_ef(struct eventfs_file *ef) +{ + kfree(ef->name); + kfree(ef->ei); + kfree(ef); +} + /** * eventfs_set_ef_status_free - set the ef->status to free * @ti: the tracefs_inode of the dentry @@ -270,34 +278,20 @@ void eventfs_set_ef_status_free(struct tracefs_inode *ti, struct dentry *dentry) { struct tracefs_inode *ti_parent; struct eventfs_inode *ei; - struct eventfs_file *ef, *tmp; + struct eventfs_file *ef; /* The top level events directory may be freed by this */ if (unlikely(ti->flags & TRACEFS_EVENT_TOP_INODE)) { - LIST_HEAD(ef_del_list); - mutex_lock(&eventfs_mutex); - ei = ti->private; - /* Record all the top level files */ - list_for_each_entry_srcu(ef, &ei->e_top_files, list, - lockdep_is_held(&eventfs_mutex)) { - list_add_tail(&ef->del_list, &ef_del_list); - } - /* Nothing should access this, but just in case! */ ti->private = NULL; - mutex_unlock(&eventfs_mutex); - /* Now safely free the top level files and their children */ - list_for_each_entry_safe(ef, tmp, &ef_del_list, del_list) { - list_del(&ef->del_list); - eventfs_remove(ef); - } - - kfree(ei); + ef = dentry->d_fsdata; + if (ef) + free_ef(ef); return; } @@ -311,16 +305,13 @@ void eventfs_set_ef_status_free(struct tracefs_inode *ti, struct dentry *dentry) if (!ef) goto out; - /* - * If ef was freed, then the LSB bit is set for d_fsdata. - * But this should not happen, as it should still have a - * ref count that prevents it. Warn in case it does. - */ - if (WARN_ON_ONCE((unsigned long)ef & 1)) - goto out; + if (ef->is_freed) { + free_ef(ef); + } else { + ef->dentry = NULL; + } dentry->d_fsdata = NULL; - ef->dentry = NULL; out: mutex_unlock(&eventfs_mutex); } @@ -847,13 +838,53 @@ int eventfs_add_file(const char *name, umode_t mode, return 0; } -static void free_ef(struct rcu_head *head) +static LLIST_HEAD(free_list); + +static void eventfs_workfn(struct work_struct *work) +{ + struct eventfs_file *ef, *tmp; + struct llist_node *llnode; + + llnode = llist_del_all(&free_list); + llist_for_each_entry_safe(ef, tmp, llnode, llist) { + /* This should only get here if it had a dentry */ + if (!WARN_ON_ONCE(!ef->dentry)) + dput(ef->dentry); + } +} + +static DECLARE_WORK(eventfs_work, eventfs_workfn); + +static void free_rcu_ef(struct rcu_head *head) { struct eventfs_file *ef = container_of(head, struct eventfs_file, rcu); - kfree(ef->name); - kfree(ef->ei); - kfree(ef); + if (ef->dentry) { + /* Do not free the ef until all references of dentry are gone */ + if (llist_add(&ef->llist, &free_list)) + queue_work(system_unbound_wq, &eventfs_work); + return; + } + + free_ef(ef); +} + +static void unhook_dentry(struct dentry *dentry) +{ + if (!dentry) + return; + + /* Keep the dentry from being freed yet (see eventfs_workfn()) */ + dget(dentry); + + dentry->d_fsdata = NULL; + d_invalidate(dentry); + mutex_lock(&eventfs_mutex); + /* dentry should now have at least a single reference */ + WARN_ONCE((int)d_count(dentry) < 1, + "dentry %px (%s) less than one reference (%d) after invalidate\n", + dentry, dentry->d_name.name, d_count(dentry)); + mutex_unlock(&eventfs_mutex); } /** @@ -905,58 +936,25 @@ void eventfs_remove(struct eventfs_file *ef) { struct eventfs_file *tmp; LIST_HEAD(ef_del_list); - struct dentry *dentry_list = NULL; - struct dentry *dentry; if (!ef) return; + /* + * Move the deleted eventfs_inodes onto the ei_del_list + * which will also set the is_freed value. Note, this has to be + * done under the eventfs_mutex, but the deletions of + * the dentries must be done outside the eventfs_mutex. + * Hence moving them to this temporary list. + */ mutex_lock(&eventfs_mutex); eventfs_remove_rec(ef, &ef_del_list, 0); - list_for_each_entry_safe(ef, tmp, &ef_del_list, del_list) { - if (ef->dentry) { - unsigned long ptr = (unsigned long)dentry_list; - - /* Keep the dentry from being freed yet */ - dget(ef->dentry); - - /* - * Paranoid: The dget() above should prevent the dentry - * from being freed and calling eventfs_set_ef_status_free(). - * But just in case, set the link list LSB pointer to 1 - * and have eventfs_set_ef_status_free() check that to - * make sure that if it does happen, it will not think - * the d_fsdata is an event_file. - * - * For this to work, no event_file should be allocated - * on a odd space, as the ef should always be allocated - * to be at least word aligned. Check for that too. - */ - WARN_ON_ONCE(ptr & 1); - - ef->dentry->d_fsdata = (void *)(ptr | 1); - dentry_list = ef->dentry; - ef->dentry = NULL; - } - call_srcu(&eventfs_srcu, &ef->rcu, free_ef); - } mutex_unlock(&eventfs_mutex); - while (dentry_list) { - unsigned long ptr; - - dentry = dentry_list; - ptr = (unsigned long)dentry->d_fsdata & ~1UL; - dentry_list = (struct dentry *)ptr; - dentry->d_fsdata = NULL; - d_invalidate(dentry); - mutex_lock(&eventfs_mutex); - /* dentry should now have at least a single reference */ - WARN_ONCE((int)d_count(dentry) < 1, - "dentry %p less than one reference (%d) after invalidate\n", - dentry, d_count(dentry)); - mutex_unlock(&eventfs_mutex); - dput(dentry); + list_for_each_entry_safe(ef, tmp, &ef_del_list, del_list) { + unhook_dentry(ef->dentry); + list_del(&ef->del_list); + call_srcu(&eventfs_srcu, &ef->rcu, free_rcu_ef); } } From patchwork Sun Nov 5 15:56:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 161653 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:8f47:0:b0:403:3b70:6f57 with SMTP id j7csp2192951vqu; Sun, 5 Nov 2023 08:02:39 -0800 (PST) X-Google-Smtp-Source: AGHT+IEeonYTu2veg+ujmJvPMzgDHMZ5Fh/7MR1aG4mRbM1Z8jF12BYS1H6mAXZ3Dju/ZGCjVjj4 X-Received: by 2002:a05:6e02:3087:b0:359:3ac2:5123 with SMTP id bf7-20020a056e02308700b003593ac25123mr20830485ilb.23.1699200159215; Sun, 05 Nov 2023 08:02:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699200159; cv=none; d=google.com; s=arc-20160816; b=Fa7G483WacxMvRcuqgAeXAU1LpLaKnXOmVUqSdf31qNewP/whbcYySvhzmkYZ2TSSK 73AaWiXZhw+9VLUM8sot3m+66liW97eq3ZTDvJ5JBJKTac8U0/IC5VmLTHQz94J+THU1 R1bWyPYc06qREIpRLFv59jpl4gT+Btm5GZfpNB7dSgPqYVVZThUztOp1skx4kxHe6BnQ gu+8xuC5XG2xkbJyhPPVwN06gdghW0ggazFNczl7R89SNBXjFdtxHIqZTBD6ivPsr5C4 PqD+C7P/OUGLXX9TiH7z+PHXfraPBvp/hMnivdjitRbyfftQMQCovephkRK2TE6WI/lY rBuA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id; bh=YnGcfoArDvJIWUQwwpfWLT211M9R/4UfCgO3lxKoxKo=; fh=H2ddzoSW94J1X7NPeAymXsOW7KW5kImL6WmZcQn1f2g=; b=SJ+h7YPJ1uoU9MobjVyS3J5z/RaKyNfW0ip34hfYSNq2EN0Q61tQgZLzxw2dIjWD6s 4+hwoBmOtGrf2Lo96rsqnsZLE1ObdLI6IEaWa6IPzUraL9fP2h6S5O73t2iOU0VKjn8f G99kYmBpGq9QxvpHZLPnsDW6E2sNhnYzv88c5oKl0B3GT2ss58HM7vbpf1Q/o3tFUEg3 0fXffbrNftXjOHAolirgz1raR5Z/ezht3ZrX/3ExPMfqHVwbzrc7EshmbNVtOJWVcl91 +qNxhzGxo6aCOQhIx2REJ/dYKEQK3+dsGFokTvSQbW7eI2biUifbMrLqQQzpPAg4uU7T tvXg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id k15-20020a056e02156f00b00357ea6dfd86si2609166ilu.11.2023.11.05.08.02.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 05 Nov 2023 08:02:39 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 1B07D805844B; Sun, 5 Nov 2023 08:02:31 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229721AbjKEQB4 (ORCPT + 34 others); Sun, 5 Nov 2023 11:01:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46906 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229537AbjKEQBn (ORCPT ); Sun, 5 Nov 2023 11:01:43 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DFF85FF; Sun, 5 Nov 2023 08:01:38 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8F2DBC43391; Sun, 5 Nov 2023 16:01:38 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.97-RC3) (envelope-from ) id 1qzfZE-00000000Cig-0WIE; Sun, 05 Nov 2023 11:01:40 -0500 Message-ID: <20231105160139.983291500@goodmis.org> User-Agent: quilt/0.67 Date: Sun, 05 Nov 2023 10:56:35 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, stable@vger.kernel.org, Cc: Masami Hiramatsu , Mark Rutland , Andrew Morton , Al Viro Subject: [v6.6][PATCH 5/5] eventfs: Use simple_recursive_removal() to clean up dentries References: <20231105155630.925114107@goodmis.org> MIME-Version: 1.0 X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Sun, 05 Nov 2023 08:02:31 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781740505810731335 X-GMAIL-MSGID: 1781740505810731335 From: "Steven Rostedt (Google)" commit 407c6726ca71b33330d2d6345d9ea7ebc02575e9 upstream Looking at how dentry is removed via the tracefs system, I found that eventfs does not do everything that it did under tracefs. The tracefs removal of a dentry calls simple_recursive_removal() that does a lot more than a simple d_invalidate(). As it should be a requirement that any eventfs_inode that has a dentry, so does its parent. When removing a eventfs_inode, if it has a dentry, a call to simple_recursive_removal() on that dentry should clean up all the dentries underneath it. Add WARN_ON_ONCE() to check for the parent having a dentry if any children do. Link: https://lore.kernel.org/all/20231101022553.GE1957730@ZenIV/ Link: https://lkml.kernel.org/r/20231101172650.552471568@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu Cc: Mark Rutland Cc: Andrew Morton Cc: Al Viro Fixes: 5bdcd5f5331a2 ("eventfs: Implement removal of meta data from eventfs") Signed-off-by: Steven Rostedt (Google) --- fs/tracefs/event_inode.c | 71 +++++++++++++++++++--------------------- 1 file changed, 33 insertions(+), 38 deletions(-) diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 7aa92b8ebc51..5fcfb634fec2 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -54,12 +54,10 @@ struct eventfs_file { /* * Union - used for deletion * @llist: for calling dput() if needed after RCU - * @del_list: list of eventfs_file to delete * @rcu: eventfs_file to delete in RCU */ union { struct llist_node llist; - struct list_head del_list; struct rcu_head rcu; }; void *data; @@ -276,7 +274,6 @@ static void free_ef(struct eventfs_file *ef) */ void eventfs_set_ef_status_free(struct tracefs_inode *ti, struct dentry *dentry) { - struct tracefs_inode *ti_parent; struct eventfs_inode *ei; struct eventfs_file *ef; @@ -297,10 +294,6 @@ void eventfs_set_ef_status_free(struct tracefs_inode *ti, struct dentry *dentry) mutex_lock(&eventfs_mutex); - ti_parent = get_tracefs(dentry->d_parent->d_inode); - if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE)) - goto out; - ef = dentry->d_fsdata; if (!ef) goto out; @@ -873,30 +866,29 @@ static void unhook_dentry(struct dentry *dentry) { if (!dentry) return; - - /* Keep the dentry from being freed yet (see eventfs_workfn()) */ + /* + * Need to add a reference to the dentry that is expected by + * simple_recursive_removal(), which will include a dput(). + */ dget(dentry); - dentry->d_fsdata = NULL; - d_invalidate(dentry); - mutex_lock(&eventfs_mutex); - /* dentry should now have at least a single reference */ - WARN_ONCE((int)d_count(dentry) < 1, - "dentry %px (%s) less than one reference (%d) after invalidate\n", - dentry, dentry->d_name.name, d_count(dentry)); - mutex_unlock(&eventfs_mutex); + /* + * Also add a reference for the dput() in eventfs_workfn(). + * That is required as that dput() will free the ei after + * the SRCU grace period is over. + */ + dget(dentry); } /** * eventfs_remove_rec - remove eventfs dir or file from list * @ef: eventfs_file to be removed. - * @head: to create list of eventfs_file to be deleted * @level: to check recursion depth * * The helper function eventfs_remove_rec() is used to clean up and free the * associated data from eventfs for both of the added functions. */ -static void eventfs_remove_rec(struct eventfs_file *ef, struct list_head *head, int level) +static void eventfs_remove_rec(struct eventfs_file *ef, int level) { struct eventfs_file *ef_child; @@ -916,14 +908,16 @@ static void eventfs_remove_rec(struct eventfs_file *ef, struct list_head *head, /* search for nested folders or files */ list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list, lockdep_is_held(&eventfs_mutex)) { - eventfs_remove_rec(ef_child, head, level + 1); + eventfs_remove_rec(ef_child, level + 1); } } ef->is_freed = 1; + unhook_dentry(ef->dentry); + list_del_rcu(&ef->list); - list_add_tail(&ef->del_list, head); + call_srcu(&eventfs_srcu, &ef->rcu, free_rcu_ef); } /** @@ -934,28 +928,22 @@ static void eventfs_remove_rec(struct eventfs_file *ef, struct list_head *head, */ void eventfs_remove(struct eventfs_file *ef) { - struct eventfs_file *tmp; - LIST_HEAD(ef_del_list); + struct dentry *dentry; if (!ef) return; - /* - * Move the deleted eventfs_inodes onto the ei_del_list - * which will also set the is_freed value. Note, this has to be - * done under the eventfs_mutex, but the deletions of - * the dentries must be done outside the eventfs_mutex. - * Hence moving them to this temporary list. - */ mutex_lock(&eventfs_mutex); - eventfs_remove_rec(ef, &ef_del_list, 0); + dentry = ef->dentry; + eventfs_remove_rec(ef, 0); mutex_unlock(&eventfs_mutex); - list_for_each_entry_safe(ef, tmp, &ef_del_list, del_list) { - unhook_dentry(ef->dentry); - list_del(&ef->del_list); - call_srcu(&eventfs_srcu, &ef->rcu, free_rcu_ef); - } + /* + * If any of the ei children has a dentry, then the ei itself + * must have a dentry. + */ + if (dentry) + simple_recursive_removal(dentry, NULL); } /** @@ -966,6 +954,8 @@ void eventfs_remove(struct eventfs_file *ef) */ void eventfs_remove_events_dir(struct dentry *dentry) { + struct eventfs_file *ef_child; + struct eventfs_inode *ei; struct tracefs_inode *ti; if (!dentry || !dentry->d_inode) @@ -975,6 +965,11 @@ void eventfs_remove_events_dir(struct dentry *dentry) if (!ti || !(ti->flags & TRACEFS_EVENT_INODE)) return; - d_invalidate(dentry); - dput(dentry); + mutex_lock(&eventfs_mutex); + ei = ti->private; + list_for_each_entry_srcu(ef_child, &ei->e_top_files, list, + lockdep_is_held(&eventfs_mutex)) { + eventfs_remove_rec(ef_child, 0); + } + mutex_unlock(&eventfs_mutex); }