From patchwork Sun Nov 5 15:56:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 161654 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:8f47:0:b0:403:3b70:6f57 with SMTP id j7csp2193021vqu; Sun, 5 Nov 2023 08:02:43 -0800 (PST) X-Google-Smtp-Source: AGHT+IFqTgGJeoFmZx2QAAvJ8mN1lQW7OnLUc9T0XRJmzOsrvZkG+sQzbTDBbBC+3rC9SksZW0hF X-Received: by 2002:a05:6a20:6a0d:b0:160:a752:59e with SMTP id p13-20020a056a206a0d00b00160a752059emr31897051pzk.40.1699200163385; Sun, 05 Nov 2023 08:02:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699200163; cv=none; d=google.com; s=arc-20160816; b=W+kJV3wV8301dQ/YbIvQN4fVD3AlTxHH/H+oEKggMYvXLzFR4QeXiEgK3YwQzeMPnm ItukJtyUbIKd2VDEl8ME9JqY8K6eW7CeBNdW1nyDg51X7FzofAPt6RDEwObkBF8B9c/Y qQBDbHihLz3w1PTqE9ZMSWzFQs49gQB2AgcMoA7GX7c6NpQCafxXPM5ZzgJ6jK0/vx9w p7lpkzvlqLceE7Uvv8w9garaJjg1Gext+RpzA0Te60j9HB51crYkfmE+SFTHrjWgmtJn apN8l8/nfo+HEBKDIKPeX5WyEz9OPGPCswaRndZ8dik3XSrm7w0pvbyCTbD/cJuYDMFo DaiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id; bh=3RHJ+Ozgs653WNV/+Inf7Ha8qDIU4cdLEW20aKnjiSo=; fh=aBYeIKHEaEgUqOkB4TyzM4Nr+wTssKZX9qr+dhXH1II=; b=DBCAxc7QjWwOHr/0W0De2MlopOfFJWbh8jA4gWv3eRj/2x5H8NY9lGSqX4zH57ZEdh mGhJhpNHXSZQE5VFMoDni8TAqPazKDTMsu0xyBOr6/Ar/EWvj2UROxCJQ/FjElywTQ7p 5TjTs2NKTKp7CmRDd9SStbwtA8PgNguqPGxA2x3jAU7zJDL6MPUTfWzZGXq9nFP69BWg giM0AM+iXby+FVrxon2GpCtD5gH9FYmh6KwtYw2kLn9olkCXxhvlkDqa2DZzbQmM73Ac YwJ5ep7xYO0Gh8xOCdw5Ra4/2LvW0O7VZXeu5D0pA5EUoVip3iHUgl3LEVDqPMKzGkLu N1yA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id c68-20020a633547000000b005b982b4f5a9si6146228pga.429.2023.11.05.08.02.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 05 Nov 2023 08:02:43 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 9CD588092C91; Sun, 5 Nov 2023 08:02:25 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229668AbjKEQBv (ORCPT + 34 others); Sun, 5 Nov 2023 11:01:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46894 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229505AbjKEQBm (ORCPT ); Sun, 5 Nov 2023 11:01:42 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 68E20F2; Sun, 5 Nov 2023 08:01:38 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 02F93C433CA; Sun, 5 Nov 2023 16:01:38 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.97-RC3) (envelope-from ) id 1qzfZD-00000000Cgk-20l0; Sun, 05 Nov 2023 11:01:39 -0500 Message-ID: <20231105160139.346174799@goodmis.org> User-Agent: quilt/0.67 Date: Sun, 05 Nov 2023 10:56:31 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, stable@vger.kernel.org, Cc: Masami Hiramatsu , Mark Rutland , Andrew Morton , Beau Belgrave Subject: [v6.6][PATCH 1/5] tracing: Have trace_event_file have ref counters References: <20231105155630.925114107@goodmis.org> MIME-Version: 1.0 X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Sun, 05 Nov 2023 08:02:25 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781740510637604774 X-GMAIL-MSGID: 1781740510637604774 From: "Steven Rostedt (Google)" commit bb32500fb9b78215e4ef6ee8b4345c5f5d7eafb4 upstream The following can crash the kernel: # cd /sys/kernel/tracing # echo 'p:sched schedule' > kprobe_events # exec 5>>events/kprobes/sched/enable # > kprobe_events # exec 5>&- The above commands: 1. Change directory to the tracefs directory 2. Create a kprobe event (doesn't matter what one) 3. Open bash file descriptor 5 on the enable file of the kprobe event 4. Delete the kprobe event (removes the files too) 5. Close the bash file descriptor 5 The above causes a crash! BUG: kernel NULL pointer dereference, address: 0000000000000028 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 6 PID: 877 Comm: bash Not tainted 6.5.0-rc4-test-00008-g2c6b6b1029d4-dirty #186 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:tracing_release_file_tr+0xc/0x50 What happens here is that the kprobe event creates a trace_event_file "file" descriptor that represents the file in tracefs to the event. It maintains state of the event (is it enabled for the given instance?). Opening the "enable" file gets a reference to the event "file" descriptor via the open file descriptor. When the kprobe event is deleted, the file is also deleted from the tracefs system which also frees the event "file" descriptor. But as the tracefs file is still opened by user space, it will not be totally removed until the final dput() is called on it. But this is not true with the event "file" descriptor that is already freed. If the user does a write to or simply closes the file descriptor it will reference the event "file" descriptor that was just freed, causing a use-after-free bug. To solve this, add a ref count to the event "file" descriptor as well as a new flag called "FREED". The "file" will not be freed until the last reference is released. But the FREE flag will be set when the event is removed to prevent any more modifications to that event from happening, even if there's still a reference to the event "file" descriptor. Link: https://lore.kernel.org/linux-trace-kernel/20231031000031.1e705592@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20231031122453.7a48b923@gandalf.local.home Cc: stable@vger.kernel.org Cc: Mark Rutland Fixes: f5ca233e2e66d ("tracing: Increase trace array ref count on enable and filter files") Reported-by: Beau Belgrave Tested-by: Beau Belgrave Reviewed-by: Masami Hiramatsu (Google) Signed-off-by: Steven Rostedt (Google) --- include/linux/trace_events.h | 4 ++++ kernel/trace/trace.c | 15 +++++++++++++++ kernel/trace/trace.h | 3 +++ kernel/trace/trace_events.c | 31 ++++++++++++++++++++++++++---- kernel/trace/trace_events_filter.c | 3 +++ 5 files changed, 52 insertions(+), 4 deletions(-) diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 21ae37e49319..cf9f0c61796e 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -492,6 +492,7 @@ enum { EVENT_FILE_FL_TRIGGER_COND_BIT, EVENT_FILE_FL_PID_FILTER_BIT, EVENT_FILE_FL_WAS_ENABLED_BIT, + EVENT_FILE_FL_FREED_BIT, }; extern struct trace_event_file *trace_get_event_file(const char *instance, @@ -630,6 +631,7 @@ extern int __kprobe_event_add_fields(struct dynevent_cmd *cmd, ...); * TRIGGER_COND - When set, one or more triggers has an associated filter * PID_FILTER - When set, the event is filtered based on pid * WAS_ENABLED - Set when enabled to know to clear trace on module removal + * FREED - File descriptor is freed, all fields should be considered invalid */ enum { EVENT_FILE_FL_ENABLED = (1 << EVENT_FILE_FL_ENABLED_BIT), @@ -643,6 +645,7 @@ enum { EVENT_FILE_FL_TRIGGER_COND = (1 << EVENT_FILE_FL_TRIGGER_COND_BIT), EVENT_FILE_FL_PID_FILTER = (1 << EVENT_FILE_FL_PID_FILTER_BIT), EVENT_FILE_FL_WAS_ENABLED = (1 << EVENT_FILE_FL_WAS_ENABLED_BIT), + EVENT_FILE_FL_FREED = (1 << EVENT_FILE_FL_FREED_BIT), }; struct trace_event_file { @@ -671,6 +674,7 @@ struct trace_event_file { * caching and such. Which is mostly OK ;-) */ unsigned long flags; + atomic_t ref; /* ref count for opened files */ atomic_t sm_ref; /* soft-mode reference counter */ atomic_t tm_ref; /* trigger-mode reference counter */ }; diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index abaaf516fcae..a40d6baf101f 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4986,6 +4986,20 @@ int tracing_open_file_tr(struct inode *inode, struct file *filp) if (ret) return ret; + mutex_lock(&event_mutex); + + /* Fail if the file is marked for removal */ + if (file->flags & EVENT_FILE_FL_FREED) { + trace_array_put(file->tr); + ret = -ENODEV; + } else { + event_file_get(file); + } + + mutex_unlock(&event_mutex); + if (ret) + return ret; + filp->private_data = inode->i_private; return 0; @@ -4996,6 +5010,7 @@ int tracing_release_file_tr(struct inode *inode, struct file *filp) struct trace_event_file *file = inode->i_private; trace_array_put(file->tr); + event_file_put(file); return 0; } diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 77debe53f07c..d608f6128704 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1664,6 +1664,9 @@ extern void event_trigger_unregister(struct event_command *cmd_ops, char *glob, struct event_trigger_data *trigger_data); +extern void event_file_get(struct trace_event_file *file); +extern void event_file_put(struct trace_event_file *file); + /** * struct event_trigger_ops - callbacks for trace event triggers * diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index f49d6ddb6342..82cb22ad6d61 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -990,13 +990,35 @@ static void remove_subsystem(struct trace_subsystem_dir *dir) } } +void event_file_get(struct trace_event_file *file) +{ + atomic_inc(&file->ref); +} + +void event_file_put(struct trace_event_file *file) +{ + if (WARN_ON_ONCE(!atomic_read(&file->ref))) { + if (file->flags & EVENT_FILE_FL_FREED) + kmem_cache_free(file_cachep, file); + return; + } + + if (atomic_dec_and_test(&file->ref)) { + /* Count should only go to zero when it is freed */ + if (WARN_ON_ONCE(!(file->flags & EVENT_FILE_FL_FREED))) + return; + kmem_cache_free(file_cachep, file); + } +} + static void remove_event_file_dir(struct trace_event_file *file) { eventfs_remove(file->ef); list_del(&file->list); remove_subsystem(file->system); free_event_filter(file->filter); - kmem_cache_free(file_cachep, file); + file->flags |= EVENT_FILE_FL_FREED; + event_file_put(file); } /* @@ -1369,7 +1391,7 @@ event_enable_read(struct file *filp, char __user *ubuf, size_t cnt, flags = file->flags; mutex_unlock(&event_mutex); - if (!file) + if (!file || flags & EVENT_FILE_FL_FREED) return -ENODEV; if (flags & EVENT_FILE_FL_ENABLED && @@ -1407,7 +1429,7 @@ event_enable_write(struct file *filp, const char __user *ubuf, size_t cnt, ret = -ENODEV; mutex_lock(&event_mutex); file = event_file_data(filp); - if (likely(file)) + if (likely(file && !(file->flags & EVENT_FILE_FL_FREED))) ret = ftrace_event_enable_disable(file, val); mutex_unlock(&event_mutex); break; @@ -1681,7 +1703,7 @@ event_filter_read(struct file *filp, char __user *ubuf, size_t cnt, mutex_lock(&event_mutex); file = event_file_data(filp); - if (file) + if (file && !(file->flags & EVENT_FILE_FL_FREED)) print_event_filter(file, s); mutex_unlock(&event_mutex); @@ -2803,6 +2825,7 @@ trace_create_new_event(struct trace_event_call *call, atomic_set(&file->tm_ref, 0); INIT_LIST_HEAD(&file->triggers); list_add(&file->list, &tr->events); + event_file_get(file); return file; } diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c index 33264e510d16..0c611b281a5b 100644 --- a/kernel/trace/trace_events_filter.c +++ b/kernel/trace/trace_events_filter.c @@ -2349,6 +2349,9 @@ int apply_event_filter(struct trace_event_file *file, char *filter_string) struct event_filter *filter = NULL; int err; + if (file->flags & EVENT_FILE_FL_FREED) + return -ENODEV; + if (!strcmp(strstrip(filter_string), "0")) { filter_disable(file); filter = event_filter(file);