Message ID | 20221116192552.1066630-1-rafaelmendsr@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp23160wrr; Wed, 16 Nov 2022 11:28:24 -0800 (PST) X-Google-Smtp-Source: AA0mqf6Tzp8vOq1P65PQLMRpIMUfG85gy6FiC2CvWQbSHWMEZrbAvNFTaLhLd+MDSSMxbaEQhVDj X-Received: by 2002:a17:902:ce8b:b0:186:a3a0:601c with SMTP id f11-20020a170902ce8b00b00186a3a0601cmr10282988plg.105.1668626904416; Wed, 16 Nov 2022 11:28:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668626904; cv=none; d=google.com; s=arc-20160816; b=yiYyp1xjAopF70nf5ONZJ5/cG7U9OGomOOmkM/eo8ozRAqrGEH/KQMW+NPQ0nZf+8r Fa1+3Dk7q2LoZjYjWiRUSUYIrxksPkybDE0DFDNKxKHP5GGB0d0mDN7ouYR04J6erxVn 9XzjVUc2KpkKa1TlfpIvEEZXsKV/Xzx4e8nKM8AsOCi13zwux6n2WORvOR1kUSh0hIwr eCmHhOhhzeWRp0wT28ltCj7ixtCf2zK41Pu8Yf+HaRA2C4jt7/Ian+pellvdWRsJy1R5 8RK81VtQbNqQRKNHldjkuvofC9Ixkt7sEegOqg5a1Qr8siaQNYLp6baSzLIfXdP9iM38 6X+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=N0wiStqSoT+QOxZQv7R/HenLizI45o2PTA9zLLh/ev8=; b=K+l6eBpaXG9fNPrvfngdek4SYptyyj1OCchdAHDKG8GXARNDvSSQggvMNF1tiSHDro jGTigDN93Troj5Yb60YR6Ts3BCBHkBBjIZMNVh+22tPOmcLBKpkoVIOjd9PqMaItVIOz 8DAaUAxPfR6CGYVJy9h5mJmUoLT38XYfjy+E4xCIJqLlCMmJ7hPh1F65sdKDBj78k01t zCNdH+P6NbrTfe73VIuuSmpdY8Q8RlJskIxMjHozJQV4xo/BBH4gBmE9w2/zRaTYmqTA GUqq0mr0dtB4ZyfvQ/93N4cehBKBA15Lx4hQViNKuDGgugQ6pf4JuHgfP+BaW9kiA5wV FEIQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=gGbUWPpA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d10-20020a63734a000000b0047018d5b375si16041492pgn.561.2022.11.16.11.28.10; Wed, 16 Nov 2022 11:28:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=gGbUWPpA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234141AbiKPT0M (ORCPT <rfc822;just.gull.subs@gmail.com> + 99 others); Wed, 16 Nov 2022 14:26:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42524 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232126AbiKPT0K (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 16 Nov 2022 14:26:10 -0500 Received: from mail-oa1-x34.google.com (mail-oa1-x34.google.com [IPv6:2001:4860:4864:20::34]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 294E42E9C3; Wed, 16 Nov 2022 11:26:09 -0800 (PST) Received: by mail-oa1-x34.google.com with SMTP id 586e51a60fabf-141ca09c2fbso11679606fac.6; Wed, 16 Nov 2022 11:26:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=N0wiStqSoT+QOxZQv7R/HenLizI45o2PTA9zLLh/ev8=; b=gGbUWPpAiVSwFwgF33WUTuRMAtG+O8HEFi2Dv7Cx6iUghNNBAhk/JvuPhj92J37CqX jQF/Japrf7u5hsfz+DO6SpWbAOYRgkMYxDft5ItE/0bzc53bicX1KLW04oRYyP2sKk7b rVPGpbYnRAyrJ8vpwFPwCFfr9GlhQT/CDdiGh2rUaRT+OD2F90koTCObzxunoz7oaXjZ IMIgXDPU9gBnB2Tqs3sUr8d2MsDS0cIj4UYuaBn3duyHkP3I+rXwt+S0dT+BHDFwXjH+ nuMH0YDWIhiz0947sJZBt4pPVFQCMOpLpmK39FETeWqp/3wse1AZh03CKFqShlWjIHl6 BKqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=N0wiStqSoT+QOxZQv7R/HenLizI45o2PTA9zLLh/ev8=; b=E74QYsnIkwgeD8ibnF9vn54wejLQB6t8LFmeuZWeXFh3S7umbtP8s4FRxtFusQNLNj tzTblgG7ywFe7jNk60IFdXHUoHjf+zmAM6i0AzAjJrDY5wBFWvZcbZb0DLzGcbbAaRTJ 0hoMT752GSolNu7oLH9jyaLAHa0yLxKjGvauEGr3hzoBtbwK+htawLVVYV9mydAOuvPm /1bU26UHQxPZ/r9bRYC4GTBBKBnwmZmBcvTwQI8A49EhRr9MHWy6SaKqL/okI0SvRD5j i7oGMiFISoqnQZ/E60heCJT6ITWi5Lwh+jglUfLDxpZ+uPJOAEcmE3NbU28/iyM3W2iJ orGw== X-Gm-Message-State: ANoB5pkKYinZeRUioNCYKpBU9kX8PEubkEi8s6AvjuAtfZsuSewQGpvy wJEmzYFlxXZkgOL+vM1Y9jU= X-Received: by 2002:a05:6870:5b96:b0:132:2bef:f802 with SMTP id em22-20020a0568705b9600b001322beff802mr2715192oab.249.1668626767405; Wed, 16 Nov 2022 11:26:07 -0800 (PST) Received: from macondo.. ([2804:431:e7cc:5fdd:4ced:7530:d9f1:360b]) by smtp.gmail.com with ESMTPSA id o39-20020a056870912700b0012d939eb0bfsm8414531oae.34.2022.11.16.11.26.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Nov 2022 11:26:06 -0800 (PST) From: Rafael Mendonca <rafaelmendsr@gmail.com> To: Steven Rostedt <rostedt@goodmis.org>, Masami Hiramatsu <mhiramat@kernel.org>, "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com> Cc: Rafael Mendonca <rafaelmendsr@gmail.com>, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH] tracing/eprobe: Update cond flag before enabling trigger Date: Wed, 16 Nov 2022 16:25:51 -0300 Message-Id: <20221116192552.1066630-1-rafaelmendsr@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749682125047047538?= X-GMAIL-MSGID: =?utf-8?q?1749682125047047538?= |
Series |
tracing/eprobe: Update cond flag before enabling trigger
|
|
Commit Message
Rafael Mendonca
Nov. 16, 2022, 7:25 p.m. UTC
The eprobe's trigger, for the event it is attached to, needs access to the
trace record in order to perform its function. This need is indicated by
the TRIGGER_COND flag and properly set by the update_cond_flag() function.
Currently, the cond flag is set after calling
trace_event_trigger_enable_disable(), which leaves a small time gap in
which the eprobe's trigger could be invoked without a trace record, causing
a NULL pointer dereference issue.
This issue can be easily reproduced by enabling an event probe for
kmem/mm_page_alloc using any of its fields as such:
root@localhost:/sys/kernel/tracing# echo 'e kmem/mm_page_alloc $gfp_flags' > dynamic_events
root@localhost:/sys/kernel/tracing# echo 1 > events/eprobes/enable
[ 30.847000] general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN PTI
[ 30.848618] KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
[ 30.849498] CPU: 0 PID: 223 Comm: bash Not tainted 6.1.0-rc5+ #98
[...] stripped
[ 30.850623] RIP: 0010:get_event_field.isra.0+0x1b8/0x3f0
[...] stripped
[ 30.856173] Call Trace:
[ 30.856340] <TASK>
[ 30.856489] process_fetch_insn+0x1289/0x19b0
[ 30.856789] ? write_comp_data+0x2f/0x90
[ 30.857061] ? __sanitizer_cov_trace_pc+0x25/0x60
[ 30.857379] ? __trace_eprobe_create+0x1d30/0x1d30
[ 30.857701] ? __sanitizer_cov_trace_pc+0x25/0x60
[ 30.858019] ? write_comp_data+0x2f/0x90
[ 30.858289] ? write_comp_data+0x2f/0x90
[ 30.858562] eprobe_trigger_func+0x1253/0x2180
[ 30.858867] ? kasan_poison.part.0+0x3c/0x50
[ 30.859165] ? process_fetch_insn+0x19b0/0x19b0
[ 30.859473] ? get_page_from_freelist+0x10c1/0x29c0
[ 30.859806] ? rcu_read_lock_bh_held+0xd0/0xd0
[ 30.860114] ? write_comp_data+0x2f/0x90
[ 30.860387] event_triggers_call+0x293/0x410
[ 30.860683] __trace_trigger_soft_disabled+0xbb/0xe0
[ 30.861019] trace_event_raw_event_mm_page_alloc+0xb5/0xc0
[ 30.861389] __alloc_pages+0x4cd/0x760
[ 30.861651] ? __alloc_pages_slowpath.constprop.0+0x2480/0x2480
[ 30.862047] ? tracepoint_probe_register_prio_may_exist+0x100/0x100
[ 30.862464] ? __sanitizer_cov_trace_pc+0x25/0x60
[ 30.862780] ? write_comp_data+0x2f/0x90
[ 30.863055] trace_buffered_event_enable+0xc6/0x440
[ 30.863381] __ftrace_event_enable_disable+0x16a/0x880
[ 30.863728] trace_event_enable_disable+0x2a/0x40
[ 30.864043] trace_event_trigger_enable_disable.part.0+0x70/0x90
[ 30.864446] trace_event_trigger_enable_disable+0x31/0xc0
[ 30.864806] eprobe_register+0x49a/0xdf0
[ 30.865080] ? disable_eprobe+0x370/0x370
[ 30.865360] ? mutex_unlock+0x16/0x20
[ 30.865618] ? write_comp_data+0x2f/0x90
[ 30.865897] __ftrace_event_enable_disable+0x4c1/0x880
[ 30.866247] __ftrace_set_clr_event_nolock+0x297/0x390
[ 30.866598] __ftrace_set_clr_event+0x43/0x70
[ 30.866901] system_enable_write+0x1f7/0x2a0
[ 30.867214] ? event_enable_write+0x290/0x290
[ 30.867521] ? rcu_read_lock_held+0xc0/0xc0
[ 30.867842] ? write_comp_data+0x2f/0x90
[ 30.868176] ? write_comp_data+0x2f/0x90
[ 30.868467] vfs_write+0x311/0xe50
[ 30.868752] ? event_enable_write+0x290/0x290
[ 30.869067] ? kernel_write+0x5b0/0x5b0
[ 30.869329] ? lockdep_hardirqs_on_prepare+0x410/0x410
[ 30.869675] ? lock_is_held_type+0xaf/0x120
[ 30.869954] ? close_fd+0x6e/0xb0
[ 30.870182] ? write_comp_data+0x2f/0x90
[ 30.870453] ? __sanitizer_cov_trace_pc+0x25/0x60
[ 30.870772] ? write_comp_data+0x2f/0x90
[ 30.871043] ? write_comp_data+0x2f/0x90
[ 30.871315] ksys_write+0x158/0x2a0
[ 30.871558] ? __ia32_sys_read+0xc0/0xc0
[ 30.871827] ? syscall_enter_from_user_mode+0x25/0x70
[ 30.872169] ? do_syscall_64+0x3b/0x90
[ 30.872432] __x64_sys_write+0x7c/0xc0
[ 30.872691] ? syscall_enter_from_user_mode+0x25/0x70
[ 30.873028] do_syscall_64+0x60/0x90
[ 30.873279] ? syscall_exit_to_user_mode+0x4a/0x60
[ 30.873597] ? do_syscall_64+0x6d/0x90
[ 30.873857] ? fpregs_assert_state_consistent+0x90/0xf0
[ 30.874214] ? syscall_exit_to_user_mode+0x4a/0x60
[ 30.874535] ? do_syscall_64+0x6d/0x90
[ 30.874794] ? do_syscall_64+0x6d/0x90
[ 30.875118] ? do_syscall_64+0x6d/0x90
[ 30.875379] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 30.875712] RIP: 0033:0x7f7cc14da8f3
[...] stripped
[ 30.879875] </TASK>
[...] stripped
[ 30.881930] ---[ end trace 0000000000000000 ]---
[ 30.882237] RIP: 0010:get_event_field.isra.0+0x1b8/0x3f0
[...] stripped
[ 30.887734] note: bash[223] exited with preempt_count 2
That happens because enable_eprobe() will eventually trigger the
kmem/mm_page_alloc trace event:
- enable_eprobe [trace_eprobe.c]
- trace_event_trigger_enable_disable [trace_events_trigger.c]
- trace_event_enable_disable [trace_events.c]
- __ftrace_event_enable_disable [trace_events.c]
- trace_buffered_event_enable [trace.c]
- alloc_pages_node [gfp.h]
...
- __alloc_pages [page_alloc.c]
- trace_mm_page_alloc // eprobe event file without TRIGGER_COND bit set
By the time kmem/mm_page_alloc trace event is hit, the eprobe event file
does not have the TRIGGER_COND flag set yet, which causes the eprobe's
trigger to be invoked (through the trace_trigger_soft_disabled() path)
without a trace record, causing a NULL pointer dereference when fetching
the event fields.
Fix this by setting the cond flag beforehand when enabling the eprobe's
trigger.
Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events")
Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com>
---
kernel/trace/trace_eprobe.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Comments
On Wed, 16 Nov 2022 16:25:51 -0300 Rafael Mendonca <rafaelmendsr@gmail.com> wrote: > That happens because enable_eprobe() will eventually trigger the > kmem/mm_page_alloc trace event: > > - enable_eprobe [trace_eprobe.c] > - trace_event_trigger_enable_disable [trace_events_trigger.c] > - trace_event_enable_disable [trace_events.c] > - __ftrace_event_enable_disable [trace_events.c] > - trace_buffered_event_enable [trace.c] > - alloc_pages_node [gfp.h] > ... > - __alloc_pages [page_alloc.c] > - trace_mm_page_alloc // eprobe event file without TRIGGER_COND bit set > > By the time kmem/mm_page_alloc trace event is hit, the eprobe event file > does not have the TRIGGER_COND flag set yet, which causes the eprobe's > trigger to be invoked (through the trace_trigger_soft_disabled() path) > without a trace record, causing a NULL pointer dereference when fetching > the event fields. > > Fix this by setting the cond flag beforehand when enabling the eprobe's > trigger. > > Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events") > Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com> > --- Thanks for the report, but I'm worried that this isn't enough because of how memory ordering can happen on different architectures. That is, just because you switch the order of updates, doesn't mean that the architecture will honor it. I don't want to add memory barriers in the fast path, but instead we can simply check if rec is NULL in the handler. So basically: static void eprobe_trigger_func(struct event_trigger_data *data, struct trace_buffer *buffer, void *rec, struct ring_buffer_event *rbe) { struct eprobe_data *edata = data->private_data; if (!rec) return; __eprobe_trace_func(edata, rec); } And this should be documented. -- Steve
On Thu, 17 Nov 2022 21:17:26 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > On Wed, 16 Nov 2022 16:25:51 -0300 > Rafael Mendonca <rafaelmendsr@gmail.com> wrote: > > > That happens because enable_eprobe() will eventually trigger the > > kmem/mm_page_alloc trace event: > > > > - enable_eprobe [trace_eprobe.c] > > - trace_event_trigger_enable_disable [trace_events_trigger.c] > > - trace_event_enable_disable [trace_events.c] > > - __ftrace_event_enable_disable [trace_events.c] > > - trace_buffered_event_enable [trace.c] > > - alloc_pages_node [gfp.h] > > ... > > - __alloc_pages [page_alloc.c] > > - trace_mm_page_alloc // eprobe event file without TRIGGER_COND bit set > > > > By the time kmem/mm_page_alloc trace event is hit, the eprobe event file > > does not have the TRIGGER_COND flag set yet, which causes the eprobe's > > trigger to be invoked (through the trace_trigger_soft_disabled() path) > > without a trace record, causing a NULL pointer dereference when fetching > > the event fields. > > > > Fix this by setting the cond flag beforehand when enabling the eprobe's > > trigger. > > > > Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events") > > Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com> > > --- > > Thanks for the report, but I'm worried that this isn't enough because of > how memory ordering can happen on different architectures. That is, just > because you switch the order of updates, doesn't mean that the architecture > will honor it. > > I don't want to add memory barriers in the fast path, but instead we can > simply check if rec is NULL in the handler. > > So basically: > > > static void eprobe_trigger_func(struct event_trigger_data *data, > struct trace_buffer *buffer, void *rec, > struct ring_buffer_event *rbe) > { > struct eprobe_data *edata = data->private_data; > > if (!rec) > return; > > __eprobe_trace_func(edata, rec); > } > > And this should be documented. > In fact, does this fix it for you? I'm going to take this patch and reference you as a reported-by, as I have a lot of urgent code that needs to got upstream, and I need to start testing it. Thanks! -- Steve diff --git a/kernel/trace/trace_eprobe.c b/kernel/trace/trace_eprobe.c index 5dd0617e5df6..6b31b74954d9 100644 --- a/kernel/trace/trace_eprobe.c +++ b/kernel/trace/trace_eprobe.c @@ -563,6 +563,9 @@ static void eprobe_trigger_func(struct event_trigger_data *data, { struct eprobe_data *edata = data->private_data; + if (!rec) + return; + __eprobe_trace_func(edata, rec); }
On Thu, Nov 17, 2022 at 09:17:26PM -0500, Steven Rostedt wrote: > On Wed, 16 Nov 2022 16:25:51 -0300 > Rafael Mendonca <rafaelmendsr@gmail.com> wrote: > > > That happens because enable_eprobe() will eventually trigger the > > kmem/mm_page_alloc trace event: > > > > - enable_eprobe [trace_eprobe.c] > > - trace_event_trigger_enable_disable [trace_events_trigger.c] > > - trace_event_enable_disable [trace_events.c] > > - __ftrace_event_enable_disable [trace_events.c] > > - trace_buffered_event_enable [trace.c] > > - alloc_pages_node [gfp.h] > > ... > > - __alloc_pages [page_alloc.c] > > - trace_mm_page_alloc // eprobe event file without TRIGGER_COND bit set > > > > By the time kmem/mm_page_alloc trace event is hit, the eprobe event file > > does not have the TRIGGER_COND flag set yet, which causes the eprobe's > > trigger to be invoked (through the trace_trigger_soft_disabled() path) > > without a trace record, causing a NULL pointer dereference when fetching > > the event fields. > > > > Fix this by setting the cond flag beforehand when enabling the eprobe's > > trigger. > > > > Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events") > > Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com> > > --- > > Thanks for the report, but I'm worried that this isn't enough because of > how memory ordering can happen on different architectures. That is, just > because you switch the order of updates, doesn't mean that the architecture > will honor it. > > I don't want to add memory barriers in the fast path, but instead we can > simply check if rec is NULL in the handler. Ok. That was my first approach but then looking for the same pattern in trace_events_trigger.c and trace_events_hist.c I saw commit 4e4a4d75700d ("tracing: Update cond flag when enabling or disabling a trigger") [1] and I thought the same would apply here without much thought. [1] https://lore.kernel.org/lkml/20151126022048.111221642@goodmis.org/ > > So basically: > > > static void eprobe_trigger_func(struct event_trigger_data *data, > struct trace_buffer *buffer, void *rec, > struct ring_buffer_event *rbe) > { > struct eprobe_data *edata = data->private_data; > > if (!rec) > return; > > __eprobe_trace_func(edata, rec); > } > > And this should be documented. > > -- Steve
On Thu, Nov 17, 2022 at 09:31:09PM -0500, Steven Rostedt wrote: > On Thu, 17 Nov 2022 21:17:26 -0500 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > On Wed, 16 Nov 2022 16:25:51 -0300 > > Rafael Mendonca <rafaelmendsr@gmail.com> wrote: > > > > > That happens because enable_eprobe() will eventually trigger the > > > kmem/mm_page_alloc trace event: > > > > > > - enable_eprobe [trace_eprobe.c] > > > - trace_event_trigger_enable_disable [trace_events_trigger.c] > > > - trace_event_enable_disable [trace_events.c] > > > - __ftrace_event_enable_disable [trace_events.c] > > > - trace_buffered_event_enable [trace.c] > > > - alloc_pages_node [gfp.h] > > > ... > > > - __alloc_pages [page_alloc.c] > > > - trace_mm_page_alloc // eprobe event file without TRIGGER_COND bit set > > > > > > By the time kmem/mm_page_alloc trace event is hit, the eprobe event file > > > does not have the TRIGGER_COND flag set yet, which causes the eprobe's > > > trigger to be invoked (through the trace_trigger_soft_disabled() path) > > > without a trace record, causing a NULL pointer dereference when fetching > > > the event fields. > > > > > > Fix this by setting the cond flag beforehand when enabling the eprobe's > > > trigger. > > > > > > Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events") > > > Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com> > > > --- > > > > Thanks for the report, but I'm worried that this isn't enough because of > > how memory ordering can happen on different architectures. That is, just > > because you switch the order of updates, doesn't mean that the architecture > > will honor it. > > > > I don't want to add memory barriers in the fast path, but instead we can > > simply check if rec is NULL in the handler. > > > > So basically: > > > > > > static void eprobe_trigger_func(struct event_trigger_data *data, > > struct trace_buffer *buffer, void *rec, > > struct ring_buffer_event *rbe) > > { > > struct eprobe_data *edata = data->private_data; > > > > if (!rec) > > return; > > > > __eprobe_trace_func(edata, rec); > > } > > > > And this should be documented. > > > > In fact, does this fix it for you? It does. I found this while enabling eprobe for all events in my setup. Doing the same test wih the proposed patch it did not trigger any issue. Thanks. > > I'm going to take this patch and reference you as a reported-by, as I have > a lot of urgent code that needs to got upstream, and I need to start > testing it. > > Thanks! > > -- Steve > > diff --git a/kernel/trace/trace_eprobe.c b/kernel/trace/trace_eprobe.c > index 5dd0617e5df6..6b31b74954d9 100644 > --- a/kernel/trace/trace_eprobe.c > +++ b/kernel/trace/trace_eprobe.c > @@ -563,6 +563,9 @@ static void eprobe_trigger_func(struct event_trigger_data *data, > { > struct eprobe_data *edata = data->private_data; > > + if (!rec) > + return; > + > __eprobe_trace_func(edata, rec); > } > >
On Fri, Nov 18, 2022 at 09:40:09AM -0300, Rafael Mendonca wrote: > On Thu, Nov 17, 2022 at 09:31:09PM -0500, Steven Rostedt wrote: > > On Thu, 17 Nov 2022 21:17:26 -0500 > > Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > On Wed, 16 Nov 2022 16:25:51 -0300 > > > Rafael Mendonca <rafaelmendsr@gmail.com> wrote: > > > > > > > That happens because enable_eprobe() will eventually trigger the > > > > kmem/mm_page_alloc trace event: > > > > > > > > - enable_eprobe [trace_eprobe.c] > > > > - trace_event_trigger_enable_disable [trace_events_trigger.c] > > > > - trace_event_enable_disable [trace_events.c] > > > > - __ftrace_event_enable_disable [trace_events.c] > > > > - trace_buffered_event_enable [trace.c] > > > > - alloc_pages_node [gfp.h] > > > > ... > > > > - __alloc_pages [page_alloc.c] > > > > - trace_mm_page_alloc // eprobe event file without TRIGGER_COND bit set > > > > > > > > By the time kmem/mm_page_alloc trace event is hit, the eprobe event file > > > > does not have the TRIGGER_COND flag set yet, which causes the eprobe's > > > > trigger to be invoked (through the trace_trigger_soft_disabled() path) > > > > without a trace record, causing a NULL pointer dereference when fetching > > > > the event fields. > > > > > > > > Fix this by setting the cond flag beforehand when enabling the eprobe's > > > > trigger. > > > > > > > > Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events") > > > > Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com> > > > > --- > > > > > > Thanks for the report, but I'm worried that this isn't enough because of > > > how memory ordering can happen on different architectures. That is, just > > > because you switch the order of updates, doesn't mean that the architecture > > > will honor it. > > > > > > I don't want to add memory barriers in the fast path, but instead we can > > > simply check if rec is NULL in the handler. > > > > > > So basically: > > > > > > > > > static void eprobe_trigger_func(struct event_trigger_data *data, > > > struct trace_buffer *buffer, void *rec, > > > struct ring_buffer_event *rbe) > > > { > > > struct eprobe_data *edata = data->private_data; > > > > > > if (!rec) > > > return; > > > > > > __eprobe_trace_func(edata, rec); > > > } > > > > > > And this should be documented. > > > > > > > In fact, does this fix it for you? > > It does. I found this while enabling eprobe for all events in my setup. > Doing the same test wih the proposed patch it did not trigger any issue. > Thanks. It did not trigger the NULL pointer issue to be more specific. When creating event probe for all events I was unable to create any event for the xhci-hcd system: root@localhost:/sys/kernel/tracing# echo 'e xhci-hcd/xhci_add_endpoint' > dynamic_events -bash: echo: write error: Invalid argument Debugging the issue it seems that the problem is in the is_good_name() check, which returns false for "xhci-hcd". Should we sanitize it by converting '-' into '_'? > > > > > I'm going to take this patch and reference you as a reported-by, as I have > > a lot of urgent code that needs to got upstream, and I need to start > > testing it. > > > > Thanks! > > > > -- Steve > > > > diff --git a/kernel/trace/trace_eprobe.c b/kernel/trace/trace_eprobe.c > > index 5dd0617e5df6..6b31b74954d9 100644 > > --- a/kernel/trace/trace_eprobe.c > > +++ b/kernel/trace/trace_eprobe.c > > @@ -563,6 +563,9 @@ static void eprobe_trigger_func(struct event_trigger_data *data, > > { > > struct eprobe_data *edata = data->private_data; > > > > + if (!rec) > > + return; > > + > > __eprobe_trace_func(edata, rec); > > } > > > >
On Fri, 18 Nov 2022 10:34:40 -0300 Rafael Mendonca <rafaelmendsr@gmail.com> wrote: > It did not trigger the NULL pointer issue to be more specific. When > creating event probe for all events I was unable to create any event for > the xhci-hcd system: > > root@localhost:/sys/kernel/tracing# echo 'e xhci-hcd/xhci_add_endpoint' > dynamic_events > -bash: echo: write error: Invalid argument > > Debugging the issue it seems that the problem is in the is_good_name() > check, which returns false for "xhci-hcd". Should we sanitize it by Ouch. I didn't realize that. > converting '-' into '_'? Actually, it's just the system name that's an issue. I tested this patch and it appears to work. -- Steve diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 54ee5711c729..a16fb4c9642e 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1955,17 +1955,30 @@ static __always_inline void trace_iterator_reset(struct trace_iterator *iter) } /* Check the name is good for event/group/fields */ -static inline bool is_good_name(const char *name) +static inline bool __is_good_name(const char *name, bool hash_ok) { - if (!isalpha(*name) && *name != '_') + if (!isalpha(*name) && *name != '_' && (!hash_ok || *name != '-')) return false; while (*++name != '\0') { - if (!isalpha(*name) && !isdigit(*name) && *name != '_') + if (!isalpha(*name) && !isdigit(*name) && *name != '_' && + (!hash_ok || *name != '-')) return false; } return true; } +/* Check the name is good for event/group/fields */ +static inline bool is_good_name(const char *name) +{ + return __is_good_name(name, false); +} + +/* Check the name is good for system */ +static inline bool is_good_system_name(const char *name) +{ + return __is_good_name(name, true); +} + /* Convert certain expected symbols into '_' when generating event names */ static inline void sanitize_event_name(char *name) { diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c index 36dff277de46..bb2f95d7175c 100644 --- a/kernel/trace/trace_probe.c +++ b/kernel/trace/trace_probe.c @@ -246,7 +246,7 @@ int traceprobe_parse_event_name(const char **pevent, const char **pgroup, return -EINVAL; } strlcpy(buf, event, slash - event + 1); - if (!is_good_name(buf)) { + if (!is_good_system_name(buf)) { trace_probe_log_err(offset, BAD_GROUP_NAME); return -EINVAL; }
On Fri, 18 Nov 2022 11:19:40 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > On Fri, 18 Nov 2022 10:34:40 -0300 > Rafael Mendonca <rafaelmendsr@gmail.com> wrote: > > > It did not trigger the NULL pointer issue to be more specific. When > > creating event probe for all events I was unable to create any event for > > the xhci-hcd system: > > > > root@localhost:/sys/kernel/tracing# echo 'e xhci-hcd/xhci_add_endpoint' > dynamic_events > > -bash: echo: write error: Invalid argument > > > > Debugging the issue it seems that the problem is in the is_good_name() > > check, which returns false for "xhci-hcd". Should we sanitize it by > > Ouch. I didn't realize that. Maybe we need better error message so that user can notice which character caused the error. > > > converting '-' into '_'? > > Actually, it's just the system name that's an issue. I tested this patch > and it appears to work. Ah, the system name is more flexible than the event name because it has TRACE_SYSTEM_VAR. Steve, can you send me the below patch? BTW, TRACE_DEFINE_ENUM() and TRACE_DEFINE_SIZEOF() macros are using TRACE_SYSTEM instead of TRACE_SYSTEM_VAR. Should those use TRACE_SYSTEM_VAR for defining a variable? Thank you, > > -- Steve > > > diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h > index 54ee5711c729..a16fb4c9642e 100644 > --- a/kernel/trace/trace.h > +++ b/kernel/trace/trace.h > @@ -1955,17 +1955,30 @@ static __always_inline void trace_iterator_reset(struct trace_iterator *iter) > } > > /* Check the name is good for event/group/fields */ > -static inline bool is_good_name(const char *name) > +static inline bool __is_good_name(const char *name, bool hash_ok) > { > - if (!isalpha(*name) && *name != '_') > + if (!isalpha(*name) && *name != '_' && (!hash_ok || *name != '-')) > return false; > while (*++name != '\0') { > - if (!isalpha(*name) && !isdigit(*name) && *name != '_') > + if (!isalpha(*name) && !isdigit(*name) && *name != '_' && > + (!hash_ok || *name != '-')) > return false; > } > return true; > } > > +/* Check the name is good for event/group/fields */ > +static inline bool is_good_name(const char *name) > +{ > + return __is_good_name(name, false); > +} > + > +/* Check the name is good for system */ > +static inline bool is_good_system_name(const char *name) > +{ > + return __is_good_name(name, true); > +} > + > /* Convert certain expected symbols into '_' when generating event names */ > static inline void sanitize_event_name(char *name) > { > diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c > index 36dff277de46..bb2f95d7175c 100644 > --- a/kernel/trace/trace_probe.c > +++ b/kernel/trace/trace_probe.c > @@ -246,7 +246,7 @@ int traceprobe_parse_event_name(const char **pevent, const char **pgroup, > return -EINVAL; > } > strlcpy(buf, event, slash - event + 1); > - if (!is_good_name(buf)) { > + if (!is_good_system_name(buf)) { > trace_probe_log_err(offset, BAD_GROUP_NAME); > return -EINVAL; > }
On Thu, 24 Nov 2022 01:01:52 +0900 Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote: > On Fri, 18 Nov 2022 11:19:40 -0500 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > On Fri, 18 Nov 2022 10:34:40 -0300 > > Rafael Mendonca <rafaelmendsr@gmail.com> wrote: > > > > > It did not trigger the NULL pointer issue to be more specific. When > > > creating event probe for all events I was unable to create any event for > > > the xhci-hcd system: > > > > > > root@localhost:/sys/kernel/tracing# echo 'e xhci-hcd/xhci_add_endpoint' > dynamic_events > > > -bash: echo: write error: Invalid argument > > > > > > Debugging the issue it seems that the problem is in the is_good_name() > > > check, which returns false for "xhci-hcd". Should we sanitize it by > > > > Ouch. I didn't realize that. > > Maybe we need better error message so that user can notice which character > caused the error. > > > > > > converting '-' into '_'? > > > > Actually, it's just the system name that's an issue. I tested this patch > > and it appears to work. > > Ah, the system name is more flexible than the event name because it has > TRACE_SYSTEM_VAR. > > Steve, can you send me the below patch? I already did ;-) https://patchwork.kernel.org/project/linux-trace-kernel/patch/20221122122345.160f5077@gandalf.local.home/ > > BTW, TRACE_DEFINE_ENUM() and TRACE_DEFINE_SIZEOF() macros are using > TRACE_SYSTEM instead of TRACE_SYSTEM_VAR. Should those use TRACE_SYSTEM_VAR > for defining a variable? Hmm, good question. Probably. But since nothing that has a bad name uses it, we can fix that whenever (but before a system with a hyphen uses those macros). Do you want to make the change, or shall I? -- Steve
diff --git a/kernel/trace/trace_eprobe.c b/kernel/trace/trace_eprobe.c index 5dd0617e5df6..5004c6f6f232 100644 --- a/kernel/trace/trace_eprobe.c +++ b/kernel/trace/trace_eprobe.c @@ -677,8 +677,8 @@ static int enable_eprobe(struct trace_eprobe *ep, list_add_tail_rcu(&trigger->list, &file->triggers); - trace_event_trigger_enable_disable(file, 1); update_cond_flag(file); + trace_event_trigger_enable_disable(file, 1); return 0; }