Message ID | 20231002135242.247536-1-asavkov@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2a8e:b0:403:3b70:6f57 with SMTP id in14csp1643351vqb; Mon, 2 Oct 2023 12:24:54 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFvw8+mlMPUSNxY4NsM5WLrtRPxKzHLw20t1ajNmcqcjJt5f5oZ/MhIrmtbxlJh6r2TeUIx X-Received: by 2002:a05:6a20:8e1f:b0:13a:fa9e:787b with SMTP id y31-20020a056a208e1f00b0013afa9e787bmr12379556pzj.12.1696274694278; Mon, 02 Oct 2023 12:24:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696274694; cv=none; d=google.com; s=arc-20160816; b=0+OIDTIVRox/fK4A17Fch2LqGsdJJpIwC3k7yKXeZs6YzIgc/PYSWVzAZLKx3aBAlr urjFO4QeliLQt/52wUxK4Y652G1XSc4hPbUK4wwxt/0bSOB9ZN8Gkc+DTmKrZJawBLTO S284mHFLeFw3MuE0qdG3yeA3foBNSbJhEnRHY3XGqP0bv/NnAXUj7gcFF9WvpPdps3Uz bjpltKuGN7jq1QrVtkaCPwo6RfbsfHRqy8l+3skaeyMkMlx9BV9dU5OuFBF6FLimIoFt wk6TCM2IZfG2y1Zm5ND8bRL7L738sqPyl8TaVgUhL3sGcA5nZ50v1nnzrkPIT7qXKD1z Dtew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=vpLoGghBwfAF/DR0ZHV2Bh2/WPx4zdRILR3KFZtAXWE=; fh=8qqxUtUzTJ2/nK+9pCI2oYgW2DNX19re5Deay+Aodsc=; b=kyQe7O4lCHj1Kz27XjHwLfjI9AgGeLFCsegdlhSKLnJFAgOyvGe8a+IrWQqGKk6dDt IP1tw+gTdUQgePVok9oEWUpzg0bfShqKuOZfbWHUr0Ip4ZHA3LveNz/jP6+7K1DRcRs4 b2rcndLLaUrnI7+mbJs23VEp5COHdtxdbm29XvgWULyvao0xMxAGlbShluuV8EjKZ3YA Ed6c6438eLovS8HddZfg6RF5GD8iCnN6G/va8yKfsKDl0avZkwzTDvj/a/ZuhI4bRaHR X2zxyUfTxWuTvdSs2HcS9WpmQBnjRw2ZzAq7+8T3Zdc3L1jwsyyM6byUOxOzgwdKO5Tw KarQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Q2sVPwcX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id j9-20020a056a00234900b00690d7d99e2fsi29702131pfj.212.2023.10.02.12.24.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Oct 2023 12:24:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Q2sVPwcX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id CAEEB8076677; Mon, 2 Oct 2023 06:54:18 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237573AbjJBNx5 (ORCPT <rfc822;pusanteemu@gmail.com> + 18 others); Mon, 2 Oct 2023 09:53:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38668 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237571AbjJBNxy (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 2 Oct 2023 09:53:54 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 798FDAD for <linux-kernel@vger.kernel.org>; Mon, 2 Oct 2023 06:53:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696254786; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=vpLoGghBwfAF/DR0ZHV2Bh2/WPx4zdRILR3KFZtAXWE=; b=Q2sVPwcXhGGCHl7ofFNf/p/AkY5nAA/RZoqA8pMj1aERHSjXVrats1fuxU9bERkkd3oVAH r3Hy95JGvkpEUO9DRp/AstnT7icwUjXE/sNbs/5OLn/ltpNlekThuBodvzm4GilJPaDE+c I31aLIPSmkClYqYa28fqjK+PHOmTigU= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-607-p2jhS79bNEmnpPnmm7HB3g-1; Mon, 02 Oct 2023 09:52:48 -0400 X-MC-Unique: p2jhS79bNEmnpPnmm7HB3g-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1F52B3C176E8; Mon, 2 Oct 2023 13:52:47 +0000 (UTC) Received: from alecto.usersys.redhat.com (unknown [10.43.17.26]) by smtp.corp.redhat.com (Postfix) with ESMTP id 494FB40C6EBF; Mon, 2 Oct 2023 13:52:45 +0000 (UTC) From: Artem Savkov <asavkov@redhat.com> To: Steven Rostedt <rostedt@goodmis.org>, Masami Hiramatsu <mhiramat@kernel.org> Cc: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Andrii Nakryiko <andrii@kernel.org>, bpf@vger.kernel.org, netdev@vger.kernel.org, Thomas Gleixner <tglx@linutronix.de>, linux-rt-users@vger.kernel.org, Jiri Olsa <jolsa@kernel.org>, Artem Savkov <asavkov@redhat.com> Subject: [RFC PATCH] tracing: change syscall number type in struct syscall_trace_* Date: Mon, 2 Oct 2023 15:52:42 +0200 Message-ID: <20231002135242.247536-1-asavkov@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=2.7 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Level: ** X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Mon, 02 Oct 2023 06:54:18 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1778672933984380209 X-GMAIL-MSGID: 1778672933984380209 |
Series |
[RFC] tracing: change syscall number type in struct syscall_trace_*
|
|
Commit Message
Artem Savkov
Oct. 2, 2023, 1:52 p.m. UTC
linux-rt-devel tree contains a patch that adds an extra member to struct
trace_entry. This causes the offset of args field in struct
trace_event_raw_sys_enter be different from the one in struct
syscall_trace_enter:
struct trace_event_raw_sys_enter {
struct trace_entry ent; /* 0 12 */
/* XXX last struct has 3 bytes of padding */
/* XXX 4 bytes hole, try to pack */
long int id; /* 16 8 */
long unsigned int args[6]; /* 24 48 */
/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
char __data[]; /* 72 0 */
/* size: 72, cachelines: 2, members: 4 */
/* sum members: 68, holes: 1, sum holes: 4 */
/* paddings: 1, sum paddings: 3 */
/* last cacheline: 8 bytes */
};
struct syscall_trace_enter {
struct trace_entry ent; /* 0 12 */
/* XXX last struct has 3 bytes of padding */
int nr; /* 12 4 */
long unsigned int args[]; /* 16 0 */
/* size: 16, cachelines: 1, members: 3 */
/* paddings: 1, sum paddings: 3 */
/* last cacheline: 16 bytes */
};
This, in turn, causes perf_event_set_bpf_prog() fail while running bpf
test_profiler testcase because max_ctx_offset is calculated based on the
former struct, while off on the latter:
10488 if (is_tracepoint || is_syscall_tp) {
10489 int off = trace_event_get_offsets(event->tp_event);
10490
10491 if (prog->aux->max_ctx_offset > off)
10492 return -EACCES;
10493 }
This patch changes the type of nr member in syscall_trace_* structs to
be long so that "args" offset is equal to that in struct
trace_event_raw_sys_enter.
Signed-off-by: Artem Savkov <asavkov@redhat.com>
---
kernel/trace/trace.h | 4 ++--
kernel/trace/trace_syscalls.c | 7 ++++---
2 files changed, 6 insertions(+), 5 deletions(-)
Comments
On Mon, Oct 2, 2023 at 6:53 AM Artem Savkov <asavkov@redhat.com> wrote: > > linux-rt-devel tree contains a patch that adds an extra member to struct can you please point to the patch itself that makes that change? > trace_entry. This causes the offset of args field in struct > trace_event_raw_sys_enter be different from the one in struct > syscall_trace_enter: > > struct trace_event_raw_sys_enter { > struct trace_entry ent; /* 0 12 */ > > /* XXX last struct has 3 bytes of padding */ > /* XXX 4 bytes hole, try to pack */ > > long int id; /* 16 8 */ > long unsigned int args[6]; /* 24 48 */ > /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */ > char __data[]; /* 72 0 */ > > /* size: 72, cachelines: 2, members: 4 */ > /* sum members: 68, holes: 1, sum holes: 4 */ > /* paddings: 1, sum paddings: 3 */ > /* last cacheline: 8 bytes */ > }; > > struct syscall_trace_enter { > struct trace_entry ent; /* 0 12 */ > > /* XXX last struct has 3 bytes of padding */ > > int nr; /* 12 4 */ > long unsigned int args[]; /* 16 0 */ > > /* size: 16, cachelines: 1, members: 3 */ > /* paddings: 1, sum paddings: 3 */ > /* last cacheline: 16 bytes */ > }; > > This, in turn, causes perf_event_set_bpf_prog() fail while running bpf > test_profiler testcase because max_ctx_offset is calculated based on the > former struct, while off on the latter: > > 10488 if (is_tracepoint || is_syscall_tp) { > 10489 int off = trace_event_get_offsets(event->tp_event); > 10490 > 10491 if (prog->aux->max_ctx_offset > off) > 10492 return -EACCES; > 10493 } > > This patch changes the type of nr member in syscall_trace_* structs to > be long so that "args" offset is equal to that in struct > trace_event_raw_sys_enter. > > Signed-off-by: Artem Savkov <asavkov@redhat.com> > --- > kernel/trace/trace.h | 4 ++-- > kernel/trace/trace_syscalls.c | 7 ++++--- > 2 files changed, 6 insertions(+), 5 deletions(-) > > diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h > index 77debe53f07cf..cd1d24df85364 100644 > --- a/kernel/trace/trace.h > +++ b/kernel/trace/trace.h > @@ -135,13 +135,13 @@ enum trace_type { > */ > struct syscall_trace_enter { > struct trace_entry ent; > - int nr; > + long nr; > unsigned long args[]; > }; > > struct syscall_trace_exit { > struct trace_entry ent; > - int nr; > + long nr; > long ret; > }; > > diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c > index de753403cdafb..c26939119f2e4 100644 > --- a/kernel/trace/trace_syscalls.c > +++ b/kernel/trace/trace_syscalls.c > @@ -101,7 +101,7 @@ find_syscall_meta(unsigned long syscall) > return NULL; > } > > -static struct syscall_metadata *syscall_nr_to_meta(int nr) > +static struct syscall_metadata *syscall_nr_to_meta(long nr) > { > if (IS_ENABLED(CONFIG_HAVE_SPARSE_SYSCALL_NR)) > return xa_load(&syscalls_metadata_sparse, (unsigned long)nr); > @@ -132,7 +132,8 @@ print_syscall_enter(struct trace_iterator *iter, int flags, > struct trace_entry *ent = iter->ent; > struct syscall_trace_enter *trace; > struct syscall_metadata *entry; > - int i, syscall; > + int i; > + long syscall; > > trace = (typeof(trace))ent; > syscall = trace->nr; > @@ -177,7 +178,7 @@ print_syscall_exit(struct trace_iterator *iter, int flags, > struct trace_seq *s = &iter->seq; > struct trace_entry *ent = iter->ent; > struct syscall_trace_exit *trace; > - int syscall; > + long syscall; > struct syscall_metadata *entry; > > trace = (typeof(trace))ent; > -- > 2.41.0 > >
On Mon, 2 Oct 2023 15:52:42 +0200 Artem Savkov <asavkov@redhat.com> wrote: > linux-rt-devel tree contains a patch that adds an extra member to struct > trace_entry. This causes the offset of args field in struct > trace_event_raw_sys_enter be different from the one in struct > syscall_trace_enter: This patch looks like it's fixing the symptom and not the issue. No code should rely on the two event structures to be related. That's an unwanted coupling, that will likely cause issues down the road (like the RT patch you mentioned). > > struct trace_event_raw_sys_enter { > struct trace_entry ent; /* 0 12 */ > > /* XXX last struct has 3 bytes of padding */ > /* XXX 4 bytes hole, try to pack */ > > long int id; /* 16 8 */ > long unsigned int args[6]; /* 24 48 */ > /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */ > char __data[]; /* 72 0 */ > > /* size: 72, cachelines: 2, members: 4 */ > /* sum members: 68, holes: 1, sum holes: 4 */ > /* paddings: 1, sum paddings: 3 */ > /* last cacheline: 8 bytes */ > }; > > struct syscall_trace_enter { > struct trace_entry ent; /* 0 12 */ > > /* XXX last struct has 3 bytes of padding */ > > int nr; /* 12 4 */ > long unsigned int args[]; /* 16 0 */ > > /* size: 16, cachelines: 1, members: 3 */ > /* paddings: 1, sum paddings: 3 */ > /* last cacheline: 16 bytes */ > }; > > This, in turn, causes perf_event_set_bpf_prog() fail while running bpf > test_profiler testcase because max_ctx_offset is calculated based on the > former struct, while off on the latter: The above appears to be pointing to the real bug. The "is calculated based on the former struct while off on the latter" Why are the two being used together? They are supposed to be *unrelated*! > > 10488 if (is_tracepoint || is_syscall_tp) { > 10489 int off = trace_event_get_offsets(event->tp_event); So basically this is clumping together the raw_syscalls with the syscalls events as if they are the same. But the are not. They are created differently. It's basically like using one structure to get the offsets of another structure. That would be a bug anyplace else in the kernel. Sounds like it's a bug here too. I think the issue is with this code, not the tracing code. We could expose the struct syscall_trace_enter and syscall_trace_exit if the offsets to those are needed. -- Steve > 10490 > 10491 if (prog->aux->max_ctx_offset > off) > 10492 return -EACCES; > 10493 } > > This patch changes the type of nr member in syscall_trace_* structs to > be long so that "args" offset is equal to that in struct > trace_event_raw_sys_enter. >
On Tue, Oct 03, 2023 at 03:11:15PM -0700, Andrii Nakryiko wrote: > On Mon, Oct 2, 2023 at 6:53 AM Artem Savkov <asavkov@redhat.com> wrote: > > > > linux-rt-devel tree contains a patch that adds an extra member to struct > > can you please point to the patch itself that makes that change? Of course, some context would be useful. The patch in question is b1773eac3f29c ("sched: Add support for lazy preemption") from rt-devel tree [0]. It came up a couple of times before: [1] [2] [3] [4]. [0] https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/commit/?id=b1773eac3f29cbdcdfd16e0339f1a164066e9f71 [1] https://lore.kernel.org/linux-rt-users/20200221153541.681468-1-jolsa@kernel.org/t/#u [2] https://github.com/iovisor/bpftrace/commit/a2e3d5dbc03ceb49b776cf5602d31896158844a7 [3] https://lore.kernel.org/bpf/xunyjzy64q9b.fsf@redhat.com/t/#u [4] https://lore.kernel.org/bpf/20230727150647.397626-1-ykaliuta@redhat.com/t/#u > > trace_entry. This causes the offset of args field in struct > > trace_event_raw_sys_enter be different from the one in struct > > syscall_trace_enter: > > > > struct trace_event_raw_sys_enter { > > struct trace_entry ent; /* 0 12 */ > > > > /* XXX last struct has 3 bytes of padding */ > > /* XXX 4 bytes hole, try to pack */ > > > > long int id; /* 16 8 */ > > long unsigned int args[6]; /* 24 48 */ > > /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */ > > char __data[]; /* 72 0 */ > > > > /* size: 72, cachelines: 2, members: 4 */ > > /* sum members: 68, holes: 1, sum holes: 4 */ > > /* paddings: 1, sum paddings: 3 */ > > /* last cacheline: 8 bytes */ > > }; > > > > struct syscall_trace_enter { > > struct trace_entry ent; /* 0 12 */ > > > > /* XXX last struct has 3 bytes of padding */ > > > > int nr; /* 12 4 */ > > long unsigned int args[]; /* 16 0 */ > > > > /* size: 16, cachelines: 1, members: 3 */ > > /* paddings: 1, sum paddings: 3 */ > > /* last cacheline: 16 bytes */ > > }; > > > > This, in turn, causes perf_event_set_bpf_prog() fail while running bpf > > test_profiler testcase because max_ctx_offset is calculated based on the > > former struct, while off on the latter: > > > > 10488 if (is_tracepoint || is_syscall_tp) { > > 10489 int off = trace_event_get_offsets(event->tp_event); > > 10490 > > 10491 if (prog->aux->max_ctx_offset > off) > > 10492 return -EACCES; > > 10493 } > > > > This patch changes the type of nr member in syscall_trace_* structs to > > be long so that "args" offset is equal to that in struct > > trace_event_raw_sys_enter. > > > > Signed-off-by: Artem Savkov <asavkov@redhat.com> > > --- > > kernel/trace/trace.h | 4 ++-- > > kernel/trace/trace_syscalls.c | 7 ++++--- > > 2 files changed, 6 insertions(+), 5 deletions(-) > > > > diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h > > index 77debe53f07cf..cd1d24df85364 100644 > > --- a/kernel/trace/trace.h > > +++ b/kernel/trace/trace.h > > @@ -135,13 +135,13 @@ enum trace_type { > > */ > > struct syscall_trace_enter { > > struct trace_entry ent; > > - int nr; > > + long nr; > > unsigned long args[]; > > }; > > > > struct syscall_trace_exit { > > struct trace_entry ent; > > - int nr; > > + long nr; > > long ret; > > }; > > > > diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c > > index de753403cdafb..c26939119f2e4 100644 > > --- a/kernel/trace/trace_syscalls.c > > +++ b/kernel/trace/trace_syscalls.c > > @@ -101,7 +101,7 @@ find_syscall_meta(unsigned long syscall) > > return NULL; > > } > > > > -static struct syscall_metadata *syscall_nr_to_meta(int nr) > > +static struct syscall_metadata *syscall_nr_to_meta(long nr) > > { > > if (IS_ENABLED(CONFIG_HAVE_SPARSE_SYSCALL_NR)) > > return xa_load(&syscalls_metadata_sparse, (unsigned long)nr); > > @@ -132,7 +132,8 @@ print_syscall_enter(struct trace_iterator *iter, int flags, > > struct trace_entry *ent = iter->ent; > > struct syscall_trace_enter *trace; > > struct syscall_metadata *entry; > > - int i, syscall; > > + int i; > > + long syscall; > > > > trace = (typeof(trace))ent; > > syscall = trace->nr; > > @@ -177,7 +178,7 @@ print_syscall_exit(struct trace_iterator *iter, int flags, > > struct trace_seq *s = &iter->seq; > > struct trace_entry *ent = iter->ent; > > struct syscall_trace_exit *trace; > > - int syscall; > > + long syscall; > > struct syscall_metadata *entry; > > > > trace = (typeof(trace))ent; > > -- > > 2.41.0 > > > > >
On Tue, Oct 03, 2023 at 09:38:44PM -0400, Steven Rostedt wrote: > On Mon, 2 Oct 2023 15:52:42 +0200 > Artem Savkov <asavkov@redhat.com> wrote: > > > linux-rt-devel tree contains a patch that adds an extra member to struct > > trace_entry. This causes the offset of args field in struct > > trace_event_raw_sys_enter be different from the one in struct > > syscall_trace_enter: > > This patch looks like it's fixing the symptom and not the issue. No code > should rely on the two event structures to be related. That's an unwanted > coupling, that will likely cause issues down the road (like the RT patch > you mentioned). I agree, but I didn't see a better solution and that was my way of starting conversation, thus the RFC. > > > > struct trace_event_raw_sys_enter { > > struct trace_entry ent; /* 0 12 */ > > > > /* XXX last struct has 3 bytes of padding */ > > /* XXX 4 bytes hole, try to pack */ > > > > long int id; /* 16 8 */ > > long unsigned int args[6]; /* 24 48 */ > > /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */ > > char __data[]; /* 72 0 */ > > > > /* size: 72, cachelines: 2, members: 4 */ > > /* sum members: 68, holes: 1, sum holes: 4 */ > > /* paddings: 1, sum paddings: 3 */ > > /* last cacheline: 8 bytes */ > > }; > > > > struct syscall_trace_enter { > > struct trace_entry ent; /* 0 12 */ > > > > /* XXX last struct has 3 bytes of padding */ > > > > int nr; /* 12 4 */ > > long unsigned int args[]; /* 16 0 */ > > > > /* size: 16, cachelines: 1, members: 3 */ > > /* paddings: 1, sum paddings: 3 */ > > /* last cacheline: 16 bytes */ > > }; > > > > This, in turn, causes perf_event_set_bpf_prog() fail while running bpf > > test_profiler testcase because max_ctx_offset is calculated based on the > > former struct, while off on the latter: > > The above appears to be pointing to the real bug. The "is calculated based > on the former struct while off on the latter" Why are the two being used > together? They are supposed to be *unrelated*! > > > > > > 10488 if (is_tracepoint || is_syscall_tp) { > > 10489 int off = trace_event_get_offsets(event->tp_event); > > So basically this is clumping together the raw_syscalls with the syscalls > events as if they are the same. But the are not. They are created > differently. It's basically like using one structure to get the offsets of > another structure. That would be a bug anyplace else in the kernel. Sounds > like it's a bug here too. > > I think the issue is with this code, not the tracing code. > > We could expose the struct syscall_trace_enter and syscall_trace_exit if > the offsets to those are needed. I don't think we need syscall_trace_* offsets, looks like trace_event_get_offsets() should return offset trace_event_raw_sys_enter instead. I am still trying to figure out how all of this works together. Maybe Alexei or Andrii have more context here.
On Wed, Oct 04, 2023 at 02:55:47PM +0200, Artem Savkov wrote: > On Tue, Oct 03, 2023 at 09:38:44PM -0400, Steven Rostedt wrote: > > On Mon, 2 Oct 2023 15:52:42 +0200 > > Artem Savkov <asavkov@redhat.com> wrote: > > > > > linux-rt-devel tree contains a patch that adds an extra member to struct > > > trace_entry. This causes the offset of args field in struct > > > trace_event_raw_sys_enter be different from the one in struct > > > syscall_trace_enter: > > > > This patch looks like it's fixing the symptom and not the issue. No code > > should rely on the two event structures to be related. That's an unwanted > > coupling, that will likely cause issues down the road (like the RT patch > > you mentioned). > > I agree, but I didn't see a better solution and that was my way of > starting conversation, thus the RFC. > > > > > > > struct trace_event_raw_sys_enter { > > > struct trace_entry ent; /* 0 12 */ > > > > > > /* XXX last struct has 3 bytes of padding */ > > > /* XXX 4 bytes hole, try to pack */ > > > > > > long int id; /* 16 8 */ > > > long unsigned int args[6]; /* 24 48 */ > > > /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */ > > > char __data[]; /* 72 0 */ > > > > > > /* size: 72, cachelines: 2, members: 4 */ > > > /* sum members: 68, holes: 1, sum holes: 4 */ > > > /* paddings: 1, sum paddings: 3 */ > > > /* last cacheline: 8 bytes */ > > > }; > > > > > > struct syscall_trace_enter { > > > struct trace_entry ent; /* 0 12 */ > > > > > > /* XXX last struct has 3 bytes of padding */ > > > > > > int nr; /* 12 4 */ > > > long unsigned int args[]; /* 16 0 */ > > > > > > /* size: 16, cachelines: 1, members: 3 */ > > > /* paddings: 1, sum paddings: 3 */ > > > /* last cacheline: 16 bytes */ > > > }; > > > > > > This, in turn, causes perf_event_set_bpf_prog() fail while running bpf > > > test_profiler testcase because max_ctx_offset is calculated based on the > > > former struct, while off on the latter: > > > > The above appears to be pointing to the real bug. The "is calculated based > > on the former struct while off on the latter" Why are the two being used > > together? They are supposed to be *unrelated*! > > > > > > > > > > 10488 if (is_tracepoint || is_syscall_tp) { > > > 10489 int off = trace_event_get_offsets(event->tp_event); > > > > So basically this is clumping together the raw_syscalls with the syscalls > > events as if they are the same. But the are not. They are created > > differently. It's basically like using one structure to get the offsets of > > another structure. That would be a bug anyplace else in the kernel. Sounds > > like it's a bug here too. > > > > I think the issue is with this code, not the tracing code. > > > > We could expose the struct syscall_trace_enter and syscall_trace_exit if > > the offsets to those are needed. > > I don't think we need syscall_trace_* offsets, looks like > trace_event_get_offsets() should return offset trace_event_raw_sys_enter > instead. I am still trying to figure out how all of this works together. > Maybe Alexei or Andrii have more context here. Turns out it is even more confusing. The tests dereference the context as struct trace_event_raw_sys_enter so bpf verifier sets max_ctx_offset based on that, then perf_event_set_bpf_prog() checks this offset against the one in struct syscall_trace_enter, but what bpf prog really gets is a pointer to struct syscall_tp_t from kernel/trace/trace_syscalls.c. I don't know the history behind these decisions, but should the tests dereference context as struct syscall_trace_enter instead and struct syscall_tp_t be changed to have syscall_nr as int?
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 77debe53f07cf..cd1d24df85364 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -135,13 +135,13 @@ enum trace_type { */ struct syscall_trace_enter { struct trace_entry ent; - int nr; + long nr; unsigned long args[]; }; struct syscall_trace_exit { struct trace_entry ent; - int nr; + long nr; long ret; }; diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c index de753403cdafb..c26939119f2e4 100644 --- a/kernel/trace/trace_syscalls.c +++ b/kernel/trace/trace_syscalls.c @@ -101,7 +101,7 @@ find_syscall_meta(unsigned long syscall) return NULL; } -static struct syscall_metadata *syscall_nr_to_meta(int nr) +static struct syscall_metadata *syscall_nr_to_meta(long nr) { if (IS_ENABLED(CONFIG_HAVE_SPARSE_SYSCALL_NR)) return xa_load(&syscalls_metadata_sparse, (unsigned long)nr); @@ -132,7 +132,8 @@ print_syscall_enter(struct trace_iterator *iter, int flags, struct trace_entry *ent = iter->ent; struct syscall_trace_enter *trace; struct syscall_metadata *entry; - int i, syscall; + int i; + long syscall; trace = (typeof(trace))ent; syscall = trace->nr; @@ -177,7 +178,7 @@ print_syscall_exit(struct trace_iterator *iter, int flags, struct trace_seq *s = &iter->seq; struct trace_entry *ent = iter->ent; struct syscall_trace_exit *trace; - int syscall; + long syscall; struct syscall_metadata *entry; trace = (typeof(trace))ent;