Message ID | 20230221211143.574-12-beaub@linux.microsoft.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp225274wrd; Tue, 21 Feb 2023 13:13:42 -0800 (PST) X-Google-Smtp-Source: AK7set8oZqazXcvzBzgnOX/lHeKQ+PYwJ6+OqWAjJnDC893hoO/watXgtN0mZymeDpu6f7Dx/d4a X-Received: by 2002:a05:6a20:7d9c:b0:bc:246c:9bdf with SMTP id v28-20020a056a207d9c00b000bc246c9bdfmr6496607pzj.1.1677014021949; Tue, 21 Feb 2023 13:13:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1677014021; cv=none; d=google.com; s=arc-20160816; b=RricF6RTs5XyNmjx3J7a9TWY7aPmwdq5/aiXtxFnfzreE1CYPyMpzuHH+0cUjLXZo1 UoVRzENcBT/RworPOGDtkjLFdwdCDUvxtrvmFxFRhr/87TkVlb59eS0or8AJD++lQVk+ lXAOaNYEPUM5e9CNBpwortymVwcFB+gm+QPQ/3lHbTmINgXcP9tws0hZX2vESy9+Iqei KMJDMngLKTRP6ERhjE4Om3Oo1nOn6+9eQ/bfRrWuQqR2y3+9ClOnn4Crd600nl6fFSzz WczbDojcz+dKH05E1H2bYNK5D+ZdCwNYqfjdp/zTPdPP5uBKtFIMZ+SCvISgsj8xSpGN pkEQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:dkim-filter; bh=I/N19C63THCRBfckR+N4mJDWmIQC1iuXco7+RGR9L6U=; b=E79/P55OpDDeVSwNCI6HMTqZsOuxmj66YjcVmAmFqhliQd7AFnEYJz3v0hPf41/yTe baL7J0rb/mUr42av5GJXV5C2ZKVm5fu04EsvgyjtbtQlYXxAWeZsy5uX+EO8XOf7XW1h aP8qbs3w2O38VU7E5rgxtpTqbkr6qcbBVpW/HwXsmgLjvbOpMTnW5vxa60UT6VVAC9SO qkyH44uvbICfbcRhgIgNAZpZOQ3ftuywbk5WGr8lsIuOfDR8NUzVy3Oe+bGIHXZAhwEO fLrURUFolJS2h4+ZY4Ul4mgW7FaVCu8fgA3UtSbog8VjbG9WorBk0vy0hEY9FeoHTFL/ 9fFQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=dCRzQGcI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k3-20020a170902c40300b0019a8119053esi17930487plk.327.2023.02.21.13.13.29; Tue, 21 Feb 2023 13:13:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=dCRzQGcI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229814AbjBUVMS (ORCPT <rfc822;hanasaki@gmail.com> + 99 others); Tue, 21 Feb 2023 16:12:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43432 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230010AbjBUVLy (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 21 Feb 2023 16:11:54 -0500 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 3C98A30E8A; Tue, 21 Feb 2023 13:11:53 -0800 (PST) Received: from W11-BEAU-MD.localdomain (unknown [76.135.27.212]) by linux.microsoft.com (Postfix) with ESMTPSA id BEE1620BC5E7; Tue, 21 Feb 2023 13:11:52 -0800 (PST) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com BEE1620BC5E7 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1677013913; bh=I/N19C63THCRBfckR+N4mJDWmIQC1iuXco7+RGR9L6U=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=dCRzQGcIfvRpOp9kpdwMTF1aaihpuvaIfHxS1euR/JCwnaOjxQUy3XkUPyUcF+G0t U2AkicFcPC9hBHiH7CZP+EKRF6aPM1gjYIrEUJz8BSq2VL0LT7plTe302im6xToFwl QP9/1kHRZa3txCLzPg3NMGhV5c8lGlBU2QrG8+Po= From: Beau Belgrave <beaub@linux.microsoft.com> To: rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, dcook@linux.microsoft.com, alanau@linux.microsoft.com, brauner@kernel.org, akpm@linux-foundation.org, ebiederm@xmission.com, keescook@chromium.org, tglx@linutronix.de Cc: linux-trace-devel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v8 11/11] tracing/user_events: Limit global user_event count Date: Tue, 21 Feb 2023 13:11:43 -0800 Message-Id: <20230221211143.574-12-beaub@linux.microsoft.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230221211143.574-1-beaub@linux.microsoft.com> References: <20230221211143.574-1-beaub@linux.microsoft.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-19.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_PASS,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758476655231703770?= X-GMAIL-MSGID: =?utf-8?q?1758476655231703770?= |
Series |
tracing/user_events: Remote write ABI
|
|
Commit Message
Beau Belgrave
Feb. 21, 2023, 9:11 p.m. UTC
Operators want to be able to ensure enough tracepoints exist on the
system for kernel components as well as for user components. Since there
are only up to 64K events, by default allow up to half to be used by
user events.
Add a boot parameter (user_events_max=%d) and a kernel sysctl parameter
(kernel.user_events_max) to set a global limit that is honored among all
groups on the system. This ensures hard limits can be setup to prevent
user processes from consuming all event IDs on the system.
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
---
kernel/trace/trace_events_user.c | 59 ++++++++++++++++++++++++++++++++
1 file changed, 59 insertions(+)
Comments
Hi Beau, On Tue, 21 Feb 2023 13:11:43 -0800 Beau Belgrave <beaub@linux.microsoft.com> wrote: > Operators want to be able to ensure enough tracepoints exist on the > system for kernel components as well as for user components. Since there > are only up to 64K events, by default allow up to half to be used by > user events. > > Add a boot parameter (user_events_max=%d) and a kernel sysctl parameter > (kernel.user_events_max) to set a global limit that is honored among all > groups on the system. This ensures hard limits can be setup to prevent > user processes from consuming all event IDs on the system. sysctl is good to me, but would we really need the kernel parameter? The user_events starts using when user-space is up, so I think setting the limit with sysctl is enough. BTW, Vlastimil tried to add 'sysctl.*' kernel parameter support(*). If we need a kernel cmdline support, I think this is more generic way. But it seems the discussion has been stopped. (*) https://patchwork.kernel.org/project/linux-mm/patch/20200427180433.7029-2-vbabka@suse.cz/ Thank you, > > Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> > --- > kernel/trace/trace_events_user.c | 59 ++++++++++++++++++++++++++++++++ > 1 file changed, 59 insertions(+) > > diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c > index 222f2eb59c7c..6a5ebe243999 100644 > --- a/kernel/trace/trace_events_user.c > +++ b/kernel/trace/trace_events_user.c > @@ -20,6 +20,7 @@ > #include <linux/types.h> > #include <linux/uaccess.h> > #include <linux/highmem.h> > +#include <linux/init.h> > #include <linux/user_events.h> > #include "trace.h" > #include "trace_dynevent.h" > @@ -61,6 +62,12 @@ struct user_event_group { > /* Group for init_user_ns mapping, top-most group */ > static struct user_event_group *init_group; > > +/* Max allowed events for the whole system */ > +static unsigned int max_user_events = 32768; > + > +/* Current number of events on the whole system */ > +static unsigned int current_user_events; > + > /* > * Stores per-event properties, as users register events > * within a file a user_event might be created if it does not > @@ -1241,6 +1248,8 @@ static int destroy_user_event(struct user_event *user) > { > int ret = 0; > > + lockdep_assert_held(&event_mutex); > + > /* Must destroy fields before call removal */ > user_event_destroy_fields(user); > > @@ -1257,6 +1266,11 @@ static int destroy_user_event(struct user_event *user) > kfree(EVENT_NAME(user)); > kfree(user); > > + if (current_user_events > 0) > + current_user_events--; > + else > + pr_alert("BUG: Bad current_user_events\n"); > + > return ret; > } > > @@ -1744,6 +1758,11 @@ static int user_event_parse(struct user_event_group *group, char *name, > > mutex_lock(&event_mutex); > > + if (current_user_events >= max_user_events) { > + ret = -EMFILE; > + goto put_user_lock; > + } > + > ret = user_event_trace_register(user); > > if (ret) > @@ -1755,6 +1774,7 @@ static int user_event_parse(struct user_event_group *group, char *name, > dyn_event_init(&user->devent, &user_event_dops); > dyn_event_add(&user->devent, &user->call); > hash_add(group->register_table, &user->node, key); > + current_user_events++; > > mutex_unlock(&event_mutex); > > @@ -2386,6 +2406,43 @@ static int create_user_tracefs(void) > return -ENODEV; > } > > +static int __init set_max_user_events(char *str) > +{ > + if (!str) > + return 0; > + > + if (kstrtouint(str, 0, &max_user_events)) > + return 0; > + > + return 1; > +} > +__setup("user_events_max=", set_max_user_events); > + > +static int set_max_user_events_sysctl(struct ctl_table *table, int write, > + void *buffer, size_t *lenp, loff_t *ppos) > +{ > + int ret; > + > + mutex_lock(&event_mutex); > + > + ret = proc_douintvec(table, write, buffer, lenp, ppos); > + > + mutex_unlock(&event_mutex); > + > + return ret; > +} > + > +static struct ctl_table user_event_sysctls[] = { > + { > + .procname = "user_events_max", > + .data = &max_user_events, > + .maxlen = sizeof(unsigned int), > + .mode = 0644, > + .proc_handler = set_max_user_events_sysctl, > + }, > + {} > +}; > + > static int __init trace_events_user_init(void) > { > int ret; > @@ -2415,6 +2472,8 @@ static int __init trace_events_user_init(void) > if (dyn_event_register(&user_event_dops)) > pr_warn("user_events could not register with dyn_events\n"); > > + register_sysctl_init("kernel", user_event_sysctls); > + > return 0; > } > > -- > 2.25.1 >
On 3/24/23 01:18, Masami Hiramatsu (Google) wrote: > Hi Beau, > > On Tue, 21 Feb 2023 13:11:43 -0800 > Beau Belgrave <beaub@linux.microsoft.com> wrote: > >> Operators want to be able to ensure enough tracepoints exist on the >> system for kernel components as well as for user components. Since there >> are only up to 64K events, by default allow up to half to be used by >> user events. >> >> Add a boot parameter (user_events_max=%d) and a kernel sysctl parameter >> (kernel.user_events_max) to set a global limit that is honored among all >> groups on the system. This ensures hard limits can be setup to prevent >> user processes from consuming all event IDs on the system. > > sysctl is good to me, but would we really need the kernel parameter? > The user_events starts using when user-space is up, so I think setting > the limit with sysctl is enough. > > BTW, Vlastimil tried to add 'sysctl.*' kernel parameter support(*). If we > need a kernel cmdline support, I think this is more generic way. But it > seems the discussion has been stopped. It was actually merged in 5.8. So sysctl should be sufficient with that. But maybe it's weird to start adding sysctls, when the rest of tracing tunables is AFAIK under /sys/kernel/tracing/ ? > (*) https://patchwork.kernel.org/project/linux-mm/patch/20200427180433.7029-2-vbabka@suse.cz/ > > Thank you, > >> >> Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> >> --- >> kernel/trace/trace_events_user.c | 59 ++++++++++++++++++++++++++++++++ >> 1 file changed, 59 insertions(+) >> >> diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c >> index 222f2eb59c7c..6a5ebe243999 100644 >> --- a/kernel/trace/trace_events_user.c >> +++ b/kernel/trace/trace_events_user.c >> @@ -20,6 +20,7 @@ >> #include <linux/types.h> >> #include <linux/uaccess.h> >> #include <linux/highmem.h> >> +#include <linux/init.h> >> #include <linux/user_events.h> >> #include "trace.h" >> #include "trace_dynevent.h" >> @@ -61,6 +62,12 @@ struct user_event_group { >> /* Group for init_user_ns mapping, top-most group */ >> static struct user_event_group *init_group; >> >> +/* Max allowed events for the whole system */ >> +static unsigned int max_user_events = 32768; >> + >> +/* Current number of events on the whole system */ >> +static unsigned int current_user_events; >> + >> /* >> * Stores per-event properties, as users register events >> * within a file a user_event might be created if it does not >> @@ -1241,6 +1248,8 @@ static int destroy_user_event(struct user_event *user) >> { >> int ret = 0; >> >> + lockdep_assert_held(&event_mutex); >> + >> /* Must destroy fields before call removal */ >> user_event_destroy_fields(user); >> >> @@ -1257,6 +1266,11 @@ static int destroy_user_event(struct user_event *user) >> kfree(EVENT_NAME(user)); >> kfree(user); >> >> + if (current_user_events > 0) >> + current_user_events--; >> + else >> + pr_alert("BUG: Bad current_user_events\n"); >> + >> return ret; >> } >> >> @@ -1744,6 +1758,11 @@ static int user_event_parse(struct user_event_group *group, char *name, >> >> mutex_lock(&event_mutex); >> >> + if (current_user_events >= max_user_events) { >> + ret = -EMFILE; >> + goto put_user_lock; >> + } >> + >> ret = user_event_trace_register(user); >> >> if (ret) >> @@ -1755,6 +1774,7 @@ static int user_event_parse(struct user_event_group *group, char *name, >> dyn_event_init(&user->devent, &user_event_dops); >> dyn_event_add(&user->devent, &user->call); >> hash_add(group->register_table, &user->node, key); >> + current_user_events++; >> >> mutex_unlock(&event_mutex); >> >> @@ -2386,6 +2406,43 @@ static int create_user_tracefs(void) >> return -ENODEV; >> } >> >> +static int __init set_max_user_events(char *str) >> +{ >> + if (!str) >> + return 0; >> + >> + if (kstrtouint(str, 0, &max_user_events)) >> + return 0; >> + >> + return 1; >> +} >> +__setup("user_events_max=", set_max_user_events); >> + >> +static int set_max_user_events_sysctl(struct ctl_table *table, int write, >> + void *buffer, size_t *lenp, loff_t *ppos) >> +{ >> + int ret; >> + >> + mutex_lock(&event_mutex); >> + >> + ret = proc_douintvec(table, write, buffer, lenp, ppos); >> + >> + mutex_unlock(&event_mutex); >> + >> + return ret; >> +} >> + >> +static struct ctl_table user_event_sysctls[] = { >> + { >> + .procname = "user_events_max", >> + .data = &max_user_events, >> + .maxlen = sizeof(unsigned int), >> + .mode = 0644, >> + .proc_handler = set_max_user_events_sysctl, >> + }, >> + {} >> +}; >> + >> static int __init trace_events_user_init(void) >> { >> int ret; >> @@ -2415,6 +2472,8 @@ static int __init trace_events_user_init(void) >> if (dyn_event_register(&user_event_dops)) >> pr_warn("user_events could not register with dyn_events\n"); >> >> + register_sysctl_init("kernel", user_event_sysctls); >> + >> return 0; >> } >> >> -- >> 2.25.1 >> > >
On Fri, Mar 24, 2023 at 09:54:48AM +0100, Vlastimil Babka wrote: > On 3/24/23 01:18, Masami Hiramatsu (Google) wrote: > > Hi Beau, > > > > On Tue, 21 Feb 2023 13:11:43 -0800 > > Beau Belgrave <beaub@linux.microsoft.com> wrote: > > > >> Operators want to be able to ensure enough tracepoints exist on the > >> system for kernel components as well as for user components. Since there > >> are only up to 64K events, by default allow up to half to be used by > >> user events. > >> > >> Add a boot parameter (user_events_max=%d) and a kernel sysctl parameter > >> (kernel.user_events_max) to set a global limit that is honored among all > >> groups on the system. This ensures hard limits can be setup to prevent > >> user processes from consuming all event IDs on the system. > > > > sysctl is good to me, but would we really need the kernel parameter? > > The user_events starts using when user-space is up, so I think setting > > the limit with sysctl is enough. > > > > BTW, Vlastimil tried to add 'sysctl.*' kernel parameter support(*). If we > > need a kernel cmdline support, I think this is more generic way. But it > > seems the discussion has been stopped. > > It was actually merged in 5.8. So sysctl should be sufficient with that. > But maybe it's weird to start adding sysctls, when the rest of tracing > tunables is AFAIK under /sys/kernel/tracing/ ? > During the TraceFS meetings Steven runs I was asked to add a boot parameter and sysctl for user_events to limit the max. To me, it seems when user_events moves toward namespace awareness sysctl might be easier to use from within a namespace to turn knobs. Happy to change to whatever, but I want to see Steven and Masami agree on the approach before doing so. Steven, do you agree with Masami to move to just sysctl? Thanks, -Beau > > > (*) https://patchwork.kernel.org/project/linux-mm/patch/20200427180433.7029-2-vbabka@suse.cz/ > > > > Thank you, > > > >> > >> Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> > >> --- > >> kernel/trace/trace_events_user.c | 59 ++++++++++++++++++++++++++++++++ > >> 1 file changed, 59 insertions(+) > >> > >> diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c > >> index 222f2eb59c7c..6a5ebe243999 100644 > >> --- a/kernel/trace/trace_events_user.c > >> +++ b/kernel/trace/trace_events_user.c > >> @@ -20,6 +20,7 @@ > >> #include <linux/types.h> > >> #include <linux/uaccess.h> > >> #include <linux/highmem.h> > >> +#include <linux/init.h> > >> #include <linux/user_events.h> > >> #include "trace.h" > >> #include "trace_dynevent.h" > >> @@ -61,6 +62,12 @@ struct user_event_group { > >> /* Group for init_user_ns mapping, top-most group */ > >> static struct user_event_group *init_group; > >> > >> +/* Max allowed events for the whole system */ > >> +static unsigned int max_user_events = 32768; > >> + > >> +/* Current number of events on the whole system */ > >> +static unsigned int current_user_events; > >> + > >> /* > >> * Stores per-event properties, as users register events > >> * within a file a user_event might be created if it does not > >> @@ -1241,6 +1248,8 @@ static int destroy_user_event(struct user_event *user) > >> { > >> int ret = 0; > >> > >> + lockdep_assert_held(&event_mutex); > >> + > >> /* Must destroy fields before call removal */ > >> user_event_destroy_fields(user); > >> > >> @@ -1257,6 +1266,11 @@ static int destroy_user_event(struct user_event *user) > >> kfree(EVENT_NAME(user)); > >> kfree(user); > >> > >> + if (current_user_events > 0) > >> + current_user_events--; > >> + else > >> + pr_alert("BUG: Bad current_user_events\n"); > >> + > >> return ret; > >> } > >> > >> @@ -1744,6 +1758,11 @@ static int user_event_parse(struct user_event_group *group, char *name, > >> > >> mutex_lock(&event_mutex); > >> > >> + if (current_user_events >= max_user_events) { > >> + ret = -EMFILE; > >> + goto put_user_lock; > >> + } > >> + > >> ret = user_event_trace_register(user); > >> > >> if (ret) > >> @@ -1755,6 +1774,7 @@ static int user_event_parse(struct user_event_group *group, char *name, > >> dyn_event_init(&user->devent, &user_event_dops); > >> dyn_event_add(&user->devent, &user->call); > >> hash_add(group->register_table, &user->node, key); > >> + current_user_events++; > >> > >> mutex_unlock(&event_mutex); > >> > >> @@ -2386,6 +2406,43 @@ static int create_user_tracefs(void) > >> return -ENODEV; > >> } > >> > >> +static int __init set_max_user_events(char *str) > >> +{ > >> + if (!str) > >> + return 0; > >> + > >> + if (kstrtouint(str, 0, &max_user_events)) > >> + return 0; > >> + > >> + return 1; > >> +} > >> +__setup("user_events_max=", set_max_user_events); > >> + > >> +static int set_max_user_events_sysctl(struct ctl_table *table, int write, > >> + void *buffer, size_t *lenp, loff_t *ppos) > >> +{ > >> + int ret; > >> + > >> + mutex_lock(&event_mutex); > >> + > >> + ret = proc_douintvec(table, write, buffer, lenp, ppos); > >> + > >> + mutex_unlock(&event_mutex); > >> + > >> + return ret; > >> +} > >> + > >> +static struct ctl_table user_event_sysctls[] = { > >> + { > >> + .procname = "user_events_max", > >> + .data = &max_user_events, > >> + .maxlen = sizeof(unsigned int), > >> + .mode = 0644, > >> + .proc_handler = set_max_user_events_sysctl, > >> + }, > >> + {} > >> +}; > >> + > >> static int __init trace_events_user_init(void) > >> { > >> int ret; > >> @@ -2415,6 +2472,8 @@ static int __init trace_events_user_init(void) > >> if (dyn_event_register(&user_event_dops)) > >> pr_warn("user_events could not register with dyn_events\n"); > >> > >> + register_sysctl_init("kernel", user_event_sysctls); > >> + > >> return 0; > >> } > >> > >> -- > >> 2.25.1 > >> > > > >
On Fri, 24 Mar 2023 09:43:53 -0700 Beau Belgrave <beaub@linux.microsoft.com> wrote: > > It was actually merged in 5.8. So sysctl should be sufficient with that. > > But maybe it's weird to start adding sysctls, when the rest of tracing > > tunables is AFAIK under /sys/kernel/tracing/ ? > > > > During the TraceFS meetings Steven runs I was asked to add a boot > parameter and sysctl for user_events to limit the max. > > To me, it seems when user_events moves toward namespace awareness > sysctl might be easier to use from within a namespace to turn knobs. > > Happy to change to whatever, but I want to see Steven and Masami agree > on the approach before doing so. > > Steven, do you agree with Masami to move to just sysctl? We do have some tracing related sysctls already: # cd /proc/sys/kernel # ls *trace* ftrace_dump_on_oops oops_all_cpu_backtrace traceoff_on_warning ftrace_enabled stack_tracer_enabled tracepoint_printk Although I would love to deprecated ftrace_enable as that now has a control in tracefs, but it's not unprecedented to have tracing tunables as sysctl. And if we get cmdline boot parameters for free from sysctls then all the better. -- Steve
On Fri, 24 Mar 2023 13:06:59 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > On Fri, 24 Mar 2023 09:43:53 -0700 > Beau Belgrave <beaub@linux.microsoft.com> wrote: > > > > It was actually merged in 5.8. So sysctl should be sufficient with that. > > > But maybe it's weird to start adding sysctls, when the rest of tracing > > > tunables is AFAIK under /sys/kernel/tracing/ ? > > > > > > > During the TraceFS meetings Steven runs I was asked to add a boot > > parameter and sysctl for user_events to limit the max. > > > > To me, it seems when user_events moves toward namespace awareness > > sysctl might be easier to use from within a namespace to turn knobs. > > > > Happy to change to whatever, but I want to see Steven and Masami agree > > on the approach before doing so. > > > > Steven, do you agree with Masami to move to just sysctl? > > We do have some tracing related sysctls already: > > # cd /proc/sys/kernel > # ls *trace* > ftrace_dump_on_oops oops_all_cpu_backtrace traceoff_on_warning > ftrace_enabled stack_tracer_enabled tracepoint_printk > > Although I would love to deprecated ftrace_enable as that now has a > control in tracefs, but it's not unprecedented to have tracing tunables as > sysctl. > > And if we get cmdline boot parameters for free from sysctls then all the > better. Yeah, I confirmed that sysctl can be set via kernel parameter. So it is OK for me to add a sysctl. Thank you, > > -- Steve
diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c index 222f2eb59c7c..6a5ebe243999 100644 --- a/kernel/trace/trace_events_user.c +++ b/kernel/trace/trace_events_user.c @@ -20,6 +20,7 @@ #include <linux/types.h> #include <linux/uaccess.h> #include <linux/highmem.h> +#include <linux/init.h> #include <linux/user_events.h> #include "trace.h" #include "trace_dynevent.h" @@ -61,6 +62,12 @@ struct user_event_group { /* Group for init_user_ns mapping, top-most group */ static struct user_event_group *init_group; +/* Max allowed events for the whole system */ +static unsigned int max_user_events = 32768; + +/* Current number of events on the whole system */ +static unsigned int current_user_events; + /* * Stores per-event properties, as users register events * within a file a user_event might be created if it does not @@ -1241,6 +1248,8 @@ static int destroy_user_event(struct user_event *user) { int ret = 0; + lockdep_assert_held(&event_mutex); + /* Must destroy fields before call removal */ user_event_destroy_fields(user); @@ -1257,6 +1266,11 @@ static int destroy_user_event(struct user_event *user) kfree(EVENT_NAME(user)); kfree(user); + if (current_user_events > 0) + current_user_events--; + else + pr_alert("BUG: Bad current_user_events\n"); + return ret; } @@ -1744,6 +1758,11 @@ static int user_event_parse(struct user_event_group *group, char *name, mutex_lock(&event_mutex); + if (current_user_events >= max_user_events) { + ret = -EMFILE; + goto put_user_lock; + } + ret = user_event_trace_register(user); if (ret) @@ -1755,6 +1774,7 @@ static int user_event_parse(struct user_event_group *group, char *name, dyn_event_init(&user->devent, &user_event_dops); dyn_event_add(&user->devent, &user->call); hash_add(group->register_table, &user->node, key); + current_user_events++; mutex_unlock(&event_mutex); @@ -2386,6 +2406,43 @@ static int create_user_tracefs(void) return -ENODEV; } +static int __init set_max_user_events(char *str) +{ + if (!str) + return 0; + + if (kstrtouint(str, 0, &max_user_events)) + return 0; + + return 1; +} +__setup("user_events_max=", set_max_user_events); + +static int set_max_user_events_sysctl(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + int ret; + + mutex_lock(&event_mutex); + + ret = proc_douintvec(table, write, buffer, lenp, ppos); + + mutex_unlock(&event_mutex); + + return ret; +} + +static struct ctl_table user_event_sysctls[] = { + { + .procname = "user_events_max", + .data = &max_user_events, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = set_max_user_events_sysctl, + }, + {} +}; + static int __init trace_events_user_init(void) { int ret; @@ -2415,6 +2472,8 @@ static int __init trace_events_user_init(void) if (dyn_event_register(&user_event_dops)) pr_warn("user_events could not register with dyn_events\n"); + register_sysctl_init("kernel", user_event_sysctls); + return 0; }