Message ID | 20230120230518.17697-1-beaub@linux.microsoft.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp471615wrn; Fri, 20 Jan 2023 15:06:30 -0800 (PST) X-Google-Smtp-Source: AMrXdXtRHyjzvUq+QlYRGELsd7JaSNRACGYKL9nQ6MLMwcRJgWBLAfkcfdrUzttXrn8WMuSZXW/P X-Received: by 2002:a17:906:a898:b0:820:4046:1586 with SMTP id ha24-20020a170906a89800b0082040461586mr13568382ejb.12.1674255990188; Fri, 20 Jan 2023 15:06:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674255990; cv=none; d=google.com; s=arc-20160816; b=YasirSwdPVjrm2d6qXMBirmFukXX9+WOXEBDrf4Mlbh1FMMgkwwhc4iC/TIU3Peiin fOUR+UWQciGIE2naHDhpw4mbnp0dxVienwBkTjDo5jkiObUiRtatPky7BAsLQ1qbii+D V/FhotINiqZQGvK4UZZHXWZDHMAUhkcWen3VJ/qENS++ZIEUTufpbQar5tUW39zlJHeO V03+mPgPjDcgQz7i7nL3eKiSJ33mzkf8e+wEcddGtVaCWdj/jm0KyXzMb+Q8w7UfkI5C td/yWR1iZ93g7u3VZ8a5wQGAE8VuAFQPFg115ulROl+oOir+NSlifaI9O8VG+A7WBNbd 3g2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature:dkim-filter; bh=vdWIm2aaGYjiZaxAd9wnXfqvqnFZmbJU/sur6oTWocY=; b=PaXLTlTPkPgdi0mv73IjM1Usa4mw2MpjFPo3zc38xU/eh15DQzuCthp9p6Sr1BZRRY RYh87KrVd4SWkiP4R+v1H+AxGSUU02xv4XQtCfQdO/G4wqAK7v0LhKgrd76znJygOhdC l37PPwwDzmdS9LyPdBHqZj6FJNLDu/pZMbnaiIq1DFiItvSO8X39inzhD7K3JjWcybI7 DbqdVpu0YU97PxHmj8V+BGK28fZG0zm4i2jz+uRvm/DBPVo4XeeUfbnmGi5Su4wvcXGe tpXzqneIvIz7xjfu/rw/zWD729cLJaK1wUE/QjuPUxEXrRgfdVVnFh8u8p7twIQJLYlG d3OA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=D4WsYKM2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id nd8-20020a170907628800b007807e1f3d9dsi15881384ejc.842.2023.01.20.15.06.06; Fri, 20 Jan 2023 15:06:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=D4WsYKM2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230125AbjATXF0 (ORCPT <rfc822;forouhar.linux@gmail.com> + 99 others); Fri, 20 Jan 2023 18:05:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44716 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229694AbjATXFZ (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 20 Jan 2023 18:05:25 -0500 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 7A1ED5E507; Fri, 20 Jan 2023 15:05:24 -0800 (PST) Received: from W11-BEAU-MD.localdomain (unknown [76.135.27.212]) by linux.microsoft.com (Postfix) with ESMTPSA id 018BF20E0A2F; Fri, 20 Jan 2023 15:05:23 -0800 (PST) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 018BF20E0A2F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1674255924; bh=vdWIm2aaGYjiZaxAd9wnXfqvqnFZmbJU/sur6oTWocY=; h=From:To:Cc:Subject:Date:From; b=D4WsYKM2rRJPsSezMfDBUUPLGaseqLjvl8clJ56V2QbGmrci9oDh9Kqn8rOeDldvO MxzEv/n5Z2yBI3XIXbGbxuGuQIip71A/s8SBetqBk+b4iduX3VuLO78HivVs5WNqo8 1IAI0pWq6o7I/hx+g38LBlfxbw2tNc0JYlcOkZvQ= From: Beau Belgrave <beaub@linux.microsoft.com> To: rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, dcook@linux.microsoft.com, alanau@linux.microsoft.com, brauner@kernel.org, akpm@linux-foundation.org Cc: linux-trace-devel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v7 00/11] tracing/user_events: Remote write ABI Date: Fri, 20 Jan 2023 15:05:07 -0800 Message-Id: <20230120230518.17697-1-beaub@linux.microsoft.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-19.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_PASS,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755584648984074645?= X-GMAIL-MSGID: =?utf-8?q?1755584648984074645?= |
Series |
tracing/user_events: Remote write ABI
|
|
Message
Beau Belgrave
Jan. 20, 2023, 11:05 p.m. UTC
As part of the discussions for user_events aligned with user space tracers, it was determined that user programs should register a aligned value to set or clear a bit when an event becomes enabled. Currently a shared page is being used that requires mmap(). Remove the shared page implementation and move to a user registered address implementation. In this new model during the event registration from user programs 3 new values are specified. The first is the address to update when the event is either enabled or disabled. The second is the bit to set/clear to reflect the event being enabled. The third is the size of the value at the specified address. This allows for a local 32/64-bit value in user programs to support both kernel and user tracers. As an example, setting bit 31 for kernel tracers when the event becomes enabled allows for user tracers to use the other bits for ref counts or other flags. The kernel side updates the bit atomically, user programs need to also update these values atomically. User provided addresses must be aligned on a natural boundary, this allows for single page checking and prevents odd behaviors such as a enable value straddling 2 pages instead of a single page. When page faults are encountered they are done asyncly via a workqueue. If the page faults back in, the write update is attempted again. If the page cannot fault-in, then we log and wait until the next time the event is enabled/disabled. This is to prevent possible infinite loops resulting from bad user processes unmapping or changing protection values after registering the address. Change history V7: Rebase to 6.2-rc4. Added flags to register ioctl, validates it's 0 for now. Future patches will enable other types of formats/options as needed. V6: Rebase to 6.2-rc2. Fixed small typos, code style. Changed from synchronize_rcu() to queue_rcu_work() to allow an rcu delay asyncly when mm is being removed and in an appropriate context for mmdrop(). V5: GFP_NOWAIT is still needed in user_event_enabler_dup(), due to rcu lock. V4: Rebase to 6.1-rc7. Moved user_events_fork() out of task signal lock and dropped use of GFP_NOWAIT. All allocations are now GFP_KERNEL or GFP_KERNEL_ACCOUNT. Added boot parameter user_events_max= to limit global events. Added sysctl value kernel.user_events_max to limit global events. Added cgroup tracking of memory allocated for events. V3: Rebase to 6.1-rc6. Removed RFC tag on series. Updated documentation to reflect ABI changes. Added self-test for ABI specific clone/fork cases. Moved user_event_mm removal into do_exit() to ensure RSS task accounting is done properly in async fault paths. Also lets us remove the delayed mmdrop(), saving memory in each user_event_mm struct. Fixed timing window where task exits, but write could be in-progress. During exit we now take mmap_write_lock to ensure we drain writes. V2: Rebase to 6.1-rc5. Added various comments based on feedback. Added enable_size to register struct, allows 32/64 bit addresses as long as the enable_bit fits and the address is naturally aligned. Changed user_event_enabler_write to accept a new flag indicating if a fault fixup should be done or not. This allows user_event_enabler_create to return back failures to the user ioctl reg call and retry to fault in data. Added tracking fork/exec/exit of tasks to have the user_event_mm lifetime tied more to the task than the file. This came with extra requirements around when you can lock, such as softirq cases, as well as a RCU pattern to ensure fork/exec/exit take minimal lock times. Changed enablers to use a single word-aligned value for saving the bit to set and any flags, such as faulting asyncly or being freed. This was required to ensure atomic bit set/test for fork cases where taking the event_mutex is not a good scalability decision. Added unregister IOCTL, since file lifetime no longer limits the enable time for any events (the mm does). Updated sample code to reflect the new remote write based ABI. Updated self-test code to reflect the new remote write based ABI. Beau Belgrave (11): tracing/user_events: Split header into uapi and kernel tracing/user_events: Track fork/exec/exit for mm lifetime tracing/user_events: Use remote writes for event enablement tracing/user_events: Fixup enable faults asyncly tracing/user_events: Add ioctl for disabling addresses tracing/user_events: Update self-tests to write ABI tracing/user_events: Add ABI self-test tracing/user_events: Use write ABI in example tracing/user_events: Update documentation for ABI tracing/user_events: Charge event allocs to cgroups tracing/user_events: Limit global user_event count Documentation/trace/user_events.rst | 177 ++-- fs/exec.c | 2 + include/linux/sched.h | 5 + include/linux/user_events.h | 101 +- include/uapi/linux/user_events.h | 81 ++ kernel/exit.c | 2 + kernel/fork.c | 2 + kernel/trace/Kconfig | 5 +- kernel/trace/trace_events_user.c | 863 +++++++++++++++--- samples/user_events/example.c | 47 +- tools/testing/selftests/user_events/Makefile | 2 +- .../testing/selftests/user_events/abi_test.c | 226 +++++ .../testing/selftests/user_events/dyn_test.c | 2 +- .../selftests/user_events/ftrace_test.c | 162 ++-- .../testing/selftests/user_events/perf_test.c | 39 +- 15 files changed, 1317 insertions(+), 399 deletions(-) create mode 100644 include/uapi/linux/user_events.h create mode 100644 tools/testing/selftests/user_events/abi_test.c base-commit: 5dc4c995db9eb45f6373a956eb1f69460e69e6d4
Comments
On Fri, 20 Jan 2023 15:05:07 -0800 Beau Belgrave <beaub@linux.microsoft.com> wrote: > Documentation/trace/user_events.rst | 177 ++-- > fs/exec.c | 2 + > include/linux/sched.h | 5 + > include/linux/user_events.h | 101 +- > include/uapi/linux/user_events.h | 81 ++ > kernel/exit.c | 2 + > kernel/fork.c | 2 + There's several files that are touched outside of the tracing subsystem. You may need to run get_maintainers on this to get their input. I started playing a little with this, but it won't mean anything if we get push back from these maintainers. -- Steve > kernel/trace/Kconfig | 5 +- > kernel/trace/trace_events_user.c | 863 +++++++++++++++--- > samples/user_events/example.c | 47 +- > tools/testing/selftests/user_events/Makefile | 2 +- > .../testing/selftests/user_events/abi_test.c | 226 +++++ > .../testing/selftests/user_events/dyn_test.c | 2 +- > .../selftests/user_events/ftrace_test.c | 162 ++-- > .../testing/selftests/user_events/perf_test.c | 39 +- > 15 files changed, 1317 insertions(+), 399 deletions(-) > create mode 100644 include/uapi/linux/user_events.h > create mode 100644 tools/testing/selftests/user_events/abi_test.c > > > base-commit: 5dc4c995db9eb45f6373a956eb1f69460e69e6d4
On Mon, Feb 20, 2023 at 05:01:35PM -0500, Steven Rostedt wrote: > On Fri, 20 Jan 2023 15:05:07 -0800 > Beau Belgrave <beaub@linux.microsoft.com> wrote: > > > > Documentation/trace/user_events.rst | 177 ++-- > > fs/exec.c | 2 + > > include/linux/sched.h | 5 + > > include/linux/user_events.h | 101 +- > > include/uapi/linux/user_events.h | 81 ++ > > kernel/exit.c | 2 + > > kernel/fork.c | 2 + > > There's several files that are touched outside of the tracing > subsystem. You may need to run get_maintainers on this to get their > input. I started playing a little with this, but it won't mean anything > if we get push back from these maintainers. > > -- Steve > Would you prefer I start another version and include the key maintainers from fs/exec.c, kernel/exit.c, and kernel/fork.c? I've added akpm and brauner in these patches. I've pinged akpm privately about these, but didn't get any responses. It seems like Eric Biederman, Kees Cook, and linux-mm would be good folks to add here from get_maintainers outputs. Thoughts? Thanks, -Beau > > > kernel/trace/Kconfig | 5 +- > > kernel/trace/trace_events_user.c | 863 +++++++++++++++--- > > samples/user_events/example.c | 47 +- > > tools/testing/selftests/user_events/Makefile | 2 +- > > .../testing/selftests/user_events/abi_test.c | 226 +++++ > > .../testing/selftests/user_events/dyn_test.c | 2 +- > > .../selftests/user_events/ftrace_test.c | 162 ++-- > > .../testing/selftests/user_events/perf_test.c | 39 +- > > 15 files changed, 1317 insertions(+), 399 deletions(-) > > create mode 100644 include/uapi/linux/user_events.h > > create mode 100644 tools/testing/selftests/user_events/abi_test.c > > > > > > base-commit: 5dc4c995db9eb45f6373a956eb1f69460e69e6d4
On Tue, 21 Feb 2023 09:42:51 -0800 Beau Belgrave <beaub@linux.microsoft.com> wrote: > Would you prefer I start another version and include the key maintainers > from fs/exec.c, kernel/exit.c, and kernel/fork.c? Yeah, you could just do a "[RESEND]" patch set, if nothing has changed (or maybe just rebase if needed. > > I've added akpm and brauner in these patches. I've pinged akpm privately > about these, but didn't get any responses. Yeah, I think he'd rather see what others think before doing anything. > > It seems like Eric Biederman, Kees Cook, and linux-mm would be good > folks to add here from get_maintainers outputs. Sure. And yes, definitely include linux-mm. -- Steve