From patchwork Thu Aug 17 15:37:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arnaldo Carvalho de Melo X-Patchwork-Id: 135963 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b82d:0:b0:3f2:4152:657d with SMTP id z13csp1054072vqi; Thu, 17 Aug 2023 17:00:02 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGXpQysAeb9ckWt0ZRYKh6RdYfzM0qzkvxPdpIfq9RiuxHEr21K//9sIhxn8pm11TFk1Sza X-Received: by 2002:a05:6a21:1f16:b0:133:cf5c:4d2e with SMTP id ry22-20020a056a211f1600b00133cf5c4d2emr1134149pzb.20.1692316802360; Thu, 17 Aug 2023 17:00:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1692316802; cv=none; d=google.com; s=arc-20160816; b=GY8MuBw/tSV1zbI1iBX7CSc1l+oLtNbr0XbHm5QmXm92Gwps7+wyW0SHhq8cXH12QS 2E4ASMIC2UzCqV8DnskiYy/nwmB9jHR+u7GmxmdjuPy8BX2Ro3wTudddz8T/fIHKbnCw sfNApySy4X9BXQ6eqYYTKpwdOpGksybdXPM3BhHyho2XplWkGHDq0dBXADz6IcSljh38 fYvJTEdcLyt7eSL9JSe4lUyufjTI6u4e7gM0WQsXXLMV9NTLHyyjrQVOVCBwQfgWO6iT BCJzReSuCQUHBlqChlJzFWnSYNSJQp8EurzJIuCTIXuHCDwUcLIbTXzJbUWYx6Vo12sL Laeg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-disposition:mime-version:message-id :subject:cc:to:from:date:dkim-signature; bh=B/gd+RL3e+5AvWbCMQ/31cY3CiGZI64tNJ47MwquAYM=; fh=O+rUN+ifSwssi4FXrEII0Ys+27U3tMXgRGa11U+atsU=; b=lGNFzs3bLC8sVESkmylrc5Z+i3fXVuyNYTgyqpL8qfwcdSFPIkw3O4PDBDy1CryPIz nip5cYXHB/KxcAw6CJe+HOJ8ae+YCZLN2CzhdqjYwUX7vE5wbCv43gwnQ1CmYX4eoH8h moorzm+An7MrijOmMRnBBfE1UmZVkNDjeEWaSWKAgeqg3s7s8gEbI8kkTqfQY91itaGI byFkEp5zmM0KA1Xr00sLx8CLIbfRtgenyHPmMReOkJlwUziJ1atRPNc+k4JBVFYTf7fs 9EEl23gZfOn2SO1WoEBfMsnu8H2ggOWCzzm+DhuA/gLtI5Oowd70FXevmHC7MbSKd5YP dDGA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=uE51BkgL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id kp13-20020a170903280d00b001b25e9a76d7si448851plb.316.2023.08.17.16.59.48; Thu, 17 Aug 2023 17:00:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=uE51BkgL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1352654AbjHQPhu (ORCPT + 99 others); Thu, 17 Aug 2023 11:37:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46552 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1353050AbjHQPh3 (ORCPT ); Thu, 17 Aug 2023 11:37:29 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0102F2D6D for ; Thu, 17 Aug 2023 08:37:27 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 84A3266396 for ; Thu, 17 Aug 2023 15:37:27 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8759AC433C9; Thu, 17 Aug 2023 15:37:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1692286646; bh=FsTOQy1J00h09RqORDICXkOwRWchRTxx4z6I0fSFUFo=; h=Date:From:To:Cc:Subject:From; b=uE51BkgLq5R2nrgeqCvtJgTcvmmzvUBASge7HSt89nmd8mvPSRvSNydYnNd0wU6N8 j0eJZWNEuNC3BkoAxS0Ev0GulJxwdnD0xdzh4vhnFdLo/W1r6cCM9v16BuBPJlNqF9 LShOhR9Fxrec+DNc+0vJvpC58c4JTFXZAtbORgnO1EZepqyufpdZPlRsq1maBGZgtn VeIuI6q8wxWHW0D0kDqcLBdcPlUwiaCYgGyQtzLwHUV0blwsZdT1Lc1WFc1VW/iU0z B+eQOjQBleKRevzOWOmTfvscWIskHar38lVn0vBHH7n/1srto/MImsyR1KdjmKYnyD f7UmniOxDsKig== Received: by quaco.ghostprotocols.net (Postfix, from userid 1000) id 986DD404DF; Thu, 17 Aug 2023 12:37:23 -0300 (-03) Date: Thu, 17 Aug 2023 12:37:23 -0300 From: Arnaldo Carvalho de Melo To: Ian Rogers , Namhyung Kim Cc: Adrian Hunter , Alexander Shishkin , Andi Kleen , Andrii Nakryiko , Anshuman Khandual , Athira Jajeev , bpf@vger.kernel.org, Brendan Gregg , Carsten Haitzler , Eduard Zingerman , Fangrui Song , He Kuang , Ingo Molnar , James Clark , Jiri Olsa , Kan Liang , Leo Yan , llvm@lists.linux.dev, Madhavan Srinivasan , Mark Rutland , Nathan Chancellor , "Naveen N. Rao" , Nick Desaulniers , Peter Zijlstra , Ravi Bangoria , Rob Herring , Tiezhu Yang , Tom Rix , Wang Nan , Wang ShaoBo , Yang Jihong , Yonghong Song , YueHaibing , Linux Kernel Mailing List Subject: [PATCH 1/1] perf trace: Use the augmented_raw_syscall BPF skel only for tracing syscalls Message-ID: MIME-Version: 1.0 Content-Disposition: inline X-Spam-Status: No, score=-1.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,HEXHASH_WORD, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1774522783331691018 X-GMAIL-MSGID: 1774522783331691018 It is possible to use 'perf trace' with tracepoints and in that case we can't initialize/use the augmented_raw_syscalls BPF skel. For instance, this usecase: # perf trace -e sched:*exec --max-events=5 ? ( ): NetworkManager/1183 ... [continued]: poll()) = 1 0.043 ( 0.007 ms): NetworkManager/1183 epoll_wait(epfd: 17, events: 0x55555f90e920, maxevents: 6) = 0 0.060 ( 0.007 ms): NetworkManager/1183 write(fd: 3, buf: 0x7ffc5a27cd30, count: 8) = 8 0.073 ( 0.005 ms): NetworkManager/1183 epoll_wait(epfd: 24, events: 0x7ffc5a27cd20, maxevents: 2) = 1 0.082 ( 0.010 ms): NetworkManager/1183 recvmmsg(fd: 26, mmsg: 0x7ffc5a27caa0, vlen: 8) = 1 # Where we want to trace just some sched tracepoints ending in 'exec' ends up tracing all syscalls. Fix it by checking existing trace->trace_syscalls boolean to see if we need the augmenter. A followup patch will move those sections of code used only with the augmenter to separate functions, to get it cleaner and remove the goto, done just for reviewing purposes. With this patch in place the previous behaviour is restored: no syscalls when we have other events and no syscall names: [root@quaco ~]# perf probe do_filp_open "filename=pathname->name:string" Added new event: probe:do_filp_open (on do_filp_open with filename=pathname->name:string) You can now use it in all perf tools, such as: perf record -e probe:do_filp_open -aR sleep 1 [root@quaco ~]# perf trace --max-events=10 -e probe:do_filp_open sleep 1 0.000 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/etc/ld.so.cache") 0.056 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/lib64/libc.so.6") 0.481 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/locale-archive") 0.501 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/share/locale/locale.alias") 0.572 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.UTF-8/LC_IDENTIFICATION") 0.581 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.utf8/LC_IDENTIFICATION") 0.616 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib64/gconv/gconv-modules.cache") 0.656 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.UTF-8/LC_MEASUREMENT") 0.664 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.utf8/LC_MEASUREMENT") 0.696 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.UTF-8/LC_TELEPHONE") [root@quaco ~]# As well as mixing syscalls with tracepoints, getting the syscall tracepoints used augmented using the BPF skel: [root@quaco ~]# perf trace --max-events=10 -e open*,probe:do_filp_open sleep 1 0.000 ( ): sleep/455124 openat(dfd: CWD, filename: "/etc/ld.so.cache", flags: RDONLY|CLOEXEC) ... 0.005 ( ): sleep/455124 probe:do_filp_open(__probe_ip: -1186560412, filename: "/etc/ld.so.cache") 0.000 ( 0.011 ms): sleep/455124 ... [continued]: openat()) = 3 0.031 ( ): sleep/455124 openat(dfd: CWD, filename: "/lib64/libc.so.6", flags: RDONLY|CLOEXEC) ... 0.033 ( ): sleep/455124 probe:do_filp_open(__probe_ip: -1186560412, filename: "/lib64/libc.so.6") 0.031 ( 0.006 ms): sleep/455124 ... [continued]: openat()) = 3 0.258 ( ): sleep/455124 openat(dfd: CWD, filename: "/usr/lib/locale/locale-archive", flags: RDONLY|CLOEXEC) ... 0.261 ( ): sleep/455124 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/locale-archive") 0.258 ( 0.006 ms): sleep/455124 ... [continued]: openat()) = -1 ENOENT (No such file or directory) 0.272 ( ): sleep/455124 openat(dfd: CWD, filename: "/usr/share/locale/locale.alias", flags: RDONLY|CLOEXEC) ... 0.273 ( ): sleep/455124 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/share/locale/locale.alias") A final note: the probe:do_filp_open uses a kprobe (probably optimized as its in the start of a function) that uses the kprobe_tracer mechanism in the kernel to collect the pathname->name string and stash it into the tracepoint created by 'perf probe' for that: [root@quaco ~]# cat /sys/kernel/debug/tracing/kprobe_events p:probe/do_filp_open _text+4621920 filename=+0(+0(%si)):string [root@quaco ~]# While the syscalls:sys_enter_openat tracepoint gets its string from a BPF program attached to raw_syscalls:sys_enter that tail calls into another BPF program that knows the types for the openat syscall args and thus can bpf_probe_read it right after the normal sys_enter/sys_enter_openat tracepoint payload that comes prefixed with whatever perf_event_open asked for (CPU, timestamp, etc): [root@quaco ~]# bpftool prog | grep -E "sys_enter |sys_enter_opena" -A3 3176: tracepoint name sys_enter tag 0bc3fc9d11754ba1 gpl loaded_at 2023-08-17T12:32:20-0300 uid 0 xlated 272B jited 257B memlock 4096B map_ids 2462,2466,2463 btf_id 2976 -- 3180: tracepoint name sys_enter_opena tag 19dd077f00ec2f58 gpl loaded_at 2023-08-17T12:32:20-0300 uid 0 xlated 328B jited 206B memlock 4096B map_ids 2466,2465 btf_id 2976 [root@quaco ~]# Fixes: 42963c8bedeb864b ("perf trace: Migrate BPF augmentation to use a skeleton") Cc: Adrian Hunter Cc: Alexander Shishkin Cc: Andi Kleen Cc: Andrii Nakryiko Cc: Anshuman Khandual Cc: Athira Jajeev Cc: bpf@vger.kernel.org Cc: Brendan Gregg Cc: Carsten Haitzler Cc: Eduard Zingerman Cc: Fangrui Song Cc: He Kuang Cc: Ian Rogers Cc: Ingo Molnar Cc: James Clark Cc: Jiri Olsa Cc: Kan Liang Cc: Leo Yan Cc: llvm@lists.linux.dev Cc: Madhavan Srinivasan Cc: Mark Rutland Cc: Namhyung Kim Cc: Nathan Chancellor Cc: Naveen N. Rao Cc: Nick Desaulniers Cc: Peter Zijlstra Cc: Ravi Bangoria Cc: Rob Herring Cc: Tiezhu Yang Cc: Tom Rix Cc: Wang Nan Cc: Wang ShaoBo Cc: Yang Jihong Cc: Yonghong Song Cc: YueHaibing Link: https://lore.kernel.org/lkml/ Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-trace.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c index 0ebfa95895e0bf4d..3964cf44cdbcb3e8 100644 --- a/tools/perf/builtin-trace.c +++ b/tools/perf/builtin-trace.c @@ -3895,7 +3895,7 @@ static int trace__run(struct trace *trace, int argc, const char **argv) if (err < 0) goto out_error_open; #ifdef HAVE_BPF_SKEL - { + if (trace->syscalls.events.bpf_output) { struct perf_cpu cpu; /* @@ -3916,7 +3916,7 @@ static int trace__run(struct trace *trace, int argc, const char **argv) goto out_error_mem; #ifdef HAVE_BPF_SKEL - if (trace->skel->progs.sys_enter) + if (trace->skel && trace->skel->progs.sys_enter) trace__init_syscalls_bpf_prog_array_maps(trace); #endif @@ -4850,6 +4850,9 @@ int cmd_trace(int argc, const char **argv) } #ifdef HAVE_BPF_SKEL + if (!trace.trace_syscalls) + goto skip_augmentation; + trace.skel = augmented_raw_syscalls_bpf__open(); if (!trace.skel) { pr_debug("Failed to open augmented syscalls BPF skeleton"); @@ -4884,6 +4887,7 @@ int cmd_trace(int argc, const char **argv) } trace.syscalls.events.bpf_output = evlist__last(trace.evlist); assert(!strcmp(evsel__name(trace.syscalls.events.bpf_output), "__augmented_syscalls__")); +skip_augmentation: #endif err = -1;