[v3,4/7] perf record: Track sideband events for all CPUs when tracing selected CPUs
Message ID | 20230722093219.174898-5-yangjihong1@huawei.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9010:0:b0:3e4:2afc:c1 with SMTP id l16csp727523vqg; Sat, 22 Jul 2023 03:28:52 -0700 (PDT) X-Google-Smtp-Source: APBJJlEYo6f90SHExzP2/wOWkysKPd0egP75oEfyNnK/tHlyhkpfF0bjB0dNERO+gz40KruQyrLw X-Received: by 2002:a17:902:7893:b0:1b9:d961:69b7 with SMTP id q19-20020a170902789300b001b9d96169b7mr4350254pll.10.1690021731935; Sat, 22 Jul 2023 03:28:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690021731; cv=none; d=google.com; s=arc-20160816; b=yRvl5mMVOCIV6wJ8C98XdoIPUsy3mLth1Rnk00ECTFZV5jvAkiL3N8Ucde0Pcs9qz1 GNkO31N7fIXMcZpmnJ6TAbfbCatvIvJRlJ8Hb23PmsNFrnOfKFqODhU7Kr0n4dCEUE8L 3L7diX7FG++ncakaEq/4qtYrBU6aJbrp8SDLlb3ApycKAsLlQ6TZfP3JIcBqk28ezzL1 hEOP9rLkPOfuAD4QJdUGFCdgZJ+7HthQO0iIwTJfaTMsEslkWMnmsDHkvQFLF0rT98nm bC7axpm5kHNUrw/Wa1g8eLWONQ2e9wNcfPLj1e9un3E3SSrq2F+r846RDdHhBjL7/ZoI VqiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=5ZnIEtk84GprAgAni+9Xy3W6UwXfLZPX8UXOtpQ94Hg=; fh=Ib1wi7jRbo+eNDLEErvWCJx+cnUm87PkGFthhdFf08Y=; b=vrvijrZUOATHj247ziyUYx/uEB3IRbQpm9lIRw3q39nMLO0JM5oHiekZy7REzDLrth kXQSg5H12e9Dd+uHRLCfGMi3QCsQ0MyOKQ8bL88clfW9pil/yOKEbYKOD6nkGXgSufWp VNequfYlbBJPvOl+6B+YCo+vcIIbjki/0IDyWRHXF2/EgEN5r98YF9nXkLFUoNSExVGr oHYh9W8Jo58FHp+Lk3PWW4nTf0MhSA+IAmRDJUZW6YYh6OqqFJUhSFFk8pp9AbBK3vcj 0ba1QH/enUzBqA59NS4NQzHJvrH6Z4/vPrZDXp8254ldL/qxJoToDOeyQnvNNxuTWeXM 9dgw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ks3-20020a170903084300b001b8b4330585si4609645plb.510.2023.07.22.03.28.38; Sat, 22 Jul 2023 03:28:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230049AbjGVJfP (ORCPT <rfc822;assdfgzxcv4@gmail.com> + 99 others); Sat, 22 Jul 2023 05:35:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56364 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229805AbjGVJes (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sat, 22 Jul 2023 05:34:48 -0400 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B38F430EF; Sat, 22 Jul 2023 02:34:46 -0700 (PDT) Received: from kwepemm600003.china.huawei.com (unknown [172.30.72.55]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4R7LpW34g8zrRjF; Sat, 22 Jul 2023 17:33:55 +0800 (CST) Received: from localhost.localdomain (10.67.174.95) by kwepemm600003.china.huawei.com (7.193.23.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Sat, 22 Jul 2023 17:34:43 +0800 From: Yang Jihong <yangjihong1@huawei.com> To: <peterz@infradead.org>, <mingo@redhat.com>, <acme@kernel.org>, <mark.rutland@arm.com>, <alexander.shishkin@linux.intel.com>, <jolsa@kernel.org>, <namhyung@kernel.org>, <irogers@google.com>, <adrian.hunter@intel.com>, <kan.liang@linux.intel.com>, <james.clark@arm.com>, <tmricht@linux.ibm.com>, <ak@linux.intel.com>, <anshuman.khandual@arm.com>, <linux-kernel@vger.kernel.org>, <linux-perf-users@vger.kernel.org> CC: <yangjihong1@huawei.com> Subject: [PATCH v3 4/7] perf record: Track sideband events for all CPUs when tracing selected CPUs Date: Sat, 22 Jul 2023 09:32:16 +0000 Message-ID: <20230722093219.174898-5-yangjihong1@huawei.com> X-Mailer: git-send-email 2.30.GIT In-Reply-To: <20230722093219.174898-1-yangjihong1@huawei.com> References: <20230722093219.174898-1-yangjihong1@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [10.67.174.95] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To kwepemm600003.china.huawei.com (7.193.23.202) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772115935790334986 X-GMAIL-MSGID: 1772116227825447354 |
Series |
perf record: Track sideband events for all CPUs when tracing selected CPUs
|
|
Commit Message
Yang Jihong
July 22, 2023, 9:32 a.m. UTC
User space tasks can migrate between CPUs, we need to track side-band
events for all CPUs.
The specific scenarios are as follows:
CPU0 CPU1
perf record -C 0 start
taskA starts to be created and executed
-> PERF_RECORD_COMM and PERF_RECORD_MMAP
events only deliver to CPU1
......
|
migrate to CPU0
|
Running on CPU0 <----------/
...
perf record -C 0 stop
Now perf samples the PC of taskA. However, perf does not record the
PERF_RECORD_COMM and PERF_RECORD_MMAP events of taskA.
Therefore, the comm and symbols of taskA cannot be parsed.
The solution is to record sideband events for all CPUs when tracing
selected CPUs. Because this modifies the default behavior, add related
comments to the perf record man page.
The sys_perf_event_open invoked is as follows:
# perf --debug verbose=3 record -e cpu-clock -C 1 true
<SNIP>
Opening: cpu-clock
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
config 0 (PERF_COUNT_SW_CPU_CLOCK)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID|LOST
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 5
Opening: dummy:u
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
config 0x9 (PERF_COUNT_SW_DUMMY)
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|CPU|IDENTIFIER
read_format ID|LOST
inherit 1
exclude_kernel 1
exclude_hv 1
mmap 1
comm 1
task 1
sample_id_all 1
exclude_guest 1
mmap2 1
comm_exec 1
ksymbol 1
bpf_event 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6
sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 7
sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 9
sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 10
sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 11
sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 12
sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 13
sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 14
<SNIP>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
---
tools/perf/Documentation/perf-record.txt | 3 +++
tools/perf/builtin-record.c | 14 +++++++++++++-
2 files changed, 16 insertions(+), 1 deletion(-)
Comments
On 22/07/23 12:32, Yang Jihong wrote: > User space tasks can migrate between CPUs, we need to track side-band > events for all CPUs. > > The specific scenarios are as follows: > > CPU0 CPU1 > perf record -C 0 start > taskA starts to be created and executed > -> PERF_RECORD_COMM and PERF_RECORD_MMAP > events only deliver to CPU1 > ...... > | > migrate to CPU0 > | > Running on CPU0 <----------/ > ... > > perf record -C 0 stop > > Now perf samples the PC of taskA. However, perf does not record the > PERF_RECORD_COMM and PERF_RECORD_MMAP events of taskA. > Therefore, the comm and symbols of taskA cannot be parsed. > > The solution is to record sideband events for all CPUs when tracing > selected CPUs. Because this modifies the default behavior, add related > comments to the perf record man page. > > The sys_perf_event_open invoked is as follows: > > # perf --debug verbose=3 record -e cpu-clock -C 1 true > <SNIP> > Opening: cpu-clock > ------------------------------------------------------------ > perf_event_attr: > type 1 (PERF_TYPE_SOFTWARE) > size 136 > config 0 (PERF_COUNT_SW_CPU_CLOCK) > { sample_period, sample_freq } 4000 > sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER > read_format ID|LOST > disabled 1 > inherit 1 > freq 1 > sample_id_all 1 > exclude_guest 1 > ------------------------------------------------------------ > sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 5 > Opening: dummy:u > ------------------------------------------------------------ > perf_event_attr: > type 1 (PERF_TYPE_SOFTWARE) > size 136 > config 0x9 (PERF_COUNT_SW_DUMMY) > { sample_period, sample_freq } 1 > sample_type IP|TID|TIME|CPU|IDENTIFIER > read_format ID|LOST > inherit 1 > exclude_kernel 1 > exclude_hv 1 > mmap 1 > comm 1 > task 1 > sample_id_all 1 > exclude_guest 1 > mmap2 1 > comm_exec 1 > ksymbol 1 > bpf_event 1 > ------------------------------------------------------------ > sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6 > sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 7 > sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 9 > sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 10 > sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 11 > sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 12 > sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 13 > sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 14 > <SNIP> > > Signed-off-by: Yang Jihong <yangjihong1@huawei.com> > --- > tools/perf/Documentation/perf-record.txt | 3 +++ > tools/perf/builtin-record.c | 14 +++++++++++++- > 2 files changed, 16 insertions(+), 1 deletion(-) > > diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt > index 680396c56bd1..dac53ece51ab 100644 > --- a/tools/perf/Documentation/perf-record.txt > +++ b/tools/perf/Documentation/perf-record.txt > @@ -388,6 +388,9 @@ comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0- > In per-thread mode with inheritance mode on (default), samples are captured only when > the thread executes on the designated CPUs. Default is to monitor all CPUs. > > +User space tasks can migrate between CPUs, so when tracing selected CPUs, > +a dummy event is created to track sideband for all CPUs. > + > -B:: > --no-buildid:: > Do not save the build ids of binaries in the perf.data files. This skips > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c > index 3ff9d972225e..4e8e97928f05 100644 > --- a/tools/perf/builtin-record.c > +++ b/tools/perf/builtin-record.c > @@ -912,6 +912,7 @@ static int record__config_tracking_events(struct record *rec) > { > struct record_opts *opts = &rec->opts; > struct evlist *evlist = rec->evlist; > + bool system_wide = false; > struct evsel *evsel; > > /* > @@ -921,7 +922,18 @@ static int record__config_tracking_events(struct record *rec) > */ > if (opts->target.initial_delay || target__has_cpu(&opts->target) || > perf_pmus__num_core_pmus() > 1) { > - evsel = evlist__findnew_tracking_event(evlist, false); > + > + /* > + * User space tasks can migrate between CPUs, so when tracing > + * selected CPUs, sideband for all CPUs is still needed. > + * > + * If all (non-dummy) evsel have exclude_user, > + * system_wide is not needed. > + */ > + if (!!opts->target.cpu_list && !opts->all_kernel) Not everyone uses all-kernel. Can we check the evsels are either dummy or exclude_user? > + system_wide = true; > + > + evsel = evlist__findnew_tracking_event(evlist, system_wide); > if (!evsel) > return -ENOMEM; >
Hello, On 2023/7/31 19:08, Adrian Hunter wrote: > On 22/07/23 12:32, Yang Jihong wrote: >> User space tasks can migrate between CPUs, we need to track side-band >> events for all CPUs. >> >> The specific scenarios are as follows: >> >> CPU0 CPU1 >> perf record -C 0 start >> taskA starts to be created and executed >> -> PERF_RECORD_COMM and PERF_RECORD_MMAP >> events only deliver to CPU1 >> ...... >> | >> migrate to CPU0 >> | >> Running on CPU0 <----------/ >> ... >> >> perf record -C 0 stop >> >> Now perf samples the PC of taskA. However, perf does not record the >> PERF_RECORD_COMM and PERF_RECORD_MMAP events of taskA. >> Therefore, the comm and symbols of taskA cannot be parsed. >> >> The solution is to record sideband events for all CPUs when tracing >> selected CPUs. Because this modifies the default behavior, add related >> comments to the perf record man page. >> >> The sys_perf_event_open invoked is as follows: >> >> # perf --debug verbose=3 record -e cpu-clock -C 1 true >> <SNIP> >> Opening: cpu-clock >> ------------------------------------------------------------ >> perf_event_attr: >> type 1 (PERF_TYPE_SOFTWARE) >> size 136 >> config 0 (PERF_COUNT_SW_CPU_CLOCK) >> { sample_period, sample_freq } 4000 >> sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER >> read_format ID|LOST >> disabled 1 >> inherit 1 >> freq 1 >> sample_id_all 1 >> exclude_guest 1 >> ------------------------------------------------------------ >> sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 5 >> Opening: dummy:u >> ------------------------------------------------------------ >> perf_event_attr: >> type 1 (PERF_TYPE_SOFTWARE) >> size 136 >> config 0x9 (PERF_COUNT_SW_DUMMY) >> { sample_period, sample_freq } 1 >> sample_type IP|TID|TIME|CPU|IDENTIFIER >> read_format ID|LOST >> inherit 1 >> exclude_kernel 1 >> exclude_hv 1 >> mmap 1 >> comm 1 >> task 1 >> sample_id_all 1 >> exclude_guest 1 >> mmap2 1 >> comm_exec 1 >> ksymbol 1 >> bpf_event 1 >> ------------------------------------------------------------ >> sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6 >> sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 7 >> sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 9 >> sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 10 >> sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 11 >> sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 12 >> sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 13 >> sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 14 >> <SNIP> >> >> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> >> --- >> tools/perf/Documentation/perf-record.txt | 3 +++ >> tools/perf/builtin-record.c | 14 +++++++++++++- >> 2 files changed, 16 insertions(+), 1 deletion(-) >> >> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt >> index 680396c56bd1..dac53ece51ab 100644 >> --- a/tools/perf/Documentation/perf-record.txt >> +++ b/tools/perf/Documentation/perf-record.txt >> @@ -388,6 +388,9 @@ comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0- >> In per-thread mode with inheritance mode on (default), samples are captured only when >> the thread executes on the designated CPUs. Default is to monitor all CPUs. >> >> +User space tasks can migrate between CPUs, so when tracing selected CPUs, >> +a dummy event is created to track sideband for all CPUs. >> + >> -B:: >> --no-buildid:: >> Do not save the build ids of binaries in the perf.data files. This skips >> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c >> index 3ff9d972225e..4e8e97928f05 100644 >> --- a/tools/perf/builtin-record.c >> +++ b/tools/perf/builtin-record.c >> @@ -912,6 +912,7 @@ static int record__config_tracking_events(struct record *rec) >> { >> struct record_opts *opts = &rec->opts; >> struct evlist *evlist = rec->evlist; >> + bool system_wide = false; >> struct evsel *evsel; >> >> /* >> @@ -921,7 +922,18 @@ static int record__config_tracking_events(struct record *rec) >> */ >> if (opts->target.initial_delay || target__has_cpu(&opts->target) || >> perf_pmus__num_core_pmus() > 1) { >> - evsel = evlist__findnew_tracking_event(evlist, false); >> + >> + /* >> + * User space tasks can migrate between CPUs, so when tracing >> + * selected CPUs, sideband for all CPUs is still needed. >> + * >> + * If all (non-dummy) evsel have exclude_user, >> + * system_wide is not needed. >> + */ >> + if (!!opts->target.cpu_list && !opts->all_kernel) > > Not everyone uses all-kernel. Can we check the evsels are either dummy > or exclude_user? For perf_record, exclude_user of all evsels is set in evsel__config(), and record__config_tracking_events() is before evsel__config(). Uh..., it seems that only opts->all_kernel can be used to check exclude_user of evsels. void evsel__config() { ... if (opts->all_kernel) { attr->exclude_kernel = 0; attr->exclude_user = 1; } ... } Thanks, Yang
On 31/07/23 15:38, Yang Jihong wrote: > Hello, > > On 2023/7/31 19:08, Adrian Hunter wrote: >> On 22/07/23 12:32, Yang Jihong wrote: >>> User space tasks can migrate between CPUs, we need to track side-band >>> events for all CPUs. >>> >>> The specific scenarios are as follows: >>> >>> CPU0 CPU1 >>> perf record -C 0 start >>> taskA starts to be created and executed >>> -> PERF_RECORD_COMM and PERF_RECORD_MMAP >>> events only deliver to CPU1 >>> ...... >>> | >>> migrate to CPU0 >>> | >>> Running on CPU0 <----------/ >>> ... >>> >>> perf record -C 0 stop >>> >>> Now perf samples the PC of taskA. However, perf does not record the >>> PERF_RECORD_COMM and PERF_RECORD_MMAP events of taskA. >>> Therefore, the comm and symbols of taskA cannot be parsed. >>> >>> The solution is to record sideband events for all CPUs when tracing >>> selected CPUs. Because this modifies the default behavior, add related >>> comments to the perf record man page. >>> >>> The sys_perf_event_open invoked is as follows: >>> >>> # perf --debug verbose=3 record -e cpu-clock -C 1 true >>> <SNIP> >>> Opening: cpu-clock >>> ------------------------------------------------------------ >>> perf_event_attr: >>> type 1 (PERF_TYPE_SOFTWARE) >>> size 136 >>> config 0 (PERF_COUNT_SW_CPU_CLOCK) >>> { sample_period, sample_freq } 4000 >>> sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER >>> read_format ID|LOST >>> disabled 1 >>> inherit 1 >>> freq 1 >>> sample_id_all 1 >>> exclude_guest 1 >>> ------------------------------------------------------------ >>> sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 5 >>> Opening: dummy:u >>> ------------------------------------------------------------ >>> perf_event_attr: >>> type 1 (PERF_TYPE_SOFTWARE) >>> size 136 >>> config 0x9 (PERF_COUNT_SW_DUMMY) >>> { sample_period, sample_freq } 1 >>> sample_type IP|TID|TIME|CPU|IDENTIFIER >>> read_format ID|LOST >>> inherit 1 >>> exclude_kernel 1 >>> exclude_hv 1 >>> mmap 1 >>> comm 1 >>> task 1 >>> sample_id_all 1 >>> exclude_guest 1 >>> mmap2 1 >>> comm_exec 1 >>> ksymbol 1 >>> bpf_event 1 >>> ------------------------------------------------------------ >>> sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6 >>> sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 7 >>> sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 9 >>> sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 10 >>> sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 11 >>> sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 12 >>> sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 13 >>> sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 14 >>> <SNIP> >>> >>> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> >>> --- >>> tools/perf/Documentation/perf-record.txt | 3 +++ >>> tools/perf/builtin-record.c | 14 +++++++++++++- >>> 2 files changed, 16 insertions(+), 1 deletion(-) >>> >>> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt >>> index 680396c56bd1..dac53ece51ab 100644 >>> --- a/tools/perf/Documentation/perf-record.txt >>> +++ b/tools/perf/Documentation/perf-record.txt >>> @@ -388,6 +388,9 @@ comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0- >>> In per-thread mode with inheritance mode on (default), samples are captured only when >>> the thread executes on the designated CPUs. Default is to monitor all CPUs. >>> +User space tasks can migrate between CPUs, so when tracing selected CPUs, >>> +a dummy event is created to track sideband for all CPUs. >>> + >>> -B:: >>> --no-buildid:: >>> Do not save the build ids of binaries in the perf.data files. This skips >>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c >>> index 3ff9d972225e..4e8e97928f05 100644 >>> --- a/tools/perf/builtin-record.c >>> +++ b/tools/perf/builtin-record.c >>> @@ -912,6 +912,7 @@ static int record__config_tracking_events(struct record *rec) >>> { >>> struct record_opts *opts = &rec->opts; >>> struct evlist *evlist = rec->evlist; >>> + bool system_wide = false; >>> struct evsel *evsel; >>> /* >>> @@ -921,7 +922,18 @@ static int record__config_tracking_events(struct record *rec) >>> */ >>> if (opts->target.initial_delay || target__has_cpu(&opts->target) || >>> perf_pmus__num_core_pmus() > 1) { >>> - evsel = evlist__findnew_tracking_event(evlist, false); >>> + >>> + /* >>> + * User space tasks can migrate between CPUs, so when tracing >>> + * selected CPUs, sideband for all CPUs is still needed. >>> + * >>> + * If all (non-dummy) evsel have exclude_user, >>> + * system_wide is not needed. >>> + */ >>> + if (!!opts->target.cpu_list && !opts->all_kernel) >> >> Not everyone uses all-kernel. Can we check the evsels are either dummy >> or exclude_user? > For perf_record, exclude_user of all evsels is set in evsel__config(), and record__config_tracking_events() is before evsel__config(). > > Uh..., it seems that only opts->all_kernel can be used to check exclude_user of evsels. > > void evsel__config() > { > ... > if (opts->all_kernel) { > attr->exclude_kernel = 0; > attr->exclude_user = 1; > } > ... > } The parser updates attr in accordance with ":k" etc. I guess opts->all_kernel or opts->all_user override that as well.
Hello, On 2023/7/31 21:01, Adrian Hunter wrote: > On 31/07/23 15:38, Yang Jihong wrote: >> Hello, >> >> On 2023/7/31 19:08, Adrian Hunter wrote: >>> On 22/07/23 12:32, Yang Jihong wrote: >>>> User space tasks can migrate between CPUs, we need to track side-band >>>> events for all CPUs. >>>> >>>> The specific scenarios are as follows: >>>> >>>> CPU0 CPU1 >>>> perf record -C 0 start >>>> taskA starts to be created and executed >>>> -> PERF_RECORD_COMM and PERF_RECORD_MMAP >>>> events only deliver to CPU1 >>>> ...... >>>> | >>>> migrate to CPU0 >>>> | >>>> Running on CPU0 <----------/ >>>> ... >>>> >>>> perf record -C 0 stop >>>> >>>> Now perf samples the PC of taskA. However, perf does not record the >>>> PERF_RECORD_COMM and PERF_RECORD_MMAP events of taskA. >>>> Therefore, the comm and symbols of taskA cannot be parsed. >>>> >>>> The solution is to record sideband events for all CPUs when tracing >>>> selected CPUs. Because this modifies the default behavior, add related >>>> comments to the perf record man page. >>>> >>>> The sys_perf_event_open invoked is as follows: >>>> >>>> # perf --debug verbose=3 record -e cpu-clock -C 1 true >>>> <SNIP> >>>> Opening: cpu-clock >>>> ------------------------------------------------------------ >>>> perf_event_attr: >>>> type 1 (PERF_TYPE_SOFTWARE) >>>> size 136 >>>> config 0 (PERF_COUNT_SW_CPU_CLOCK) >>>> { sample_period, sample_freq } 4000 >>>> sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER >>>> read_format ID|LOST >>>> disabled 1 >>>> inherit 1 >>>> freq 1 >>>> sample_id_all 1 >>>> exclude_guest 1 >>>> ------------------------------------------------------------ >>>> sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 5 >>>> Opening: dummy:u >>>> ------------------------------------------------------------ >>>> perf_event_attr: >>>> type 1 (PERF_TYPE_SOFTWARE) >>>> size 136 >>>> config 0x9 (PERF_COUNT_SW_DUMMY) >>>> { sample_period, sample_freq } 1 >>>> sample_type IP|TID|TIME|CPU|IDENTIFIER >>>> read_format ID|LOST >>>> inherit 1 >>>> exclude_kernel 1 >>>> exclude_hv 1 >>>> mmap 1 >>>> comm 1 >>>> task 1 >>>> sample_id_all 1 >>>> exclude_guest 1 >>>> mmap2 1 >>>> comm_exec 1 >>>> ksymbol 1 >>>> bpf_event 1 >>>> ------------------------------------------------------------ >>>> sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6 >>>> sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 7 >>>> sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 9 >>>> sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 10 >>>> sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 11 >>>> sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 12 >>>> sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 13 >>>> sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 14 >>>> <SNIP> >>>> >>>> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> >>>> --- >>>> tools/perf/Documentation/perf-record.txt | 3 +++ >>>> tools/perf/builtin-record.c | 14 +++++++++++++- >>>> 2 files changed, 16 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt >>>> index 680396c56bd1..dac53ece51ab 100644 >>>> --- a/tools/perf/Documentation/perf-record.txt >>>> +++ b/tools/perf/Documentation/perf-record.txt >>>> @@ -388,6 +388,9 @@ comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0- >>>> In per-thread mode with inheritance mode on (default), samples are captured only when >>>> the thread executes on the designated CPUs. Default is to monitor all CPUs. >>>> +User space tasks can migrate between CPUs, so when tracing selected CPUs, >>>> +a dummy event is created to track sideband for all CPUs. >>>> + >>>> -B:: >>>> --no-buildid:: >>>> Do not save the build ids of binaries in the perf.data files. This skips >>>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c >>>> index 3ff9d972225e..4e8e97928f05 100644 >>>> --- a/tools/perf/builtin-record.c >>>> +++ b/tools/perf/builtin-record.c >>>> @@ -912,6 +912,7 @@ static int record__config_tracking_events(struct record *rec) >>>> { >>>> struct record_opts *opts = &rec->opts; >>>> struct evlist *evlist = rec->evlist; >>>> + bool system_wide = false; >>>> struct evsel *evsel; >>>> /* >>>> @@ -921,7 +922,18 @@ static int record__config_tracking_events(struct record *rec) >>>> */ >>>> if (opts->target.initial_delay || target__has_cpu(&opts->target) || >>>> perf_pmus__num_core_pmus() > 1) { >>>> - evsel = evlist__findnew_tracking_event(evlist, false); >>>> + >>>> + /* >>>> + * User space tasks can migrate between CPUs, so when tracing >>>> + * selected CPUs, sideband for all CPUs is still needed. >>>> + * >>>> + * If all (non-dummy) evsel have exclude_user, >>>> + * system_wide is not needed. >>>> + */ >>>> + if (!!opts->target.cpu_list && !opts->all_kernel) >>> >>> Not everyone uses all-kernel. Can we check the evsels are either dummy >>> or exclude_user? >> For perf_record, exclude_user of all evsels is set in evsel__config(), and record__config_tracking_events() is before evsel__config(). >> >> Uh..., it seems that only opts->all_kernel can be used to check exclude_user of evsels. >> >> void evsel__config() >> { >> ... >> if (opts->all_kernel) { >> attr->exclude_kernel = 0; >> attr->exclude_user = 1; >> } >> ... >> } > > The parser updates attr in accordance with ":k" etc. I guess Yes, the ":k" situation also needs to be considered. > opts->all_kernel or opts->all_user override that as well. Yes, opts->all_kernel and opts->all_user will overwrite the original attr, see [1]. may need to check all_user, all_kernel and non-dummy exclude_user at the same time: if ((all_user && one_non_dummy_exist) || (!all_user && !all_kernel && one_non_dummy_without_exclude_user)) { system_wide = true; } [1] # perf --debug verbose=2 record -e cpu-clock:u --all-kernel true <SNIP> ------------------------------------------------------------ perf_event_attr: type 1 (PERF_TYPE_SOFTWARE) size 136 config 0 (PERF_COUNT_SW_CPU_CLOCK) { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|PERIOD read_format ID|LOST disabled 1 inherit 1 exclude_user 1 exclude_hv 1 mmap 1 comm 1 freq 1 enable_on_exec 1 task 1 sample_id_all 1 exclude_guest 1 mmap2 1 comm_exec 1 ksymbol 1 bpf_event 1 ------------------------------------------------------------ <SNIP> Thanks, Yang
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index 680396c56bd1..dac53ece51ab 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -388,6 +388,9 @@ comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0- In per-thread mode with inheritance mode on (default), samples are captured only when the thread executes on the designated CPUs. Default is to monitor all CPUs. +User space tasks can migrate between CPUs, so when tracing selected CPUs, +a dummy event is created to track sideband for all CPUs. + -B:: --no-buildid:: Do not save the build ids of binaries in the perf.data files. This skips diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 3ff9d972225e..4e8e97928f05 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -912,6 +912,7 @@ static int record__config_tracking_events(struct record *rec) { struct record_opts *opts = &rec->opts; struct evlist *evlist = rec->evlist; + bool system_wide = false; struct evsel *evsel; /* @@ -921,7 +922,18 @@ static int record__config_tracking_events(struct record *rec) */ if (opts->target.initial_delay || target__has_cpu(&opts->target) || perf_pmus__num_core_pmus() > 1) { - evsel = evlist__findnew_tracking_event(evlist, false); + + /* + * User space tasks can migrate between CPUs, so when tracing + * selected CPUs, sideband for all CPUs is still needed. + * + * If all (non-dummy) evsel have exclude_user, + * system_wide is not needed. + */ + if (!!opts->target.cpu_list && !opts->all_kernel) + system_wide = true; + + evsel = evlist__findnew_tracking_event(evlist, system_wide); if (!evsel) return -ENOMEM;