Message ID | 20231208210855.407580-1-kan.liang@linux.intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp5721671vqy; Fri, 8 Dec 2023 13:09:40 -0800 (PST) X-Google-Smtp-Source: AGHT+IGAJdda2MuSBiFm8DmvPBRdBIiGqLrqNkATx5sbIFwRZsNLcX8j7QGNJNGKKy852kHU/xeo X-Received: by 2002:a17:90a:6881:b0:288:79f3:8075 with SMTP id a1-20020a17090a688100b0028879f38075mr658436pjd.13.1702069780268; Fri, 08 Dec 2023 13:09:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702069780; cv=none; d=google.com; s=arc-20160816; b=Nop6REQ3jPSunckkNRTcdGWTM28O+3I/KslznXrD0Ki2cU4gnXfnIN6sthLnMvyG2W 92tAVwkLMEAy8FLoxxDWsXp5aLvjH6As/gd0KNSZ7DwS1ZTh5AhHlA+ZtJWIRXAy4Jbq m3pvBxHo51Jgo4UOCgSKhgKaVQ2lH/v82ppBLpV5WUoNZgzmmpzct7Nmvxm2O8aIQcIj T0Ozrms0dwBqXoCoU1BAlG2DPz0oO+x4tMwHZWllkdCh5aBR3uaJAVUfHlqAlkp24x6T sKBmQqXme7TRDkT1xr4sgerqcJKEnmUXnUxGnwE7MMYD1UYONSQOFRI9dLh5CqSjoeNc HNuw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=46efbU48F6ODbzQwhx6hXSYLEVApLyKOl2PUe62TFLY=; fh=V97FFFi5RmpdUikarEoLTJWoOplAC1MpM9MGOX1xMoI=; b=HguWCHK13P/JyZAPCmuHUutJPhUbvDxD3PYm1nsR+AOuB3C6LOxKqeDZfONp8Np9X9 goBEhqeOMJseedkEgSmRVbjsvtVnORCnNMdVwEYUpYNN2ZR2tlkWYpwDfIxaUd40qEyJ Otvcui0O0z1P8Pe4vMXZGdC3pQX81VNM6p1mHeetqBLA+GmF5RgzumXgWYcFyy8Y10fU ynlaWjuG+5T3kubeCzzoD+F1BvQHNjdh1MK7ASI7UTYsQJuLC6lPcztlwpadOqQTxgWb WeO5OyAoPfxlz0rncUS866sYjiYyuWQH6w2XfbqgPZK4z8V5G+Lkq4fQMHi4sERCcwI+ iB3Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="N/9O9tL/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from pete.vger.email (pete.vger.email. [2620:137:e000::3:6]) by mx.google.com with ESMTPS id gl13-20020a17090b120d00b002868ad0a84bsi3457203pjb.37.2023.12.08.13.09.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Dec 2023 13:09:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) client-ip=2620:137:e000::3:6; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="N/9O9tL/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id B959883593BA; Fri, 8 Dec 2023 13:09:37 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1574769AbjLHVJL (ORCPT <rfc822;makky5685@gmail.com> + 99 others); Fri, 8 Dec 2023 16:09:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47638 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229817AbjLHVJH (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 8 Dec 2023 16:09:07 -0500 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4494A193; Fri, 8 Dec 2023 13:09:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1702069754; x=1733605754; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=/SGOKoA+Swz5M7d9MFKFLrftb660oNkZR5QqizVC2Sk=; b=N/9O9tL/walmMZXDf8xP9cXNMH5vAy8Cb5YB5Ts7Plecqb/AcKxeZiLh 4j0Ob6wllnMjmZ51yDdpdEUlCyHr0caqFEZ+PekikWcznDZKXkdrjaNfP wvah0O9IAzcNadby7bVw8m3tKEmkpAoJRE/LZ4/mSkkZbIw8hl4XPpvSF 7Yr3QVmOEBxe4qR+nUBoVRrVfxdhBcKBHYS4wnTBWtqHx3y/GGoHBliPm +7pCRBB4MKFCGNWaH7SxNK9U7PYT36boEBhSgLloVBDX+lytcdYmcOdNR 31h4YrVR0YX50l6brVKNLFIwvRfGcC5OFMjRUrruegC/YuWXkRRR8ddfv w==; X-IronPort-AV: E=McAfee;i="6600,9927,10918"; a="7791999" X-IronPort-AV: E=Sophos;i="6.04,261,1695711600"; d="scan'208";a="7791999" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2023 13:09:13 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10918"; a="772261155" X-IronPort-AV: E=Sophos;i="6.04,261,1695711600"; d="scan'208";a="772261155" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orsmga002.jf.intel.com with ESMTP; 08 Dec 2023 13:09:13 -0800 From: kan.liang@linux.intel.com To: acme@kernel.org, irogers@google.com Cc: namhyung@kernel.org, mark.rutland@arm.com, maz@kernel.org, marcan@marcan.st, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Kan Liang <kan.liang@linux.intel.com> Subject: [PATCH] perf top: Use evsel's cpus to replace user_requested_cpus Date: Fri, 8 Dec 2023 13:08:55 -0800 Message-Id: <20231208210855.407580-1-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.35.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Fri, 08 Dec 2023 13:09:37 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784749521742265544 X-GMAIL-MSGID: 1784749521742265544 |
Series |
perf top: Use evsel's cpus to replace user_requested_cpus
|
|
Commit Message
Liang, Kan
Dec. 8, 2023, 9:08 p.m. UTC
From: Kan Liang <kan.liang@linux.intel.com> perf top errors out on a hybrid machine $perf top Error: The cycles:P event is not supported. The user_requested_cpus may contain CPUs that are invalid for a hybrid PMU. It causes perf_event_open to fail. The commit ef91871c960e ("perf evlist: Propagate user CPU maps intersecting core PMU maps") already intersect the requested CPU map with the CPU map of the PMU. Use the evsel's cpus to replace user_requested_cpus. The evlist's threads is also propagated to evsel's threads in __perf_evlist__propagate_maps(). Replace it as well. Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Closes: https://lore.kernel.org/linux-perf-users/ZXNnDrGKXbEELMXV@kernel.org/ Signed-off-by: Kan Liang <kan.liang@linux.intel.com> --- tools/perf/builtin-top.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
Comments
Em Fri, Dec 08, 2023 at 01:08:55PM -0800, kan.liang@linux.intel.com escreveu: > From: Kan Liang <kan.liang@linux.intel.com> > > perf top errors out on a hybrid machine > $perf top > > Error: > The cycles:P event is not supported. > > The user_requested_cpus may contain CPUs that are invalid for a hybrid > PMU. It causes perf_event_open to fail. ? All perf top expects is that the "cycles", the most basic one, be collected, on all CPUs in the system. > The commit ef91871c960e ("perf evlist: Propagate user CPU maps > intersecting core PMU maps") already intersect the requested CPU map > with the CPU map of the PMU. Use the evsel's cpus to replace > user_requested_cpus. > The evlist's threads is also propagated to evsel's threads in > __perf_evlist__propagate_maps(). Replace it as well. Thanks, but please try to add more detail to the fix so as to help others to consider looking at the patch. - Arnaldo > Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> > Closes: https://lore.kernel.org/linux-perf-users/ZXNnDrGKXbEELMXV@kernel.org/ > Signed-off-by: Kan Liang <kan.liang@linux.intel.com> > --- > tools/perf/builtin-top.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c > index ea8c7eca5eee..cce9350177e2 100644 > --- a/tools/perf/builtin-top.c > +++ b/tools/perf/builtin-top.c > @@ -1027,8 +1027,8 @@ static int perf_top__start_counters(struct perf_top *top) > > evlist__for_each_entry(evlist, counter) { > try_again: > - if (evsel__open(counter, top->evlist->core.user_requested_cpus, > - top->evlist->core.threads) < 0) { > + if (evsel__open(counter, counter->core.cpus, > + counter->core.threads) < 0) { > > /* > * Specially handle overwrite fall back. > -- > 2.35.1 >
On Fri, Dec 8, 2023 at 1:09 PM <kan.liang@linux.intel.com> wrote: > > From: Kan Liang <kan.liang@linux.intel.com> > > perf top errors out on a hybrid machine > $perf top > > Error: > The cycles:P event is not supported. > > The user_requested_cpus may contain CPUs that are invalid for a hybrid > PMU. It causes perf_event_open to fail. > > The commit ef91871c960e ("perf evlist: Propagate user CPU maps > intersecting core PMU maps") already intersect the requested CPU map > with the CPU map of the PMU. Use the evsel's cpus to replace > user_requested_cpus. > > The evlist's threads is also propagated to evsel's threads in > __perf_evlist__propagate_maps(). Replace it as well. > > Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> > Closes: https://lore.kernel.org/linux-perf-users/ZXNnDrGKXbEELMXV@kernel.org/ > Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Sorry I missed top doing the evlist__create_maps calls: https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/builtin-top.c?h=perf-tools-next#n1761 so it is right that the only divergence from record: https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/builtin-record.c?h=perf-tools-next#n1362 and stat: https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/stat.c?h=perf-tools-next#n797 So this is the right fix I believe. Reviewed-by: Ian Rogers <irogers@google.com> Thanks, Ian > --- > tools/perf/builtin-top.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c > index ea8c7eca5eee..cce9350177e2 100644 > --- a/tools/perf/builtin-top.c > +++ b/tools/perf/builtin-top.c > @@ -1027,8 +1027,8 @@ static int perf_top__start_counters(struct perf_top *top) > > evlist__for_each_entry(evlist, counter) { > try_again: > - if (evsel__open(counter, top->evlist->core.user_requested_cpus, > - top->evlist->core.threads) < 0) { > + if (evsel__open(counter, counter->core.cpus, > + counter->core.threads) < 0) { > > /* > * Specially handle overwrite fall back. > -- > 2.35.1 >
On 2023-12-11 4:13 p.m., Arnaldo Carvalho de Melo wrote: > Em Fri, Dec 08, 2023 at 01:08:55PM -0800, kan.liang@linux.intel.com escreveu: >> From: Kan Liang <kan.liang@linux.intel.com> >> >> perf top errors out on a hybrid machine >> $perf top >> >> Error: >> The cycles:P event is not supported. >> >> The user_requested_cpus may contain CPUs that are invalid for a hybrid >> PMU. It causes perf_event_open to fail. > > ? > > All perf top expects is that the "cycles", the most basic one, be > collected, on all CPUs in the system. > Yes, but for hybrid there is no single "cycles" event which can cover all CPUs. Perf has to split it into two cycles events, cpu_core/cycles/ and cpu_atom/cycles/. Each event has its own CPU mask. If a event is opened on the unsupported CPU. The open fails. That's the reason perf top fails. So perf should only open the cycles event on the corresponding CPU. >> The commit ef91871c960e ("perf evlist: Propagate user CPU maps >> intersecting core PMU maps") already intersect the requested CPU map >> with the CPU map of the PMU. Use the evsel's cpus to replace >> user_requested_cpus. > >> The evlist's threads is also propagated to evsel's threads in >> __perf_evlist__propagate_maps(). Replace it as well. > > Thanks, but please try to add more detail to the fix so as to help > others to consider looking at the patch. OK. For the threads, the same as other tools, e.g., perf record, perf appends a dummy for the system wide event. For a per-thread event, the evlist's thread_map is assigned to the evsel. So using the evsel's threads is appropriate and also be consistent with other tools. I will update the description and send a V2. Thanks, Kan > > - Arnaldo > >> Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> >> Closes: https://lore.kernel.org/linux-perf-users/ZXNnDrGKXbEELMXV@kernel.org/ >> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> >> --- >> tools/perf/builtin-top.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c >> index ea8c7eca5eee..cce9350177e2 100644 >> --- a/tools/perf/builtin-top.c >> +++ b/tools/perf/builtin-top.c >> @@ -1027,8 +1027,8 @@ static int perf_top__start_counters(struct perf_top *top) >> >> evlist__for_each_entry(evlist, counter) { >> try_again: >> - if (evsel__open(counter, top->evlist->core.user_requested_cpus, >> - top->evlist->core.threads) < 0) { >> + if (evsel__open(counter, counter->core.cpus, >> + counter->core.threads) < 0) { >> >> /* >> * Specially handle overwrite fall back. >> -- >> 2.35.1 >> >
Em Tue, Dec 12, 2023 at 10:56:15AM -0500, Liang, Kan escreveu: > > > On 2023-12-11 4:13 p.m., Arnaldo Carvalho de Melo wrote: > > Em Fri, Dec 08, 2023 at 01:08:55PM -0800, kan.liang@linux.intel.com escreveu: > >> From: Kan Liang <kan.liang@linux.intel.com> > >> > >> perf top errors out on a hybrid machine > >> $perf top > >> > >> Error: > >> The cycles:P event is not supported. > >> > >> The user_requested_cpus may contain CPUs that are invalid for a hybrid > >> PMU. It causes perf_event_open to fail. > > ? > > All perf top expects is that the "cycles", the most basic one, be > > collected, on all CPUs in the system. > Yes, but for hybrid there is no single "cycles" event which can cover > all CPUs. Perf has to split it into two cycles events, cpu_core/cycles/ > and cpu_atom/cycles/. Each event has its own CPU mask. If a event is > opened on the unsupported CPU. The open fails. That's the reason perf > top fails. So perf should only open the cycles event on the > corresponding CPU. Great explanation, please make sure it is present in the fix, i.e. in the v2 you mentioned. > >> The commit ef91871c960e ("perf evlist: Propagate user CPU maps > >> intersecting core PMU maps") already intersect the requested CPU map > >> with the CPU map of the PMU. Use the evsel's cpus to replace > >> user_requested_cpus. > > > >> The evlist's threads is also propagated to evsel's threads in > >> __perf_evlist__propagate_maps(). Replace it as well. > > > > Thanks, but please try to add more detail to the fix so as to help > > others to consider looking at the patch. > > OK. For the threads, the same as other tools, e.g., perf record, perf > appends a dummy for the system wide event. For a per-thread event, the > evlist's thread_map is assigned to the evsel. So using the evsel's > threads is appropriate and also be consistent with other tools. > > I will update the description and send a V2. > > Thanks, > Kan > > > > > - Arnaldo > > > >> Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> > >> Closes: https://lore.kernel.org/linux-perf-users/ZXNnDrGKXbEELMXV@kernel.org/ > >> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> > >> --- > >> tools/perf/builtin-top.c | 4 ++-- > >> 1 file changed, 2 insertions(+), 2 deletions(-) > >> > >> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c > >> index ea8c7eca5eee..cce9350177e2 100644 > >> --- a/tools/perf/builtin-top.c > >> +++ b/tools/perf/builtin-top.c > >> @@ -1027,8 +1027,8 @@ static int perf_top__start_counters(struct perf_top *top) > >> > >> evlist__for_each_entry(evlist, counter) { > >> try_again: > >> - if (evsel__open(counter, top->evlist->core.user_requested_cpus, > >> - top->evlist->core.threads) < 0) { > >> + if (evsel__open(counter, counter->core.cpus, > >> + counter->core.threads) < 0) { > >> > >> /* > >> * Specially handle overwrite fall back.
On Tue, Dec 12, 2023 at 7:56 AM Liang, Kan <kan.liang@linux.intel.com> wrote: > > > > On 2023-12-11 4:13 p.m., Arnaldo Carvalho de Melo wrote: > > Em Fri, Dec 08, 2023 at 01:08:55PM -0800, kan.liang@linux.intel.com escreveu: > >> From: Kan Liang <kan.liang@linux.intel.com> > >> > >> perf top errors out on a hybrid machine > >> $perf top > >> > >> Error: > >> The cycles:P event is not supported. > >> > >> The user_requested_cpus may contain CPUs that are invalid for a hybrid > >> PMU. It causes perf_event_open to fail. > > > > ? > > > > All perf top expects is that the "cycles", the most basic one, be > > collected, on all CPUs in the system. > > > > Yes, but for hybrid there is no single "cycles" event which can cover > all CPUs. Does that mean the kernel would reject the legacy "cycles" event on hybrid CPUs? Thanks, Namhyung
On Tue, Dec 12, 2023 at 9:23 AM Namhyung Kim <namhyung@kernel.org> wrote: > > On Tue, Dec 12, 2023 at 7:56 AM Liang, Kan <kan.liang@linux.intel.com> wrote: > > > > > > > > On 2023-12-11 4:13 p.m., Arnaldo Carvalho de Melo wrote: > > > Em Fri, Dec 08, 2023 at 01:08:55PM -0800, kan.liang@linux.intel.com escreveu: > > >> From: Kan Liang <kan.liang@linux.intel.com> > > >> > > >> perf top errors out on a hybrid machine > > >> $perf top > > >> > > >> Error: > > >> The cycles:P event is not supported. > > >> > > >> The user_requested_cpus may contain CPUs that are invalid for a hybrid > > >> PMU. It causes perf_event_open to fail. > > > > > > ? > > > > > > All perf top expects is that the "cycles", the most basic one, be > > > collected, on all CPUs in the system. > > > > > > > Yes, but for hybrid there is no single "cycles" event which can cover > > all CPUs. > > Does that mean the kernel would reject the legacy "cycles" event > on hybrid CPUs? I believe not. When the extended type isn't set on legacy cycles we often have the CPU and from that can determine the PMU. The issue is with the -1 any CPU perf_event_open option. As I was told, the PMU the event is opened on in this case is the first one registered in the kernel, on Intel hybrid this could be cpu_core or cpu_atom.. but IIRC it'll probably be cpu_core. On ARM ¯\_(ツ)_/¯. In any case you only have an event that will be enabled on a fraction of the CPU cores the thread you are monitoring could be scheduled on. This is why 2 or more events are needed (depending on how many core types you have). Thanks, Ian > Thanks, > Namhyung
On Tue, Dec 12, 2023 at 10:00:16AM -0800, Ian Rogers wrote: > On Tue, Dec 12, 2023 at 9:23 AM Namhyung Kim <namhyung@kernel.org> wrote: > > > > On Tue, Dec 12, 2023 at 7:56 AM Liang, Kan <kan.liang@linux.intel.com> wrote: > > > > > > > > > > > > On 2023-12-11 4:13 p.m., Arnaldo Carvalho de Melo wrote: > > > > Em Fri, Dec 08, 2023 at 01:08:55PM -0800, kan.liang@linux.intel.com escreveu: > > > >> From: Kan Liang <kan.liang@linux.intel.com> > > > >> > > > >> perf top errors out on a hybrid machine > > > >> $perf top > > > >> > > > >> Error: > > > >> The cycles:P event is not supported. > > > >> > > > >> The user_requested_cpus may contain CPUs that are invalid for a hybrid > > > >> PMU. It causes perf_event_open to fail. > > > > > > > > ? > > > > > > > > All perf top expects is that the "cycles", the most basic one, be > > > > collected, on all CPUs in the system. > > > > > > > > > > Yes, but for hybrid there is no single "cycles" event which can cover > > > all CPUs. > > > > Does that mean the kernel would reject the legacy "cycles" event > > on hybrid CPUs? > > I believe not. When the extended type isn't set on legacy cycles we > often have the CPU and from that can determine the PMU. The issue is > with the -1 any CPU perf_event_open option. As I was told, the PMU the > event is opened on in this case is the first one registered in the > kernel, on Intel hybrid this could be cpu_core or cpu_atom.. but IIRC > it'll probably be cpu_core. On ARM ¯\_(ツ)_/¯. On ARM it'll be essentially the same as on x86: if you open an event with type==PERF_EVENT_TYPE_HARDWARE (without the extended HW type pointing to a specific PMU), and with cpu==-1, it'll go to an arbitrary CPU PMU, whichever happens to be found by perf_init_event() when iterating over the 'pmus' list. If you open an event with type==PERF_EVENT_TYPE_HARDWARE and cpu!=-1, the event will opened on the appropriate CPU PMU, by virtue of being rejected by others when perf_init_event() iterates over the 'pmus' list. Mark.
On Tue, Dec 12, 2023 at 10:31 AM Mark Rutland <mark.rutland@arm.com> wrote: > > On Tue, Dec 12, 2023 at 10:00:16AM -0800, Ian Rogers wrote: > > On Tue, Dec 12, 2023 at 9:23 AM Namhyung Kim <namhyung@kernel.org> wrote: > > > > > > On Tue, Dec 12, 2023 at 7:56 AM Liang, Kan <kan.liang@linux.intel.com> wrote: > > > > > > > > > > > > > > > > On 2023-12-11 4:13 p.m., Arnaldo Carvalho de Melo wrote: > > > > > Em Fri, Dec 08, 2023 at 01:08:55PM -0800, kan.liang@linux.intel.com escreveu: > > > > >> From: Kan Liang <kan.liang@linux.intel.com> > > > > >> > > > > >> perf top errors out on a hybrid machine > > > > >> $perf top > > > > >> > > > > >> Error: > > > > >> The cycles:P event is not supported. > > > > >> > > > > >> The user_requested_cpus may contain CPUs that are invalid for a hybrid > > > > >> PMU. It causes perf_event_open to fail. > > > > > > > > > > ? > > > > > > > > > > All perf top expects is that the "cycles", the most basic one, be > > > > > collected, on all CPUs in the system. > > > > > > > > > > > > > Yes, but for hybrid there is no single "cycles" event which can cover > > > > all CPUs. > > > > > > Does that mean the kernel would reject the legacy "cycles" event > > > on hybrid CPUs? > > > > I believe not. When the extended type isn't set on legacy cycles we > > often have the CPU and from that can determine the PMU. The issue is > > with the -1 any CPU perf_event_open option. As I was told, the PMU the > > event is opened on in this case is the first one registered in the > > kernel, on Intel hybrid this could be cpu_core or cpu_atom.. but IIRC > > it'll probably be cpu_core. On ARM ¯\_(ツ)_/¯. > > On ARM it'll be essentially the same as on x86: if you open an event with > type==PERF_EVENT_TYPE_HARDWARE (without the extended HW type pointing to a > specific PMU), and with cpu==-1, it'll go to an arbitrary CPU PMU, whichever > happens to be found by perf_init_event() when iterating over the 'pmus' list. > > If you open an event with type==PERF_EVENT_TYPE_HARDWARE and cpu!=-1, the event > will opened on the appropriate CPU PMU, by virtue of being rejected by others > when perf_init_event() iterates over the 'pmus' list. Ok, that means "cycles" with cpu == -1 would not work well. I'm curious if it's possible to do some basic work at the event_init() like to preserve (common) resource and to do some other work at sched to config PMU on the current CPU. So that users can simply use "cycles" or "instructions" for their processes. Thanks, Namhyung
On 2023-12-12 1:49 p.m., Namhyung Kim wrote: > On Tue, Dec 12, 2023 at 10:31 AM Mark Rutland <mark.rutland@arm.com> wrote: >> >> On Tue, Dec 12, 2023 at 10:00:16AM -0800, Ian Rogers wrote: >>> On Tue, Dec 12, 2023 at 9:23 AM Namhyung Kim <namhyung@kernel.org> wrote: >>>> >>>> On Tue, Dec 12, 2023 at 7:56 AM Liang, Kan <kan.liang@linux.intel.com> wrote: >>>>> >>>>> >>>>> >>>>> On 2023-12-11 4:13 p.m., Arnaldo Carvalho de Melo wrote: >>>>>> Em Fri, Dec 08, 2023 at 01:08:55PM -0800, kan.liang@linux.intel.com escreveu: >>>>>>> From: Kan Liang <kan.liang@linux.intel.com> >>>>>>> >>>>>>> perf top errors out on a hybrid machine >>>>>>> $perf top >>>>>>> >>>>>>> Error: >>>>>>> The cycles:P event is not supported. >>>>>>> >>>>>>> The user_requested_cpus may contain CPUs that are invalid for a hybrid >>>>>>> PMU. It causes perf_event_open to fail. >>>>>> >>>>>> ? >>>>>> >>>>>> All perf top expects is that the "cycles", the most basic one, be >>>>>> collected, on all CPUs in the system. >>>>>> >>>>> >>>>> Yes, but for hybrid there is no single "cycles" event which can cover >>>>> all CPUs. >>>> >>>> Does that mean the kernel would reject the legacy "cycles" event >>>> on hybrid CPUs? >>> >>> I believe not. When the extended type isn't set on legacy cycles we >>> often have the CPU and from that can determine the PMU. The issue is >>> with the -1 any CPU perf_event_open option. As I was told, the PMU the >>> event is opened on in this case is the first one registered in the >>> kernel, on Intel hybrid this could be cpu_core or cpu_atom.. but IIRC >>> it'll probably be cpu_core. On ARM ¯\_(ツ)_/¯. >> >> On ARM it'll be essentially the same as on x86: if you open an event with >> type==PERF_EVENT_TYPE_HARDWARE (without the extended HW type pointing to a >> specific PMU), and with cpu==-1, it'll go to an arbitrary CPU PMU, whichever >> happens to be found by perf_init_event() when iterating over the 'pmus' list. >> >> If you open an event with type==PERF_EVENT_TYPE_HARDWARE and cpu!=-1, the event >> will opened on the appropriate CPU PMU, by virtue of being rejected by others >> when perf_init_event() iterates over the 'pmus' list. > > Ok, that means "cycles" with cpu == -1 would not work well. > Unless a PMU is specified. > I'm curious if it's possible to do some basic work at the event_init() > like to preserve (common) resource and to do some other work at > sched to config PMU on the current CPU. So that users can simply > use "cycles" or "instructions" for their processes. > The current code treats the hybrid as two standalone PMUs. To preserve the common resource in the other PMU, I think the only way is to create an event on the other PMU. It's what perf tool does now. I don't think we want to move the logic to the kernel. I think a possible way is to abstract a common PMU (cpu) which only includes common PMU features. It should be doable, because without the enabling code of hybrid, the default PMU is the common PMU. But I don't know how does it coexist with the other hybrid PMUs if we have both common PMU and hybrid PMUs available? It may just bring more complexity. Thanks, Kan
On Tue, Dec 12, 2023 at 10:49 AM Namhyung Kim <namhyung@kernel.org> wrote: > > On Tue, Dec 12, 2023 at 10:31 AM Mark Rutland <mark.rutland@arm.com> wrote: > > > > On Tue, Dec 12, 2023 at 10:00:16AM -0800, Ian Rogers wrote: > > > On Tue, Dec 12, 2023 at 9:23 AM Namhyung Kim <namhyung@kernel.org> wrote: > > > > > > > > On Tue, Dec 12, 2023 at 7:56 AM Liang, Kan <kan.liang@linux.intel.com> wrote: > > > > > > > > > > > > > > > > > > > > On 2023-12-11 4:13 p.m., Arnaldo Carvalho de Melo wrote: > > > > > > Em Fri, Dec 08, 2023 at 01:08:55PM -0800, kan.liang@linux.intel.com escreveu: > > > > > >> From: Kan Liang <kan.liang@linux.intel.com> > > > > > >> > > > > > >> perf top errors out on a hybrid machine > > > > > >> $perf top > > > > > >> > > > > > >> Error: > > > > > >> The cycles:P event is not supported. > > > > > >> > > > > > >> The user_requested_cpus may contain CPUs that are invalid for a hybrid > > > > > >> PMU. It causes perf_event_open to fail. > > > > > > > > > > > > ? > > > > > > > > > > > > All perf top expects is that the "cycles", the most basic one, be > > > > > > collected, on all CPUs in the system. > > > > > > > > > > > > > > > > Yes, but for hybrid there is no single "cycles" event which can cover > > > > > all CPUs. > > > > > > > > Does that mean the kernel would reject the legacy "cycles" event > > > > on hybrid CPUs? > > > > > > I believe not. When the extended type isn't set on legacy cycles we > > > often have the CPU and from that can determine the PMU. The issue is > > > with the -1 any CPU perf_event_open option. As I was told, the PMU the > > > event is opened on in this case is the first one registered in the > > > kernel, on Intel hybrid this could be cpu_core or cpu_atom.. but IIRC > > > it'll probably be cpu_core. On ARM ¯\_(ツ)_/¯. > > > > On ARM it'll be essentially the same as on x86: if you open an event with > > type==PERF_EVENT_TYPE_HARDWARE (without the extended HW type pointing to a > > specific PMU), and with cpu==-1, it'll go to an arbitrary CPU PMU, whichever > > happens to be found by perf_init_event() when iterating over the 'pmus' list. > > > > If you open an event with type==PERF_EVENT_TYPE_HARDWARE and cpu!=-1, the event > > will opened on the appropriate CPU PMU, by virtue of being rejected by others > > when perf_init_event() iterates over the 'pmus' list. > > Ok, that means "cycles" with cpu == -1 would not work well. > > I'm curious if it's possible to do some basic work at the event_init() > like to preserve (common) resource and to do some other work at > sched to config PMU on the current CPU. So that users can simply > use "cycles" or "instructions" for their processes. It should be possible to make things better, especially for standard events for cycles or instructions, for software that isn't aware of multiple core PMUs and the extended type field. There are legacy events like dTLB-speculative-read that may be supported by 1 PMU and not the other, so what should happen with thread scheduling there? On RISC-V there was discussion of the legacy to pmu/config mapping that happens in the driver being something that is handled in the tool. A lot of the legacy events end up being "<not supported>" which is a pretty bad user experience, like this on AMD: ``` # perf stat -d -a sleep 1 Performance counter stats for 'system wide': 259,256.21 msec cpu-clock # 257.445 CPUs utilized 11,931 context-switches # 46.020 /sec 264 cpu-migrations # 1.018 /sec 419 page-faults # 1.616 /sec 5,892,812,250 cycles # 0.023 GHz (62.43%) 1,159,909,806 stalled-cycles-frontend # 19.68% frontend cycles idle (62.48%) 831,668,644 stalled-cycles-backend # 14.11% backend cycles idle (62.52%) 2,825,351,898 instructions # 0.48 insn per cycle # 0.41 stalled cycles per insn (62.54%) 520,403,374 branches # 2.007 M/sec (62.57%) 12,050,970 branch-misses # 2.32% of all branches (62.55%) 1,117,382,209 L1-dcache-loads # 4.310 M/sec (62.48%) 61,060,457 L1-dcache-load-misses # 5.46% of all L1-dcache accesses (62.43%) <not supported> LLC-loads <not supported> LLC-load-misses 1.007033432 seconds time elapsed ``` So I think the move should be away from legacy events but that doesn't necessarily mean we can't make them better. Thanks, Ian > Thanks, > Namhyung
On Tue, Dec 12, 2023 at 02:22:49PM -0500, Liang, Kan wrote: > > > On 2023-12-12 1:49 p.m., Namhyung Kim wrote: > > On Tue, Dec 12, 2023 at 10:31 AM Mark Rutland <mark.rutland@arm.com> wrote: > >> > >> On Tue, Dec 12, 2023 at 10:00:16AM -0800, Ian Rogers wrote: > >>> On Tue, Dec 12, 2023 at 9:23 AM Namhyung Kim <namhyung@kernel.org> wrote: > >>>> > >>>> On Tue, Dec 12, 2023 at 7:56 AM Liang, Kan <kan.liang@linux.intel.com> wrote: > >>>>> > >>>>> > >>>>> > >>>>> On 2023-12-11 4:13 p.m., Arnaldo Carvalho de Melo wrote: > >>>>>> Em Fri, Dec 08, 2023 at 01:08:55PM -0800, kan.liang@linux.intel.com escreveu: > >>>>>>> From: Kan Liang <kan.liang@linux.intel.com> > >>>>>>> > >>>>>>> perf top errors out on a hybrid machine > >>>>>>> $perf top > >>>>>>> > >>>>>>> Error: > >>>>>>> The cycles:P event is not supported. > >>>>>>> > >>>>>>> The user_requested_cpus may contain CPUs that are invalid for a hybrid > >>>>>>> PMU. It causes perf_event_open to fail. > >>>>>> > >>>>>> ? > >>>>>> > >>>>>> All perf top expects is that the "cycles", the most basic one, be > >>>>>> collected, on all CPUs in the system. > >>>>>> > >>>>> > >>>>> Yes, but for hybrid there is no single "cycles" event which can cover > >>>>> all CPUs. > >>>> > >>>> Does that mean the kernel would reject the legacy "cycles" event > >>>> on hybrid CPUs? > >>> > >>> I believe not. When the extended type isn't set on legacy cycles we > >>> often have the CPU and from that can determine the PMU. The issue is > >>> with the -1 any CPU perf_event_open option. As I was told, the PMU the > >>> event is opened on in this case is the first one registered in the > >>> kernel, on Intel hybrid this could be cpu_core or cpu_atom.. but IIRC > >>> it'll probably be cpu_core. On ARM ¯\_(ツ)_/¯. > >> > >> On ARM it'll be essentially the same as on x86: if you open an event with > >> type==PERF_EVENT_TYPE_HARDWARE (without the extended HW type pointing to a > >> specific PMU), and with cpu==-1, it'll go to an arbitrary CPU PMU, whichever > >> happens to be found by perf_init_event() when iterating over the 'pmus' list. > >> > >> If you open an event with type==PERF_EVENT_TYPE_HARDWARE and cpu!=-1, the event > >> will opened on the appropriate CPU PMU, by virtue of being rejected by others > >> when perf_init_event() iterates over the 'pmus' list. > > > > Ok, that means "cycles" with cpu == -1 would not work well. > > Unless a PMU is specified. > > > I'm curious if it's possible to do some basic work at the event_init() > > like to preserve (common) resource and to do some other work at > > sched to config PMU on the current CPU. So that users can simply > > use "cycles" or "instructions" for their processes. > > The current code treats the hybrid as two standalone PMUs. To preserve > the common resource in the other PMU, I think the only way is to create > an event on the other PMU. It's what perf tool does now. I don't think > we want to move the logic to the kernel. Agreed. > I think a possible way is to abstract a common PMU (cpu) which only > includes common PMU features. It should be doable, because without the > enabling code of hybrid, the default PMU is the common PMU. But I don't > know how does it coexist with the other hybrid PMUs if we have both > common PMU and hybrid PMUs available? It may just bring more complexity. I think that brings a surprising amount of complexity, and I'm not entirely sure if that's practical (since you'd effectively end up with a logical PMU being dependent on multiple other logical PMUs). I also think that it's practically necessary to expose the counts to the user separately, even for common events. For example, the 'instructions' event may count differently (speculative vs architectural execution), and 'cycles' can be wildly different across microarchitectures due to realizable IPC, and blindly adding those up across PMUs is liable to produce a misleading figure (and/or one with massive variation). While it is ugly, I think that it's necessary for userspace to discover the set of CPU PMUs and open seperate events on them in order to produce useful data. Specifically for perf top, if one is monitoring all CPUs, it'd be fine to open a PERF_TYPE_HARDWARE event for each CPU; so long as cpu!=-1 it would go to the relevant PMU and be counted as expected. Thanks, Mark.
Em Tue, Dec 12, 2023 at 06:31:05PM +0000, Mark Rutland escreveu: > On ARM it'll be essentially the same as on x86: if you open an event with > type==PERF_EVENT_TYPE_HARDWARE (without the extended HW type pointing to a > specific PMU), and with cpu==-1, it'll go to an arbitrary CPU PMU, whichever > happens to be found by perf_init_event() when iterating over the 'pmus' list. > If you open an event with type==PERF_EVENT_TYPE_HARDWARE and cpu!=-1, the event > will opened on the appropriate CPU PMU, by virtue of being rejected by others > when perf_init_event() iterates over the 'pmus' list. The way that it is working non on my intel hybrid system, with Kan's patch, is equivalent to using this on the RK3399pc board I have: root@roc-rk3399-pc:~# perf top -e armv8_cortex_a72/cycles/P,armv8_cortex_a53/cycles/P Wouldn't be better to make 'perf top' on ARM work the way is being done in x86 now, i.e. default to opening the two events, one per PMU and allow the user to switch back and forth using the TUI/stdio? Kan, I also noticed that the name of the event is: 1K cpu_atom/cycles:P/ ◆ 11K cpu_core/cycles:P/ If I try to use that on the command line: root@number:~# perf top -e cpu_atom/cycles:P/ event syntax error: 'cpu_atom/cycles:P/' \___ Bad event or PMU Unable to find PMU or event on a PMU of 'cpu_atom' Initial error: event syntax error: 'cpu_atom/cycles:P/' \___ unknown term 'cycles:P' for pmu 'cpu_atom' valid terms: event,pc,edge,offcore_rsp,ldlat,inv,umask,cmask,config,config1,config2,config3,name,period,freq,branch_type,time,call-graph,stack-size,no-inherit,inherit,max-stack,nr,no-overwrite,overwrite,driver-config,percore,aux-output,aux-sample-size,metric-id,raw,legacy-cache,hardware Run 'perf list' for a list of valid events Usage: perf top [<options>] -e, --event <event> event selector. use 'perf list' to list available events root@number:~# It should be: "cpu_atom/cycles/P" - Arnaldo
On Fri, Dec 15, 2023 at 12:36:10PM -0300, Arnaldo Carvalho de Melo wrote: > Em Tue, Dec 12, 2023 at 06:31:05PM +0000, Mark Rutland escreveu: > > On ARM it'll be essentially the same as on x86: if you open an event with > > type==PERF_EVENT_TYPE_HARDWARE (without the extended HW type pointing to a > > specific PMU), and with cpu==-1, it'll go to an arbitrary CPU PMU, whichever > > happens to be found by perf_init_event() when iterating over the 'pmus' list. > > > If you open an event with type==PERF_EVENT_TYPE_HARDWARE and cpu!=-1, the event > > will opened on the appropriate CPU PMU, by virtue of being rejected by others > > when perf_init_event() iterates over the 'pmus' list. > > The way that it is working non on my intel hybrid system, with Kan's > patch, is equivalent to using this on the RK3399pc board I have: > > root@roc-rk3399-pc:~# perf top -e armv8_cortex_a72/cycles/P,armv8_cortex_a53/cycles/P > > Wouldn't be better to make 'perf top' on ARM work the way is being done > in x86 now, i.e. default to opening the two events, one per PMU and > allow the user to switch back and forth using the TUI/stdio? TBH, for perf top I don't know *which* behaviour is preferable, but I agree that it'd be good for x86 and arm to work in the same way. For design-cleanliness and consistency with other commands I can see that opening those separately is probably for the best, but for typical usage of perf top it's really nice to have those presented together without having to tab back-and-forth between the distinct PMUs, and so the existing behaviour of using CPU-bound PERF_EVENT_TYPE_HARDWARE events is arguably nicer for the user. I don't have a strong feeling either way; I'm personally happy so long as explicit pmu_name/event/ events don't get silently converted into PERF_EVENT_TYPE_HARDWARE events, and as long as we correctly set the extended HW type when we decide to use that. Thanks, Mark. > Kan, I also noticed that the name of the event is: > > 1K cpu_atom/cycles:P/ ◆ > 11K cpu_core/cycles:P/ > > If I try to use that on the command line: > > root@number:~# perf top -e cpu_atom/cycles:P/ > event syntax error: 'cpu_atom/cycles:P/' > \___ Bad event or PMU > > Unable to find PMU or event on a PMU of 'cpu_atom' > > Initial error: > event syntax error: 'cpu_atom/cycles:P/' > \___ unknown term 'cycles:P' for pmu 'cpu_atom' > > valid terms: event,pc,edge,offcore_rsp,ldlat,inv,umask,cmask,config,config1,config2,config3,name,period,freq,branch_type,time,call-graph,stack-size,no-inherit,inherit,max-stack,nr,no-overwrite,overwrite,driver-config,percore,aux-output,aux-sample-size,metric-id,raw,legacy-cache,hardware > Run 'perf list' for a list of valid events > > Usage: perf top [<options>] > > -e, --event <event> event selector. use 'perf list' to list available events > root@number:~# > > It should be: > > "cpu_atom/cycles/P" > > - Arnaldo
Em Fri, Dec 15, 2023 at 04:51:49PM +0000, Mark Rutland escreveu: > On Fri, Dec 15, 2023 at 12:36:10PM -0300, Arnaldo Carvalho de Melo wrote: > > Em Tue, Dec 12, 2023 at 06:31:05PM +0000, Mark Rutland escreveu: > > > On ARM it'll be essentially the same as on x86: if you open an event with > > > type==PERF_EVENT_TYPE_HARDWARE (without the extended HW type pointing to a > > > specific PMU), and with cpu==-1, it'll go to an arbitrary CPU PMU, whichever > > > happens to be found by perf_init_event() when iterating over the 'pmus' list. > > > If you open an event with type==PERF_EVENT_TYPE_HARDWARE and cpu!=-1, the event > > > will opened on the appropriate CPU PMU, by virtue of being rejected by others > > > when perf_init_event() iterates over the 'pmus' list. > > The way that it is working non on my intel hybrid system, with Kan's > > patch, is equivalent to using this on the RK3399pc board I have: > > root@roc-rk3399-pc:~# perf top -e armv8_cortex_a72/cycles/P,armv8_cortex_a53/cycles/P > > Wouldn't be better to make 'perf top' on ARM work the way is being done > > in x86 now, i.e. default to opening the two events, one per PMU and > > allow the user to switch back and forth using the TUI/stdio? > TBH, for perf top I don't know *which* behaviour is preferable, but I agree > that it'd be good for x86 and arm to work in the same way. Right, reducing the difference in the user experience accross arches. > For design-cleanliness and consistency with other commands I can see that > opening those separately is probably for the best, but for typical usage of > perf top it's really nice to have those presented together without having to > tab back-and-forth between the distinct PMUs, and so the existing behaviour of Humm, so you would want two histogram viewers, one for each PMU, side by side? > using CPU-bound PERF_EVENT_TYPE_HARDWARE events is arguably nicer for the user. So, on ARM64, start the following 'perf trace' session, then run the stock 'perf top': root@roc-rk3399-pc:~# perf trace -e perf_event_open <SNIP calls doing capability queries and setting up sideband stuff> 535.764 ( 0.015 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 19 535.783 ( 0.067 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, cpu: 1, group_fd: -1, flags: FD_CLOEXEC) = 28 535.854 ( 0.063 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, cpu: 2, group_fd: -1, flags: FD_CLOEXEC) = 29 535.920 ( 0.015 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 30 535.939 ( 0.016 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, cpu: 4, group_fd: -1, flags: FD_CLOEXEC) = 31 535.959 ( 0.011 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, cpu: 5, group_fd: -1, flags: FD_CLOEXEC) = 32 root@roc-rk3399-pc:~# grep "CPU part" /proc/cpuinfo | uniq -c 4 CPU part : 0xd03 2 CPU part : 0xd08 root@roc-rk3399-pc:~# It is already doing what you suggest, right? PERF_TYPE_HARDWARE, one counter per CPU, maps to armv8_cortex_a72/cycles/P and armv8_cortex_a53/cycles/P. One thing I'm thinking is that we should split this per PMU at the hist_entry, so that we could show how many samples/events came from each of them... - Arnaldo > I don't have a strong feeling either way; I'm personally happy so long as > explicit pmu_name/event/ events don't get silently converted into > PERF_EVENT_TYPE_HARDWARE events, and as long as we correctly set the extended > HW type when we decide to use that. > > Thanks, > Mark. > > > Kan, I also noticed that the name of the event is: > > > > 1K cpu_atom/cycles:P/ ◆ > > 11K cpu_core/cycles:P/ > > > > If I try to use that on the command line: > > > > root@number:~# perf top -e cpu_atom/cycles:P/ > > event syntax error: 'cpu_atom/cycles:P/' > > \___ Bad event or PMU > > > > Unable to find PMU or event on a PMU of 'cpu_atom' > > > > Initial error: > > event syntax error: 'cpu_atom/cycles:P/' > > \___ unknown term 'cycles:P' for pmu 'cpu_atom' > > > > valid terms: event,pc,edge,offcore_rsp,ldlat,inv,umask,cmask,config,config1,config2,config3,name,period,freq,branch_type,time,call-graph,stack-size,no-inherit,inherit,max-stack,nr,no-overwrite,overwrite,driver-config,percore,aux-output,aux-sample-size,metric-id,raw,legacy-cache,hardware > > Run 'perf list' for a list of valid events > > > > Usage: perf top [<options>] > > > > -e, --event <event> event selector. use 'perf list' to list available events > > root@number:~# > > > > It should be: > > > > "cpu_atom/cycles/P"
On 2023-12-15 10:36 a.m., Arnaldo Carvalho de Melo wrote: > Kan, I also noticed that the name of the event is: > > 1K cpu_atom/cycles:P/ > > ◆ > 11K cpu_core/cycles:P/ > > If I try to use that on the command line: > > root@number:~# perf top -e cpu_atom/cycles:P/ > event syntax error: 'cpu_atom/cycles:P/' > \___ Bad event or PMU > > Unable to find PMU or event on a PMU of 'cpu_atom' > > Initial error: > event syntax error: 'cpu_atom/cycles:P/' > \___ unknown term 'cycles:P' for pmu > 'cpu_atom' > > valid terms: > event,pc,edge,offcore_rsp,ldlat,inv,umask,cmask,config,config1,config2,config3,name,period,freq,branch_type,time,call-graph,stack-size,no-inherit,inherit,max-stack,nr,no-overwrite,overwrite,driver-config,percore,aux-output,aux-sample-size,metric-id,raw,legacy-cache,hardware > Run > 'perf list' for a list of valid events > > Usage: perf top [<options>] > > -e, --event <event> event selector. use 'perf list' to list > available events > root@number:~# > > It should be: > > "cpu_atom/cycles/P" The issue also impacts the perf record and report as well. #perf record true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.019 MB perf.data (16 samples) ] #perf report --header-only | grep event # event : name = cpu_atom/cycles:P/, , id = { 7360, 7361, 7362, 7363, 7364, 7365, 7366, 7367, 7368, 7369 }, type = 0 (PERF_TYPE_HARDWARE), size = 136, config = 0xa00000000, { sample_period, sample_freq } = 3000, sample_type = IP|TID|TIME|PERIOD|IDENTIFIER, read_format = ID|LOST, disabled = 1, inherit = 1, freq = 1, enable_on_exec = 1, precise_ip = 3, sample_id_all = 1 # event : name = cpu_core/cycles:P/, , id = { 7370, 7371, 7372, 7373, 7374, 7375, 7376, 7377, 7378, 7379, 7380, 7381 }, type = 0 (PERF_TYPE_HARDWARE), size = 136, config = 0x400000000, { sample_period, sample_freq } = 3000, sample_type = IP|TID|TIME|PERIOD|IDENTIFIER, read_format = ID|LOST, disabled = 1, inherit = 1, freq = 1, enable_on_exec = 1, precise_ip = 3, sample_id_all = 1 I think we should move all the modifiers after the "/". The below patch can fix it. https://lore.kernel.org/lkml/20231215175455.1300261-1-kan.liang@linux.intel.com/ Thanks, Kan
Em Fri, Dec 15, 2023 at 12:59:22PM -0500, Liang, Kan escreveu: > On 2023-12-15 10:36 a.m., Arnaldo Carvalho de Melo wrote: > > #perf report --header-only | grep event > # event : name = cpu_atom/cycles:P/, , id = { 7360, 7361, 7362, 7363, > 7364, 7365, 7366, 7367, 7368, 7369 }, type = 0 (PERF_TYPE_HARDWARE), > size = 136, config = 0xa00000000, { sample_period, sample_freq } = 3000, > sample_type = IP|TID|TIME|PERIOD|IDENTIFIER, read_format = ID|LOST, > disabled = 1, inherit = 1, freq = 1, enable_on_exec = 1, precise_ip = 3, > sample_id_all = 1 > # event : name = cpu_core/cycles:P/, , id = { 7370, 7371, 7372, 7373, > 7374, 7375, 7376, 7377, 7378, 7379, 7380, 7381 }, type = 0 > (PERF_TYPE_HARDWARE), size = 136, config = 0x400000000, { sample_period, > sample_freq } = 3000, sample_type = IP|TID|TIME|PERIOD|IDENTIFIER, > read_format = ID|LOST, disabled = 1, inherit = 1, freq = 1, > enable_on_exec = 1, precise_ip = 3, sample_id_all = 1 > > I think we should move all the modifiers after the "/". The below patch > can fix it. > > https://lore.kernel.org/lkml/20231215175455.1300261-1-kan.liang@linux.intel.com/ Right, I implemented it in a slightly different way, but end result should be the same: From 5dd1b7ab1ba69ebb8e070923dcc214b7b489ffc2 Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo <acme@redhat.com> Date: Fri, 15 Dec 2023 15:23:30 -0300 Subject: [PATCH 1/1] perf evlist: Move event attributes to after the / when uniquefying using the PMU name Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/evlist.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index 6f0892803c2249af..3a9505c99490b372 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -2522,7 +2522,7 @@ void evlist__warn_user_requested_cpus(struct evlist *evlist, const char *cpu_lis void evlist__uniquify_name(struct evlist *evlist) { struct evsel *pos; - char *new_name; + char *new_name, *attributes; int ret; if (perf_pmus__num_core_pmus() == 1) @@ -2535,8 +2535,16 @@ void evlist__uniquify_name(struct evlist *evlist) if (strchr(pos->name, '/')) continue; - ret = asprintf(&new_name, "%s/%s/", - pos->pmu_name, pos->name); + attributes = strchr(pos->name, ':'); + if (attributes) + *attributes = '\0'; + + ret = asprintf(&new_name, "%s/%s/%s", + pos->pmu_name, pos->name, attributes ? attributes + 1 : ""); + + if (attributes) + *attributes = ':'; + if (ret) { free(pos->name); pos->name = new_name;
On 2023-12-15 1:26 p.m., Arnaldo Carvalho de Melo wrote: > Em Fri, Dec 15, 2023 at 12:59:22PM -0500, Liang, Kan escreveu: >> On 2023-12-15 10:36 a.m., Arnaldo Carvalho de Melo wrote: >> >> #perf report --header-only | grep event >> # event : name = cpu_atom/cycles:P/, , id = { 7360, 7361, 7362, 7363, >> 7364, 7365, 7366, 7367, 7368, 7369 }, type = 0 (PERF_TYPE_HARDWARE), >> size = 136, config = 0xa00000000, { sample_period, sample_freq } = 3000, >> sample_type = IP|TID|TIME|PERIOD|IDENTIFIER, read_format = ID|LOST, >> disabled = 1, inherit = 1, freq = 1, enable_on_exec = 1, precise_ip = 3, >> sample_id_all = 1 >> # event : name = cpu_core/cycles:P/, , id = { 7370, 7371, 7372, 7373, >> 7374, 7375, 7376, 7377, 7378, 7379, 7380, 7381 }, type = 0 >> (PERF_TYPE_HARDWARE), size = 136, config = 0x400000000, { sample_period, >> sample_freq } = 3000, sample_type = IP|TID|TIME|PERIOD|IDENTIFIER, >> read_format = ID|LOST, disabled = 1, inherit = 1, freq = 1, >> enable_on_exec = 1, precise_ip = 3, sample_id_all = 1 >> >> I think we should move all the modifiers after the "/". The below patch >> can fix it. >> >> https://lore.kernel.org/lkml/20231215175455.1300261-1-kan.liang@linux.intel.com/ > > Right, I implemented it in a slightly different way, but end result > should be the same: > > From 5dd1b7ab1ba69ebb8e070923dcc214b7b489ffc2 Mon Sep 17 00:00:00 2001 > From: Arnaldo Carvalho de Melo <acme@redhat.com> > Date: Fri, 15 Dec 2023 15:23:30 -0300 > Subject: [PATCH 1/1] perf evlist: Move event attributes to after the / when > uniquefying using the PMU name > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Looks good to me and verified. Tested-by: Kan Liang <kan.liang@linux.intel.com> Thanks, Kan > --- > tools/perf/util/evlist.c | 14 +++++++++++--- > 1 file changed, 11 insertions(+), 3 deletions(-) > > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c > index 6f0892803c2249af..3a9505c99490b372 100644 > --- a/tools/perf/util/evlist.c > +++ b/tools/perf/util/evlist.c > @@ -2522,7 +2522,7 @@ void evlist__warn_user_requested_cpus(struct evlist *evlist, const char *cpu_lis > void evlist__uniquify_name(struct evlist *evlist) > { > struct evsel *pos; > - char *new_name; > + char *new_name, *attributes; > int ret; > > if (perf_pmus__num_core_pmus() == 1) > @@ -2535,8 +2535,16 @@ void evlist__uniquify_name(struct evlist *evlist) > if (strchr(pos->name, '/')) > continue; > > - ret = asprintf(&new_name, "%s/%s/", > - pos->pmu_name, pos->name); > + attributes = strchr(pos->name, ':'); > + if (attributes) > + *attributes = '\0'; > + > + ret = asprintf(&new_name, "%s/%s/%s", > + pos->pmu_name, pos->name, attributes ? attributes + 1 : ""); > + > + if (attributes) > + *attributes = ':'; > + > if (ret) { > free(pos->name); > pos->name = new_name;
Em Fri, Dec 15, 2023 at 01:53:12PM -0500, Liang, Kan escreveu: > On 2023-12-15 1:26 p.m., Arnaldo Carvalho de Melo wrote: > > Right, I implemented it in a slightly different way, but end result > > should be the same: > > From: Arnaldo Carvalho de Melo <acme@redhat.com> > > Date: Fri, 15 Dec 2023 15:23:30 -0300 > > Subject: [PATCH 1/1] perf evlist: Move event attributes to after the / when uniquefying using the PMU name > Looks good to me and verified. > Tested-by: Kan Liang <kan.liang@linux.intel.com> I ended up with a bit more simplified version: From 22ecc4601e28a12661f14ca877e39348dab6be8e Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo <acme@redhat.com> Date: Fri, 15 Dec 2023 15:23:30 -0300 Subject: [PATCH 1/1] perf evlist: Move event attributes to after the / when uniquefying using the PMU name When turning an event with attributes to the format including the PMU we need to move the "event:attributes" format to "event/attributes/" so that we can copy the event displayed and use it in the command line, i.e. in 'perf top' we had: 1K cpu_atom/cycles:P/ 11K cpu_core/cycles:P/ If I try to use that on the command line: # perf top -e cpu_atom/cycles:P/ event syntax error: 'cpu_atom/cycles:P/' \___ Bad event or PMU Unable to find PMU or event on a PMU of 'cpu_atom' Initial error: event syntax error: 'cpu_atom/cycles:P/' \___ unknown term 'cycles:P' for pmu 'cpu_atom' valid terms: event,pc,edge,offcore_rsp,ldlat,inv,umask,cmask,config,config1,config2,config3,name,period,freq,branch_type,time,call-graph,stack-size,no-inherit,inherit,max-stack,nr,no-overwrite,overwrite ,driver-config,percore,aux-output,aux-sample-size,metric-id,raw,legacy-cache,hardware Run 'perf list' for a list of valid events Usage: perf top [<options>] -e, --event <event> event selector. use 'perf list' to list available events # Cc: Hector Martin <marcan@marcan.st> Cc: Ian Rogers <irogers@google.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZXxyanyZgWBTOnoK@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/evlist.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index 6f0892803c2249af..95f25e9fb994ab2a 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -2521,9 +2521,8 @@ void evlist__warn_user_requested_cpus(struct evlist *evlist, const char *cpu_lis void evlist__uniquify_name(struct evlist *evlist) { + char *new_name, empty_attributes[2] = ":", *attributes; struct evsel *pos; - char *new_name; - int ret; if (perf_pmus__num_core_pmus() == 1) return; @@ -2535,11 +2534,17 @@ void evlist__uniquify_name(struct evlist *evlist) if (strchr(pos->name, '/')) continue; - ret = asprintf(&new_name, "%s/%s/", - pos->pmu_name, pos->name); - if (ret) { + attributes = strchr(pos->name, ':'); + if (attributes) + *attributes = '\0'; + else + attributes = empty_attributes; + + if (asprintf(&new_name, "%s/%s/%s", pos->pmu_name, pos->name, attributes + 1)) { free(pos->name); pos->name = new_name; + } else { + *attributes = ':'; } } }
On 2023-12-18 3:23 p.m., Arnaldo Carvalho de Melo wrote: > Em Fri, Dec 15, 2023 at 01:53:12PM -0500, Liang, Kan escreveu: >> On 2023-12-15 1:26 p.m., Arnaldo Carvalho de Melo wrote: >>> Right, I implemented it in a slightly different way, but end result >>> should be the same: > >>> From: Arnaldo Carvalho de Melo <acme@redhat.com> >>> Date: Fri, 15 Dec 2023 15:23:30 -0300 >>> Subject: [PATCH 1/1] perf evlist: Move event attributes to after the / when uniquefying using the PMU name > >> Looks good to me and verified. > >> Tested-by: Kan Liang <kan.liang@linux.intel.com> > > I ended up with a bit more simplified version: > > From 22ecc4601e28a12661f14ca877e39348dab6be8e Mon Sep 17 00:00:00 2001 > From: Arnaldo Carvalho de Melo <acme@redhat.com> > Date: Fri, 15 Dec 2023 15:23:30 -0300 > Subject: [PATCH 1/1] perf evlist: Move event attributes to after the / when > uniquefying using the PMU name > > When turning an event with attributes to the format including the PMU we > need to move the "event:attributes" format to "event/attributes/" so > that we can copy the event displayed and use it in the command line, > i.e. in 'perf top' we had: > > 1K cpu_atom/cycles:P/ > 11K cpu_core/cycles:P/ > > If I try to use that on the command line: > > # perf top -e cpu_atom/cycles:P/ > event syntax error: 'cpu_atom/cycles:P/' > \___ Bad event or PMU > > Unable to find PMU or event on a PMU of 'cpu_atom' > > Initial error: > event syntax error: 'cpu_atom/cycles:P/' > \___ unknown term 'cycles:P' for pmu > 'cpu_atom' > > valid terms: > > event,pc,edge,offcore_rsp,ldlat,inv,umask,cmask,config,config1,config2,config3,name,period,freq,branch_type,time,call-graph,stack-size,no-inherit,inherit,max-stack,nr,no-overwrite,overwrite ,driver-config,percore,aux-output,aux-sample-size,metric-id,raw,legacy-cache,hardware > Run > 'perf list' for a list of valid events > > Usage: perf top [<options>] > > -e, --event <event> event selector. use 'perf list' to list available events > # > > Cc: Hector Martin <marcan@marcan.st> > Cc: Ian Rogers <irogers@google.com> > Cc: Kan Liang <kan.liang@linux.intel.com> > Cc: Marc Zyngier <maz@kernel.org> > Cc: Mark Rutland <mark.rutland@arm.com> > Cc: Namhyung Kim <namhyung@kernel.org> > Link: https://lore.kernel.org/lkml/ZXxyanyZgWBTOnoK@kernel.org > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Thanks, Kan > --- > tools/perf/util/evlist.c | 15 ++++++++++----- > 1 file changed, 10 insertions(+), 5 deletions(-) > > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c > index 6f0892803c2249af..95f25e9fb994ab2a 100644 > --- a/tools/perf/util/evlist.c > +++ b/tools/perf/util/evlist.c > @@ -2521,9 +2521,8 @@ void evlist__warn_user_requested_cpus(struct evlist *evlist, const char *cpu_lis > > void evlist__uniquify_name(struct evlist *evlist) > { > + char *new_name, empty_attributes[2] = ":", *attributes; > struct evsel *pos; > - char *new_name; > - int ret; > > if (perf_pmus__num_core_pmus() == 1) > return; > @@ -2535,11 +2534,17 @@ void evlist__uniquify_name(struct evlist *evlist) > if (strchr(pos->name, '/')) > continue; > > - ret = asprintf(&new_name, "%s/%s/", > - pos->pmu_name, pos->name); > - if (ret) { > + attributes = strchr(pos->name, ':'); > + if (attributes) > + *attributes = '\0'; > + else > + attributes = empty_attributes; > + > + if (asprintf(&new_name, "%s/%s/%s", pos->pmu_name, pos->name, attributes + 1)) { > free(pos->name); > pos->name = new_name; > + } else { > + *attributes = ':'; > } > } > }
On Fri, Dec 15, 2023 at 02:49:14PM -0300, Arnaldo Carvalho de Melo wrote: > Em Fri, Dec 15, 2023 at 04:51:49PM +0000, Mark Rutland escreveu: > > On Fri, Dec 15, 2023 at 12:36:10PM -0300, Arnaldo Carvalho de Melo wrote: > > > Em Tue, Dec 12, 2023 at 06:31:05PM +0000, Mark Rutland escreveu: > > > > On ARM it'll be essentially the same as on x86: if you open an event with > > > > type==PERF_EVENT_TYPE_HARDWARE (without the extended HW type pointing to a > > > > specific PMU), and with cpu==-1, it'll go to an arbitrary CPU PMU, whichever > > > > happens to be found by perf_init_event() when iterating over the 'pmus' list. > > > > > If you open an event with type==PERF_EVENT_TYPE_HARDWARE and cpu!=-1, the event > > > > will opened on the appropriate CPU PMU, by virtue of being rejected by others > > > > when perf_init_event() iterates over the 'pmus' list. > > > > The way that it is working non on my intel hybrid system, with Kan's > > > patch, is equivalent to using this on the RK3399pc board I have: > > > > root@roc-rk3399-pc:~# perf top -e armv8_cortex_a72/cycles/P,armv8_cortex_a53/cycles/P > > > > Wouldn't be better to make 'perf top' on ARM work the way is being done > > > in x86 now, i.e. default to opening the two events, one per PMU and > > > allow the user to switch back and forth using the TUI/stdio? > > > TBH, for perf top I don't know *which* behaviour is preferable, but I agree > > that it'd be good for x86 and arm to work in the same way. > > Right, reducing the difference in the user experience accross arches. > > > For design-cleanliness and consistency with other commands I can see that > > opening those separately is probably for the best, but for typical usage of > > perf top it's really nice to have those presented together without having to > > tab back-and-forth between the distinct PMUs, and so the existing behaviour of > > Humm, so you would want two histogram viewers, one for each PMU, side by > side? I had meant as an aggregated view (the same as what you'd get if you opened one plain PERF_TYPE_HARDWARE event per cpu); I hadn't considered side-by-side views. To be clear, I'm personally happy to tab between per-pmu views, and if that's how x86 works today for heterogeneous PMUs, I'm fine with the same for arm/arm64. I was trying to say that I didn't have a strong preference. > > using CPU-bound PERF_EVENT_TYPE_HARDWARE events is arguably nicer for the user. > > So, on ARM64, start the following 'perf trace' session, then run the > stock 'perf top': > > root@roc-rk3399-pc:~# perf trace -e perf_event_open > <SNIP calls doing capability queries and setting up sideband stuff> > 535.764 ( 0.015 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 19 > 535.783 ( 0.067 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, cpu: 1, group_fd: -1, flags: FD_CLOEXEC) = 28 > 535.854 ( 0.063 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, cpu: 2, group_fd: -1, flags: FD_CLOEXEC) = 29 > 535.920 ( 0.015 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 30 > 535.939 ( 0.016 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, cpu: 4, group_fd: -1, flags: FD_CLOEXEC) = 31 > 535.959 ( 0.011 ms): perf/15627 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 }, pid: -1, cpu: 5, group_fd: -1, flags: FD_CLOEXEC) = 32 > > root@roc-rk3399-pc:~# grep "CPU part" /proc/cpuinfo | uniq -c > 4 CPU part : 0xd03 > 2 CPU part : 0xd08 > root@roc-rk3399-pc:~# > > It is already doing what you suggest, right? PERF_TYPE_HARDWARE, one > counter per CPU, maps to armv8_cortex_a72/cycles/P and > armv8_cortex_a53/cycles/P. Sounds like it; as above I'm happy for that to change to per-PMU views. > One thing I'm thinking is that we should split this per PMU at the > hist_entry, so that we could show how many samples/events came from each > of them... That sounds sensible to me. Mark. > > - Arnaldo > > > I don't have a strong feeling either way; I'm personally happy so long as > > explicit pmu_name/event/ events don't get silently converted into > > PERF_EVENT_TYPE_HARDWARE events, and as long as we correctly set the extended > > HW type when we decide to use that. > > > > Thanks, > > Mark. > > > > > Kan, I also noticed that the name of the event is: > > > > > > 1K cpu_atom/cycles:P/ ◆ > > > 11K cpu_core/cycles:P/ > > > > > > If I try to use that on the command line: > > > > > > root@number:~# perf top -e cpu_atom/cycles:P/ > > > event syntax error: 'cpu_atom/cycles:P/' > > > \___ Bad event or PMU > > > > > > Unable to find PMU or event on a PMU of 'cpu_atom' > > > > > > Initial error: > > > event syntax error: 'cpu_atom/cycles:P/' > > > \___ unknown term 'cycles:P' for pmu 'cpu_atom' > > > > > > valid terms: event,pc,edge,offcore_rsp,ldlat,inv,umask,cmask,config,config1,config2,config3,name,period,freq,branch_type,time,call-graph,stack-size,no-inherit,inherit,max-stack,nr,no-overwrite,overwrite,driver-config,percore,aux-output,aux-sample-size,metric-id,raw,legacy-cache,hardware > > > Run 'perf list' for a list of valid events > > > > > > Usage: perf top [<options>] > > > > > > -e, --event <event> event selector. use 'perf list' to list available events > > > root@number:~# > > > > > > It should be: > > > > > > "cpu_atom/cycles/P"
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c index ea8c7eca5eee..cce9350177e2 100644 --- a/tools/perf/builtin-top.c +++ b/tools/perf/builtin-top.c @@ -1027,8 +1027,8 @@ static int perf_top__start_counters(struct perf_top *top) evlist__for_each_entry(evlist, counter) { try_again: - if (evsel__open(counter, top->evlist->core.user_requested_cpus, - top->evlist->core.threads) < 0) { + if (evsel__open(counter, counter->core.cpus, + counter->core.threads) < 0) { /* * Specially handle overwrite fall back.