Message ID | 20230616031420.3751973-1-kan.liang@linux.intel.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp1068406vqr; Thu, 15 Jun 2023 20:45:54 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5Uo43nxf1VgXPTPrad29qE2EU8Zp1V4hyRQJDq/QDF22VUwJ+gxb7tFtsTfmGHF/VVLjAq X-Received: by 2002:a05:6a20:2583:b0:106:5dff:5dc6 with SMTP id k3-20020a056a20258300b001065dff5dc6mr1560177pzd.16.1686887154534; Thu, 15 Jun 2023 20:45:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686887154; cv=none; d=google.com; s=arc-20160816; b=LVoGy3sTa2mIEsk1AbRkz0Mv6UI/RadRziKzYfGSmd3Nzo9pMI9xnF+hfynr50Dsfl CcYJJ7dYXicqkpe3nEP2mks0zQDWzEL8L1F/1LgWL2GBg8DljKfU/+Bgbbf9vT4UQAE+ Ox2e3krhQ3nO0q/we97L52dBq4sLzVd9rdGujCIZu4aIw44/vll0TVn3KRc9sMIibp4A ywuGuTkk0AepJ2XsV0wAeQR2/fMF1bGP/1x/kqYwCi13eDOjEV4kjg830+vKiAHu4Pru JyunM0EtZ+dyCm4pTE9wo21u7RZtt+sKHK+twvZ9fcyWKWFaFWZn0bnPeDWWLnlFnUIs RY5w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=m46cXj5j/5L0gSqRcg9NA6d9tKrC5XXh46exhtAmtI4=; b=lIKhGOggiff+/keAIZZYvHH287aFWPtPk9aUCQyiToWV8E3JmPfKNyun2RoMaUyP5i p55ItCF5Xvpo8NXG/lqDNXpLdgfdbWUXPUOhmplAjaoh9ITHmVPMbSXsq6FhdARL/Ous P485pavLJ0iNxwpvWQNn4H1R7DKkPwRaQj1itSU33/Wtb7UkqworFi/W+8FTW8KxZmYr jGe4iDk8oJ7iQptpcwjjoO23ZA3z3u1pmNzaVKlx1KlC+0eFExxtfjmfdajnah+EdcCY kdQTsNqqYqUQohl57aL4/4anjdCoovOS/YgF045um072tJrMxjQ8x0yJMIc4G9mUn+jW lsEA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=R9mPMSBl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i62-20020a625441000000b00665a749200asi643031pfb.70.2023.06.15.20.45.40; Thu, 15 Jun 2023 20:45:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=R9mPMSBl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241799AbjFPDPi (ORCPT <rfc822;maxin.john@gmail.com> + 99 others); Thu, 15 Jun 2023 23:15:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40356 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241838AbjFPDO4 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 15 Jun 2023 23:14:56 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9A2AB2132; Thu, 15 Jun 2023 20:14:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686885288; x=1718421288; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=j6V55rCFU2w6lIcgKhvWcpNfOzmEBuL+14YH+IvcGk4=; b=R9mPMSBlwTxzeKvzIcRLv5SvBXH69W/MwstUVU/Xbfp+JMnQcBGw506o OBpzm9aip2f8nb9BRz9mOnyJOC0hQNS96zXy8UPOsyeW5C9enTWqImAVA GOWfzcR9TM+ozLKdMUJTmBPaaIFyAowfCiXzNFfKGKGe3TGOjeZ9q7eEl RD0uJ/dysJujyjwejfJQlyZ54nKyK2nYosnXy6YM+jkWGhNv/8f2Yc8QL +vCxZ+WW+DZolRg2Wq9fuXKcvNespziAbm3dwHH8iKxkcFjSDguCGvi4i aUm+PfAx2/83/gGiQ9ToUs7x3+nACPBl2XCZPOzMMrpGAvn96T0uDZqBH Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10742"; a="338718185" X-IronPort-AV: E=Sophos;i="6.00,246,1681196400"; d="scan'208";a="338718185" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jun 2023 20:14:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10742"; a="825555582" X-IronPort-AV: E=Sophos;i="6.00,246,1681196400"; d="scan'208";a="825555582" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by fmsmga002.fm.intel.com with ESMTP; 15 Jun 2023 20:14:47 -0700 From: kan.liang@linux.intel.com To: acme@kernel.org, mingo@redhat.com, peterz@infradead.org, irogers@google.com, namhyung@kernel.org, jolsa@kernel.org, adrian.hunter@intel.com, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Cc: ak@linux.intel.com, eranian@google.com, ahmad.yasin@intel.com, Kan Liang <kan.liang@linux.intel.com> Subject: [PATCH V4 0/5] New metricgroup output in perf stat default mode Date: Thu, 15 Jun 2023 20:14:15 -0700 Message-Id: <20230616031420.3751973-1-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.35.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_PASS, SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768779336236297414?= X-GMAIL-MSGID: =?utf-8?q?1768829384850483676?= |
Series |
New metricgroup output in perf stat default mode
|
|
Message
Liang, Kan
June 16, 2023, 3:14 a.m. UTC
From: Kan Liang <kan.liang@linux.intel.com>
Changes since V3:
- Move the full name (PMU + metricgroup name) generation from the metric
code to the output code. (Ian)
- Add default tags for Hisi hip08 L1 metrics (John)
- Some patches have been merged. Drop them from the V4.
Changes since V2:
- Fixes memory leak (Ian)
(Ian, I cannot reproduce the memory leak on all my machines. Please
check whether the fix works on your side. Thanks.)
- Add Reviewed-by tags for several patches.
Changes since V1:
- Remove EVSEL_EVENT_MASK and use the __evsel__match which is suggested
by Ian.
- Support TopdownL1 on both e-core and p-core of ADL in the default
mode. (Ian)
- Have separate patches for the modifications of metricgroup and output.
(Ian)
- Does 2nd sort for the Default metricgroup. Remove the logic of
changing the associated metric event. (Ian)
- Move all the metric related code to stat-shadow (Ian)
- Move the commong functions between stat+csv_output and stat+std_output
to the lib directory (Ian)
In the default mode, the current output of the metricgroup include both
events and metrics, which is not necessary and makes the output hard to
read. Also, different ARCHs (even different generations of the ARCH) may
have a different output format because of the different events in a
metrics.
The patch proposes a new output format which only outputting the value
of each metric and the metricgroup name. It can brings a clean and
consistent output format among ARCHs and generations.
The patches 1-2 introduce the new metricgroup output.
The patches 3-4 improve the tests to cover the default mode.
The patch 5 update the event list for Hisi hip08.
Here are some examples for the new output.
STD output:
On SPR
perf stat -a sleep 1
Performance counter stats for 'system wide':
226,054.13 msec cpu-clock # 224.588 CPUs utilized
932 context-switches # 4.123 /sec
224 cpu-migrations # 0.991 /sec
76 page-faults # 0.336 /sec
45,940,682 cycles # 0.000 GHz
36,676,047 instructions # 0.80 insn per cycle
7,044,516 branches # 31.163 K/sec
62,169 branch-misses # 0.88% of all branches
TopdownL1 # 68.7 % tma_backend_bound
# 3.1 % tma_bad_speculation
# 13.0 % tma_frontend_bound
# 15.2 % tma_retiring
TopdownL2 # 2.7 % tma_branch_mispredicts
# 19.6 % tma_core_bound
# 4.8 % tma_fetch_bandwidth
# 8.3 % tma_fetch_latency
# 2.9 % tma_heavy_operations
# 12.3 % tma_light_operations
# 0.4 % tma_machine_clears
# 49.1 % tma_memory_bound
1.006529767 seconds time elapsed
perf stat -a sleep 1
Performance counter stats for 'system wide':
32,127.99 msec cpu-clock # 31.992 CPUs utilized
240 context-switches # 7.470 /sec
32 cpu-migrations # 0.996 /sec
74 page-faults # 2.303 /sec
6,313,960 cpu_core/cycles/ # 0.000 GHz
257,711,907 cpu_atom/cycles/ # 0.008 GHz (54.18%)
4,477,162 cpu_core/instructions/ # 0.71 insn per cycle
37,721,481 cpu_atom/instructions/ # 5.97 insn per cycle (63.33%)
809,747 cpu_core/branches/ # 25.204 K/sec
6,621,226 cpu_atom/branches/ # 206.089 K/sec (63.32%)
39,667 cpu_core/branch-misses/ # 4.90% of all branches
1,032,146 cpu_atom/branch-misses/ # 127.47% of all branches (63.33%)
TopdownL1 (cpu_core) # nan % tma_backend_bound
# 0.0 % tma_bad_speculation
# nan % tma_frontend_bound
# nan % tma_retiring
TopdownL1 (cpu_atom) # 13.6 % tma_bad_speculation (63.36%)
# 41.1 % tma_frontend_bound (63.54%)
# 39.2 % tma_backend_bound
# 39.2 % tma_backend_bound_aux (63.93%)
# 5.4 % tma_retiring (64.15%)
1.004244114 seconds time elapsed
JSON output
on SPR
perf stat --json -a sleep 1
{"counter-value" : "225904.823297", "unit" : "msec", "event" : "cpu-clock", "event-runtime" : 225904323425, "pcnt-running" : 100.00, "metric-value" : "224.456872", "metric-unit" : "CPUs utilized"}
{"counter-value" : "986.000000", "unit" : "", "event" : "context-switches", "event-runtime" : 225904108985, "pcnt-running" : 100.00, "metric-value" : "4.364670", "metric-unit" : "/sec"}
{"counter-value" : "224.000000", "unit" : "", "event" : "cpu-migrations", "event-runtime" : 225904016141, "pcnt-running" : 100.00, "metric-value" : "0.991568", "metric-unit" : "/sec"}
{"counter-value" : "76.000000", "unit" : "", "event" : "page-faults", "event-runtime" : 225903913270, "pcnt-running" : 100.00, "metric-value" : "0.336425", "metric-unit" : "/sec"}
{"counter-value" : "48433482.000000", "unit" : "", "event" : "cycles", "event-runtime" : 225903792732, "pcnt-running" : 100.00, "metric-value" : "0.000214", "metric-unit" : "GHz"}
{"counter-value" : "38620409.000000", "unit" : "", "event" : "instructions", "event-runtime" : 225903657830, "pcnt-running" : 100.00, "metric-value" : "0.797391", "metric-unit" : "insn per cycle"}
{"counter-value" : "7369473.000000", "unit" : "", "event" : "branches", "event-runtime" : 225903464328, "pcnt-running" : 100.00, "metric-value" : "32.622026", "metric-unit" : "K/sec"}
{"counter-value" : "54747.000000", "unit" : "", "event" : "branch-misses", "event-runtime" : 225903234523, "pcnt-running" : 100.00, "metric-value" : "0.742889", "metric-unit" : "of all branches"}
{"event-runtime" : 225902840555, "pcnt-running" : 100.00, "metricgroup" : "TopdownL1"}
{"metric-value" : "69.950631", "metric-unit" : "% tma_backend_bound"}
{"metric-value" : "2.771783", "metric-unit" : "% tma_bad_speculation"}
{"metric-value" : "12.026074", "metric-unit" : "% tma_frontend_bound"}
{"metric-value" : "15.251513", "metric-unit" : "% tma_retiring"}
{"event-runtime" : 225902840555, "pcnt-running" : 100.00, "metricgroup" : "TopdownL2"}
{"metric-value" : "2.351757", "metric-unit" : "% tma_branch_mispredicts"}
{"metric-value" : "19.729771", "metric-unit" : "% tma_core_bound"}
{"metric-value" : "4.555207", "metric-unit" : "% tma_fetch_bandwidth"}
{"metric-value" : "7.470867", "metric-unit" : "% tma_fetch_latency"}
{"metric-value" : "2.938808", "metric-unit" : "% tma_heavy_operations"}
{"metric-value" : "12.312705", "metric-unit" : "% tma_light_operations"}
{"metric-value" : "0.420026", "metric-unit" : "% tma_machine_clears"}
{"metric-value" : "50.220860", "metric-unit" : "% tma_memory_bound"}
On hybrid
perf stat --json -a sleep 1
{"counter-value" : "32131.530625", "unit" : "msec", "event" : "cpu-clock", "event-runtime" : 32131536951, "pcnt-running" : 100.00, "metric-value" : "31.992642", "metric-unit" : "CPUs utilized"}
{"counter-value" : "328.000000", "unit" : "", "event" : "context-switches", "event-runtime" : 32131525778, "pcnt-running" : 100.00, "metric-value" : "10.208042", "metric-unit" : "/sec"}
{"counter-value" : "32.000000", "unit" : "", "event" : "cpu-migrations", "event-runtime" : 32131515104, "pcnt-running" : 100.00, "metric-value" : "0.995906", "metric-unit" : "/sec"}
{"counter-value" : "353.000000", "unit" : "", "event" : "page-faults", "event-runtime" : 32131501396, "pcnt-running" : 100.00, "metric-value" : "10.986094", "metric-unit" : "/sec"}
{"counter-value" : "18685492.000000", "unit" : "", "event" : "cpu_core/cycles/", "event-runtime" : 16061585292, "pcnt-running" : 100.00, "metric-value" : "0.000582", "metric-unit" : "GHz"}
{"counter-value" : "255620352.000000", "unit" : "", "event" : "cpu_atom/cycles/", "event-runtime" : 8690268422, "pcnt-running" : 54.00, "metric-value" : "0.007955", "metric-unit" : "GHz"}
{"counter-value" : "15489913.000000", "unit" : "", "event" : "cpu_core/instructions/", "event-runtime" : 16061582200, "pcnt-running" : 100.00, "metric-value" : "0.828981", "metric-unit" : "insn per cycle"}
{"counter-value" : "38790161.000000", "unit" : "", "event" : "cpu_atom/instructions/", "event-runtime" : 10163133324, "pcnt-running" : 63.00, "metric-value" : "2.075951", "metric-unit" : "insn per cycle"}
{"counter-value" : "2908031.000000", "unit" : "", "event" : "cpu_core/branches/", "event-runtime" : 16061563416, "pcnt-running" : 100.00, "metric-value" : "90.503967", "metric-unit" : "K/sec"}
{"counter-value" : "6814948.000000", "unit" : "", "event" : "cpu_atom/branches/", "event-runtime" : 10161711336, "pcnt-running" : 63.00, "metric-value" : "212.095343", "metric-unit" : "K/sec"}
{"counter-value" : "97638.000000", "unit" : "", "event" : "cpu_core/branch-misses/", "event-runtime" : 16061535261, "pcnt-running" : 100.00, "metric-value" : "3.357530", "metric-unit" : "of all branches"}
{"counter-value" : "1017066.000000", "unit" : "", "event" : "cpu_atom/branch-misses/", "event-runtime" : 10159971797, "pcnt-running" : 63.00, "metric-value" : "34.974386", "metric-unit" : "of all branches"}
{"event-runtime" : 16061513607, "pcnt-running" : 100.00, "metricgroup" : "TopdownL1 (cpu_core)"}
{"metric-value" : "nan", "metric-unit" : "% tma_backend_bound"}
{"metric-value" : "0.000000", "metric-unit" : "% tma_bad_speculation"}
{"metric-value" : "nan", "metric-unit" : "% tma_frontend_bound"}
{"metric-value" : "nan", "metric-unit" : "% tma_retiring"}
{"event-runtime" : 10157398501, "pcnt-running" : 63.00, "metricgroup" : "TopdownL1 (cpu_atom)"}
{"metric-value" : "13.719821", "metric-unit" : "% tma_bad_speculation"}
{"event-runtime" : 10178698656, "pcnt-running" : 63.00, "metric-value" : "41.016738", "metric-unit" : "% tma_frontend_bound"}
{"event-runtime" : 10240582902, "pcnt-running" : 63.00, "metric-value" : "39.327764", "metric-unit" : "% tma_backend_bound"}
{"metric-value" : "39.327764", "metric-unit" : "% tma_backend_bound_aux"}
{"event-runtime" : 10284284920, "pcnt-running" : 64.00, "metric-value" : "5.374638", "metric-unit" : "% tma_retiring"}
CSV output
On SPR
perf stat -x, -a sleep 1
225851.20,msec,cpu-clock,225850700108,100.00,224.431,CPUs utilized
976,,context-switches,225850504803,100.00,4.321,/sec
224,,cpu-migrations,225850410336,100.00,0.992,/sec
76,,page-faults,225850304155,100.00,0.337,/sec
52288305,,cycles,225850188531,100.00,0.000,GHz
37977214,,instructions,225850071251,100.00,0.73,insn per cycle
7299859,,branches,225849890722,100.00,32.322,K/sec
51102,,branch-misses,225849672536,100.00,0.70,of all branches
,225849327050,100.00,,,,TopdownL1
,,,,,70.1,% tma_backend_bound
,,,,,2.7,% tma_bad_speculation
,,,,,12.5,% tma_frontend_bound
,,,,,14.6,% tma_retiring
,225849327050,100.00,,,,TopdownL2
,,,,,2.3,% tma_branch_mispredicts
,,,,,19.6,% tma_core_bound
,,,,,4.6,% tma_fetch_bandwidth
,,,,,7.9,% tma_fetch_latency
,,,,,2.9,% tma_heavy_operations
,,,,,11.7,% tma_light_operations
,,,,,0.5,% tma_machine_clears
,,,,,50.5,% tma_memory_bound
On Hybrid
perf stat -x, -a sleep 1
32139.34,msec,cpu-clock,32139351409,100.00,32.001,CPUs utilized
225,,context-switches,32139342672,100.00,7.001,/sec
32,,cpu-migrations,32139337772,100.00,0.996,/sec
72,,page-faults,32139328384,100.00,2.240,/sec
6766433,,cpu_core/cycles/,16067551558,100.00,0.000,GHz
256500230,,cpu_atom/cycles/,8695757391,54.00,0.008,GHz
4688595,,cpu_core/instructions/,16067558976,100.00,0.69,insn per cycle
37487490,,cpu_atom/instructions/,10165193856,63.00,5.54,insn per cycle
845211,,cpu_core/branches/,16067540225,100.00,26.298,K/sec
6571193,,cpu_atom/branches/,10155940853,63.00,204.459,K/sec
41359,,cpu_core/branch-misses/,16067516493,100.00,4.89,of all branches
1020231,,cpu_atom/branch-misses/,10159363620,63.00,120.71,of all branches
,16067494476,100.00,,,,TopdownL1 (cpu_core)
,,,,,,% tma_backend_bound
,,,,,0.0,% tma_bad_speculation
,,,,,,% tma_frontend_bound
,,,,,,% tma_retiring
,10160989992,63.00,,,,TopdownL1 (cpu_atom)
,,,,,13.8,% tma_bad_speculation
,10188319019,63.00,,,41.3,% tma_frontend_bound
,10258326591,63.00,,,38.6,% tma_backend_bound
,,,,,38.6,% tma_backend_bound_aux
,10282689488,64.00,,,5.4,% tma_retiring
Kan Liang (5):
perf metrics: Sort the Default metricgroup
perf stat: New metricgroup output for the default mode
perf test: Move all the check functions of stat csv output to lib
perf test: Add test case for the standard perf stat output
perf vendor events arm64: Add default tags for Hisi hip08 L1 metrics
tools/perf/builtin-stat.c | 1 +
.../arch/arm64/hisilicon/hip08/metrics.json | 12 +-
tools/perf/tests/shell/lib/stat_output.sh | 169 ++++++++++++++++
tools/perf/tests/shell/stat+csv_output.sh | 188 ++----------------
tools/perf/tests/shell/stat+std_output.sh | 108 ++++++++++
tools/perf/util/evsel.h | 1 +
tools/perf/util/metricgroup.c | 26 +++
tools/perf/util/metricgroup.h | 3 +
tools/perf/util/stat-display.c | 108 +++++++++-
tools/perf/util/stat-shadow.c | 131 ++++++++++--
tools/perf/util/stat.h | 15 ++
11 files changed, 563 insertions(+), 199 deletions(-)
create mode 100755 tools/perf/tests/shell/lib/stat_output.sh
create mode 100755 tools/perf/tests/shell/stat+std_output.sh
Comments
On Thu, Jun 15, 2023 at 8:14 PM <kan.liang@linux.intel.com> wrote: > > From: Kan Liang <kan.liang@linux.intel.com> > > Changes since V3: > - Move the full name (PMU + metricgroup name) generation from the metric > code to the output code. (Ian) > - Add default tags for Hisi hip08 L1 metrics (John) > - Some patches have been merged. Drop them from the V4. > > Changes since V2: > - Fixes memory leak (Ian) > (Ian, I cannot reproduce the memory leak on all my machines. Please > check whether the fix works on your side. Thanks.) > - Add Reviewed-by tags for several patches. > > Changes since V1: > - Remove EVSEL_EVENT_MASK and use the __evsel__match which is suggested > by Ian. > - Support TopdownL1 on both e-core and p-core of ADL in the default > mode. (Ian) > - Have separate patches for the modifications of metricgroup and output. > (Ian) > - Does 2nd sort for the Default metricgroup. Remove the logic of > changing the associated metric event. (Ian) > - Move all the metric related code to stat-shadow (Ian) > - Move the commong functions between stat+csv_output and stat+std_output > to the lib directory (Ian) > > In the default mode, the current output of the metricgroup include both > events and metrics, which is not necessary and makes the output hard to > read. Also, different ARCHs (even different generations of the ARCH) may > have a different output format because of the different events in a > metrics. > > The patch proposes a new output format which only outputting the value > of each metric and the metricgroup name. It can brings a clean and > consistent output format among ARCHs and generations. > > The patches 1-2 introduce the new metricgroup output. > > The patches 3-4 improve the tests to cover the default mode. > > The patch 5 update the event list for Hisi hip08. > > Here are some examples for the new output. > > STD output: > > On SPR > > perf stat -a sleep 1 > > Performance counter stats for 'system wide': > > 226,054.13 msec cpu-clock # 224.588 CPUs utilized > 932 context-switches # 4.123 /sec > 224 cpu-migrations # 0.991 /sec > 76 page-faults # 0.336 /sec > 45,940,682 cycles # 0.000 GHz > 36,676,047 instructions # 0.80 insn per cycle > 7,044,516 branches # 31.163 K/sec > 62,169 branch-misses # 0.88% of all branches > TopdownL1 # 68.7 % tma_backend_bound > # 3.1 % tma_bad_speculation > # 13.0 % tma_frontend_bound > # 15.2 % tma_retiring > TopdownL2 # 2.7 % tma_branch_mispredicts > # 19.6 % tma_core_bound > # 4.8 % tma_fetch_bandwidth > # 8.3 % tma_fetch_latency > # 2.9 % tma_heavy_operations > # 12.3 % tma_light_operations > # 0.4 % tma_machine_clears > # 49.1 % tma_memory_bound > > 1.006529767 seconds time elapsed > > perf stat -a sleep 1 > > Performance counter stats for 'system wide': > > 32,127.99 msec cpu-clock # 31.992 CPUs utilized > 240 context-switches # 7.470 /sec > 32 cpu-migrations # 0.996 /sec > 74 page-faults # 2.303 /sec > 6,313,960 cpu_core/cycles/ # 0.000 GHz > 257,711,907 cpu_atom/cycles/ # 0.008 GHz (54.18%) > 4,477,162 cpu_core/instructions/ # 0.71 insn per cycle > 37,721,481 cpu_atom/instructions/ # 5.97 insn per cycle (63.33%) > 809,747 cpu_core/branches/ # 25.204 K/sec > 6,621,226 cpu_atom/branches/ # 206.089 K/sec (63.32%) > 39,667 cpu_core/branch-misses/ # 4.90% of all branches > 1,032,146 cpu_atom/branch-misses/ # 127.47% of all branches (63.33%) > TopdownL1 (cpu_core) # nan % tma_backend_bound > # 0.0 % tma_bad_speculation > # nan % tma_frontend_bound > # nan % tma_retiring > TopdownL1 (cpu_atom) # 13.6 % tma_bad_speculation (63.36%) > # 41.1 % tma_frontend_bound (63.54%) > # 39.2 % tma_backend_bound > # 39.2 % tma_backend_bound_aux (63.93%) > # 5.4 % tma_retiring (64.15%) > > 1.004244114 seconds time elapsed > > JSON output > > on SPR > > perf stat --json -a sleep 1 > {"counter-value" : "225904.823297", "unit" : "msec", "event" : "cpu-clock", "event-runtime" : 225904323425, "pcnt-running" : 100.00, "metric-value" : "224.456872", "metric-unit" : "CPUs utilized"} > {"counter-value" : "986.000000", "unit" : "", "event" : "context-switches", "event-runtime" : 225904108985, "pcnt-running" : 100.00, "metric-value" : "4.364670", "metric-unit" : "/sec"} > {"counter-value" : "224.000000", "unit" : "", "event" : "cpu-migrations", "event-runtime" : 225904016141, "pcnt-running" : 100.00, "metric-value" : "0.991568", "metric-unit" : "/sec"} > {"counter-value" : "76.000000", "unit" : "", "event" : "page-faults", "event-runtime" : 225903913270, "pcnt-running" : 100.00, "metric-value" : "0.336425", "metric-unit" : "/sec"} > {"counter-value" : "48433482.000000", "unit" : "", "event" : "cycles", "event-runtime" : 225903792732, "pcnt-running" : 100.00, "metric-value" : "0.000214", "metric-unit" : "GHz"} > {"counter-value" : "38620409.000000", "unit" : "", "event" : "instructions", "event-runtime" : 225903657830, "pcnt-running" : 100.00, "metric-value" : "0.797391", "metric-unit" : "insn per cycle"} > {"counter-value" : "7369473.000000", "unit" : "", "event" : "branches", "event-runtime" : 225903464328, "pcnt-running" : 100.00, "metric-value" : "32.622026", "metric-unit" : "K/sec"} > {"counter-value" : "54747.000000", "unit" : "", "event" : "branch-misses", "event-runtime" : 225903234523, "pcnt-running" : 100.00, "metric-value" : "0.742889", "metric-unit" : "of all branches"} > {"event-runtime" : 225902840555, "pcnt-running" : 100.00, "metricgroup" : "TopdownL1"} > {"metric-value" : "69.950631", "metric-unit" : "% tma_backend_bound"} > {"metric-value" : "2.771783", "metric-unit" : "% tma_bad_speculation"} > {"metric-value" : "12.026074", "metric-unit" : "% tma_frontend_bound"} > {"metric-value" : "15.251513", "metric-unit" : "% tma_retiring"} > {"event-runtime" : 225902840555, "pcnt-running" : 100.00, "metricgroup" : "TopdownL2"} > {"metric-value" : "2.351757", "metric-unit" : "% tma_branch_mispredicts"} > {"metric-value" : "19.729771", "metric-unit" : "% tma_core_bound"} > {"metric-value" : "4.555207", "metric-unit" : "% tma_fetch_bandwidth"} > {"metric-value" : "7.470867", "metric-unit" : "% tma_fetch_latency"} > {"metric-value" : "2.938808", "metric-unit" : "% tma_heavy_operations"} > {"metric-value" : "12.312705", "metric-unit" : "% tma_light_operations"} > {"metric-value" : "0.420026", "metric-unit" : "% tma_machine_clears"} > {"metric-value" : "50.220860", "metric-unit" : "% tma_memory_bound"} > > On hybrid > > perf stat --json -a sleep 1 > {"counter-value" : "32131.530625", "unit" : "msec", "event" : "cpu-clock", "event-runtime" : 32131536951, "pcnt-running" : 100.00, "metric-value" : "31.992642", "metric-unit" : "CPUs utilized"} > {"counter-value" : "328.000000", "unit" : "", "event" : "context-switches", "event-runtime" : 32131525778, "pcnt-running" : 100.00, "metric-value" : "10.208042", "metric-unit" : "/sec"} > {"counter-value" : "32.000000", "unit" : "", "event" : "cpu-migrations", "event-runtime" : 32131515104, "pcnt-running" : 100.00, "metric-value" : "0.995906", "metric-unit" : "/sec"} > {"counter-value" : "353.000000", "unit" : "", "event" : "page-faults", "event-runtime" : 32131501396, "pcnt-running" : 100.00, "metric-value" : "10.986094", "metric-unit" : "/sec"} > {"counter-value" : "18685492.000000", "unit" : "", "event" : "cpu_core/cycles/", "event-runtime" : 16061585292, "pcnt-running" : 100.00, "metric-value" : "0.000582", "metric-unit" : "GHz"} > {"counter-value" : "255620352.000000", "unit" : "", "event" : "cpu_atom/cycles/", "event-runtime" : 8690268422, "pcnt-running" : 54.00, "metric-value" : "0.007955", "metric-unit" : "GHz"} > {"counter-value" : "15489913.000000", "unit" : "", "event" : "cpu_core/instructions/", "event-runtime" : 16061582200, "pcnt-running" : 100.00, "metric-value" : "0.828981", "metric-unit" : "insn per cycle"} > {"counter-value" : "38790161.000000", "unit" : "", "event" : "cpu_atom/instructions/", "event-runtime" : 10163133324, "pcnt-running" : 63.00, "metric-value" : "2.075951", "metric-unit" : "insn per cycle"} > {"counter-value" : "2908031.000000", "unit" : "", "event" : "cpu_core/branches/", "event-runtime" : 16061563416, "pcnt-running" : 100.00, "metric-value" : "90.503967", "metric-unit" : "K/sec"} > {"counter-value" : "6814948.000000", "unit" : "", "event" : "cpu_atom/branches/", "event-runtime" : 10161711336, "pcnt-running" : 63.00, "metric-value" : "212.095343", "metric-unit" : "K/sec"} > {"counter-value" : "97638.000000", "unit" : "", "event" : "cpu_core/branch-misses/", "event-runtime" : 16061535261, "pcnt-running" : 100.00, "metric-value" : "3.357530", "metric-unit" : "of all branches"} > {"counter-value" : "1017066.000000", "unit" : "", "event" : "cpu_atom/branch-misses/", "event-runtime" : 10159971797, "pcnt-running" : 63.00, "metric-value" : "34.974386", "metric-unit" : "of all branches"} > {"event-runtime" : 16061513607, "pcnt-running" : 100.00, "metricgroup" : "TopdownL1 (cpu_core)"} > {"metric-value" : "nan", "metric-unit" : "% tma_backend_bound"} > {"metric-value" : "0.000000", "metric-unit" : "% tma_bad_speculation"} > {"metric-value" : "nan", "metric-unit" : "% tma_frontend_bound"} > {"metric-value" : "nan", "metric-unit" : "% tma_retiring"} > {"event-runtime" : 10157398501, "pcnt-running" : 63.00, "metricgroup" : "TopdownL1 (cpu_atom)"} > {"metric-value" : "13.719821", "metric-unit" : "% tma_bad_speculation"} > {"event-runtime" : 10178698656, "pcnt-running" : 63.00, "metric-value" : "41.016738", "metric-unit" : "% tma_frontend_bound"} > {"event-runtime" : 10240582902, "pcnt-running" : 63.00, "metric-value" : "39.327764", "metric-unit" : "% tma_backend_bound"} > {"metric-value" : "39.327764", "metric-unit" : "% tma_backend_bound_aux"} > {"event-runtime" : 10284284920, "pcnt-running" : 64.00, "metric-value" : "5.374638", "metric-unit" : "% tma_retiring"} > > CSV output > > On SPR > > perf stat -x, -a sleep 1 > 225851.20,msec,cpu-clock,225850700108,100.00,224.431,CPUs utilized > 976,,context-switches,225850504803,100.00,4.321,/sec > 224,,cpu-migrations,225850410336,100.00,0.992,/sec > 76,,page-faults,225850304155,100.00,0.337,/sec > 52288305,,cycles,225850188531,100.00,0.000,GHz > 37977214,,instructions,225850071251,100.00,0.73,insn per cycle > 7299859,,branches,225849890722,100.00,32.322,K/sec > 51102,,branch-misses,225849672536,100.00,0.70,of all branches > ,225849327050,100.00,,,,TopdownL1 > ,,,,,70.1,% tma_backend_bound > ,,,,,2.7,% tma_bad_speculation > ,,,,,12.5,% tma_frontend_bound > ,,,,,14.6,% tma_retiring > ,225849327050,100.00,,,,TopdownL2 > ,,,,,2.3,% tma_branch_mispredicts > ,,,,,19.6,% tma_core_bound > ,,,,,4.6,% tma_fetch_bandwidth > ,,,,,7.9,% tma_fetch_latency > ,,,,,2.9,% tma_heavy_operations > ,,,,,11.7,% tma_light_operations > ,,,,,0.5,% tma_machine_clears > ,,,,,50.5,% tma_memory_bound > > On Hybrid > > perf stat -x, -a sleep 1 > 32139.34,msec,cpu-clock,32139351409,100.00,32.001,CPUs utilized > 225,,context-switches,32139342672,100.00,7.001,/sec > 32,,cpu-migrations,32139337772,100.00,0.996,/sec > 72,,page-faults,32139328384,100.00,2.240,/sec > 6766433,,cpu_core/cycles/,16067551558,100.00,0.000,GHz > 256500230,,cpu_atom/cycles/,8695757391,54.00,0.008,GHz > 4688595,,cpu_core/instructions/,16067558976,100.00,0.69,insn per cycle > 37487490,,cpu_atom/instructions/,10165193856,63.00,5.54,insn per cycle > 845211,,cpu_core/branches/,16067540225,100.00,26.298,K/sec > 6571193,,cpu_atom/branches/,10155940853,63.00,204.459,K/sec > 41359,,cpu_core/branch-misses/,16067516493,100.00,4.89,of all branches > 1020231,,cpu_atom/branch-misses/,10159363620,63.00,120.71,of all branches > ,16067494476,100.00,,,,TopdownL1 (cpu_core) > ,,,,,,% tma_backend_bound > ,,,,,0.0,% tma_bad_speculation > ,,,,,,% tma_frontend_bound > ,,,,,,% tma_retiring > ,10160989992,63.00,,,,TopdownL1 (cpu_atom) > ,,,,,13.8,% tma_bad_speculation > ,10188319019,63.00,,,41.3,% tma_frontend_bound > ,10258326591,63.00,,,38.6,% tma_backend_bound > ,,,,,38.6,% tma_backend_bound_aux > ,10282689488,64.00,,,5.4,% tma_retiring > > Kan Liang (5): > perf metrics: Sort the Default metricgroup > perf stat: New metricgroup output for the default mode > perf test: Move all the check functions of stat csv output to lib > perf test: Add test case for the standard perf stat output > perf vendor events arm64: Add default tags for Hisi hip08 L1 metrics Just to be clear, I'm happy with this to be submitted having put reviewed/acked-by on it. Thanks, Ian > tools/perf/builtin-stat.c | 1 + > .../arch/arm64/hisilicon/hip08/metrics.json | 12 +- > tools/perf/tests/shell/lib/stat_output.sh | 169 ++++++++++++++++ > tools/perf/tests/shell/stat+csv_output.sh | 188 ++---------------- > tools/perf/tests/shell/stat+std_output.sh | 108 ++++++++++ > tools/perf/util/evsel.h | 1 + > tools/perf/util/metricgroup.c | 26 +++ > tools/perf/util/metricgroup.h | 3 + > tools/perf/util/stat-display.c | 108 +++++++++- > tools/perf/util/stat-shadow.c | 131 ++++++++++-- > tools/perf/util/stat.h | 15 ++ > 11 files changed, 563 insertions(+), 199 deletions(-) > create mode 100755 tools/perf/tests/shell/lib/stat_output.sh > create mode 100755 tools/perf/tests/shell/stat+std_output.sh > > -- > 2.35.1 >
On 2023-06-16 1:59 a.m., Ian Rogers wrote: > On Thu, Jun 15, 2023 at 8:14 PM <kan.liang@linux.intel.com> wrote: >> >> From: Kan Liang <kan.liang@linux.intel.com> >> >> Changes since V3: >> - Move the full name (PMU + metricgroup name) generation from the metric >> code to the output code. (Ian) >> - Add default tags for Hisi hip08 L1 metrics (John) >> - Some patches have been merged. Drop them from the V4. >> >> Changes since V2: >> - Fixes memory leak (Ian) >> (Ian, I cannot reproduce the memory leak on all my machines. Please >> check whether the fix works on your side. Thanks.) >> - Add Reviewed-by tags for several patches. >> >> Changes since V1: >> - Remove EVSEL_EVENT_MASK and use the __evsel__match which is suggested >> by Ian. >> - Support TopdownL1 on both e-core and p-core of ADL in the default >> mode. (Ian) >> - Have separate patches for the modifications of metricgroup and output. >> (Ian) >> - Does 2nd sort for the Default metricgroup. Remove the logic of >> changing the associated metric event. (Ian) >> - Move all the metric related code to stat-shadow (Ian) >> - Move the commong functions between stat+csv_output and stat+std_output >> to the lib directory (Ian) >> >> In the default mode, the current output of the metricgroup include both >> events and metrics, which is not necessary and makes the output hard to >> read. Also, different ARCHs (even different generations of the ARCH) may >> have a different output format because of the different events in a >> metrics. >> >> The patch proposes a new output format which only outputting the value >> of each metric and the metricgroup name. It can brings a clean and >> consistent output format among ARCHs and generations. >> >> The patches 1-2 introduce the new metricgroup output. >> >> The patches 3-4 improve the tests to cover the default mode. >> >> The patch 5 update the event list for Hisi hip08. >> >> Here are some examples for the new output. >> >> STD output: >> >> On SPR >> >> perf stat -a sleep 1 >> >> Performance counter stats for 'system wide': >> >> 226,054.13 msec cpu-clock # 224.588 CPUs utilized >> 932 context-switches # 4.123 /sec >> 224 cpu-migrations # 0.991 /sec >> 76 page-faults # 0.336 /sec >> 45,940,682 cycles # 0.000 GHz >> 36,676,047 instructions # 0.80 insn per cycle >> 7,044,516 branches # 31.163 K/sec >> 62,169 branch-misses # 0.88% of all branches >> TopdownL1 # 68.7 % tma_backend_bound >> # 3.1 % tma_bad_speculation >> # 13.0 % tma_frontend_bound >> # 15.2 % tma_retiring >> TopdownL2 # 2.7 % tma_branch_mispredicts >> # 19.6 % tma_core_bound >> # 4.8 % tma_fetch_bandwidth >> # 8.3 % tma_fetch_latency >> # 2.9 % tma_heavy_operations >> # 12.3 % tma_light_operations >> # 0.4 % tma_machine_clears >> # 49.1 % tma_memory_bound >> >> 1.006529767 seconds time elapsed >> >> perf stat -a sleep 1 >> >> Performance counter stats for 'system wide': >> >> 32,127.99 msec cpu-clock # 31.992 CPUs utilized >> 240 context-switches # 7.470 /sec >> 32 cpu-migrations # 0.996 /sec >> 74 page-faults # 2.303 /sec >> 6,313,960 cpu_core/cycles/ # 0.000 GHz >> 257,711,907 cpu_atom/cycles/ # 0.008 GHz (54.18%) >> 4,477,162 cpu_core/instructions/ # 0.71 insn per cycle >> 37,721,481 cpu_atom/instructions/ # 5.97 insn per cycle (63.33%) >> 809,747 cpu_core/branches/ # 25.204 K/sec >> 6,621,226 cpu_atom/branches/ # 206.089 K/sec (63.32%) >> 39,667 cpu_core/branch-misses/ # 4.90% of all branches >> 1,032,146 cpu_atom/branch-misses/ # 127.47% of all branches (63.33%) >> TopdownL1 (cpu_core) # nan % tma_backend_bound >> # 0.0 % tma_bad_speculation >> # nan % tma_frontend_bound >> # nan % tma_retiring >> TopdownL1 (cpu_atom) # 13.6 % tma_bad_speculation (63.36%) >> # 41.1 % tma_frontend_bound (63.54%) >> # 39.2 % tma_backend_bound >> # 39.2 % tma_backend_bound_aux (63.93%) >> # 5.4 % tma_retiring (64.15%) >> >> 1.004244114 seconds time elapsed >> >> JSON output >> >> on SPR >> >> perf stat --json -a sleep 1 >> {"counter-value" : "225904.823297", "unit" : "msec", "event" : "cpu-clock", "event-runtime" : 225904323425, "pcnt-running" : 100.00, "metric-value" : "224.456872", "metric-unit" : "CPUs utilized"} >> {"counter-value" : "986.000000", "unit" : "", "event" : "context-switches", "event-runtime" : 225904108985, "pcnt-running" : 100.00, "metric-value" : "4.364670", "metric-unit" : "/sec"} >> {"counter-value" : "224.000000", "unit" : "", "event" : "cpu-migrations", "event-runtime" : 225904016141, "pcnt-running" : 100.00, "metric-value" : "0.991568", "metric-unit" : "/sec"} >> {"counter-value" : "76.000000", "unit" : "", "event" : "page-faults", "event-runtime" : 225903913270, "pcnt-running" : 100.00, "metric-value" : "0.336425", "metric-unit" : "/sec"} >> {"counter-value" : "48433482.000000", "unit" : "", "event" : "cycles", "event-runtime" : 225903792732, "pcnt-running" : 100.00, "metric-value" : "0.000214", "metric-unit" : "GHz"} >> {"counter-value" : "38620409.000000", "unit" : "", "event" : "instructions", "event-runtime" : 225903657830, "pcnt-running" : 100.00, "metric-value" : "0.797391", "metric-unit" : "insn per cycle"} >> {"counter-value" : "7369473.000000", "unit" : "", "event" : "branches", "event-runtime" : 225903464328, "pcnt-running" : 100.00, "metric-value" : "32.622026", "metric-unit" : "K/sec"} >> {"counter-value" : "54747.000000", "unit" : "", "event" : "branch-misses", "event-runtime" : 225903234523, "pcnt-running" : 100.00, "metric-value" : "0.742889", "metric-unit" : "of all branches"} >> {"event-runtime" : 225902840555, "pcnt-running" : 100.00, "metricgroup" : "TopdownL1"} >> {"metric-value" : "69.950631", "metric-unit" : "% tma_backend_bound"} >> {"metric-value" : "2.771783", "metric-unit" : "% tma_bad_speculation"} >> {"metric-value" : "12.026074", "metric-unit" : "% tma_frontend_bound"} >> {"metric-value" : "15.251513", "metric-unit" : "% tma_retiring"} >> {"event-runtime" : 225902840555, "pcnt-running" : 100.00, "metricgroup" : "TopdownL2"} >> {"metric-value" : "2.351757", "metric-unit" : "% tma_branch_mispredicts"} >> {"metric-value" : "19.729771", "metric-unit" : "% tma_core_bound"} >> {"metric-value" : "4.555207", "metric-unit" : "% tma_fetch_bandwidth"} >> {"metric-value" : "7.470867", "metric-unit" : "% tma_fetch_latency"} >> {"metric-value" : "2.938808", "metric-unit" : "% tma_heavy_operations"} >> {"metric-value" : "12.312705", "metric-unit" : "% tma_light_operations"} >> {"metric-value" : "0.420026", "metric-unit" : "% tma_machine_clears"} >> {"metric-value" : "50.220860", "metric-unit" : "% tma_memory_bound"} >> >> On hybrid >> >> perf stat --json -a sleep 1 >> {"counter-value" : "32131.530625", "unit" : "msec", "event" : "cpu-clock", "event-runtime" : 32131536951, "pcnt-running" : 100.00, "metric-value" : "31.992642", "metric-unit" : "CPUs utilized"} >> {"counter-value" : "328.000000", "unit" : "", "event" : "context-switches", "event-runtime" : 32131525778, "pcnt-running" : 100.00, "metric-value" : "10.208042", "metric-unit" : "/sec"} >> {"counter-value" : "32.000000", "unit" : "", "event" : "cpu-migrations", "event-runtime" : 32131515104, "pcnt-running" : 100.00, "metric-value" : "0.995906", "metric-unit" : "/sec"} >> {"counter-value" : "353.000000", "unit" : "", "event" : "page-faults", "event-runtime" : 32131501396, "pcnt-running" : 100.00, "metric-value" : "10.986094", "metric-unit" : "/sec"} >> {"counter-value" : "18685492.000000", "unit" : "", "event" : "cpu_core/cycles/", "event-runtime" : 16061585292, "pcnt-running" : 100.00, "metric-value" : "0.000582", "metric-unit" : "GHz"} >> {"counter-value" : "255620352.000000", "unit" : "", "event" : "cpu_atom/cycles/", "event-runtime" : 8690268422, "pcnt-running" : 54.00, "metric-value" : "0.007955", "metric-unit" : "GHz"} >> {"counter-value" : "15489913.000000", "unit" : "", "event" : "cpu_core/instructions/", "event-runtime" : 16061582200, "pcnt-running" : 100.00, "metric-value" : "0.828981", "metric-unit" : "insn per cycle"} >> {"counter-value" : "38790161.000000", "unit" : "", "event" : "cpu_atom/instructions/", "event-runtime" : 10163133324, "pcnt-running" : 63.00, "metric-value" : "2.075951", "metric-unit" : "insn per cycle"} >> {"counter-value" : "2908031.000000", "unit" : "", "event" : "cpu_core/branches/", "event-runtime" : 16061563416, "pcnt-running" : 100.00, "metric-value" : "90.503967", "metric-unit" : "K/sec"} >> {"counter-value" : "6814948.000000", "unit" : "", "event" : "cpu_atom/branches/", "event-runtime" : 10161711336, "pcnt-running" : 63.00, "metric-value" : "212.095343", "metric-unit" : "K/sec"} >> {"counter-value" : "97638.000000", "unit" : "", "event" : "cpu_core/branch-misses/", "event-runtime" : 16061535261, "pcnt-running" : 100.00, "metric-value" : "3.357530", "metric-unit" : "of all branches"} >> {"counter-value" : "1017066.000000", "unit" : "", "event" : "cpu_atom/branch-misses/", "event-runtime" : 10159971797, "pcnt-running" : 63.00, "metric-value" : "34.974386", "metric-unit" : "of all branches"} >> {"event-runtime" : 16061513607, "pcnt-running" : 100.00, "metricgroup" : "TopdownL1 (cpu_core)"} >> {"metric-value" : "nan", "metric-unit" : "% tma_backend_bound"} >> {"metric-value" : "0.000000", "metric-unit" : "% tma_bad_speculation"} >> {"metric-value" : "nan", "metric-unit" : "% tma_frontend_bound"} >> {"metric-value" : "nan", "metric-unit" : "% tma_retiring"} >> {"event-runtime" : 10157398501, "pcnt-running" : 63.00, "metricgroup" : "TopdownL1 (cpu_atom)"} >> {"metric-value" : "13.719821", "metric-unit" : "% tma_bad_speculation"} >> {"event-runtime" : 10178698656, "pcnt-running" : 63.00, "metric-value" : "41.016738", "metric-unit" : "% tma_frontend_bound"} >> {"event-runtime" : 10240582902, "pcnt-running" : 63.00, "metric-value" : "39.327764", "metric-unit" : "% tma_backend_bound"} >> {"metric-value" : "39.327764", "metric-unit" : "% tma_backend_bound_aux"} >> {"event-runtime" : 10284284920, "pcnt-running" : 64.00, "metric-value" : "5.374638", "metric-unit" : "% tma_retiring"} >> >> CSV output >> >> On SPR >> >> perf stat -x, -a sleep 1 >> 225851.20,msec,cpu-clock,225850700108,100.00,224.431,CPUs utilized >> 976,,context-switches,225850504803,100.00,4.321,/sec >> 224,,cpu-migrations,225850410336,100.00,0.992,/sec >> 76,,page-faults,225850304155,100.00,0.337,/sec >> 52288305,,cycles,225850188531,100.00,0.000,GHz >> 37977214,,instructions,225850071251,100.00,0.73,insn per cycle >> 7299859,,branches,225849890722,100.00,32.322,K/sec >> 51102,,branch-misses,225849672536,100.00,0.70,of all branches >> ,225849327050,100.00,,,,TopdownL1 >> ,,,,,70.1,% tma_backend_bound >> ,,,,,2.7,% tma_bad_speculation >> ,,,,,12.5,% tma_frontend_bound >> ,,,,,14.6,% tma_retiring >> ,225849327050,100.00,,,,TopdownL2 >> ,,,,,2.3,% tma_branch_mispredicts >> ,,,,,19.6,% tma_core_bound >> ,,,,,4.6,% tma_fetch_bandwidth >> ,,,,,7.9,% tma_fetch_latency >> ,,,,,2.9,% tma_heavy_operations >> ,,,,,11.7,% tma_light_operations >> ,,,,,0.5,% tma_machine_clears >> ,,,,,50.5,% tma_memory_bound >> >> On Hybrid >> >> perf stat -x, -a sleep 1 >> 32139.34,msec,cpu-clock,32139351409,100.00,32.001,CPUs utilized >> 225,,context-switches,32139342672,100.00,7.001,/sec >> 32,,cpu-migrations,32139337772,100.00,0.996,/sec >> 72,,page-faults,32139328384,100.00,2.240,/sec >> 6766433,,cpu_core/cycles/,16067551558,100.00,0.000,GHz >> 256500230,,cpu_atom/cycles/,8695757391,54.00,0.008,GHz >> 4688595,,cpu_core/instructions/,16067558976,100.00,0.69,insn per cycle >> 37487490,,cpu_atom/instructions/,10165193856,63.00,5.54,insn per cycle >> 845211,,cpu_core/branches/,16067540225,100.00,26.298,K/sec >> 6571193,,cpu_atom/branches/,10155940853,63.00,204.459,K/sec >> 41359,,cpu_core/branch-misses/,16067516493,100.00,4.89,of all branches >> 1020231,,cpu_atom/branch-misses/,10159363620,63.00,120.71,of all branches >> ,16067494476,100.00,,,,TopdownL1 (cpu_core) >> ,,,,,,% tma_backend_bound >> ,,,,,0.0,% tma_bad_speculation >> ,,,,,,% tma_frontend_bound >> ,,,,,,% tma_retiring >> ,10160989992,63.00,,,,TopdownL1 (cpu_atom) >> ,,,,,13.8,% tma_bad_speculation >> ,10188319019,63.00,,,41.3,% tma_frontend_bound >> ,10258326591,63.00,,,38.6,% tma_backend_bound >> ,,,,,38.6,% tma_backend_bound_aux >> ,10282689488,64.00,,,5.4,% tma_retiring >> >> Kan Liang (5): >> perf metrics: Sort the Default metricgroup >> perf stat: New metricgroup output for the default mode >> perf test: Move all the check functions of stat csv output to lib >> perf test: Add test case for the standard perf stat output >> perf vendor events arm64: Add default tags for Hisi hip08 L1 metrics > > Just to be clear, I'm happy with this to be submitted having put > reviewed/acked-by on it. > Thanks Ian. Appreciate all your feedback and comments. Thanks, Kan > Thanks, > Ian > >> tools/perf/builtin-stat.c | 1 + >> .../arch/arm64/hisilicon/hip08/metrics.json | 12 +- >> tools/perf/tests/shell/lib/stat_output.sh | 169 ++++++++++++++++ >> tools/perf/tests/shell/stat+csv_output.sh | 188 ++---------------- >> tools/perf/tests/shell/stat+std_output.sh | 108 ++++++++++ >> tools/perf/util/evsel.h | 1 + >> tools/perf/util/metricgroup.c | 26 +++ >> tools/perf/util/metricgroup.h | 3 + >> tools/perf/util/stat-display.c | 108 +++++++++- >> tools/perf/util/stat-shadow.c | 131 ++++++++++-- >> tools/perf/util/stat.h | 15 ++ >> 11 files changed, 563 insertions(+), 199 deletions(-) >> create mode 100755 tools/perf/tests/shell/lib/stat_output.sh >> create mode 100755 tools/perf/tests/shell/stat+std_output.sh >> >> -- >> 2.35.1 >>
Em Fri, Jun 16, 2023 at 09:26:26AM -0400, Liang, Kan escreveu: > > > On 2023-06-16 1:59 a.m., Ian Rogers wrote: > > On Thu, Jun 15, 2023 at 8:14 PM <kan.liang@linux.intel.com> wrote: > >> > >> From: Kan Liang <kan.liang@linux.intel.com> > >> > >> Changes since V3: > >> - Move the full name (PMU + metricgroup name) generation from the metric > >> code to the output code. (Ian) > >> - Add default tags for Hisi hip08 L1 metrics (John) > >> - Some patches have been merged. Drop them from the V4. > >> > >> Changes since V2: > >> - Fixes memory leak (Ian) > >> (Ian, I cannot reproduce the memory leak on all my machines. Please > >> check whether the fix works on your side. Thanks.) > >> - Add Reviewed-by tags for several patches. > >> > >> Changes since V1: > >> - Remove EVSEL_EVENT_MASK and use the __evsel__match which is suggested > >> by Ian. > >> - Support TopdownL1 on both e-core and p-core of ADL in the default > >> mode. (Ian) > >> - Have separate patches for the modifications of metricgroup and output. > >> (Ian) > >> - Does 2nd sort for the Default metricgroup. Remove the logic of > >> changing the associated metric event. (Ian) > >> - Move all the metric related code to stat-shadow (Ian) > >> - Move the commong functions between stat+csv_output and stat+std_output > >> to the lib directory (Ian) > >> > >> In the default mode, the current output of the metricgroup include both > >> events and metrics, which is not necessary and makes the output hard to > >> read. Also, different ARCHs (even different generations of the ARCH) may > >> have a different output format because of the different events in a > >> metrics. > >> > >> The patch proposes a new output format which only outputting the value > >> of each metric and the metricgroup name. It can brings a clean and > >> consistent output format among ARCHs and generations. > >> > >> The patches 1-2 introduce the new metricgroup output. > >> > >> The patches 3-4 improve the tests to cover the default mode. > >> > >> The patch 5 update the event list for Hisi hip08. > >> > >> Here are some examples for the new output. > >> > >> STD output: > >> > >> On SPR > >> > >> perf stat -a sleep 1 > >> > >> Performance counter stats for 'system wide': > >> > >> 226,054.13 msec cpu-clock # 224.588 CPUs utilized > >> 932 context-switches # 4.123 /sec > >> 224 cpu-migrations # 0.991 /sec > >> 76 page-faults # 0.336 /sec > >> 45,940,682 cycles # 0.000 GHz > >> 36,676,047 instructions # 0.80 insn per cycle > >> 7,044,516 branches # 31.163 K/sec > >> 62,169 branch-misses # 0.88% of all branches > >> TopdownL1 # 68.7 % tma_backend_bound > >> # 3.1 % tma_bad_speculation > >> # 13.0 % tma_frontend_bound > >> # 15.2 % tma_retiring > >> TopdownL2 # 2.7 % tma_branch_mispredicts > >> # 19.6 % tma_core_bound > >> # 4.8 % tma_fetch_bandwidth > >> # 8.3 % tma_fetch_latency > >> # 2.9 % tma_heavy_operations > >> # 12.3 % tma_light_operations > >> # 0.4 % tma_machine_clears > >> # 49.1 % tma_memory_bound > >> > >> 1.006529767 seconds time elapsed > >> > >> perf stat -a sleep 1 > >> > >> Performance counter stats for 'system wide': > >> > >> 32,127.99 msec cpu-clock # 31.992 CPUs utilized > >> 240 context-switches # 7.470 /sec > >> 32 cpu-migrations # 0.996 /sec > >> 74 page-faults # 2.303 /sec > >> 6,313,960 cpu_core/cycles/ # 0.000 GHz > >> 257,711,907 cpu_atom/cycles/ # 0.008 GHz (54.18%) > >> 4,477,162 cpu_core/instructions/ # 0.71 insn per cycle > >> 37,721,481 cpu_atom/instructions/ # 5.97 insn per cycle (63.33%) > >> 809,747 cpu_core/branches/ # 25.204 K/sec > >> 6,621,226 cpu_atom/branches/ # 206.089 K/sec (63.32%) > >> 39,667 cpu_core/branch-misses/ # 4.90% of all branches > >> 1,032,146 cpu_atom/branch-misses/ # 127.47% of all branches (63.33%) > >> TopdownL1 (cpu_core) # nan % tma_backend_bound > >> # 0.0 % tma_bad_speculation > >> # nan % tma_frontend_bound > >> # nan % tma_retiring > >> TopdownL1 (cpu_atom) # 13.6 % tma_bad_speculation (63.36%) > >> # 41.1 % tma_frontend_bound (63.54%) > >> # 39.2 % tma_backend_bound > >> # 39.2 % tma_backend_bound_aux (63.93%) > >> # 5.4 % tma_retiring (64.15%) > >> > >> 1.004244114 seconds time elapsed > >> > >> JSON output > >> > >> on SPR > >> > >> perf stat --json -a sleep 1 > >> {"counter-value" : "225904.823297", "unit" : "msec", "event" : "cpu-clock", "event-runtime" : 225904323425, "pcnt-running" : 100.00, "metric-value" : "224.456872", "metric-unit" : "CPUs utilized"} > >> {"counter-value" : "986.000000", "unit" : "", "event" : "context-switches", "event-runtime" : 225904108985, "pcnt-running" : 100.00, "metric-value" : "4.364670", "metric-unit" : "/sec"} > >> {"counter-value" : "224.000000", "unit" : "", "event" : "cpu-migrations", "event-runtime" : 225904016141, "pcnt-running" : 100.00, "metric-value" : "0.991568", "metric-unit" : "/sec"} > >> {"counter-value" : "76.000000", "unit" : "", "event" : "page-faults", "event-runtime" : 225903913270, "pcnt-running" : 100.00, "metric-value" : "0.336425", "metric-unit" : "/sec"} > >> {"counter-value" : "48433482.000000", "unit" : "", "event" : "cycles", "event-runtime" : 225903792732, "pcnt-running" : 100.00, "metric-value" : "0.000214", "metric-unit" : "GHz"} > >> {"counter-value" : "38620409.000000", "unit" : "", "event" : "instructions", "event-runtime" : 225903657830, "pcnt-running" : 100.00, "metric-value" : "0.797391", "metric-unit" : "insn per cycle"} > >> {"counter-value" : "7369473.000000", "unit" : "", "event" : "branches", "event-runtime" : 225903464328, "pcnt-running" : 100.00, "metric-value" : "32.622026", "metric-unit" : "K/sec"} > >> {"counter-value" : "54747.000000", "unit" : "", "event" : "branch-misses", "event-runtime" : 225903234523, "pcnt-running" : 100.00, "metric-value" : "0.742889", "metric-unit" : "of all branches"} > >> {"event-runtime" : 225902840555, "pcnt-running" : 100.00, "metricgroup" : "TopdownL1"} > >> {"metric-value" : "69.950631", "metric-unit" : "% tma_backend_bound"} > >> {"metric-value" : "2.771783", "metric-unit" : "% tma_bad_speculation"} > >> {"metric-value" : "12.026074", "metric-unit" : "% tma_frontend_bound"} > >> {"metric-value" : "15.251513", "metric-unit" : "% tma_retiring"} > >> {"event-runtime" : 225902840555, "pcnt-running" : 100.00, "metricgroup" : "TopdownL2"} > >> {"metric-value" : "2.351757", "metric-unit" : "% tma_branch_mispredicts"} > >> {"metric-value" : "19.729771", "metric-unit" : "% tma_core_bound"} > >> {"metric-value" : "4.555207", "metric-unit" : "% tma_fetch_bandwidth"} > >> {"metric-value" : "7.470867", "metric-unit" : "% tma_fetch_latency"} > >> {"metric-value" : "2.938808", "metric-unit" : "% tma_heavy_operations"} > >> {"metric-value" : "12.312705", "metric-unit" : "% tma_light_operations"} > >> {"metric-value" : "0.420026", "metric-unit" : "% tma_machine_clears"} > >> {"metric-value" : "50.220860", "metric-unit" : "% tma_memory_bound"} > >> > >> On hybrid > >> > >> perf stat --json -a sleep 1 > >> {"counter-value" : "32131.530625", "unit" : "msec", "event" : "cpu-clock", "event-runtime" : 32131536951, "pcnt-running" : 100.00, "metric-value" : "31.992642", "metric-unit" : "CPUs utilized"} > >> {"counter-value" : "328.000000", "unit" : "", "event" : "context-switches", "event-runtime" : 32131525778, "pcnt-running" : 100.00, "metric-value" : "10.208042", "metric-unit" : "/sec"} > >> {"counter-value" : "32.000000", "unit" : "", "event" : "cpu-migrations", "event-runtime" : 32131515104, "pcnt-running" : 100.00, "metric-value" : "0.995906", "metric-unit" : "/sec"} > >> {"counter-value" : "353.000000", "unit" : "", "event" : "page-faults", "event-runtime" : 32131501396, "pcnt-running" : 100.00, "metric-value" : "10.986094", "metric-unit" : "/sec"} > >> {"counter-value" : "18685492.000000", "unit" : "", "event" : "cpu_core/cycles/", "event-runtime" : 16061585292, "pcnt-running" : 100.00, "metric-value" : "0.000582", "metric-unit" : "GHz"} > >> {"counter-value" : "255620352.000000", "unit" : "", "event" : "cpu_atom/cycles/", "event-runtime" : 8690268422, "pcnt-running" : 54.00, "metric-value" : "0.007955", "metric-unit" : "GHz"} > >> {"counter-value" : "15489913.000000", "unit" : "", "event" : "cpu_core/instructions/", "event-runtime" : 16061582200, "pcnt-running" : 100.00, "metric-value" : "0.828981", "metric-unit" : "insn per cycle"} > >> {"counter-value" : "38790161.000000", "unit" : "", "event" : "cpu_atom/instructions/", "event-runtime" : 10163133324, "pcnt-running" : 63.00, "metric-value" : "2.075951", "metric-unit" : "insn per cycle"} > >> {"counter-value" : "2908031.000000", "unit" : "", "event" : "cpu_core/branches/", "event-runtime" : 16061563416, "pcnt-running" : 100.00, "metric-value" : "90.503967", "metric-unit" : "K/sec"} > >> {"counter-value" : "6814948.000000", "unit" : "", "event" : "cpu_atom/branches/", "event-runtime" : 10161711336, "pcnt-running" : 63.00, "metric-value" : "212.095343", "metric-unit" : "K/sec"} > >> {"counter-value" : "97638.000000", "unit" : "", "event" : "cpu_core/branch-misses/", "event-runtime" : 16061535261, "pcnt-running" : 100.00, "metric-value" : "3.357530", "metric-unit" : "of all branches"} > >> {"counter-value" : "1017066.000000", "unit" : "", "event" : "cpu_atom/branch-misses/", "event-runtime" : 10159971797, "pcnt-running" : 63.00, "metric-value" : "34.974386", "metric-unit" : "of all branches"} > >> {"event-runtime" : 16061513607, "pcnt-running" : 100.00, "metricgroup" : "TopdownL1 (cpu_core)"} > >> {"metric-value" : "nan", "metric-unit" : "% tma_backend_bound"} > >> {"metric-value" : "0.000000", "metric-unit" : "% tma_bad_speculation"} > >> {"metric-value" : "nan", "metric-unit" : "% tma_frontend_bound"} > >> {"metric-value" : "nan", "metric-unit" : "% tma_retiring"} > >> {"event-runtime" : 10157398501, "pcnt-running" : 63.00, "metricgroup" : "TopdownL1 (cpu_atom)"} > >> {"metric-value" : "13.719821", "metric-unit" : "% tma_bad_speculation"} > >> {"event-runtime" : 10178698656, "pcnt-running" : 63.00, "metric-value" : "41.016738", "metric-unit" : "% tma_frontend_bound"} > >> {"event-runtime" : 10240582902, "pcnt-running" : 63.00, "metric-value" : "39.327764", "metric-unit" : "% tma_backend_bound"} > >> {"metric-value" : "39.327764", "metric-unit" : "% tma_backend_bound_aux"} > >> {"event-runtime" : 10284284920, "pcnt-running" : 64.00, "metric-value" : "5.374638", "metric-unit" : "% tma_retiring"} > >> > >> CSV output > >> > >> On SPR > >> > >> perf stat -x, -a sleep 1 > >> 225851.20,msec,cpu-clock,225850700108,100.00,224.431,CPUs utilized > >> 976,,context-switches,225850504803,100.00,4.321,/sec > >> 224,,cpu-migrations,225850410336,100.00,0.992,/sec > >> 76,,page-faults,225850304155,100.00,0.337,/sec > >> 52288305,,cycles,225850188531,100.00,0.000,GHz > >> 37977214,,instructions,225850071251,100.00,0.73,insn per cycle > >> 7299859,,branches,225849890722,100.00,32.322,K/sec > >> 51102,,branch-misses,225849672536,100.00,0.70,of all branches > >> ,225849327050,100.00,,,,TopdownL1 > >> ,,,,,70.1,% tma_backend_bound > >> ,,,,,2.7,% tma_bad_speculation > >> ,,,,,12.5,% tma_frontend_bound > >> ,,,,,14.6,% tma_retiring > >> ,225849327050,100.00,,,,TopdownL2 > >> ,,,,,2.3,% tma_branch_mispredicts > >> ,,,,,19.6,% tma_core_bound > >> ,,,,,4.6,% tma_fetch_bandwidth > >> ,,,,,7.9,% tma_fetch_latency > >> ,,,,,2.9,% tma_heavy_operations > >> ,,,,,11.7,% tma_light_operations > >> ,,,,,0.5,% tma_machine_clears > >> ,,,,,50.5,% tma_memory_bound > >> > >> On Hybrid > >> > >> perf stat -x, -a sleep 1 > >> 32139.34,msec,cpu-clock,32139351409,100.00,32.001,CPUs utilized > >> 225,,context-switches,32139342672,100.00,7.001,/sec > >> 32,,cpu-migrations,32139337772,100.00,0.996,/sec > >> 72,,page-faults,32139328384,100.00,2.240,/sec > >> 6766433,,cpu_core/cycles/,16067551558,100.00,0.000,GHz > >> 256500230,,cpu_atom/cycles/,8695757391,54.00,0.008,GHz > >> 4688595,,cpu_core/instructions/,16067558976,100.00,0.69,insn per cycle > >> 37487490,,cpu_atom/instructions/,10165193856,63.00,5.54,insn per cycle > >> 845211,,cpu_core/branches/,16067540225,100.00,26.298,K/sec > >> 6571193,,cpu_atom/branches/,10155940853,63.00,204.459,K/sec > >> 41359,,cpu_core/branch-misses/,16067516493,100.00,4.89,of all branches > >> 1020231,,cpu_atom/branch-misses/,10159363620,63.00,120.71,of all branches > >> ,16067494476,100.00,,,,TopdownL1 (cpu_core) > >> ,,,,,,% tma_backend_bound > >> ,,,,,0.0,% tma_bad_speculation > >> ,,,,,,% tma_frontend_bound > >> ,,,,,,% tma_retiring > >> ,10160989992,63.00,,,,TopdownL1 (cpu_atom) > >> ,,,,,13.8,% tma_bad_speculation > >> ,10188319019,63.00,,,41.3,% tma_frontend_bound > >> ,10258326591,63.00,,,38.6,% tma_backend_bound > >> ,,,,,38.6,% tma_backend_bound_aux > >> ,10282689488,64.00,,,5.4,% tma_retiring > >> > >> Kan Liang (5): > >> perf metrics: Sort the Default metricgroup > >> perf stat: New metricgroup output for the default mode > >> perf test: Move all the check functions of stat csv output to lib > >> perf test: Add test case for the standard perf stat output > >> perf vendor events arm64: Add default tags for Hisi hip08 L1 metrics > > > > Just to be clear, I'm happy with this to be submitted having put > > reviewed/acked-by on it. > > > > Thanks Ian. Appreciate all your feedback and comments. Applied, - Arnaldo