Message ID | 20240202022512.467636-2-irogers@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-49089-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:9bc1:b0:106:209c:c626 with SMTP id op1csp163278dyc; Thu, 1 Feb 2024 18:26:27 -0800 (PST) X-Google-Smtp-Source: AGHT+IHujffoIa4wuThIQ7hR3bqUskLu5FKwPkaMUmhUTbz3VY64Brp0rXneVnhAcaR3bDqhfWiU X-Received: by 2002:a2e:90c3:0:b0:2d0:6b2c:a73d with SMTP id o3-20020a2e90c3000000b002d06b2ca73dmr4448516ljg.26.1706840787224; Thu, 01 Feb 2024 18:26:27 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706840787; cv=pass; d=google.com; s=arc-20160816; b=eRVn1mvpmyLtXrn+PCQ0xPJAhPdkhbbWEbHT50Z/VDKwkYk7PXd+xW5dbuOMJyvwKR O+c9r8L6V4DwKShQdqF9YwNQ2SqQJvHTL4ZDZMdQ7CmsLRq4k+LW8DIsF9SNcnT5Z/Mj PSTNhD2xZSg3BzCExQSIot1298COQpgjmiSvb7GvlnFpJT+sBt7lea+GsqljVnCFH6DZ 3r6pjOV9Sqm+75N03DTTqa76XOJxXn1sCM8Q7uw/8cj64RRkQwWNBS5y0K3RfPzlD8OG IQwuhHR0ZWdgvI9B7TkqSC4Xy9qaUrv4RpCc/KjUXmN/kSJ4SgcOfF7naoKJB20hmqus YvfQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:references:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:in-reply-to:date :dkim-signature; bh=5l+BJ0TTHCM6A4VVZ55heq0k4VpuOGC5eZCLoF5zeOI=; fh=zjoFsESysS+wENGWX6+EzNBz4jRq3IyZ83YSb8WfUFs=; b=Rtiq7//laKQwsftluZmRePGKV59Wfq2+WolKaoVbQANdsqkMxu29Fg012XX50aHiKg TbMaa5qMBGs5iCT6fu9EwW2c+lsgmlSkMT5woBgANrT1bJiEqVHCiGW8M2r4h3aVD1H0 hDd60Px3aXaKQAGaqhe1iNB6OBHwHbmzQlJV9onzX79ifMDqcQ14hkMudU9iKOcjo5OK hjORaF/hCJb4OS4PQ3/BnVdkCFL4JPK60JKpZKDhH8twUg5wYlk+3vKCMT8/rjX2yWui a3zXFiCJHKMK8oTV7gtHR6sXUDB3csKlIsxQjzl7YBbyFCZmMav1ZXPRcQ+F2krVO/CZ 3mLQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=QhAi7lLj; arc=pass (i=1 spf=pass spfdomain=flex--irogers.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-49089-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-49089-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com X-Forwarded-Encrypted: i=1; AJvYcCUJ96bpMoKzVqWnkyJOR9IUve+MuEAldhAv1KOnVYiT7GtinCDD/dyXG8zI7MdBxHTDzxpRFOo/pQDbNfT/yU/kqZwXDA== Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id a15-20020a509b4f000000b0055f0fa701e1si363010edj.628.2024.02.01.18.26.27 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Feb 2024 18:26:27 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-49089-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=QhAi7lLj; arc=pass (i=1 spf=pass spfdomain=flex--irogers.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-49089-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-49089-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 1ADDE1F21672 for <ouuuleilei@gmail.com>; Fri, 2 Feb 2024 02:26:15 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 2895EEAC1; Fri, 2 Feb 2024 02:25:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="QhAi7lLj" Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 62033BA24 for <linux-kernel@vger.kernel.org>; Fri, 2 Feb 2024 02:25:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706840729; cv=none; b=Clew1C2r9caxAaJacT0Ac+5RppAOEea/MM4hw+ye2NVR6qC0XkDOFFSbsqfD6LhC0GR4lN/aiAdMQRyBbPl0O1p0mUnoHPEyc+G5X+hliCRKDifUGXcETVRgophebfcsGXncMBtRp84vVJkDrXXTSuh1XpCpw/Zr9ma4+I1Gx4I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706840729; c=relaxed/simple; bh=zuKDGBbvC8Z880JPqJda92bCCuSmaTxeQ7yu928nL2s=; h=Date:In-Reply-To:Message-Id:Mime-Version:References:Subject:From: To:Cc:Content-Type; b=U/e/Dd0c8ajpCNcpzghO0xBPJV63n46d7OYw07h/0FbeYwxJ290QebCXYzIWC5iYWVZmFWNkyRLfinI31C47fAeXbyDXUctO+t/uj+cIiB03Ej6uKyTiqgOYIzMmu7ZiQdgm5EjPY7p7+I0HczWjdkyECWN/lRiglbSJNF2gYZE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=QhAi7lLj; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-6040597d005so32101457b3.2 for <linux-kernel@vger.kernel.org>; Thu, 01 Feb 2024 18:25:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1706840726; x=1707445526; darn=vger.kernel.org; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=5l+BJ0TTHCM6A4VVZ55heq0k4VpuOGC5eZCLoF5zeOI=; b=QhAi7lLj+Jqz4lFXUIeCSIm8zOZjxZHwE+VkC1cCucx+xOr6iAasyDUIVi7AA+e0ln fr3uSUNt5Z2emg5LUUCYCzaZn71jwkkss/2uFr9aS0ZXZjFIaKtgKhLE28IrvJ3Wb9r1 17a4CApQj0onrTaFTBc0F6n9pJ/aXzRe7uXWK2pVKvUd29GYMSdlGDA5rK5PnYSK7Leu GlqR9KUJ8smqlyYsN6aO7q0m1XVRpmmLURCm9Uk8T+9+mUm4KVa6izKJ4sCX1YDsNC5Q rj8LlAOa+06du87fhAQB8W4H6kn3XvknaH4OvfsvtEGMZx1KTU2/yHjNmrA7BuG76dGd VViA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706840726; x=1707445526; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=5l+BJ0TTHCM6A4VVZ55heq0k4VpuOGC5eZCLoF5zeOI=; b=aSmRVhOJpi0LkP8WrUMMD/By/9+S3LIQ6pClCWEmLnugLpeu9fahf2yiGy4CXipFT7 l3hsgInxl/Ur3kNnM0izwc7otA5cU6t9oF7uxH9olKYaQHehzOOf+ti+tD8IGwRLDBdZ MB1UQwTB8RfhA9A0EEOmmvWjSrkwPBtI+elj43p22rmRwTnDxwYVxSDBc7fzWazfQu+W 76Gn4AMCpkk4mKjeCJml5JAf7izeIMCqw5nu0h4wNLgux7SahCoxIxEwCSuqQz5eRa0t zfoCr4gpM+3cH81U2qGXukZDkgxey3nM5oV+/j0ujFDRj0Trst1uVzVplaa2FJPkiTBE a/JQ== X-Gm-Message-State: AOJu0Yx0cdQJg5HHC4Pzc0cwmc640fB4tcRPCO6jhtNG39XClYCAlV8u KsrJz7ZlIYNry/5QamjEjLYnfyhvzJsB23ikLpBKK2OW3Lv06mZrpiGmQPaq9BTORVHwPbkNLnW CNhOqww== X-Received: from irogers.svl.corp.google.com ([2620:15c:2a3:200:a85f:db1d:a66b:7f53]) (user=irogers job=sendgmr) by 2002:a81:4c0d:0:b0:5fb:7e5b:b87f with SMTP id z13-20020a814c0d000000b005fb7e5bb87fmr1522384ywa.1.1706840726259; Thu, 01 Feb 2024 18:25:26 -0800 (PST) Date: Thu, 1 Feb 2024 18:25:11 -0800 In-Reply-To: <20240202022512.467636-1-irogers@google.com> Message-Id: <20240202022512.467636-2-irogers@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> Mime-Version: 1.0 References: <20240202022512.467636-1-irogers@google.com> X-Mailer: git-send-email 2.43.0.594.gd9cf4e227d-goog Subject: [PATCH v1 2/3] perf metrics: Compute unmerged uncore metrics individually From: Ian Rogers <irogers@google.com> To: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>, Arnaldo Carvalho de Melo <acme@kernel.org>, Namhyung Kim <namhyung@kernel.org>, Mark Rutland <mark.rutland@arm.com>, Alexander Shishkin <alexander.shishkin@linux.intel.com>, Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>, Adrian Hunter <adrian.hunter@intel.com>, Kan Liang <kan.liang@linux.intel.com>, Kajol Jain <kjain@linux.ibm.com>, John Garry <john.g.garry@oracle.com>, Kaige Ye <ye@kaige.org>, K Prateek Nayak <kprateek.nayak@amd.com>, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Stephane Eranian <eranian@google.com> Content-Type: text/plain; charset="UTF-8" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789752285418360718 X-GMAIL-MSGID: 1789752285418360718 |
Series |
[v1,1/3] perf stat: Pass fewer metric arguments
|
|
Commit Message
Ian Rogers
Feb. 2, 2024, 2:25 a.m. UTC
When merging counts from multiple uncore PMUs the metric is only
computed for the metric leader. When merging/aggregation is disabled,
prior to this patch just the leader's metric would be computed. Fix
this by computing the metric for each PMU.
On a SkylakeX:
Before:
```
$ perf stat -A -M memory_bandwidth_total -a sleep 1
Performance counter stats for 'system wide':
CPU0 82,217 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 9.2 MB/s memory_bandwidth_total
CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 0.0 MB/s memory_bandwidth_total
CPU0 61,395 UNC_M_CAS_COUNT.WR [uncore_imc_0]
CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_0]
CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_1]
CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_1]
CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_1]
CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_1]
CPU0 81,570 UNC_M_CAS_COUNT.RD [uncore_imc_2]
CPU18 113,886 UNC_M_CAS_COUNT.RD [uncore_imc_2]
CPU0 62,330 UNC_M_CAS_COUNT.WR [uncore_imc_2]
CPU18 66,942 UNC_M_CAS_COUNT.WR [uncore_imc_2]
CPU0 75,489 UNC_M_CAS_COUNT.RD [uncore_imc_3]
CPU18 27,958 UNC_M_CAS_COUNT.RD [uncore_imc_3]
CPU0 55,864 UNC_M_CAS_COUNT.WR [uncore_imc_3]
CPU18 38,727 UNC_M_CAS_COUNT.WR [uncore_imc_3]
CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_4]
CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_4]
CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_4]
CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_4]
CPU0 75,423 UNC_M_CAS_COUNT.RD [uncore_imc_5]
CPU18 104,527 UNC_M_CAS_COUNT.RD [uncore_imc_5]
CPU0 57,596 UNC_M_CAS_COUNT.WR [uncore_imc_5]
CPU18 56,777 UNC_M_CAS_COUNT.WR [uncore_imc_5]
CPU0 1,003,440,851 ns duration_time
1.003440851 seconds time elapsed
```
After:
```
$ perf stat -A -M memory_bandwidth_total -a sleep 1
Performance counter stats for 'system wide':
CPU0 88,968 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 9.5 MB/s memory_bandwidth_total
CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 0.0 MB/s memory_bandwidth_total
CPU0 59,498 UNC_M_CAS_COUNT.WR [uncore_imc_0]
CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_0]
CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] # 0.0 MB/s memory_bandwidth_total
CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] # 0.0 MB/s memory_bandwidth_total
CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_1]
CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_1]
CPU0 88,635 UNC_M_CAS_COUNT.RD [uncore_imc_2] # 9.5 MB/s memory_bandwidth_total
CPU18 117,975 UNC_M_CAS_COUNT.RD [uncore_imc_2] # 11.5 MB/s memory_bandwidth_total
CPU0 60,829 UNC_M_CAS_COUNT.WR [uncore_imc_2]
CPU18 62,105 UNC_M_CAS_COUNT.WR [uncore_imc_2]
CPU0 82,238 UNC_M_CAS_COUNT.RD [uncore_imc_3] # 8.7 MB/s memory_bandwidth_total
CPU18 22,906 UNC_M_CAS_COUNT.RD [uncore_imc_3] # 3.6 MB/s memory_bandwidth_total
CPU0 53,959 UNC_M_CAS_COUNT.WR [uncore_imc_3]
CPU18 32,990 UNC_M_CAS_COUNT.WR [uncore_imc_3]
CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] # 0.0 MB/s memory_bandwidth_total
CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] # 0.0 MB/s memory_bandwidth_total
CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_4]
CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_4]
CPU0 83,595 UNC_M_CAS_COUNT.RD [uncore_imc_5] # 8.9 MB/s memory_bandwidth_total
CPU18 110,151 UNC_M_CAS_COUNT.RD [uncore_imc_5] # 10.5 MB/s memory_bandwidth_total
CPU0 56,540 UNC_M_CAS_COUNT.WR [uncore_imc_5]
CPU18 53,816 UNC_M_CAS_COUNT.WR [uncore_imc_5]
CPU0 1,003,353,416 ns duration_time
```
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/util/metricgroup.c | 2 ++
tools/perf/util/stat-shadow.c | 31 +++++++++++++++++++++++++++----
2 files changed, 29 insertions(+), 4 deletions(-)
Comments
On Thu, Feb 1, 2024 at 6:25 PM Ian Rogers <irogers@google.com> wrote: > > When merging counts from multiple uncore PMUs the metric is only > computed for the metric leader. When merging/aggregation is disabled, > prior to this patch just the leader's metric would be computed. Fix > this by computing the metric for each PMU. > > On a SkylakeX: > Before: > ``` > $ perf stat -A -M memory_bandwidth_total -a sleep 1 > > Performance counter stats for 'system wide': > > CPU0 82,217 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 92 MB/s memory_bandwidth_total > CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 00 MB/s memory_bandwidth_total > CPU0 61,395 UNC_M_CAS_COUNT.WR [uncore_imc_0] > CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_0] > CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] > CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] > CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] > CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] > CPU0 81,570 UNC_M_CAS_COUNT.RD [uncore_imc_2] > CPU18 113,886 UNC_M_CAS_COUNT.RD [uncore_imc_2] > CPU0 62,330 UNC_M_CAS_COUNT.WR [uncore_imc_2] > CPU18 66,942 UNC_M_CAS_COUNT.WR [uncore_imc_2] > CPU0 75,489 UNC_M_CAS_COUNT.RD [uncore_imc_3] > CPU18 27,958 UNC_M_CAS_COUNT.RD [uncore_imc_3] > CPU0 55,864 UNC_M_CAS_COUNT.WR [uncore_imc_3] > CPU18 38,727 UNC_M_CAS_COUNT.WR [uncore_imc_3] > CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] > CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] > CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] > CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] > CPU0 75,423 UNC_M_CAS_COUNT.RD [uncore_imc_5] > CPU18 104,527 UNC_M_CAS_COUNT.RD [uncore_imc_5] > CPU0 57,596 UNC_M_CAS_COUNT.WR [uncore_imc_5] > CPU18 56,777 UNC_M_CAS_COUNT.WR [uncore_imc_5] > CPU0 1,003,440,851 ns duration_time > > 1.003440851 seconds time elapsed > ``` > > After: > ``` > $ perf stat -A -M memory_bandwidth_total -a sleep 1 > > Performance counter stats for 'system wide': > > CPU0 88,968 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 95 MB/s memory_bandwidth_total > CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 00 MB/s memory_bandwidth_total > CPU0 59,498 UNC_M_CAS_COUNT.WR [uncore_imc_0] > CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_0] > CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] # 00 MB/s memory_bandwidth_total > CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] # 00 MB/s memory_bandwidth_total > CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] > CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] > CPU0 88,635 UNC_M_CAS_COUNT.RD [uncore_imc_2] # 95 MB/s memory_bandwidth_total > CPU18 117,975 UNC_M_CAS_COUNT.RD [uncore_imc_2] # 115 MB/s memory_bandwidth_total > CPU0 60,829 UNC_M_CAS_COUNT.WR [uncore_imc_2] > CPU18 62,105 UNC_M_CAS_COUNT.WR [uncore_imc_2] > CPU0 82,238 UNC_M_CAS_COUNT.RD [uncore_imc_3] # 87 MB/s memory_bandwidth_total > CPU18 22,906 UNC_M_CAS_COUNT.RD [uncore_imc_3] # 36 MB/s memory_bandwidth_total > CPU0 53,959 UNC_M_CAS_COUNT.WR [uncore_imc_3] > CPU18 32,990 UNC_M_CAS_COUNT.WR [uncore_imc_3] > CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] # 00 MB/s memory_bandwidth_total > CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] # 00 MB/s memory_bandwidth_total > CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] > CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] > CPU0 83,595 UNC_M_CAS_COUNT.RD [uncore_imc_5] # 89 MB/s memory_bandwidth_total > CPU18 110,151 UNC_M_CAS_COUNT.RD [uncore_imc_5] # 105 MB/s memory_bandwidth_total > CPU0 56,540 UNC_M_CAS_COUNT.WR [uncore_imc_5] > CPU18 53,816 UNC_M_CAS_COUNT.WR [uncore_imc_5] > CPU0 1,003,353,416 ns duration_time > ``` > > Signed-off-by: Ian Rogers <irogers@google.com> > --- > tools/perf/util/metricgroup.c | 2 ++ > tools/perf/util/stat-shadow.c | 31 +++++++++++++++++++++++++++---- > 2 files changed, 29 insertions(+), 4 deletions(-) > > diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c > index ca3e0404f187..c33ffee837ca 100644 > --- a/tools/perf/util/metricgroup.c > +++ b/tools/perf/util/metricgroup.c > @@ -44,6 +44,8 @@ struct metric_event *metricgroup__lookup(struct rblist *metric_events, > if (!metric_events) > return NULL; > > + if (evsel->metric_leader) > + me.evsel = evsel->metric_leader; > nd = rblist__find(metric_events, &me); > if (nd) > return container_of(nd, struct metric_event, nd); > diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c > index f6c9d2f98835..1be23b0eee2f 100644 > --- a/tools/perf/util/stat-shadow.c > +++ b/tools/perf/util/stat-shadow.c > @@ -356,6 +356,7 @@ static void print_nsecs(struct perf_stat_config *config, > } > > static int prepare_metric(const struct metric_expr *mexp, > + const struct evsel *evsel, > struct expr_parse_ctx *pctx, > int aggr_idx) > { > @@ -398,8 +399,29 @@ static int prepare_metric(const struct metric_expr *mexp, > source_count = 1; > } else { > struct perf_stat_evsel *ps = mexp->metric_events[i]->stats; > - struct perf_stat_aggr *aggr = &ps->aggr[aggr_idx]; > + struct perf_stat_aggr *aggr; > > + /* > + * If there are multiple uncore PMUs and we're not > + * reading the leader's stats, determine the stats for > + * the appropriate uncore PMU. > + */ > + if (evsel && evsel->metric_leader && > + evsel->pmu != evsel->metric_leader->pmu && > + mexp->metric_events[i]->pmu == evsel->metric_leader->pmu) { Is it just to check we're in --no-aggr (or --no-merge)? Then it'd be simpler to use stat_config->aggr_mode. Thanks, Namhyung > + struct evsel *pos; > + > + evlist__for_each_entry(evsel->evlist, pos) { > + if (pos->pmu != evsel->pmu) > + continue; > + if (pos->metric_leader != mexp->metric_events[i]) > + continue; > + ps = pos->stats; > + source_count = 1; > + break; > + } > + } > + aggr = &ps->aggr[aggr_idx]; > if (!aggr) > break; > > @@ -420,7 +442,8 @@ static int prepare_metric(const struct metric_expr *mexp, > * metric. > */ > val = aggr->counts.val * (1.0 / mexp->metric_events[i]->scale); > - source_count = evsel__source_count(mexp->metric_events[i]); > + if (!source_count) > + source_count = evsel__source_count(mexp->metric_events[i]); > } > } > n = strdup(evsel__metric_id(mexp->metric_events[i])); > @@ -461,7 +484,7 @@ static void generic_metric(struct perf_stat_config *config, > pctx->sctx.user_requested_cpu_list = strdup(config->user_requested_cpu_list); > pctx->sctx.runtime = mexp->runtime; > pctx->sctx.system_wide = config->system_wide; > - i = prepare_metric(mexp, pctx, aggr_idx); > + i = prepare_metric(mexp, evsel, pctx, aggr_idx); > if (i < 0) { > expr__ctx_free(pctx); > return; > @@ -522,7 +545,7 @@ double test_generic_metric(struct metric_expr *mexp, int aggr_idx) > if (!pctx) > return NAN; > > - if (prepare_metric(mexp, pctx, aggr_idx) < 0) > + if (prepare_metric(mexp, /*evsel=*/NULL, pctx, aggr_idx) < 0) > goto out; > > if (expr__parse(&ratio, pctx, mexp->metric_expr)) > -- > 2.43.0.594.gd9cf4e227d-goog >
On Mon, Feb 5, 2024 at 6:02 PM Namhyung Kim <namhyung@kernel.org> wrote: > > On Thu, Feb 1, 2024 at 6:25 PM Ian Rogers <irogers@google.com> wrote: > > > > When merging counts from multiple uncore PMUs the metric is only > > computed for the metric leader. When merging/aggregation is disabled, > > prior to this patch just the leader's metric would be computed. Fix > > this by computing the metric for each PMU. > > > > On a SkylakeX: > > Before: > > ``` > > $ perf stat -A -M memory_bandwidth_total -a sleep 1 > > > > Performance counter stats for 'system wide': > > > > CPU0 82,217 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 9.2 MB/s memory_bandwidth_total > > CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 0.0 MB/s memory_bandwidth_total > > CPU0 61,395 UNC_M_CAS_COUNT.WR [uncore_imc_0] > > CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_0] > > CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] > > CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] > > CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] > > CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] > > CPU0 81,570 UNC_M_CAS_COUNT.RD [uncore_imc_2] > > CPU18 113,886 UNC_M_CAS_COUNT.RD [uncore_imc_2] > > CPU0 62,330 UNC_M_CAS_COUNT.WR [uncore_imc_2] > > CPU18 66,942 UNC_M_CAS_COUNT.WR [uncore_imc_2] > > CPU0 75,489 UNC_M_CAS_COUNT.RD [uncore_imc_3] > > CPU18 27,958 UNC_M_CAS_COUNT.RD [uncore_imc_3] > > CPU0 55,864 UNC_M_CAS_COUNT.WR [uncore_imc_3] > > CPU18 38,727 UNC_M_CAS_COUNT.WR [uncore_imc_3] > > CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] > > CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] > > CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] > > CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] > > CPU0 75,423 UNC_M_CAS_COUNT.RD [uncore_imc_5] > > CPU18 104,527 UNC_M_CAS_COUNT.RD [uncore_imc_5] > > CPU0 57,596 UNC_M_CAS_COUNT.WR [uncore_imc_5] > > CPU18 56,777 UNC_M_CAS_COUNT.WR [uncore_imc_5] > > CPU0 1,003,440,851 ns duration_time > > > > 1.003440851 seconds time elapsed > > ``` > > > > After: > > ``` > > $ perf stat -A -M memory_bandwidth_total -a sleep 1 > > > > Performance counter stats for 'system wide': > > > > CPU0 88,968 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 9.5 MB/s memory_bandwidth_total > > CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 0.0 MB/s memory_bandwidth_total > > CPU0 59,498 UNC_M_CAS_COUNT.WR [uncore_imc_0] > > CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_0] > > CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] # 0.0 MB/s memory_bandwidth_total > > CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] # 0.0 MB/s memory_bandwidth_total > > CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] > > CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] > > CPU0 88,635 UNC_M_CAS_COUNT.RD [uncore_imc_2] # 9.5 MB/s memory_bandwidth_total > > CPU18 117,975 UNC_M_CAS_COUNT.RD [uncore_imc_2] # 11.5 MB/s memory_bandwidth_total > > CPU0 60,829 UNC_M_CAS_COUNT.WR [uncore_imc_2] > > CPU18 62,105 UNC_M_CAS_COUNT.WR [uncore_imc_2] > > CPU0 82,238 UNC_M_CAS_COUNT.RD [uncore_imc_3] # 8.7 MB/s memory_bandwidth_total > > CPU18 22,906 UNC_M_CAS_COUNT.RD [uncore_imc_3] # 3.6 MB/s memory_bandwidth_total > > CPU0 53,959 UNC_M_CAS_COUNT.WR [uncore_imc_3] > > CPU18 32,990 UNC_M_CAS_COUNT.WR [uncore_imc_3] > > CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] # 0.0 MB/s memory_bandwidth_total > > CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] # 0.0 MB/s memory_bandwidth_total > > CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] > > CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] > > CPU0 83,595 UNC_M_CAS_COUNT.RD [uncore_imc_5] # 8.9 MB/s memory_bandwidth_total > > CPU18 110,151 UNC_M_CAS_COUNT.RD [uncore_imc_5] # 10.5 MB/s memory_bandwidth_total > > CPU0 56,540 UNC_M_CAS_COUNT.WR [uncore_imc_5] > > CPU18 53,816 UNC_M_CAS_COUNT.WR [uncore_imc_5] > > CPU0 1,003,353,416 ns duration_time > > ``` > > > > Signed-off-by: Ian Rogers <irogers@google.com> > > --- > > tools/perf/util/metricgroup.c | 2 ++ > > tools/perf/util/stat-shadow.c | 31 +++++++++++++++++++++++++++---- > > 2 files changed, 29 insertions(+), 4 deletions(-) > > > > diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c > > index ca3e0404f187..c33ffee837ca 100644 > > --- a/tools/perf/util/metricgroup.c > > +++ b/tools/perf/util/metricgroup.c > > @@ -44,6 +44,8 @@ struct metric_event *metricgroup__lookup(struct rblist *metric_events, > > if (!metric_events) > > return NULL; > > > > + if (evsel->metric_leader) > > + me.evsel = evsel->metric_leader; > > nd = rblist__find(metric_events, &me); > > if (nd) > > return container_of(nd, struct metric_event, nd); > > diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c > > index f6c9d2f98835..1be23b0eee2f 100644 > > --- a/tools/perf/util/stat-shadow.c > > +++ b/tools/perf/util/stat-shadow.c > > @@ -356,6 +356,7 @@ static void print_nsecs(struct perf_stat_config *config, > > } > > > > static int prepare_metric(const struct metric_expr *mexp, > > + const struct evsel *evsel, > > struct expr_parse_ctx *pctx, > > int aggr_idx) > > { > > @@ -398,8 +399,29 @@ static int prepare_metric(const struct metric_expr *mexp, > > source_count = 1; > > } else { > > struct perf_stat_evsel *ps = mexp->metric_events[i]->stats; > > - struct perf_stat_aggr *aggr = &ps->aggr[aggr_idx]; > > + struct perf_stat_aggr *aggr; > > > > + /* > > + * If there are multiple uncore PMUs and we're not > > + * reading the leader's stats, determine the stats for > > + * the appropriate uncore PMU. > > + */ > > + if (evsel && evsel->metric_leader && > > + evsel->pmu != evsel->metric_leader->pmu && > > + mexp->metric_events[i]->pmu == evsel->metric_leader->pmu) { > > Is it just to check we're in --no-aggr (or --no-merge)? > Then it'd be simpler to use stat_config->aggr_mode. For most metrics the events will be on the same PMU, but there is nothing stopping mixing events from different PMUs (grouping can be disabled). There may also be software and tool evsels. Thanks, Ian > Thanks, > Namhyung > > > > + struct evsel *pos; > > + > > + evlist__for_each_entry(evsel->evlist, pos) { > > + if (pos->pmu != evsel->pmu) > > + continue; > > + if (pos->metric_leader != mexp->metric_events[i]) > > + continue; > > + ps = pos->stats; > > + source_count = 1; > > + break; > > + } > > + } > > + aggr = &ps->aggr[aggr_idx]; > > if (!aggr) > > break; > > > > @@ -420,7 +442,8 @@ static int prepare_metric(const struct metric_expr *mexp, > > * metric. > > */ > > val = aggr->counts.val * (1.0 / mexp->metric_events[i]->scale); > > - source_count = evsel__source_count(mexp->metric_events[i]); > > + if (!source_count) > > + source_count = evsel__source_count(mexp->metric_events[i]); > > } > > } > > n = strdup(evsel__metric_id(mexp->metric_events[i])); > > @@ -461,7 +484,7 @@ static void generic_metric(struct perf_stat_config *config, > > pctx->sctx.user_requested_cpu_list = strdup(config->user_requested_cpu_list); > > pctx->sctx.runtime = mexp->runtime; > > pctx->sctx.system_wide = config->system_wide; > > - i = prepare_metric(mexp, pctx, aggr_idx); > > + i = prepare_metric(mexp, evsel, pctx, aggr_idx); > > if (i < 0) { > > expr__ctx_free(pctx); > > return; > > @@ -522,7 +545,7 @@ double test_generic_metric(struct metric_expr *mexp, int aggr_idx) > > if (!pctx) > > return NAN; > > > > - if (prepare_metric(mexp, pctx, aggr_idx) < 0) > > + if (prepare_metric(mexp, /*evsel=*/NULL, pctx, aggr_idx) < 0) > > goto out; > > > > if (expr__parse(&ratio, pctx, mexp->metric_expr)) > > -- > > 2.43.0.594.gd9cf4e227d-goog > >
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c index ca3e0404f187..c33ffee837ca 100644 --- a/tools/perf/util/metricgroup.c +++ b/tools/perf/util/metricgroup.c @@ -44,6 +44,8 @@ struct metric_event *metricgroup__lookup(struct rblist *metric_events, if (!metric_events) return NULL; + if (evsel->metric_leader) + me.evsel = evsel->metric_leader; nd = rblist__find(metric_events, &me); if (nd) return container_of(nd, struct metric_event, nd); diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c index f6c9d2f98835..1be23b0eee2f 100644 --- a/tools/perf/util/stat-shadow.c +++ b/tools/perf/util/stat-shadow.c @@ -356,6 +356,7 @@ static void print_nsecs(struct perf_stat_config *config, } static int prepare_metric(const struct metric_expr *mexp, + const struct evsel *evsel, struct expr_parse_ctx *pctx, int aggr_idx) { @@ -398,8 +399,29 @@ static int prepare_metric(const struct metric_expr *mexp, source_count = 1; } else { struct perf_stat_evsel *ps = mexp->metric_events[i]->stats; - struct perf_stat_aggr *aggr = &ps->aggr[aggr_idx]; + struct perf_stat_aggr *aggr; + /* + * If there are multiple uncore PMUs and we're not + * reading the leader's stats, determine the stats for + * the appropriate uncore PMU. + */ + if (evsel && evsel->metric_leader && + evsel->pmu != evsel->metric_leader->pmu && + mexp->metric_events[i]->pmu == evsel->metric_leader->pmu) { + struct evsel *pos; + + evlist__for_each_entry(evsel->evlist, pos) { + if (pos->pmu != evsel->pmu) + continue; + if (pos->metric_leader != mexp->metric_events[i]) + continue; + ps = pos->stats; + source_count = 1; + break; + } + } + aggr = &ps->aggr[aggr_idx]; if (!aggr) break; @@ -420,7 +442,8 @@ static int prepare_metric(const struct metric_expr *mexp, * metric. */ val = aggr->counts.val * (1.0 / mexp->metric_events[i]->scale); - source_count = evsel__source_count(mexp->metric_events[i]); + if (!source_count) + source_count = evsel__source_count(mexp->metric_events[i]); } } n = strdup(evsel__metric_id(mexp->metric_events[i])); @@ -461,7 +484,7 @@ static void generic_metric(struct perf_stat_config *config, pctx->sctx.user_requested_cpu_list = strdup(config->user_requested_cpu_list); pctx->sctx.runtime = mexp->runtime; pctx->sctx.system_wide = config->system_wide; - i = prepare_metric(mexp, pctx, aggr_idx); + i = prepare_metric(mexp, evsel, pctx, aggr_idx); if (i < 0) { expr__ctx_free(pctx); return; @@ -522,7 +545,7 @@ double test_generic_metric(struct metric_expr *mexp, int aggr_idx) if (!pctx) return NAN; - if (prepare_metric(mexp, pctx, aggr_idx) < 0) + if (prepare_metric(mexp, /*evsel=*/NULL, pctx, aggr_idx) < 0) goto out; if (expr__parse(&ratio, pctx, mexp->metric_expr))