From patchwork Sun Feb 19 09:28:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ian Rogers X-Patchwork-Id: 59138 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp806447wrn; Sun, 19 Feb 2023 03:32:35 -0800 (PST) X-Google-Smtp-Source: AK7set952/zel7/jcCqKZPHD+XybCHR9qZwpI7JNs29XMBkgkTAqOPVWyXotKsKOrPhbpBZevTTZ X-Received: by 2002:a17:906:c407:b0:8af:5154:ff8e with SMTP id u7-20020a170906c40700b008af5154ff8emr6158997ejz.15.1676806355554; Sun, 19 Feb 2023 03:32:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676806355; cv=none; d=google.com; s=arc-20160816; b=YmLLLs8wL+aHcC+nKG1xjMdztiI9cJts6Q9sleq1cqgcRK29FQdwfDb6/M7/QbfzuR wXv3j29IetEImQ+x0aZzbYTY9sbKOK8fU4HuDJud83MV6r6D55hoJdiRhT93r95T3ukE Kf7zXIq9HvBBgEqKey8CYIXZMMpU3+QZLR71+KxMTDg61BPbPpdDhFeqO1d4ly0moiiH mhlopecZDkMgKkBT/echk3yyQ1nEpsyp218yGvj+guNhcsXMnMGpZc17Q2Q+h0ygHc3K D9wv+wPJUQgiuzQTiaRDwRpqLBd2nEWrnjg/0E3DUj1sKeW3Xn4uX2x5eZwKq+lakRd1 o5LA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:references:mime-version :message-id:in-reply-to:date:dkim-signature; bh=QEW8cpP+9cEKKFEdVgrXtSkmKtdo53fKw92TT63XLiE=; b=tqibHMbEW+OYDYLUZxpzfp5kpJhUTwlTQ9PtiAxZuj0QQHefNVWTFP5igbDJLb9AwS O36oJQ3QUasz52UfotVKABWkwrfMycilOJwvLbxpAL14BlQuCw8WF+guh2abDWMSN/Kd brS0As9gQsh6bWKY+ezvX8K3tQlepcKqkaiBWsXG9BntSWBNKH0oGFDFjLqiwVLdBosM fV89Bw7aBSpNth1cd+4zx3dMPuYUsbXW54F7irsC99+i45lY6K8yIZ8YOrH7XBDtaayH x3V8NcBPVObRe2nLMx4tKtFMJ9DnP2vI7uIuuGAP00Xy+foecSKapwjAIx4yzJrq11Lr W91w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="n4X3/MyC"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r18-20020aa7d592000000b004acbdaf456fsi11997619edq.291.2023.02.19.03.32.13; Sun, 19 Feb 2023 03:32:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="n4X3/MyC"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229940AbjBSJxj (ORCPT + 99 others); Sun, 19 Feb 2023 04:53:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50172 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230222AbjBSJxf (ORCPT ); Sun, 19 Feb 2023 04:53:35 -0500 Received: from mail-oo1-f73.google.com (mail-oo1-f73.google.com [209.85.161.73]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D0B022689 for ; Sun, 19 Feb 2023 01:53:30 -0800 (PST) Received: by mail-oo1-f73.google.com with SMTP id e22-20020a4a5516000000b0051faef7ba52so52529oob.11 for ; Sun, 19 Feb 2023 01:53:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=QEW8cpP+9cEKKFEdVgrXtSkmKtdo53fKw92TT63XLiE=; b=n4X3/MyCrrOq2mj67RyYv+wSgQgAcXUl5ZNGMxwZaz9J4PgnjVhdA1gdEU4c7rVlKF RD00qyZUg2Nu0n9I83joDDS11WMNhVImRYuGl74bUdlm/NTcCXRU+s9231VSqdViQ5y5 IOjhRW1LL0rd9D6VBrftyxEETIGZdePeZwPEUGiN5/i9rfa+KjDdpnTAf03uChXg7LNL HofGnGJrz33+hAH8qvx1150CJ4NN8RVbooFRVWHl3Pa2X5o7hZoSId7AYt1X7xwfP/qF /RVOFCjEigymUkIoiJhHOh3HzHiK5xYr4mOzg/3XkdKF3C1nQhiXmDqDiljssst6B9/K PH/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=QEW8cpP+9cEKKFEdVgrXtSkmKtdo53fKw92TT63XLiE=; b=kgJJp+WcvgAl44DbCiNEl1dcgHkTGfw86fkMQ9ZSdvydvNixJvxAytpOo0F/zBNIgy D0uynyqWcIpcDhpzPuOCKnB74J+BozTZdDnnOspPcmoBmJOR25KVnkOnD7nko6TBb39J sUU0RzMBIdZbwsYtEdMyv324oGBYZ7B/3TCTe6hCjWrQv5Ro07ZTHIlFVXNF96my3MkP AJkM2nkkWEJ1XU8IMms2moBXnCnMfMXO/rrAaC69GRLGchueK/S8zIVwRg6SEsSp2gOm CtGtijeqyeucg0ePeOpQQYQoXCKxdx29/Ye14i/rSbCIe9G9ANwXE0TsPXqOQm74OFA3 7N1g== X-Gm-Message-State: AO0yUKUJoyVA7/XzTZE+steRHwHPSueR89738jfqBe+zGKwz94j/hfVU WMxjG2v5t7o4X5jhvaHrgzPeW1cN/NaX X-Received: from irogers.svl.corp.google.com ([2620:15c:2d4:203:cde9:3fbc:e1f1:6e3b]) (user=irogers job=sendgmr) by 2002:a05:6902:11cd:b0:8a3:d147:280b with SMTP id n13-20020a05690211cd00b008a3d147280bmr181444ybu.3.1676799294311; Sun, 19 Feb 2023 01:34:54 -0800 (PST) Date: Sun, 19 Feb 2023 01:28:39 -0800 In-Reply-To: <20230219092848.639226-1-irogers@google.com> Message-Id: <20230219092848.639226-43-irogers@google.com> Mime-Version: 1.0 References: <20230219092848.639226-1-irogers@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Subject: [PATCH v1 42/51] perf doc: Refresh topdown documentation From: Ian Rogers To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Maxime Coquelin , Alexandre Torgue , Zhengjun Xing , Sandipan Das , James Clark , Kajol Jain , John Garry , Kan Liang , Adrian Hunter , Andrii Nakryiko , Eduard Zingerman , Suzuki Poulouse , Leo Yan , Florian Fischer , Ravi Bangoria , Jing Zhang , Sean Christopherson , Athira Rajeev , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-stm32@st-md-mailman.stormreply.com, linux-arm-kernel@lists.infradead.org, Perry Taylor , Caleb Biggers Cc: Stephane Eranian , Ian Rogers X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758258900990940261?= X-GMAIL-MSGID: =?utf-8?q?1758258900990940261?= perf stat now supports --topdown for any platform with the TopdownL1 metric group including Intel before Icelake. Tweak the documentation to reflect this. Signed-off-by: Ian Rogers --- tools/perf/Documentation/perf-stat.txt | 27 +++++----- tools/perf/Documentation/topdown.txt | 70 +++++++++++--------------- 2 files changed, 44 insertions(+), 53 deletions(-) diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt index 18abdc1dce05..29bdcfa93f04 100644 --- a/tools/perf/Documentation/perf-stat.txt +++ b/tools/perf/Documentation/perf-stat.txt @@ -394,10 +394,10 @@ See perf list output for the possible metrics and metricgroups. Do not aggregate counts across all monitored CPUs. --topdown:: -Print complete top-down metrics supported by the CPU. This allows to -determine bottle necks in the CPU pipeline for CPU bound workloads, -by breaking the cycles consumed down into frontend bound, backend bound, -bad speculation and retiring. +Print top-down metrics supported by the CPU. This allows to determine +bottle necks in the CPU pipeline for CPU bound workloads, by breaking +the cycles consumed down into frontend bound, backend bound, bad +speculation and retiring. Frontend bound means that the CPU cannot fetch and decode instructions fast enough. Backend bound means that computation or memory access is the bottle @@ -430,15 +430,18 @@ CPUs the workload runs on. If needed the CPUs can be forced using taskset. --td-level:: -Print the top-down statistics that equal to or lower than the input level. -It allows users to print the interested top-down metrics level instead of -the complete top-down metrics. +Print the top-down statistics that equal the input level. It allows +users to print the interested top-down metrics level instead of the +level 1 top-down metrics. + +As the higher levels gather more metrics and use more counters they +will be less accurate. By convention a metric can be examined by +appending '_group' to it and this will increase accuracy compared to +gathering all metrics for a level. For example, level 1 analysis may +highlight 'tma_frontend_bound'. This metric may be drilled into with +'tma_frontend_bound_group' with +'perf stat -M tma_frontend_bound_group...'. -The availability of the top-down metrics level depends on the hardware. For -example, Ice Lake only supports L1 top-down metrics. The Sapphire Rapids -supports both L1 and L2 top-down metrics. - -Default: 0 means the max level that the current hardware support. Error out if the input is higher than the supported max level. --no-merge:: diff --git a/tools/perf/Documentation/topdown.txt b/tools/perf/Documentation/topdown.txt index a15b93fdcf50..ae0aee86844f 100644 --- a/tools/perf/Documentation/topdown.txt +++ b/tools/perf/Documentation/topdown.txt @@ -1,46 +1,35 @@ -Using TopDown metrics in user space ------------------------------------ +Using TopDown metrics +--------------------- -Intel CPUs (since Sandy Bridge and Silvermont) support a TopDown -methodology to break down CPU pipeline execution into 4 bottlenecks: -frontend bound, backend bound, bad speculation, retiring. +TopDown metrics break apart performance bottlenecks. Starting at level +1 it is typical to get metrics on retiring, bad speculation, frontend +bound, and backend bound. Higher levels provide more detail in to the +level 1 bottlenecks, such as at level 2: core bound, memory bound, +heavy operations, light operations, branch mispredicts, machine +clears, fetch latency and fetch bandwidth. For more details see [1][2][3]. -For more details on Topdown see [1][5] +perf stat --topdown implements this using available metrics that vary +per architecture. -Traditionally this was implemented by events in generic counters -and specific formulas to compute the bottlenecks. - -perf stat --topdown implements this. - -Full Top Down includes more levels that can break down the -bottlenecks further. This is not directly implemented in perf, -but available in other tools that can run on top of perf, -such as toplev[2] or vtune[3] +% perf stat -a --topdown -I1000 +# time % tma_retiring % tma_backend_bound % tma_frontend_bound % tma_bad_speculation + 1.001141351 11.5 34.9 46.9 6.7 + 2.006141972 13.4 28.1 50.4 8.1 + 3.010162040 12.9 28.1 51.1 8.0 + 4.014009311 12.5 28.6 51.8 7.2 + 5.017838554 11.8 33.0 48.0 7.2 + 5.704818971 14.0 27.5 51.3 7.3 +... -New Topdown features in Ice Lake -=============================== +New Topdown features in Intel Ice Lake +====================================== With Ice Lake CPUs the TopDown metrics are directly available as fixed counters and do not require generic counters. This allows to collect TopDown always in addition to other events. -% perf stat -a --topdown -I1000 -# time retiring bad speculation frontend bound backend bound - 1.001281330 23.0% 15.3% 29.6% 32.1% - 2.003009005 5.0% 6.8% 46.6% 41.6% - 3.004646182 6.7% 6.7% 46.0% 40.6% - 4.006326375 5.0% 6.4% 47.6% 41.0% - 5.007991804 5.1% 6.3% 46.3% 42.3% - 6.009626773 6.2% 7.1% 47.3% 39.3% - 7.011296356 4.7% 6.7% 46.2% 42.4% - 8.012951831 4.7% 6.7% 47.5% 41.1% -... - -This also enables measuring TopDown per thread/process instead -of only per core. - -Using TopDown through RDPMC in applications on Ice Lake -====================================================== +Using TopDown through RDPMC in applications on Intel Ice Lake +============================================================= For more fine grained measurements it can be useful to access the new directly from user space. This is more complicated, @@ -301,8 +290,8 @@ This "opens" a new measurement period. A program using RDPMC for TopDown should schedule such a reset regularly, as in every few seconds. -Limits on Ice Lake -================== +Limits on Intel Ice Lake +======================== Four pseudo TopDown metric events are exposed for the end-users, topdown-retiring, topdown-bad-spec, topdown-fe-bound and topdown-be-bound. @@ -318,8 +307,8 @@ a sampling read group. Since the SLOTS event must be the leader of a TopDown group, the second event of the group is the sampling event. For example, perf record -e '{slots, $sampling_event, topdown-retiring}:S' -Extension on Sapphire Rapids Server -=================================== +Extension on Intel Sapphire Rapids Server +========================================= The metrics counter is extended to support TMA method level 2 metrics. The lower half of the register is the TMA level 1 metrics (legacy). The upper half is also divided into four 8-bit fields for the new level 2 @@ -338,7 +327,6 @@ other four level 2 metrics by subtracting corresponding metrics as below. [1] https://software.intel.com/en-us/top-down-microarchitecture-analysis-method-win -[2] https://github.com/andikleen/pmu-tools/wiki/toplev-manual -[3] https://software.intel.com/en-us/intel-vtune-amplifier-xe +[2] https://sites.google.com/site/analysismethods/yasin-pubs +[3] https://perf.wiki.kernel.org/index.php/Top-Down_Analysis [4] https://github.com/andikleen/pmu-tools/tree/master/jevents -[5] https://sites.google.com/site/analysismethods/yasin-pubs