From patchwork Mon Dec 19 16:12:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Clark X-Patchwork-Id: 34643 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp2480162wrn; Mon, 19 Dec 2022 08:16:19 -0800 (PST) X-Google-Smtp-Source: AA0mqf64aoVDchIbatvsn4vFXRPJiUJNb32sLWUAXgMNdFdi0SKvY7lEeeP8+P//CySyt0wgeQC5 X-Received: by 2002:a17:90b:344f:b0:223:34bb:cb3 with SMTP id lj15-20020a17090b344f00b0022334bb0cb3mr21325554pjb.38.1671466579637; Mon, 19 Dec 2022 08:16:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671466579; cv=none; d=google.com; s=arc-20160816; b=rm/0T/mM+3681Qgq6RRQ5K2vraP8NfQnfMfAlPEPJPqtXSrkTcqtUEjyjlJz16pUbv FEDV/MStVGjtU1C7/AaHmUc50AjVm484CNahgkWaynH9T2ZTuiS5WOc6Wk/Ta03SfAUF rpIpyfKmhn12+755NZAuCXuFapKCcP65Hukc9MW2h0/K3QgUE2NiYy+fdJOp/uRTEkfq e4r5ZbQsiyN7di58JN50yuMaNkdYmK6frT10AzPV4eCeke0ALzuWTxq+rAzCjKcMRlJr lYRu9v//Qo0YM4ehFCx0nkSb1pwdpVa928XyRs5OjC9izuFpbDm9HI+ZxxNLDJ/h2nfM 5OAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=4swbeeSO6Qcc8tdof/UNx/DNRutkTBo4NqRQwUXQRK8=; b=lqXPIalVdwBoq6oX1az1nxBUpcWNmPRnxyXZ3TuSpmQtRM8j0NkyLw3mJ4vi53IVYp Hz0+DqT8CQgtXg2m8gZiL3DpEZCLawEgg4Fra+cMTyARdAgOHBwXnDUxvakmcdhn7VPB VbAFZUZr0Q7+vUXejqWWDwMwJuMiuGmeN8oMBrgbnaxWl00hKLXUJmVH/Z5QsCbkBEkL SF7RlXDs/6I8az5KIy5goTfTFAwRwqJudFf1/e6duZTW0TbsBMh58YtIALX9QdJlF/Pb XTFb7/5p8oADSMl7WYt2SkiDwFUSb6EJ09evuyUsx+T1ILC1wb6FF4YWAyQUR2rOCCva F/Dg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id rm6-20020a17090b3ec600b00209b6044d31si12095967pjb.51.2022.12.19.08.16.06; Mon, 19 Dec 2022 08:16:19 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232037AbiLSQOY (ORCPT + 99 others); Mon, 19 Dec 2022 11:14:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49218 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231840AbiLSQOK (ORCPT ); Mon, 19 Dec 2022 11:14:10 -0500 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id CF6E4A440; Mon, 19 Dec 2022 08:14:03 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 85D70FEC; Mon, 19 Dec 2022 08:14:44 -0800 (PST) Received: from e126815.warwick.arm.com (e126815.arm.com [10.32.32.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 5C9433F703; Mon, 19 Dec 2022 08:14:01 -0800 (PST) From: James Clark To: linux-perf-users@vger.kernel.org Cc: robh@kernel.org, German Gomez , James Clark , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , John Garry , Will Deacon , Mike Leach , Leo Yan , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH 4/4] perf report: Add 'simd' sort field Date: Mon, 19 Dec 2022 16:12:58 +0000 Message-Id: <20221219161259.3097213-5-james.clark@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221219161259.3097213-1-james.clark@arm.com> References: <20221219161259.3097213-1-james.clark@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752659740252511706?= X-GMAIL-MSGID: =?utf-8?q?1752659740252511706?= From: German Gomez Add 'simd' sort field to visualize SIMD ops in perf-report. Rows are labeled with the SIMD isa, and the type of predicate (if any): - [p] partial predicate - [e] empty predicate (no elements in the vector being used) Example with Arm SPE and SVE (Scalable Vector Extension): #include double src[1025], dst[1025]; int main(void) { svfloat64_t vc = svdup_f64(1); for(;;) for(int i = 0; i < 1025; i += svcntd()) { svbool_t pg = svwhilelt_b64(i, 1025); svfloat64_t vsrc = svld1(pg, &src[i]); svfloat64_t vdst = svadd_x(pg, vsrc, vc); svst1(pg, &dst[i], vdst); } return 0; } ... compiled using "gcc-11 -march=armv8-a+sve -O3" Profiling on a platform that implements FEAT_SVE and FEAT_SPEv1p1: $ perf record -e arm_spe_0// -- ./a.out $ perf report --itrace=i1i -s overhead,pid,simd,sym Overhead Pid:Command Simd Symbol ........ ................ ....... ...................... 53.76% 10758:program [.] main 46.14% 10758:program [.] SVE [.] main 0.09% 10758:program [p] SVE [.] main The report shows 0.09% of the sampled SVE operations use partial predicates due to src and dst arrays not being multiples of the vector register lengths. Signed-off-by: German Gomez Signed-off-by: James Clark --- tools/perf/Documentation/perf-report.txt | 1 + tools/perf/util/hist.c | 1 + tools/perf/util/hist.h | 1 + tools/perf/util/sort.c | 47 ++++++++++++++++++++++++ tools/perf/util/sort.h | 2 + 5 files changed, 52 insertions(+) diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index 4fa509b15948..ff524d83a4a7 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -115,6 +115,7 @@ OPTIONS - p_stage_cyc: On powerpc, this presents the number of cycles spent in a pipeline stage. And currently supported only on powerpc. - addr: (Full) virtual address of the sampled instruction + - simd: Flags describing a SIMD operation. "e" for empty Arm SVE predicate. "p" for partial Arm SVE predicate By default, comm, dso and symbol keys are used. (i.e. --sort comm,dso,symbol) diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index 17a05e943b44..e2390114c495 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -742,6 +742,7 @@ __hists__add_entry(struct hists *hists, .weight = sample->weight, .ins_lat = sample->ins_lat, .p_stage_cyc = sample->p_stage_cyc, + .simd_flags = sample->simd_flags, }, *he = hists__findnew_entry(hists, &entry, al, sample_self); if (!hists->has_callchains && he && he->callchain_size != 0) diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h index ebd8a8f783ee..e6ecb4453053 100644 --- a/tools/perf/util/hist.h +++ b/tools/perf/util/hist.h @@ -80,6 +80,7 @@ enum hist_column { HISTC_ADDR_FROM, HISTC_ADDR_TO, HISTC_ADDR, + HISTC_SIMD, HISTC_NR_COLS, /* Last entry */ }; diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index 0ecc2cb13792..5c8bfea2ce34 100644 --- a/tools/perf/util/sort.c +++ b/tools/perf/util/sort.c @@ -131,6 +131,52 @@ struct sort_entry sort_thread = { .se_width_idx = HISTC_THREAD, }; +/* --sort simd */ + +static int64_t +sort__simd_cmp(struct hist_entry *left, struct hist_entry *right) +{ + if (left->simd_flags.arch != right->simd_flags.arch) + return (int64_t) left->simd_flags.arch - right->simd_flags.arch; + + return (int64_t) left->simd_flags.pred - right->simd_flags.pred; +} + +static const char *hist_entry__get_simd_name(struct simd_flags *simd_flags) +{ + u64 arch = simd_flags->arch; + + if (arch & SIMD_OP_FLAGS_ARCH_SVE) + return "SVE"; + else + return "n/a"; +} + +static int hist_entry__simd_snprintf(struct hist_entry *he, char *bf, + size_t size, unsigned int width __maybe_unused) +{ + const char *name; + + if (!he->simd_flags.arch) + return repsep_snprintf(bf, size, ""); + + name = hist_entry__get_simd_name(&he->simd_flags); + + if (he->simd_flags.pred & SIMD_OP_FLAGS_PRED_EMPTY) + return repsep_snprintf(bf, size, "[e] %s", name); + else if (he->simd_flags.pred & SIMD_OP_FLAGS_PRED_PARTIAL) + return repsep_snprintf(bf, size, "[p] %s", name); + + return repsep_snprintf(bf, size, "[.] %s", name); +} + +struct sort_entry sort_simd = { + .se_header = "Simd ", + .se_cmp = sort__simd_cmp, + .se_snprintf = hist_entry__simd_snprintf, + .se_width_idx = HISTC_SIMD, +}; + /* --sort comm */ /* @@ -2042,6 +2088,7 @@ static struct sort_dimension common_sort_dimensions[] = { DIM(SORT_LOCAL_PIPELINE_STAGE_CYC, "local_p_stage_cyc", sort_local_p_stage_cyc), DIM(SORT_GLOBAL_PIPELINE_STAGE_CYC, "p_stage_cyc", sort_global_p_stage_cyc), DIM(SORT_ADDR, "addr", sort_addr), + DIM(SORT_SIMD, "simd", sort_simd) }; #undef DIM diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h index 04ff8b61a2a7..8e69a2a53dc1 100644 --- a/tools/perf/util/sort.h +++ b/tools/perf/util/sort.h @@ -110,6 +110,7 @@ struct hist_entry { u64 p_stage_cyc; u8 cpumode; u8 depth; + struct simd_flags simd_flags; /* We are added by hists__add_dummy_entry. */ bool dummy; @@ -237,6 +238,7 @@ enum sort_type { SORT_LOCAL_PIPELINE_STAGE_CYC, SORT_GLOBAL_PIPELINE_STAGE_CYC, SORT_ADDR, + SORT_SIMD, /* branch stack specific sort keys */ __SORT_BRANCH_STACK,