From patchwork Tue Mar 28 22:27:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liang, Kan" X-Patchwork-Id: 76296 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp13402vqo; Tue, 28 Mar 2023 15:39:30 -0700 (PDT) X-Google-Smtp-Source: AKy350ayL9+xCX4AixdaJMj2e2chZyboxj54fyDgfay1hnf3G3MIYgqdJbCRt5sDSDa8AcCar7vW X-Received: by 2002:a17:906:4a55:b0:92f:a0d5:211c with SMTP id a21-20020a1709064a5500b0092fa0d5211cmr14789342ejv.35.1680043170677; Tue, 28 Mar 2023 15:39:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680043170; cv=none; d=google.com; s=arc-20160816; b=D1hR6zgLccTfrP3OSaO65ejRQ6ekeiQVZ8A2sUj6ZB3So/y70Je5mi+TWXpNq6Uk7b 0ZaaVm+15zYOGyJN2px3xDwW99QGAUfHwIWItcDMtQQCSivF3m3n4eMcEdASMo6ZaiH3 9+bwpWzu3ZZQG30y9vD1Kc9f4eqlSwmZ4wVUK9Zh5f8r8S8P4t6mb54uhgkpI7yoBXDt sMOVzAehows+88lgKFJmK6J2kifW+72rA/fT9CY7GRZisKI8DCHlkF0OUoTFmQs6mU/t htqSTtD/7I/d4KXGXBmmTl3kI8inBhJmDmyWsr2biCZHNURxEBEsmB4vRHqFpAFiqCoI 0D7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=fzH6hLsRudlfctgnXwopdgPcBTPATXj3u7ZWBbrloS4=; b=XIs5+k0NWhGF1Wo4cThrViKgrmEULLu6CPeu/zBwhTpDo5Y57orgYHcsx3hxokXPcQ a+dqydsj/9Mj+MrmJ+LS1l87bHaetGDYRZzG7mhTYpRnQH2aWyq8Ya2qeR4yjVbwCuTj 3LkQds9vxU3kBmGJ9JTAfS0v2r5mQMAY+cbEVb0nmOGNHIWa6oh/F5FiiRxmm4S3QIgE j7bymYjRYfPdNVkhjW9l06IdCtFaG+90UfNm1rg1GhVQzS0TtBI8HlLuUxK54wtQDGOB DUO05nkEK06c+ceUtXr8W/RpEGzgGgq4sJhzrMpSAvOKlsxrkwp8hZVr/9RGtb6s4upF QUwA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ReGVh010; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z16-20020a170906945000b0093dd2150283si13047498ejx.715.2023.03.28.15.39.07; Tue, 28 Mar 2023 15:39:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ReGVh010; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229725AbjC1Wd4 (ORCPT + 99 others); Tue, 28 Mar 2023 18:33:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58262 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229689AbjC1Wdo (ORCPT ); Tue, 28 Mar 2023 18:33:44 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CABA8107 for ; Tue, 28 Mar 2023 15:33:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1680042823; x=1711578823; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dff8bfa4xFxrDAv4QO3C6nfds+XDea5PueZbQSlEq0Q=; b=ReGVh010QeWvXu3czNBIBxPaRmeFXazVfwipkTW67FKyNJqaNb+vIbWA kEod+MzuzE+PMfw5z+LGuWDsj8/yystvM5kQz97v5nyzVFDzosKvFAU8e Moq/bUJkaa2rhS1hnMVxRWueb4tXTayJyKFt+pEDhXwI0YmNyWZZaeeNk RORGhqryC3KPWVydePweysz33VJN0Hx+N+7UPn420nvJEh7gm9CRXZGq3 bq7rBk7M/mTZDo6ALu+jmeiE9edBGZpsJlSbsBzmBZBcKwIuIi4XaNbSt kl9hJjxf2KuOlDAK4WbFIDBFqjM0o9p9+tICVBmTtlgRzzWHjEGwyWC0l Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10663"; a="340735721" X-IronPort-AV: E=Sophos;i="5.98,297,1673942400"; d="scan'208";a="340735721" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Mar 2023 15:32:06 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10663"; a="634236283" X-IronPort-AV: E=Sophos;i="5.98,297,1673942400"; d="scan'208";a="634236283" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orsmga003.jf.intel.com with ESMTP; 28 Mar 2023 15:27:37 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, linux-kernel@vger.kernel.org Cc: ak@linux.intel.com, eranian@google.com, Kan Liang Subject: [PATCH 2/2] perf/x86/intel/ds: Use the size from each PEBS record Date: Tue, 28 Mar 2023 15:27:35 -0700 Message-Id: <20230328222735.1367829-2-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230328222735.1367829-1-kan.liang@linux.intel.com> References: <20230328222735.1367829-1-kan.liang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.4 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761652947836523128?= X-GMAIL-MSGID: =?utf-8?q?1761652947836523128?= From: Kan Liang The kernel warning for the unexpected PEBS record can also be observed during a context switch, when the below commands are running in parallel for a while on SPR. while true; do perf record --no-buildid -a --intr-regs=AX -e cpu/event=0xd0,umask=0x81/pp -c 10003 -o /dev/null ./triad; done & while true; do perf record -o /tmp/out -W -d -e '{ld_blocks.store_forward:period=1000000, MEM_TRANS_RETIRED.LOAD_LATENCY:u:precise=2:ldlat=4}' -c 1037 ./triad; done *The triad program is just the generation of loads/stores. The current PEBS code assumes that all the PEBS records in the DS buffer have the same size, aka cpuc->pebs_record_size. It's true for the most cases, since the DS buffer is always flushed in every context switch. However, there is a corner case that breaks the assumption. A system-wide PEBS event with the large PEBS config may be enabled during a context switch. Some PEBS records for the system-wide PEBS may be generated while the old task is sched out but the new one hasn't been sched in yet. When the new task is sched in, the cpuc->pebs_record_size may be updated for the per-task PEBS events. So the existing system-wide PEBS records have a different size from the later PEBS records. Two methods were considered to fix the issue. One is to flush the DS buffer for the system-wide PEBS right before the new task sched in. It has to be done in the generic code via the sched_task() call back. However, the sched_task() is shared among different ARCHs. The movement may impact other ARCHs, e.g., AMD BRS requires the sched_task() is called after the PMU has started on a ctxswin. The method is dropped. The other method is implemented here. It doesn't assume that all the PEBS records have the same size any more. The size from each PEBS record is used to parse the record. For the previous platform (PEBS format < 4), which doesn't support adaptive PEBS, there is nothing changed. Reported-by: Stephane Eranian Signed-off-by: Kan Liang --- arch/x86/events/intel/ds.c | 31 ++++++++++++++++++++++++++----- arch/x86/include/asm/perf_event.h | 6 ++++++ 2 files changed, 32 insertions(+), 5 deletions(-) diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index a2e566e53076..905135a8b99f 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1546,6 +1546,15 @@ static inline u64 get_pebs_status(void *n) return ((struct pebs_basic *)n)->applicable_counters; } +static inline u64 get_pebs_size(void *n) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + + if (x86_pmu.intel_cap.pebs_format < 4) + return cpuc->pebs_record_size; + return intel_adaptive_pebs_size(((struct pebs_basic *)n)->format_size); +} + #define PERF_X86_EVENT_PEBS_HSW_PREC \ (PERF_X86_EVENT_PEBS_ST_HSW | \ PERF_X86_EVENT_PEBS_LD_HSW | \ @@ -1903,9 +1912,9 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event, } } - WARN_ONCE(next_record != __pebs + (format_size >> 48), + WARN_ONCE(next_record != __pebs + intel_adaptive_pebs_size(format_size), "PEBS record size %llu, expected %llu, config %llx\n", - format_size >> 48, + intel_adaptive_pebs_size(format_size), (u64)(next_record - __pebs), basic->format_size); } @@ -1927,7 +1936,7 @@ get_next_pebs_record_by_bit(void *base, void *top, int bit) if (base == NULL) return NULL; - for (at = base; at < top; at += cpuc->pebs_record_size) { + for (at = base; at < top; at += get_pebs_size(at)) { unsigned long status = get_pebs_status(at); if (test_bit(bit, (unsigned long *)&status)) { @@ -2054,7 +2063,7 @@ __intel_pmu_pebs_event(struct perf_event *event, while (count > 1) { setup_sample(event, iregs, at, data, regs); perf_event_output(event, data, regs); - at += cpuc->pebs_record_size; + at += get_pebs_size(at); at = get_next_pebs_record_by_bit(at, top, bit); count--; } @@ -2278,7 +2287,19 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs, struct perf_sample_d return; } - for (at = base; at < top; at += cpuc->pebs_record_size) { + /* + * The cpuc->pebs_record_size may be different from the + * size of each PEBS record. For example, a system-wide + * PEBS event with the large PEBS config may be enabled + * during a context switch. Some PEBS records for the + * system-wide PEBS may be generated while the old task + * is sched out but the new one isn't sched in. When the + * new task is sched in, the cpuc->pebs_record_size may + * be updated for the per-task PEBS events. So the + * existing system-wide PEBS records have a different + * size from the later PEBS records. + */ + for (at = base; at < top; at += get_pebs_size(at)) { u64 pebs_status; pebs_status = get_pebs_status(at) & cpuc->pebs_enabled; diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h index 8fc15ed5e60b..ad5655bb90f6 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -386,6 +386,12 @@ static inline bool is_topdown_idx(int idx) /* * Adaptive PEBS v4 */ +#define INTEL_ADAPTIVE_PEBS_SIZE_OFF 48 + +static inline u64 intel_adaptive_pebs_size(u64 format_size) +{ + return (format_size >> INTEL_ADAPTIVE_PEBS_SIZE_OFF); +} struct pebs_basic { u64 format_size;