Message ID | 20230105031039.207972-6-anshuman.khandual@arm.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp94775wrt; Wed, 4 Jan 2023 19:15:14 -0800 (PST) X-Google-Smtp-Source: AMrXdXuq4pb0WhjZNXJdN5sS0Lc6duNard0hDhOD7/2NB0PrXURCbsp6Zz0Xn64gic/BZc7emgli X-Received: by 2002:aa7:9886:0:b0:582:5460:19c5 with SMTP id r6-20020aa79886000000b00582546019c5mr14292097pfl.23.1672888514489; Wed, 04 Jan 2023 19:15:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672888514; cv=none; d=google.com; s=arc-20160816; b=uXBqZgYuBZMa9nw87ckE2pG4uHY8aPvrYnL/QAnX9CW5467Td/hdVQgxzfATh72yku vI/ftfbRGsF7664MosW0JV7lzenHwH0fKS6Epxiz3fgc3qwX6PmsCdkKlV9X9klh0z6i 7nPe33H9LGJI45XDrX21UV9gqg/SOGm6Coi8nUgnzByteHXloO/ACwuutFs81w3lGj6M JXtZuAPHPj5WzXaSP4MYo0OlPm+lyJxJLE17j0H6dDoiNlqG8qOCzXW3DOgB0jjetu3p 8dJLiYKj3EYrSJvw2+Oih9UYcBPoovwL0Tlc4IaR6PDKI/fPFwOB9mcfRBkEYDcOZqVE vzSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=8jP/V7blRNY4ManJA6NXnbvdd49UdH7eqFe9s1FlE9U=; b=mgTlfBsC8/94wCsmT8xJpUOMbHOI2YmzgVWnhS3DcBLcihVVNOoVgiu1U2dJNYEorK Q/RWIguzih7C+LZeWB7VsiwfiQLS2xQgXF19KiTHJVRLjSsaYzPw0NjDQWsQBlZhzQxI f1PFfKHwXj2MQYudcJ4zRMotQaYm7K9AfDucxc3zne0n6oT9/5B59VXJgn5mNpbV4ceC r4/9+kE4STDzkDCpZHlyjE9TtphfHpGmD7tLv7d/j/5k1SbHwhg3uezuv7ky3BaKV5U4 waU1ej+5T9vCB/AWnyI9WdJF8eWdZZpL79HPCuthz03jx5M/fwUH7k79jv+6ao3N1oAg /+ag== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s14-20020a056a00194e00b00580eef1f714si31233866pfk.265.2023.01.04.19.15.00; Wed, 04 Jan 2023 19:15:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230101AbjAEDLX (ORCPT <rfc822;tmhikaru@gmail.com> + 99 others); Wed, 4 Jan 2023 22:11:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56454 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230061AbjAEDLL (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 4 Jan 2023 22:11:11 -0500 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 56816479D8 for <linux-kernel@vger.kernel.org>; Wed, 4 Jan 2023 19:11:08 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CFF0D16F3; Wed, 4 Jan 2023 19:11:49 -0800 (PST) Received: from a077893.blr.arm.com (unknown [10.162.43.6]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id E81E03F663; Wed, 4 Jan 2023 19:11:05 -0800 (PST) From: Anshuman Khandual <anshuman.khandual@arm.com> To: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, mark.rutland@arm.com Cc: Anshuman Khandual <anshuman.khandual@arm.com>, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org> Subject: [PATCH V7 5/6] arm64/perf: Add branch stack support in ARMV8 PMU Date: Thu, 5 Jan 2023 08:40:38 +0530 Message-Id: <20230105031039.207972-6-anshuman.khandual@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230105031039.207972-1-anshuman.khandual@arm.com> References: <20230105031039.207972-1-anshuman.khandual@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754150747033690739?= X-GMAIL-MSGID: =?utf-8?q?1754150747033690739?= |
Series |
arm64/perf: Enable branch stack sampling
|
|
Commit Message
Anshuman Khandual
Jan. 5, 2023, 3:10 a.m. UTC
This enables support for branch stack sampling event in ARMV8 PMU, checking
has_branch_stack() on the event inside 'struct arm_pmu' callbacks. Although
these branch stack helpers armv8pmu_branch_XXXXX() are just dummy functions
for now. While here, this also defines arm_pmu's sched_task() callback with
armv8pmu_sched_task(), which resets the branch record buffer on a sched_in.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
arch/arm64/include/asm/perf_event.h | 10 +++++++++
arch/arm64/kernel/perf_event.c | 35 +++++++++++++++++++++++++++++
2 files changed, 45 insertions(+)
Comments
On Thu, Jan 05, 2023 at 08:40:38AM +0530, Anshuman Khandual wrote: > This enables support for branch stack sampling event in ARMV8 PMU, checking > has_branch_stack() on the event inside 'struct arm_pmu' callbacks. Although > these branch stack helpers armv8pmu_branch_XXXXX() are just dummy functions > for now. While here, this also defines arm_pmu's sched_task() callback with > armv8pmu_sched_task(), which resets the branch record buffer on a sched_in. > > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: Will Deacon <will@kernel.org> > Cc: Mark Rutland <mark.rutland@arm.com> > Cc: linux-arm-kernel@lists.infradead.org > Cc: linux-kernel@vger.kernel.org > Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> > --- > arch/arm64/include/asm/perf_event.h | 10 +++++++++ > arch/arm64/kernel/perf_event.c | 35 +++++++++++++++++++++++++++++ > 2 files changed, 45 insertions(+) > > diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h > index 3eaf462f5752..a038902d6874 100644 > --- a/arch/arm64/include/asm/perf_event.h > +++ b/arch/arm64/include/asm/perf_event.h > @@ -273,4 +273,14 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs); > (regs)->pstate = PSR_MODE_EL1h; \ > } > > +struct pmu_hw_events; > +struct arm_pmu; > +struct perf_event; > + > +static inline void armv8pmu_branch_read(struct pmu_hw_events *cpuc, struct perf_event *event) { } > +static inline bool armv8pmu_branch_valid(struct perf_event *event) { return false; } > +static inline void armv8pmu_branch_enable(struct perf_event *event) { } > +static inline void armv8pmu_branch_disable(struct perf_event *event) { } > +static inline void armv8pmu_branch_probe(struct arm_pmu *arm_pmu) { } > +static inline void armv8pmu_branch_reset(void) { } As far as I can tell, these are not supposed to be called when !has_branch_stack(), so it would be good if these had a WARN() or similar to spot buggy usage. > #endif > diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c > index a5193f2146a6..8805b4516088 100644 > --- a/arch/arm64/kernel/perf_event.c > +++ b/arch/arm64/kernel/perf_event.c > @@ -789,10 +789,22 @@ static void armv8pmu_enable_event(struct perf_event *event) > * Enable counter > */ > armv8pmu_enable_event_counter(event); > + > + /* > + * Enable BRBE > + */ > + if (has_branch_stack(event)) > + armv8pmu_branch_enable(event); > } This looks fine, but tbh I think we should delete the existing comments above each function call as they're blindingly obvious and just waste space. > static void armv8pmu_disable_event(struct perf_event *event) > { > + /* > + * Disable BRBE > + */ > + if (has_branch_stack(event)) > + armv8pmu_branch_disable(event); > + Likewise here. > /* > * Disable counter > */ > @@ -878,6 +890,13 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu) > if (!armpmu_event_set_period(event)) > continue; > > + if (has_branch_stack(event)) { > + WARN_ON(!cpuc->branches); > + armv8pmu_branch_read(cpuc, event); > + data.br_stack = &cpuc->branches->branch_stack; > + data.sample_flags |= PERF_SAMPLE_BRANCH_STACK; > + } How do we ensure the data we're getting isn't changed under our feet? Is BRBE disabled at this point? Is this going to have branches after taking the exception, or does BRBE stop automatically at that point? If so we presumably need to take special care as to when we read this relative to enabling/disabling and/or manipulating the overflow bits. > + > /* > * Perf event overflow will queue the processing of the event as > * an irq_work which will be taken care of in the handling of > @@ -976,6 +995,14 @@ static int armv8pmu_user_event_idx(struct perf_event *event) > return event->hw.idx; > } > > +static void armv8pmu_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in) > +{ > + struct arm_pmu *armpmu = to_arm_pmu(pmu_ctx->pmu); > + > + if (sched_in && arm_pmu_branch_stack_supported(armpmu)) > + armv8pmu_branch_reset(); > +} When scheduling out, shouldn't we save what we have so far? It seems odd that we just throw that away rather than placing it into a FIFO. Thanks, Mark.
On 1/12/23 19:59, Mark Rutland wrote: > On Thu, Jan 05, 2023 at 08:40:38AM +0530, Anshuman Khandual wrote: >> This enables support for branch stack sampling event in ARMV8 PMU, checking >> has_branch_stack() on the event inside 'struct arm_pmu' callbacks. Although >> these branch stack helpers armv8pmu_branch_XXXXX() are just dummy functions >> for now. While here, this also defines arm_pmu's sched_task() callback with >> armv8pmu_sched_task(), which resets the branch record buffer on a sched_in. >> >> Cc: Catalin Marinas <catalin.marinas@arm.com> >> Cc: Will Deacon <will@kernel.org> >> Cc: Mark Rutland <mark.rutland@arm.com> >> Cc: linux-arm-kernel@lists.infradead.org >> Cc: linux-kernel@vger.kernel.org >> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> >> --- >> arch/arm64/include/asm/perf_event.h | 10 +++++++++ >> arch/arm64/kernel/perf_event.c | 35 +++++++++++++++++++++++++++++ >> 2 files changed, 45 insertions(+) >> >> diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h >> index 3eaf462f5752..a038902d6874 100644 >> --- a/arch/arm64/include/asm/perf_event.h >> +++ b/arch/arm64/include/asm/perf_event.h >> @@ -273,4 +273,14 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs); >> (regs)->pstate = PSR_MODE_EL1h; \ >> } >> >> +struct pmu_hw_events; >> +struct arm_pmu; >> +struct perf_event; >> + >> +static inline void armv8pmu_branch_read(struct pmu_hw_events *cpuc, struct perf_event *event) { } >> +static inline bool armv8pmu_branch_valid(struct perf_event *event) { return false; } >> +static inline void armv8pmu_branch_enable(struct perf_event *event) { } >> +static inline void armv8pmu_branch_disable(struct perf_event *event) { } >> +static inline void armv8pmu_branch_probe(struct arm_pmu *arm_pmu) { } >> +static inline void armv8pmu_branch_reset(void) { } > > As far as I can tell, these are not supposed to be called when > !has_branch_stack(), so it would be good if these had a WARN() or similar to > spot buggy usage. This is actually true except the last two helper functions, which get called in the generic PMU context i.e while probing or resetting the PMU. While probing it is not yet known whether the PMU supports branch stack or not, but while resetting the PMU arm_pmu_branch_stack_supported() is checked to ensure there is a buffer to be reset via a special instruction. Will change the first four functions to add warnings in case the event is not a branch stack one. diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h index cf2e88c7b707..ab1d180e17a6 100644 --- a/arch/arm64/include/asm/perf_event.h +++ b/arch/arm64/include/asm/perf_event.h @@ -285,10 +285,27 @@ void armv8pmu_branch_disable(struct perf_event *event); void armv8pmu_branch_probe(struct arm_pmu *arm_pmu); void armv8pmu_branch_reset(void); #else -static inline void armv8pmu_branch_read(struct pmu_hw_events *cpuc, struct perf_event *event) { } -static inline bool armv8pmu_branch_valid(struct perf_event *event) { return false; } -static inline void armv8pmu_branch_enable(struct perf_event *event) { } -static inline void armv8pmu_branch_disable(struct perf_event *event) { } +static inline void armv8pmu_branch_read(struct pmu_hw_events *cpuc, struct perf_event *event) +{ + WARN_ON_ONCE(!has_branch_stack(event)); +} + +static inline bool armv8pmu_branch_valid(struct perf_event *event) +{ + WARN_ON_ONCE(!has_branch_stack(event)); + return false; +} + +static inline void armv8pmu_branch_enable(struct perf_event *event) +{ + WARN_ON_ONCE(!has_branch_stack(event)); +} + +static inline void armv8pmu_branch_disable(struct perf_event *event) +{ + WARN_ON_ONCE(!has_branch_stack(event)); +} + static inline void armv8pmu_branch_probe(struct arm_pmu *arm_pmu) { } static inline void armv8pmu_branch_reset(void) { } #endif > >> #endif >> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c >> index a5193f2146a6..8805b4516088 100644 >> --- a/arch/arm64/kernel/perf_event.c >> +++ b/arch/arm64/kernel/perf_event.c >> @@ -789,10 +789,22 @@ static void armv8pmu_enable_event(struct perf_event *event) >> * Enable counter >> */ >> armv8pmu_enable_event_counter(event); >> + >> + /* >> + * Enable BRBE >> + */ >> + if (has_branch_stack(event)) >> + armv8pmu_branch_enable(event); >> } > > This looks fine, but tbh I think we should delete the existing comments above > each function call as they're blindingly obvious and just waste space. > >> static void armv8pmu_disable_event(struct perf_event *event) >> { >> + /* >> + * Disable BRBE >> + */ >> + if (has_branch_stack(event)) >> + armv8pmu_branch_disable(event); >> + > > Likewise here. Dropped all the comments in armv8pmu_enable_event() and in armv8pmu_disable_event() and removed the redundant interleaving lines as well. > >> /* >> * Disable counter >> */ >> @@ -878,6 +890,13 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu) >> if (!armpmu_event_set_period(event)) >> continue; >> >> + if (has_branch_stack(event)) { >> + WARN_ON(!cpuc->branches); >> + armv8pmu_branch_read(cpuc, event); >> + data.br_stack = &cpuc->branches->branch_stack; >> + data.sample_flags |= PERF_SAMPLE_BRANCH_STACK; >> + } > > How do we ensure the data we're getting isn't changed under our feet? Is BRBE > disabled at this point? Right, BRBE is paused after a PMU IRQ. We also ensure the buffer is disabled for all exception levels, i.e removing BRBCR_EL1_E0BRE/E1BRE from the configuration, before initiating the actual read, which eventually populates the data.br_stack. > > Is this going to have branches after taking the exception, or does BRBE stop > automatically at that point? If so we presumably need to take special care as > to when we read this relative to enabling/disabling and/or manipulating the > overflow bits. The default BRBE configuration includes setting BRBCR_EL1.FZP, enabling BRBE to be paused automatically, right after a PMU IRQ. Regardless, before reading the buffer, BRBE is paused (BRBFCR_EL1.PAUSED) and disabled for all privilege levels ~(BRBCR_EL1.E0BRE/E1BRE) which ensures that no new branch record is getting into the buffer, while it is being read for perf right buffer. > >> + >> /* >> * Perf event overflow will queue the processing of the event as >> * an irq_work which will be taken care of in the handling of >> @@ -976,6 +995,14 @@ static int armv8pmu_user_event_idx(struct perf_event *event) >> return event->hw.idx; >> } >> >> +static void armv8pmu_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in) >> +{ >> + struct arm_pmu *armpmu = to_arm_pmu(pmu_ctx->pmu); >> + >> + if (sched_in && arm_pmu_branch_stack_supported(armpmu)) >> + armv8pmu_branch_reset(); >> +} > > When scheduling out, shouldn't we save what we have so far? > > It seems odd that we just throw that away rather than placing it into a FIFO. IIRC we had discussed this earlier, save and restore mechanism will be added later, not during this enablement patch series. For now resetting the buffer ensures that branch records from one session does not get into another. Note that these branches cannot be pushed into perf ring buffer either, as there was no corresponding PMU interrupt to be associated with.
On Fri, Jan 13, 2023 at 10:41:51AM +0530, Anshuman Khandual wrote: > > > On 1/12/23 19:59, Mark Rutland wrote: > > On Thu, Jan 05, 2023 at 08:40:38AM +0530, Anshuman Khandual wrote: > >> @@ -878,6 +890,13 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu) > >> if (!armpmu_event_set_period(event)) > >> continue; > >> > >> + if (has_branch_stack(event)) { > >> + WARN_ON(!cpuc->branches); > >> + armv8pmu_branch_read(cpuc, event); > >> + data.br_stack = &cpuc->branches->branch_stack; > >> + data.sample_flags |= PERF_SAMPLE_BRANCH_STACK; > >> + } > > > > How do we ensure the data we're getting isn't changed under our feet? Is BRBE > > disabled at this point? > > Right, BRBE is paused after a PMU IRQ. We also ensure the buffer is disabled for > all exception levels, i.e removing BRBCR_EL1_E0BRE/E1BRE from the configuration, > before initiating the actual read, which eventually populates the data.br_stack. Ok; just to confirm, what exactly is the condition that enforces that BRBE is disabled? Is that *while* there's an overflow asserted, or does something else get set at the instant the overflow occurs? What exactly is necessary for it to start again? > > Is this going to have branches after taking the exception, or does BRBE stop > > automatically at that point? If so we presumably need to take special care as > > to when we read this relative to enabling/disabling and/or manipulating the > > overflow bits. > > The default BRBE configuration includes setting BRBCR_EL1.FZP, enabling BRBE to > be paused automatically, right after a PMU IRQ. Regardless, before reading the > buffer, BRBE is paused (BRBFCR_EL1.PAUSED) and disabled for all privilege levels > ~(BRBCR_EL1.E0BRE/E1BRE) which ensures that no new branch record is getting into > the buffer, while it is being read for perf right buffer. Ok; I think we could do with some comments as to this. > > > > >> + > >> /* > >> * Perf event overflow will queue the processing of the event as > >> * an irq_work which will be taken care of in the handling of > >> @@ -976,6 +995,14 @@ static int armv8pmu_user_event_idx(struct perf_event *event) > >> return event->hw.idx; > >> } > >> > >> +static void armv8pmu_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in) > >> +{ > >> + struct arm_pmu *armpmu = to_arm_pmu(pmu_ctx->pmu); > >> + > >> + if (sched_in && arm_pmu_branch_stack_supported(armpmu)) > >> + armv8pmu_branch_reset(); > >> +} > > > > When scheduling out, shouldn't we save what we have so far? > > > > It seems odd that we just throw that away rather than placing it into a FIFO. > > IIRC we had discussed this earlier, save and restore mechanism will be added > later, not during this enablement patch series. Sorry, but why? I don't understand why it's acceptable to non-deterministically throw away data for now. At the least that's going to confuse users, especially as the observable behaviour may change if and when that's added later. I assume that there's some reason that it's painful to do that? Could you please elaborate on that? > For now resetting the buffer ensures that branch records from one session > does not get into another. I agree that it's necessary to do that, but as above I don't believe it's sufficient. > Note that these branches cannot be pushed into perf ring buffer either, as > there was no corresponding PMU interrupt to be associated with. I'm not suggesting we put it in the perf ring buffer; I'm suggesting that we snapshot it into *some* kernel-internal storage, then later reconcile that. Maybe that's far more painful than I expect? Thanks, Mark.
On 2/9/23 01:06, Mark Rutland wrote: > On Fri, Jan 13, 2023 at 10:41:51AM +0530, Anshuman Khandual wrote: >> >> >> On 1/12/23 19:59, Mark Rutland wrote: >>> On Thu, Jan 05, 2023 at 08:40:38AM +0530, Anshuman Khandual wrote: >>>> @@ -878,6 +890,13 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu) >>>> if (!armpmu_event_set_period(event)) >>>> continue; >>>> >>>> + if (has_branch_stack(event)) { >>>> + WARN_ON(!cpuc->branches); >>>> + armv8pmu_branch_read(cpuc, event); >>>> + data.br_stack = &cpuc->branches->branch_stack; >>>> + data.sample_flags |= PERF_SAMPLE_BRANCH_STACK; >>>> + } >>> >>> How do we ensure the data we're getting isn't changed under our feet? Is BRBE >>> disabled at this point? >> >> Right, BRBE is paused after a PMU IRQ. We also ensure the buffer is disabled for >> all exception levels, i.e removing BRBCR_EL1_E0BRE/E1BRE from the configuration, >> before initiating the actual read, which eventually populates the data.br_stack. > > Ok; just to confirm, what exactly is the condition that enforces that BRBE is > disabled? Is that *while* there's an overflow asserted, or does something else > get set at the instant the overflow occurs? - BRBE can be disabled completely via BRBCR_EL1_E0BRE/E1BRE irrespective of PMU interrupt - But with PMU interrupt, it just pauses if BRBCR_EL1_FZP is enabled > > What exactly is necessary for it to start again? - Unpause via clearing BRBFCR_EL1_PAUSED - Enable for applicable privilege levels via setting BRBCR_EL1_E0BRE/E1BRE > >>> Is this going to have branches after taking the exception, or does BRBE stop >>> automatically at that point? If so we presumably need to take special care as >>> to when we read this relative to enabling/disabling and/or manipulating the >>> overflow bits. >> >> The default BRBE configuration includes setting BRBCR_EL1.FZP, enabling BRBE to >> be paused automatically, right after a PMU IRQ. Regardless, before reading the >> buffer, BRBE is paused (BRBFCR_EL1.PAUSED) and disabled for all privilege levels >> ~(BRBCR_EL1.E0BRE/E1BRE) which ensures that no new branch record is getting into >> the buffer, while it is being read for perf right buffer. > > Ok; I think we could do with some comments as to this. Sure, will add a comment, something like this. diff --git a/arch/arm64/kernel/brbe.c b/arch/arm64/kernel/brbe.c index 17562d3fb33d..0d6e54e92ee2 100644 --- a/arch/arm64/kernel/brbe.c +++ b/arch/arm64/kernel/brbe.c @@ -480,6 +480,13 @@ static bool capture_branch_entry(struct pmu_hw_events *cpuc, return true; } +/* + * This gets called in PMU IRQ handler context. BRBE is configured (BRBCR_EL1.FZP) + * to be paused, right after a PMU IRQ. Regardless, before reading branch records, + * BRBE explicitly paused (BRBFCR_EL1.PAUSED) and also disabled for all applicable + * privilege levels (BRBCR_EL1.E0/E1BRBE) ensuring that no branch record could get + * in the BRBE buffer while it is being read. + */ void armv8pmu_branch_read(struct pmu_hw_events *cpuc, struct perf_event *event) { struct brbe_hw_attr *brbe_attr = (struct brbe_hw_attr *)cpuc->percpu_pmu->private; > >> >>> >>>> + >>>> /* >>>> * Perf event overflow will queue the processing of the event as >>>> * an irq_work which will be taken care of in the handling of >>>> @@ -976,6 +995,14 @@ static int armv8pmu_user_event_idx(struct perf_event *event) >>>> return event->hw.idx; >>>> } >>>> >>>> +static void armv8pmu_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in) >>>> +{ >>>> + struct arm_pmu *armpmu = to_arm_pmu(pmu_ctx->pmu); >>>> + >>>> + if (sched_in && arm_pmu_branch_stack_supported(armpmu)) >>>> + armv8pmu_branch_reset(); >>>> +} >>> >>> When scheduling out, shouldn't we save what we have so far? >>> >>> It seems odd that we just throw that away rather than placing it into a FIFO. >> >> IIRC we had discussed this earlier, save and restore mechanism will be added >> later, not during this enablement patch series. > > Sorry, but why? > > I don't understand why it's acceptable to non-deterministically throw away data > for now. At the least that's going to confuse users, especially as the > observable behaviour may change if and when that's added later. Each set of branch records captured in BRBE is part of broader statistical sample which goes into the perf ring buffer. So in absolute terms throwing away some branch records should not be a problem for the end result. Would the relative instances of possible task switches be more than all those PMU interrupts that can be generated between those switches ? I am not sure if that will create much loss of samples for the overall perf session. For implementation, we could follow x86 intel_pmu_lbr_sched_task(), which saves and restores branch records via perf_event_pmu_context->task_ctx_data with some more changes to CPU specific structure. But restoration involves writing back the saved branch records into the HW (via BRB INJ instructions in BRBE case) recreating the buffer state before task switch out happened. This save/restore mechanism will increase switch out and switch in latency for tasks being traced for branch stack samples. Just wondering are those potential lost branch samples worth the increase in task switch latency ? Should this save/restore be auto enabled for all tasks ? Should this be done part of the base enablement series itself ? > > I assume that there's some reason that it's painful to do that? Could you > please elaborate on that? > >> For now resetting the buffer ensures that branch records from one session >> does not get into another. > > I agree that it's necessary to do that, but as above I don't believe it's > sufficient. > >> Note that these branches cannot be pushed into perf ring buffer either, as >> there was no corresponding PMU interrupt to be associated with. > > I'm not suggesting we put it in the perf ring buffer; I'm suggesting that we > snapshot it into *some* kernel-internal storage, then later reconcile that. Only feasible reconciliation would be restore them into the BRBE HW buffer ensuring their continuity after the task starts back again on the CPU, and continue capturing records for perf ring buffer during future PMU interrupts. Saved branch records cannot be pushed into the perf ring buffer, along side the newer ones, because continuity of branch branch records leading upto the PMU interrupt will be broken. It might also happen that all restored branch records might just get overridden by newer branch records, before the next PMU interrupt, wasting the entire restoration process. > > Maybe that's far more painful than I expect? I could try and implement save/restore mechanism for BRBE as explained above.
On Mon, Feb 13, 2023 at 01:53:56PM +0530, Anshuman Khandual wrote: > > > On 2/9/23 01:06, Mark Rutland wrote: > > On Fri, Jan 13, 2023 at 10:41:51AM +0530, Anshuman Khandual wrote: > >> > >> > >> On 1/12/23 19:59, Mark Rutland wrote: > >>> On Thu, Jan 05, 2023 at 08:40:38AM +0530, Anshuman Khandual wrote: > >>>> @@ -878,6 +890,13 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu) > >>>> if (!armpmu_event_set_period(event)) > >>>> continue; > >>>> > >>>> + if (has_branch_stack(event)) { > >>>> + WARN_ON(!cpuc->branches); > >>>> + armv8pmu_branch_read(cpuc, event); > >>>> + data.br_stack = &cpuc->branches->branch_stack; > >>>> + data.sample_flags |= PERF_SAMPLE_BRANCH_STACK; > >>>> + } > >>> > >>> How do we ensure the data we're getting isn't changed under our feet? Is BRBE > >>> disabled at this point? > >> > >> Right, BRBE is paused after a PMU IRQ. We also ensure the buffer is disabled for > >> all exception levels, i.e removing BRBCR_EL1_E0BRE/E1BRE from the configuration, > >> before initiating the actual read, which eventually populates the data.br_stack. > > > > Ok; just to confirm, what exactly is the condition that enforces that BRBE is > > disabled? Is that *while* there's an overflow asserted, or does something else > > get set at the instant the overflow occurs? > > - BRBE can be disabled completely via BRBCR_EL1_E0BRE/E1BRE irrespective of PMU interrupt > - But with PMU interrupt, it just pauses if BRBCR_EL1_FZP is enabled IIUC the distinction between "disabled completely" and "just pauses" doesn't really matter to us, and a pause is sufficient for use to be able to read and manipulate the records. I also note that we always set BRBCR_EL1.FZP. Am I missing something? Thanks, Mark.
On 2/23/23 19:17, Mark Rutland wrote: > On Mon, Feb 13, 2023 at 01:53:56PM +0530, Anshuman Khandual wrote: >> >> >> On 2/9/23 01:06, Mark Rutland wrote: >>> On Fri, Jan 13, 2023 at 10:41:51AM +0530, Anshuman Khandual wrote: >>>> >>>> >>>> On 1/12/23 19:59, Mark Rutland wrote: >>>>> On Thu, Jan 05, 2023 at 08:40:38AM +0530, Anshuman Khandual wrote: >>>>>> @@ -878,6 +890,13 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu) >>>>>> if (!armpmu_event_set_period(event)) >>>>>> continue; >>>>>> >>>>>> + if (has_branch_stack(event)) { >>>>>> + WARN_ON(!cpuc->branches); >>>>>> + armv8pmu_branch_read(cpuc, event); >>>>>> + data.br_stack = &cpuc->branches->branch_stack; >>>>>> + data.sample_flags |= PERF_SAMPLE_BRANCH_STACK; >>>>>> + } >>>>> >>>>> How do we ensure the data we're getting isn't changed under our feet? Is BRBE >>>>> disabled at this point? >>>> >>>> Right, BRBE is paused after a PMU IRQ. We also ensure the buffer is disabled for >>>> all exception levels, i.e removing BRBCR_EL1_E0BRE/E1BRE from the configuration, >>>> before initiating the actual read, which eventually populates the data.br_stack. >>> >>> Ok; just to confirm, what exactly is the condition that enforces that BRBE is >>> disabled? Is that *while* there's an overflow asserted, or does something else >>> get set at the instant the overflow occurs? >> >> - BRBE can be disabled completely via BRBCR_EL1_E0BRE/E1BRE irrespective of PMU interrupt >> - But with PMU interrupt, it just pauses if BRBCR_EL1_FZP is enabled > > IIUC the distinction between "disabled completely" and "just pauses" doesn't > really matter to us, and a pause is sufficient for use to be able to read and > manipulate the records. As I learned from the HW folks earlier, but seems like we might have to revisit this understanding once again. 'Pause' state ensures that no new branch records could not get into the buffer which is necessary, but not sufficient enough condition before all the branch records could be processed in software. BRBE "disabled completely" via putting in prohibited region (implicitly during PMU interrupt while tracing user only sessions, explicitly while tracing user/kernel/hv sessions) is still necessary. > > I also note that we always set BRBCR_EL1.FZP. > > Am I missing something? We always set BRBCR_EL1.FZP, but during PMU interrupt while processing branch records, there are certain distinctions. user only traces: - Ensuring BRBFCR_EL1_PAUSED being set is not necessary - BRBE is already in prohibited region (IRQ handler in EL1) - Exception transition serve as a context synchronization event - Branch records can be read and processed right away - Return after clearing BRBFCR_EL1_PAUSED followed by BRB_IALL - isb() is not even necessary before returning - ERET from EL1 will ensure a context a synchronization event privilege traces: - Ensuring BRBFCR_EL1_PAUSED is necessary - Ensuring BRBE is in prohibited state - SW clears BRBCR_EL1_E1BR - isb() is required to ensure BRBE is prohibited state before reading - Return after clearing BRBFCR_EL1_PAUSED followed by BRB_IALL - isb() is required while returning from IRQ handler I had suggested differentiating user only sessions, because it can save multiple isb() instances and register write accesses which is not possible for privilege trace sessions. - Anshuman
diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h index 3eaf462f5752..a038902d6874 100644 --- a/arch/arm64/include/asm/perf_event.h +++ b/arch/arm64/include/asm/perf_event.h @@ -273,4 +273,14 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs); (regs)->pstate = PSR_MODE_EL1h; \ } +struct pmu_hw_events; +struct arm_pmu; +struct perf_event; + +static inline void armv8pmu_branch_read(struct pmu_hw_events *cpuc, struct perf_event *event) { } +static inline bool armv8pmu_branch_valid(struct perf_event *event) { return false; } +static inline void armv8pmu_branch_enable(struct perf_event *event) { } +static inline void armv8pmu_branch_disable(struct perf_event *event) { } +static inline void armv8pmu_branch_probe(struct arm_pmu *arm_pmu) { } +static inline void armv8pmu_branch_reset(void) { } #endif diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index a5193f2146a6..8805b4516088 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -789,10 +789,22 @@ static void armv8pmu_enable_event(struct perf_event *event) * Enable counter */ armv8pmu_enable_event_counter(event); + + /* + * Enable BRBE + */ + if (has_branch_stack(event)) + armv8pmu_branch_enable(event); } static void armv8pmu_disable_event(struct perf_event *event) { + /* + * Disable BRBE + */ + if (has_branch_stack(event)) + armv8pmu_branch_disable(event); + /* * Disable counter */ @@ -878,6 +890,13 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu) if (!armpmu_event_set_period(event)) continue; + if (has_branch_stack(event)) { + WARN_ON(!cpuc->branches); + armv8pmu_branch_read(cpuc, event); + data.br_stack = &cpuc->branches->branch_stack; + data.sample_flags |= PERF_SAMPLE_BRANCH_STACK; + } + /* * Perf event overflow will queue the processing of the event as * an irq_work which will be taken care of in the handling of @@ -976,6 +995,14 @@ static int armv8pmu_user_event_idx(struct perf_event *event) return event->hw.idx; } +static void armv8pmu_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in) +{ + struct arm_pmu *armpmu = to_arm_pmu(pmu_ctx->pmu); + + if (sched_in && arm_pmu_branch_stack_supported(armpmu)) + armv8pmu_branch_reset(); +} + /* * Add an event filter to a given event. */ @@ -1052,6 +1079,9 @@ static void armv8pmu_reset(void *info) pmcr |= ARMV8_PMU_PMCR_LP; armv8pmu_pmcr_write(pmcr); + + if (arm_pmu_branch_stack_supported(cpu_pmu)) + armv8pmu_branch_reset(); } static int __armv8_pmuv3_map_event(struct perf_event *event, @@ -1069,6 +1099,9 @@ static int __armv8_pmuv3_map_event(struct perf_event *event, &armv8_pmuv3_perf_cache_map, ARMV8_PMU_EVTYPE_EVENT); + if (has_branch_stack(event) && !armv8pmu_branch_valid(event)) + return -EOPNOTSUPP; + if (armv8pmu_event_is_64bit(event)) event->hw.flags |= ARMPMU_EVT_64BIT; @@ -1181,6 +1214,7 @@ static void __armv8pmu_probe_pmu(void *info) cpu_pmu->reg_pmmir = read_cpuid(PMMIR_EL1); else cpu_pmu->reg_pmmir = 0; + armv8pmu_branch_probe(cpu_pmu); } static int armv8pmu_probe_pmu(struct arm_pmu *cpu_pmu) @@ -1261,6 +1295,7 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char *name, cpu_pmu->filter = armv8pmu_filter; cpu_pmu->pmu.event_idx = armv8pmu_user_event_idx; + cpu_pmu->sched_task = armv8pmu_sched_task; cpu_pmu->name = name; cpu_pmu->map_event = map_event;