Message ID | 20230531040428.501523-9-anshuman.khandual@arm.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp2630395vqr; Tue, 30 May 2023 21:26:56 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5AnbrTo8A6qzIRUHu2GZ4A0YWz87rH9kGaGHwvEV1B9dz8edfsmdKBSt+hwzLB0e6kbiun X-Received: by 2002:a17:902:d2d0:b0:1ae:8fa:cd4c with SMTP id n16-20020a170902d2d000b001ae08facd4cmr14121020plc.7.1685507216192; Tue, 30 May 2023 21:26:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685507216; cv=none; d=google.com; s=arc-20160816; b=GIMGQbPfZeSK0VjJVlD1Wf6U2fPcTVCrvhpbfW8PQbCA5tRsPY9uy/3zYTW3RcmKgK rpIxkqgrsV+mgkcWeO05zTAsl9Rt6RaLqPinK8m12HcY89MBiG4PWzzn5vIx3/KlDw9+ 6X91l2L2CV399BAL8D9mvm4dkfctzC/tcAXEJq63HwwaRDVMMSinIeBn7m8wDSD6S5WB VM3YTtVJCNdmW1kVWRsMa6cFi9VHPNHbk2xMguo8ZIEh71ok4ToITjiJnwHnvRH8z2n0 rmdqxGPtcF7tE6Hi6eGkVOS1bo4tFc+RZo5HwwQ5Y9HxcgK/uUCN/6yYZJkCuN7uAklD +NPw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=c70OKjiv0Fn7XZ8+ZZ1HzDi9eCv1icTS3FLXrLUO2gc=; b=svfWWOS93udKH9RpwgTIgvbYTlap5gC9ORUOHbP+Qe323V/ET4dCDUnOQS5/DzHyFc 7lnOTODSfCyR3Qc7GGAdo+KYovlnPZVFYeAyvhxYhFN0DXVa+2sGvg6JpdtbIEKysKFP ZWZCSjQh1nP73DFhuihT88BcwMLFQXZ4580C5qIA4QIc8XiZFdp4Hi6WH1U2+Nu8yxT5 QdufwYIiMkK98E+bGswhdt67umv4zfvPRVd6orz2alf2+dP1YLsLNKadTNLBMgycKwQI lPDNtOGFJ12tQ4BJMO4u8BtWCPZ7U/lRqWrbaO6vlO7lB+29pvTHyoU3fet2icVVfZBf J0dw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h7-20020a170902680700b001b061af0599si195357plk.340.2023.05.30.21.26.44; Tue, 30 May 2023 21:26:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234039AbjEaEGt (ORCPT <rfc822;callmefire3@gmail.com> + 99 others); Wed, 31 May 2023 00:06:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55396 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234265AbjEaEGG (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 31 May 2023 00:06:06 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B0C94E6D; Tue, 30 May 2023 21:05:44 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5AE1615DB; Tue, 30 May 2023 21:06:29 -0700 (PDT) Received: from a077893.arm.com (unknown [10.163.73.163]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id EA8CF3F6C4; Tue, 30 May 2023 21:05:38 -0700 (PDT) From: Anshuman Khandual <anshuman.khandual@arm.com> To: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, will@kernel.org, catalin.marinas@arm.com, mark.rutland@arm.com Cc: Anshuman Khandual <anshuman.khandual@arm.com>, Mark Brown <broonie@kernel.org>, James Clark <james.clark@arm.com>, Rob Herring <robh@kernel.org>, Marc Zyngier <maz@kernel.org>, Suzuki Poulose <suzuki.poulose@arm.com>, Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>, Arnaldo Carvalho de Melo <acme@kernel.org>, linux-perf-users@vger.kernel.org Subject: [PATCH V11 08/10] arm64/perf: Add struct brbe_regset helper functions Date: Wed, 31 May 2023 09:34:26 +0530 Message-Id: <20230531040428.501523-9-anshuman.khandual@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230531040428.501523-1-anshuman.khandual@arm.com> References: <20230531040428.501523-1-anshuman.khandual@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767382414797224718?= X-GMAIL-MSGID: =?utf-8?q?1767382414797224718?= |
Series |
arm64/perf: Enable branch stack sampling
|
|
Commit Message
Anshuman Khandual
May 31, 2023, 4:04 a.m. UTC
The primary abstraction level for fetching branch records from BRBE HW has been changed as 'struct brbe_regset', which contains storage for all three BRBE registers i.e BRBSRC, BRBTGT, BRBINF. Whether branch record processing happens in the task sched out path, or in the PMU IRQ handling path, these registers need to be extracted from the HW. Afterwards both live and stored sets need to be stitched together to create final branch records set. This adds required helper functions for such operations. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Tested-by: James Clark <james.clark@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> --- drivers/perf/arm_brbe.c | 163 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 163 insertions(+)
Comments
On Tue, May 30, 2023 at 9:15 PM Anshuman Khandual <anshuman.khandual@arm.com> wrote: > > The primary abstraction level for fetching branch records from BRBE HW has > been changed as 'struct brbe_regset', which contains storage for all three > BRBE registers i.e BRBSRC, BRBTGT, BRBINF. Whether branch record processing > happens in the task sched out path, or in the PMU IRQ handling path, these > registers need to be extracted from the HW. Afterwards both live and stored > sets need to be stitched together to create final branch records set. This > adds required helper functions for such operations. > > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: Will Deacon <will@kernel.org> > Cc: Mark Rutland <mark.rutland@arm.com> > Cc: linux-arm-kernel@lists.infradead.org > Cc: linux-kernel@vger.kernel.org > Tested-by: James Clark <james.clark@arm.com> > Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> > --- [SNIP] > + > +static inline void copy_brbe_regset(struct brbe_regset *src, int src_idx, > + struct brbe_regset *dst, int dst_idx) > +{ > + dst[dst_idx].brbinf = src[src_idx].brbinf; > + dst[dst_idx].brbsrc = src[src_idx].brbsrc; > + dst[dst_idx].brbtgt = src[src_idx].brbtgt; > +} > + > +/* > + * This function concatenates branch records from stored and live buffer > + * up to maximum nr_max records and the stored buffer holds the resultant > + * buffer. The concatenated buffer contains all the branch records from > + * the live buffer but might contain some from stored buffer considering > + * the maximum combined length does not exceed 'nr_max'. > + * > + * Stored records Live records > + * ------------------------------------------------^ > + * | S0 | L0 | Newest | > + * --------------------------------- | > + * | S1 | L1 | | > + * --------------------------------- | > + * | S2 | L2 | | > + * --------------------------------- | > + * | S3 | L3 | | > + * --------------------------------- | > + * | S4 | L4 | nr_max > + * --------------------------------- | > + * | | L5 | | > + * --------------------------------- | > + * | | L6 | | > + * --------------------------------- | > + * | | L7 | | > + * --------------------------------- | > + * | | | | > + * --------------------------------- | > + * | | | Oldest | > + * ------------------------------------------------V > + * > + * > + * S0 is the newest in the stored records, where as L7 is the oldest in > + * the live reocords. Unless the live buffer is detetcted as being full > + * thus potentially dropping off some older records, L7 and S0 records > + * are contiguous in time for a user task context. The stitched buffer > + * here represents maximum possible branch records, contiguous in time. > + * > + * Stored records Live records > + * ------------------------------------------------^ > + * | L0 | L0 | Newest | > + * --------------------------------- | > + * | L0 | L1 | | > + * --------------------------------- | > + * | L2 | L2 | | > + * --------------------------------- | > + * | L3 | L3 | | > + * --------------------------------- | > + * | L4 | L4 | nr_max > + * --------------------------------- | > + * | L5 | L5 | | > + * --------------------------------- | > + * | L6 | L6 | | > + * --------------------------------- | > + * | L7 | L7 | | > + * --------------------------------- | > + * | S0 | | | > + * --------------------------------- | > + * | S1 | | Oldest | > + * ------------------------------------------------V > + * | S2 | <----| > + * ----------------- | > + * | S3 | <----| Dropped off after nr_max > + * ----------------- | > + * | S4 | <----| > + * ----------------- > + */ > +static int stitch_stored_live_entries(struct brbe_regset *stored, > + struct brbe_regset *live, > + int nr_stored, int nr_live, > + int nr_max) > +{ > + int nr_total, nr_excess, nr_last, i; > + > + nr_total = nr_stored + nr_live; > + nr_excess = nr_total - nr_max; > + > + /* Stored branch records in stitched buffer */ > + if (nr_live == nr_max) > + nr_stored = 0; > + else if (nr_excess > 0) > + nr_stored -= nr_excess; > + > + /* Stitched buffer branch records length */ > + if (nr_total > nr_max) > + nr_last = nr_max; > + else > + nr_last = nr_total; > + > + /* Move stored branch records */ > + for (i = 0; i < nr_stored; i++) > + copy_brbe_regset(stored, i, stored, nr_last - nr_stored - 1 + i); I'm afraid it can overwrite some entries if nr_live is small and nr_stored is big. Why not use memmove()? Also I think it'd be simpler if you copy store to live. It'll save copying live in the IRQ but it will copy the whole content to store again for the sched switch. Thanks, Namhyung > + > + /* Copy live branch records */ > + for (i = 0; i < nr_live; i++) > + copy_brbe_regset(live, i, stored, i); > + > + return nr_last; > +} > + > /* > * Generic perf branch filters supported on BRBE > * > -- > 2.25.1 >
On 6/2/23 08:10, Namhyung Kim wrote: > On Tue, May 30, 2023 at 9:15 PM Anshuman Khandual > <anshuman.khandual@arm.com> wrote: >> >> The primary abstraction level for fetching branch records from BRBE HW has >> been changed as 'struct brbe_regset', which contains storage for all three >> BRBE registers i.e BRBSRC, BRBTGT, BRBINF. Whether branch record processing >> happens in the task sched out path, or in the PMU IRQ handling path, these >> registers need to be extracted from the HW. Afterwards both live and stored >> sets need to be stitched together to create final branch records set. This >> adds required helper functions for such operations. >> >> Cc: Catalin Marinas <catalin.marinas@arm.com> >> Cc: Will Deacon <will@kernel.org> >> Cc: Mark Rutland <mark.rutland@arm.com> >> Cc: linux-arm-kernel@lists.infradead.org >> Cc: linux-kernel@vger.kernel.org >> Tested-by: James Clark <james.clark@arm.com> >> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> >> --- > > [SNIP] >> + >> +static inline void copy_brbe_regset(struct brbe_regset *src, int src_idx, >> + struct brbe_regset *dst, int dst_idx) >> +{ >> + dst[dst_idx].brbinf = src[src_idx].brbinf; >> + dst[dst_idx].brbsrc = src[src_idx].brbsrc; >> + dst[dst_idx].brbtgt = src[src_idx].brbtgt; >> +} >> + >> +/* >> + * This function concatenates branch records from stored and live buffer >> + * up to maximum nr_max records and the stored buffer holds the resultant >> + * buffer. The concatenated buffer contains all the branch records from >> + * the live buffer but might contain some from stored buffer considering >> + * the maximum combined length does not exceed 'nr_max'. >> + * >> + * Stored records Live records >> + * ------------------------------------------------^ >> + * | S0 | L0 | Newest | >> + * --------------------------------- | >> + * | S1 | L1 | | >> + * --------------------------------- | >> + * | S2 | L2 | | >> + * --------------------------------- | >> + * | S3 | L3 | | >> + * --------------------------------- | >> + * | S4 | L4 | nr_max >> + * --------------------------------- | >> + * | | L5 | | >> + * --------------------------------- | >> + * | | L6 | | >> + * --------------------------------- | >> + * | | L7 | | >> + * --------------------------------- | >> + * | | | | >> + * --------------------------------- | >> + * | | | Oldest | >> + * ------------------------------------------------V >> + * >> + * >> + * S0 is the newest in the stored records, where as L7 is the oldest in >> + * the live reocords. Unless the live buffer is detetcted as being full >> + * thus potentially dropping off some older records, L7 and S0 records >> + * are contiguous in time for a user task context. The stitched buffer >> + * here represents maximum possible branch records, contiguous in time. >> + * >> + * Stored records Live records >> + * ------------------------------------------------^ >> + * | L0 | L0 | Newest | >> + * --------------------------------- | >> + * | L0 | L1 | | >> + * --------------------------------- | >> + * | L2 | L2 | | >> + * --------------------------------- | >> + * | L3 | L3 | | >> + * --------------------------------- | >> + * | L4 | L4 | nr_max >> + * --------------------------------- | >> + * | L5 | L5 | | >> + * --------------------------------- | >> + * | L6 | L6 | | >> + * --------------------------------- | >> + * | L7 | L7 | | >> + * --------------------------------- | >> + * | S0 | | | >> + * --------------------------------- | >> + * | S1 | | Oldest | >> + * ------------------------------------------------V >> + * | S2 | <----| >> + * ----------------- | >> + * | S3 | <----| Dropped off after nr_max >> + * ----------------- | >> + * | S4 | <----| >> + * ----------------- >> + */ >> +static int stitch_stored_live_entries(struct brbe_regset *stored, >> + struct brbe_regset *live, >> + int nr_stored, int nr_live, >> + int nr_max) >> +{ >> + int nr_total, nr_excess, nr_last, i; >> + >> + nr_total = nr_stored + nr_live; >> + nr_excess = nr_total - nr_max; >> + >> + /* Stored branch records in stitched buffer */ >> + if (nr_live == nr_max) >> + nr_stored = 0; >> + else if (nr_excess > 0) >> + nr_stored -= nr_excess; >> + >> + /* Stitched buffer branch records length */ >> + if (nr_total > nr_max) >> + nr_last = nr_max; >> + else >> + nr_last = nr_total; >> + >> + /* Move stored branch records */ >> + for (i = 0; i < nr_stored; i++) >> + copy_brbe_regset(stored, i, stored, nr_last - nr_stored - 1 + i); > > I'm afraid it can overwrite some entries if nr_live is small > and nr_stored is big. Why not use memmove()? nr_stored is first adjusted with nr_excess if both live and stored entries combined exceed the maximum branch records in the HW. I am wondering how it can override ? > > Also I think it'd be simpler if you copy store to live. > It'll save copying live in the IRQ but it will copy the > whole content to store again for the sched switch. But how that is better than the current scheme ?
On Sun, Jun 4, 2023 at 8:15 PM Anshuman Khandual <anshuman.khandual@arm.com> wrote: > > > > On 6/2/23 08:10, Namhyung Kim wrote: > > On Tue, May 30, 2023 at 9:15 PM Anshuman Khandual > > <anshuman.khandual@arm.com> wrote: > >> > >> The primary abstraction level for fetching branch records from BRBE HW has > >> been changed as 'struct brbe_regset', which contains storage for all three > >> BRBE registers i.e BRBSRC, BRBTGT, BRBINF. Whether branch record processing > >> happens in the task sched out path, or in the PMU IRQ handling path, these > >> registers need to be extracted from the HW. Afterwards both live and stored > >> sets need to be stitched together to create final branch records set. This > >> adds required helper functions for such operations. > >> > >> Cc: Catalin Marinas <catalin.marinas@arm.com> > >> Cc: Will Deacon <will@kernel.org> > >> Cc: Mark Rutland <mark.rutland@arm.com> > >> Cc: linux-arm-kernel@lists.infradead.org > >> Cc: linux-kernel@vger.kernel.org > >> Tested-by: James Clark <james.clark@arm.com> > >> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> > >> --- > > > > [SNIP] > >> + > >> +static inline void copy_brbe_regset(struct brbe_regset *src, int src_idx, > >> + struct brbe_regset *dst, int dst_idx) > >> +{ > >> + dst[dst_idx].brbinf = src[src_idx].brbinf; > >> + dst[dst_idx].brbsrc = src[src_idx].brbsrc; > >> + dst[dst_idx].brbtgt = src[src_idx].brbtgt; > >> +} > >> + > >> +/* > >> + * This function concatenates branch records from stored and live buffer > >> + * up to maximum nr_max records and the stored buffer holds the resultant > >> + * buffer. The concatenated buffer contains all the branch records from > >> + * the live buffer but might contain some from stored buffer considering > >> + * the maximum combined length does not exceed 'nr_max'. > >> + * > >> + * Stored records Live records > >> + * ------------------------------------------------^ > >> + * | S0 | L0 | Newest | > >> + * --------------------------------- | > >> + * | S1 | L1 | | > >> + * --------------------------------- | > >> + * | S2 | L2 | | > >> + * --------------------------------- | > >> + * | S3 | L3 | | > >> + * --------------------------------- | > >> + * | S4 | L4 | nr_max > >> + * --------------------------------- | > >> + * | | L5 | | > >> + * --------------------------------- | > >> + * | | L6 | | > >> + * --------------------------------- | > >> + * | | L7 | | > >> + * --------------------------------- | > >> + * | | | | > >> + * --------------------------------- | > >> + * | | | Oldest | > >> + * ------------------------------------------------V > >> + * > >> + * > >> + * S0 is the newest in the stored records, where as L7 is the oldest in > >> + * the live reocords. Unless the live buffer is detetcted as being full > >> + * thus potentially dropping off some older records, L7 and S0 records > >> + * are contiguous in time for a user task context. The stitched buffer > >> + * here represents maximum possible branch records, contiguous in time. > >> + * > >> + * Stored records Live records > >> + * ------------------------------------------------^ > >> + * | L0 | L0 | Newest | > >> + * --------------------------------- | > >> + * | L0 | L1 | | > >> + * --------------------------------- | > >> + * | L2 | L2 | | > >> + * --------------------------------- | > >> + * | L3 | L3 | | > >> + * --------------------------------- | > >> + * | L4 | L4 | nr_max > >> + * --------------------------------- | > >> + * | L5 | L5 | | > >> + * --------------------------------- | > >> + * | L6 | L6 | | > >> + * --------------------------------- | > >> + * | L7 | L7 | | > >> + * --------------------------------- | > >> + * | S0 | | | > >> + * --------------------------------- | > >> + * | S1 | | Oldest | > >> + * ------------------------------------------------V > >> + * | S2 | <----| > >> + * ----------------- | > >> + * | S3 | <----| Dropped off after nr_max > >> + * ----------------- | > >> + * | S4 | <----| > >> + * ----------------- > >> + */ > >> +static int stitch_stored_live_entries(struct brbe_regset *stored, > >> + struct brbe_regset *live, > >> + int nr_stored, int nr_live, > >> + int nr_max) > >> +{ > >> + int nr_total, nr_excess, nr_last, i; > >> + > >> + nr_total = nr_stored + nr_live; > >> + nr_excess = nr_total - nr_max; > >> + > >> + /* Stored branch records in stitched buffer */ > >> + if (nr_live == nr_max) > >> + nr_stored = 0; > >> + else if (nr_excess > 0) > >> + nr_stored -= nr_excess; > >> + > >> + /* Stitched buffer branch records length */ > >> + if (nr_total > nr_max) > >> + nr_last = nr_max; > >> + else > >> + nr_last = nr_total; > >> + > >> + /* Move stored branch records */ > >> + for (i = 0; i < nr_stored; i++) > >> + copy_brbe_regset(stored, i, stored, nr_last - nr_stored - 1 + i); > > > > I'm afraid it can overwrite some entries if nr_live is small > > and nr_stored is big. Why not use memmove()? > > nr_stored is first adjusted with nr_excess if both live and stored entries combined > exceed the maximum branch records in the HW. I am wondering how it can override ? Say nr_stored = 40 and nr_live = 20, wouldn't it copy stored[0] to stored[20]? Then stored[20:39] will be lost. Also I'm not sure "-1" is correct. > > > > > Also I think it'd be simpler if you copy store to live. > > It'll save copying live in the IRQ but it will copy the > > whole content to store again for the sched switch. > > But how that is better than the current scheme ? I guess normally the live buffer is full, then it can skip the copy and use the buffer directly for IRQ, right? Thanks, Namhyung
On Wed, May 31, 2023 at 09:34:26AM +0530, Anshuman Khandual wrote: > The primary abstraction level for fetching branch records from BRBE HW has > been changed as 'struct brbe_regset', which contains storage for all three > BRBE registers i.e BRBSRC, BRBTGT, BRBINF. Whether branch record processing > happens in the task sched out path, or in the PMU IRQ handling path, these > registers need to be extracted from the HW. Afterwards both live and stored > sets need to be stitched together to create final branch records set. This > adds required helper functions for such operations. > > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: Will Deacon <will@kernel.org> > Cc: Mark Rutland <mark.rutland@arm.com> > Cc: linux-arm-kernel@lists.infradead.org > Cc: linux-kernel@vger.kernel.org > Tested-by: James Clark <james.clark@arm.com> > Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> > --- > drivers/perf/arm_brbe.c | 163 ++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 163 insertions(+) > > diff --git a/drivers/perf/arm_brbe.c b/drivers/perf/arm_brbe.c > index 484842d8cf3e..759db681d673 100644 > --- a/drivers/perf/arm_brbe.c > +++ b/drivers/perf/arm_brbe.c > @@ -44,6 +44,169 @@ static void select_brbe_bank(int bank) > isb(); > } > > +/* > + * This scans over BRBE register banks and captures individual branch reocrds > + * [BRBSRC, BRBTGT, BRBINF] into a pre-allocated 'struct brbe_regset' buffer, > + * until an invalid one gets encountered. The caller for this function needs > + * to ensure BRBE is an appropriate state before the records can be captured. > + */ > +static int capture_brbe_regset(struct brbe_hw_attr *brbe_attr, struct brbe_regset *buf) > +{ > + int loop1_idx1, loop1_idx2, loop2_idx1, loop2_idx2; > + int idx, count; > + > + loop1_idx1 = BRBE_BANK0_IDX_MIN; > + if (brbe_attr->brbe_nr <= BRBE_BANK_MAX_ENTRIES) { > + loop1_idx2 = brbe_attr->brbe_nr - 1; > + loop2_idx1 = BRBE_BANK1_IDX_MIN; > + loop2_idx2 = BRBE_BANK0_IDX_MAX; > + } else { > + loop1_idx2 = BRBE_BANK0_IDX_MAX; > + loop2_idx1 = BRBE_BANK1_IDX_MIN; > + loop2_idx2 = brbe_attr->brbe_nr - 1; > + } > + > + select_brbe_bank(BRBE_BANK_IDX_0); > + for (idx = 0, count = loop1_idx1; count <= loop1_idx2; idx++, count++) { > + buf[idx].brbinf = get_brbinf_reg(idx); > + /* > + * There are no valid entries anymore on the buffer. > + * Abort the branch record processing to save some > + * cycles and also reduce the capture/process load > + * for the user space as well. > + */ > + if (brbe_invalid(buf[idx].brbinf)) > + return idx; > + > + buf[idx].brbsrc = get_brbsrc_reg(idx); > + buf[idx].brbtgt = get_brbtgt_reg(idx); > + } > + > + select_brbe_bank(BRBE_BANK_IDX_1); > + for (count = loop2_idx1; count <= loop2_idx2; idx++, count++) { > + buf[idx].brbinf = get_brbinf_reg(idx); > + /* > + * There are no valid entries anymore on the buffer. > + * Abort the branch record processing to save some > + * cycles and also reduce the capture/process load > + * for the user space as well. > + */ > + if (brbe_invalid(buf[idx].brbinf)) > + return idx; > + > + buf[idx].brbsrc = get_brbsrc_reg(idx); > + buf[idx].brbtgt = get_brbtgt_reg(idx); > + } > + return idx; > +} As with __armv8pmu_branch_read(), the loop conditions are a bit hard to follow, and I believe that can be rewritten along the lines of the suggestion there. Looking at this, we now have a couple of places that will try to read the registers for an individual record, so it probably makes sense to facotr that into a helper, e.g. | static bool __read_brbe_regset(struct brbe_regset *entry, int idx) | { | u64 brbinf = get_brbinf_reg(idx); | | if (brbe_invalid(brbinf)) | return false; | | entry->brbinf = brbinf; | entry->brbsrc = get_brbsrc_reg(idx); | entry->brbtgt = get_brbtgt_reg(idx); | | return true; | } ... which can be used here, e.g. | /* | * Capture all records before the first invalid record, and return the number | * of records captured. | */ | static int capture_brbe_regset(struct brbe_hw_attr *brbe_attr, struct brbe_regset *buf) | { | | int nr_entries = brbe_attr->brbe_nr; | int idx = 0; | | select_brbe_bank(BRBE_BANK_IDX_0); | while (idx < nr_entries && IDX < BRBE_BANK0_IDX_MAX) { | if (__read_brbe_regset(&buf[idx], idx)) | return idx; | } | | select_brbe_bank(BRBE_BANK_IDX_1); | while (idx < nr_entries && IDX < BRBE_BANK1_IDX_MAX) { | if (__read_brbe_regset(&buf[idx], idx)) | return idx; | } | | return idx; | } ... and could be used to implement capture_branch_entry() in the patch before this. > +static inline void copy_brbe_regset(struct brbe_regset *src, int src_idx, > + struct brbe_regset *dst, int dst_idx) > +{ > + dst[dst_idx].brbinf = src[src_idx].brbinf; > + dst[dst_idx].brbsrc = src[src_idx].brbsrc; > + dst[dst_idx].brbtgt = src[src_idx].brbtgt; > +} C can do struct assignment, so this is the same as: | static inline void copy_brbe_regset(struct brbe_regset *src, int src_idx, | struct brbe_regset *dst, int dst_idx) | { | dst[dst_idx] = src[src_idx]; | } ... and given that, it would be simpler and clearer to have that directly in the caller, so I don't think we need this helper function. > +/* > + * This function concatenates branch records from stored and live buffer > + * up to maximum nr_max records and the stored buffer holds the resultant > + * buffer. The concatenated buffer contains all the branch records from > + * the live buffer but might contain some from stored buffer considering > + * the maximum combined length does not exceed 'nr_max'. > + * > + * Stored records Live records > + * ------------------------------------------------^ > + * | S0 | L0 | Newest | > + * --------------------------------- | > + * | S1 | L1 | | > + * --------------------------------- | > + * | S2 | L2 | | > + * --------------------------------- | > + * | S3 | L3 | | > + * --------------------------------- | > + * | S4 | L4 | nr_max > + * --------------------------------- | > + * | | L5 | | > + * --------------------------------- | > + * | | L6 | | > + * --------------------------------- | > + * | | L7 | | > + * --------------------------------- | > + * | | | | > + * --------------------------------- | > + * | | | Oldest | > + * ------------------------------------------------V > + * > + * > + * S0 is the newest in the stored records, where as L7 is the oldest in > + * the live reocords. Unless the live buffer is detetcted as being full > + * thus potentially dropping off some older records, L7 and S0 records > + * are contiguous in time for a user task context. The stitched buffer > + * here represents maximum possible branch records, contiguous in time. > + * > + * Stored records Live records > + * ------------------------------------------------^ > + * | L0 | L0 | Newest | > + * --------------------------------- | > + * | L0 | L1 | | > + * --------------------------------- | > + * | L2 | L2 | | > + * --------------------------------- | > + * | L3 | L3 | | > + * --------------------------------- | > + * | L4 | L4 | nr_max > + * --------------------------------- | > + * | L5 | L5 | | > + * --------------------------------- | > + * | L6 | L6 | | > + * --------------------------------- | > + * | L7 | L7 | | > + * --------------------------------- | > + * | S0 | | | > + * --------------------------------- | > + * | S1 | | Oldest | > + * ------------------------------------------------V > + * | S2 | <----| > + * ----------------- | > + * | S3 | <----| Dropped off after nr_max > + * ----------------- | > + * | S4 | <----| > + * ----------------- > + */ > +static int stitch_stored_live_entries(struct brbe_regset *stored, > + struct brbe_regset *live, > + int nr_stored, int nr_live, > + int nr_max) > +{ > + int nr_total, nr_excess, nr_last, i; > + > + nr_total = nr_stored + nr_live; > + nr_excess = nr_total - nr_max; > + > + /* Stored branch records in stitched buffer */ > + if (nr_live == nr_max) > + nr_stored = 0; > + else if (nr_excess > 0) > + nr_stored -= nr_excess; > + > + /* Stitched buffer branch records length */ > + if (nr_total > nr_max) > + nr_last = nr_max; > + else > + nr_last = nr_total; > + > + /* Move stored branch records */ > + for (i = 0; i < nr_stored; i++) > + copy_brbe_regset(stored, i, stored, nr_last - nr_stored - 1 + i); > + > + /* Copy live branch records */ > + for (i = 0; i < nr_live; i++) > + copy_brbe_regset(live, i, stored, i); > + > + return nr_last; > +} I think this can be written more simply as something like: static int stitch_stored_live_entries(struct brbe_regset *stored, struct brbe_regset *live, int nr_stored, int nr_live, int nr_max) { int nr_move = max(nr_stored, nr_max - nr_live); /* Move the tail of the buffer to make room for the new entries */ memmove(&stored[nr_live], &stored[0], nr_move * sizeof(*stored)); /* Copy the new entries into the head of the buffer */ memcpy(stored[0], &live[0], nr_live * sizeof(*stored)); /* Return the number of entries in the stitched buffer */ return min(nr_live + nr_stored, nr_max); } ... or if we could save this oldest-first, we could make it a circular buffer and avoid moving older entries. Thanks, Mark.
On 6/13/23 22:47, Mark Rutland wrote: > On Wed, May 31, 2023 at 09:34:26AM +0530, Anshuman Khandual wrote: >> The primary abstraction level for fetching branch records from BRBE HW has >> been changed as 'struct brbe_regset', which contains storage for all three >> BRBE registers i.e BRBSRC, BRBTGT, BRBINF. Whether branch record processing >> happens in the task sched out path, or in the PMU IRQ handling path, these >> registers need to be extracted from the HW. Afterwards both live and stored >> sets need to be stitched together to create final branch records set. This >> adds required helper functions for such operations. >> >> Cc: Catalin Marinas <catalin.marinas@arm.com> >> Cc: Will Deacon <will@kernel.org> >> Cc: Mark Rutland <mark.rutland@arm.com> >> Cc: linux-arm-kernel@lists.infradead.org >> Cc: linux-kernel@vger.kernel.org >> Tested-by: James Clark <james.clark@arm.com> >> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> >> --- >> drivers/perf/arm_brbe.c | 163 ++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 163 insertions(+) >> >> diff --git a/drivers/perf/arm_brbe.c b/drivers/perf/arm_brbe.c >> index 484842d8cf3e..759db681d673 100644 >> --- a/drivers/perf/arm_brbe.c >> +++ b/drivers/perf/arm_brbe.c >> @@ -44,6 +44,169 @@ static void select_brbe_bank(int bank) >> isb(); >> } >> >> +/* >> + * This scans over BRBE register banks and captures individual branch reocrds >> + * [BRBSRC, BRBTGT, BRBINF] into a pre-allocated 'struct brbe_regset' buffer, >> + * until an invalid one gets encountered. The caller for this function needs >> + * to ensure BRBE is an appropriate state before the records can be captured. >> + */ >> +static int capture_brbe_regset(struct brbe_hw_attr *brbe_attr, struct brbe_regset *buf) >> +{ >> + int loop1_idx1, loop1_idx2, loop2_idx1, loop2_idx2; >> + int idx, count; >> + >> + loop1_idx1 = BRBE_BANK0_IDX_MIN; >> + if (brbe_attr->brbe_nr <= BRBE_BANK_MAX_ENTRIES) { >> + loop1_idx2 = brbe_attr->brbe_nr - 1; >> + loop2_idx1 = BRBE_BANK1_IDX_MIN; >> + loop2_idx2 = BRBE_BANK0_IDX_MAX; >> + } else { >> + loop1_idx2 = BRBE_BANK0_IDX_MAX; >> + loop2_idx1 = BRBE_BANK1_IDX_MIN; >> + loop2_idx2 = brbe_attr->brbe_nr - 1; >> + } >> + >> + select_brbe_bank(BRBE_BANK_IDX_0); >> + for (idx = 0, count = loop1_idx1; count <= loop1_idx2; idx++, count++) { >> + buf[idx].brbinf = get_brbinf_reg(idx); >> + /* >> + * There are no valid entries anymore on the buffer. >> + * Abort the branch record processing to save some >> + * cycles and also reduce the capture/process load >> + * for the user space as well. >> + */ >> + if (brbe_invalid(buf[idx].brbinf)) >> + return idx; >> + >> + buf[idx].brbsrc = get_brbsrc_reg(idx); >> + buf[idx].brbtgt = get_brbtgt_reg(idx); >> + } >> + >> + select_brbe_bank(BRBE_BANK_IDX_1); >> + for (count = loop2_idx1; count <= loop2_idx2; idx++, count++) { >> + buf[idx].brbinf = get_brbinf_reg(idx); >> + /* >> + * There are no valid entries anymore on the buffer. >> + * Abort the branch record processing to save some >> + * cycles and also reduce the capture/process load >> + * for the user space as well. >> + */ >> + if (brbe_invalid(buf[idx].brbinf)) >> + return idx; >> + >> + buf[idx].brbsrc = get_brbsrc_reg(idx); >> + buf[idx].brbtgt = get_brbtgt_reg(idx); >> + } >> + return idx; >> +} > > As with __armv8pmu_branch_read(), the loop conditions are a bit hard to follow, > and I believe that can be rewritten along the lines of the suggestion there. I have changed both the places (in separate patches) with suggested loop structure. > > Looking at this, we now have a couple of places that will try to read the > registers for an individual record, so it probably makes sense to facotr that > into a helper, e.g. There are indeed two places inside capture_brbe_regset() - one for each bank. > > | static bool __read_brbe_regset(struct brbe_regset *entry, int idx) > | { > | u64 brbinf = get_brbinf_reg(idx); > | > | if (brbe_invalid(brbinf)) > | return false; > | > | entry->brbinf = brbinf; > | entry->brbsrc = get_brbsrc_reg(idx); > | entry->brbtgt = get_brbtgt_reg(idx); > | > | return true; > | } > > ... which can be used here, e.g. > > | /* > | * Capture all records before the first invalid record, and return the number > | * of records captured. > | */ > | static int capture_brbe_regset(struct brbe_hw_attr *brbe_attr, struct brbe_regset *buf) > | { > | > | int nr_entries = brbe_attr->brbe_nr; > | int idx = 0; > | > | select_brbe_bank(BRBE_BANK_IDX_0); > | while (idx < nr_entries && IDX < BRBE_BANK0_IDX_MAX) { > | if (__read_brbe_regset(&buf[idx], idx)) It should test !_read_brbe_regset(&buf[idx], idx)) instead as the error case returns false. > | return idx; > | } > | > | select_brbe_bank(BRBE_BANK_IDX_1); > | while (idx < nr_entries && IDX < BRBE_BANK1_IDX_MAX) { > | if (__read_brbe_regset(&buf[idx], idx)) > | return idx; Ditto. > | } > | > | return idx; > | } Will factor out a new helper __read_brbe_regset() from capture_brbe_regset(). > > ... and could be used to implement capture_branch_entry() in the patch before > this. > >> +static inline void copy_brbe_regset(struct brbe_regset *src, int src_idx, >> + struct brbe_regset *dst, int dst_idx) >> +{ >> + dst[dst_idx].brbinf = src[src_idx].brbinf; >> + dst[dst_idx].brbsrc = src[src_idx].brbsrc; >> + dst[dst_idx].brbtgt = src[src_idx].brbtgt; >> +} > > C can do struct assignment, so this is the same as: > > | static inline void copy_brbe_regset(struct brbe_regset *src, int src_idx, > | struct brbe_regset *dst, int dst_idx) > | { > | dst[dst_idx] = src[src_idx]; > | } Agreed. > > ... and given that, it would be simpler and clearer to have that directly in > the caller, so I don't think we need this helper function. Agreed, will drop copy_brbe_regset(). > >> +/* >> + * This function concatenates branch records from stored and live buffer >> + * up to maximum nr_max records and the stored buffer holds the resultant >> + * buffer. The concatenated buffer contains all the branch records from >> + * the live buffer but might contain some from stored buffer considering >> + * the maximum combined length does not exceed 'nr_max'. >> + * >> + * Stored records Live records >> + * ------------------------------------------------^ >> + * | S0 | L0 | Newest | >> + * --------------------------------- | >> + * | S1 | L1 | | >> + * --------------------------------- | >> + * | S2 | L2 | | >> + * --------------------------------- | >> + * | S3 | L3 | | >> + * --------------------------------- | >> + * | S4 | L4 | nr_max >> + * --------------------------------- | >> + * | | L5 | | >> + * --------------------------------- | >> + * | | L6 | | >> + * --------------------------------- | >> + * | | L7 | | >> + * --------------------------------- | >> + * | | | | >> + * --------------------------------- | >> + * | | | Oldest | >> + * ------------------------------------------------V >> + * >> + * >> + * S0 is the newest in the stored records, where as L7 is the oldest in >> + * the live reocords. Unless the live buffer is detetcted as being full Fixed these typos ^^^ ^^^ >> + * thus potentially dropping off some older records, L7 and S0 records >> + * are contiguous in time for a user task context. The stitched buffer >> + * here represents maximum possible branch records, contiguous in time. >> + * >> + * Stored records Live records >> + * ------------------------------------------------^ >> + * | L0 | L0 | Newest | >> + * --------------------------------- | >> + * | L0 | L1 | | >> + * --------------------------------- | >> + * | L2 | L2 | | >> + * --------------------------------- | >> + * | L3 | L3 | | >> + * --------------------------------- | >> + * | L4 | L4 | nr_max >> + * --------------------------------- | >> + * | L5 | L5 | | >> + * --------------------------------- | >> + * | L6 | L6 | | >> + * --------------------------------- | >> + * | L7 | L7 | | >> + * --------------------------------- | >> + * | S0 | | | >> + * --------------------------------- | >> + * | S1 | | Oldest | >> + * ------------------------------------------------V >> + * | S2 | <----| >> + * ----------------- | >> + * | S3 | <----| Dropped off after nr_max >> + * ----------------- | >> + * | S4 | <----| >> + * ----------------- >> + */ >> +static int stitch_stored_live_entries(struct brbe_regset *stored, >> + struct brbe_regset *live, >> + int nr_stored, int nr_live, >> + int nr_max) >> +{ >> + int nr_total, nr_excess, nr_last, i; >> + >> + nr_total = nr_stored + nr_live; >> + nr_excess = nr_total - nr_max; >> + >> + /* Stored branch records in stitched buffer */ >> + if (nr_live == nr_max) >> + nr_stored = 0; >> + else if (nr_excess > 0) >> + nr_stored -= nr_excess; >> + >> + /* Stitched buffer branch records length */ >> + if (nr_total > nr_max) >> + nr_last = nr_max; >> + else >> + nr_last = nr_total; >> + >> + /* Move stored branch records */ >> + for (i = 0; i < nr_stored; i++) >> + copy_brbe_regset(stored, i, stored, nr_last - nr_stored - 1 + i); >> + >> + /* Copy live branch records */ >> + for (i = 0; i < nr_live; i++) >> + copy_brbe_regset(live, i, stored, i); >> + >> + return nr_last; >> +} > > I think this can be written more simply as something like: > > static int stitch_stored_live_entries(struct brbe_regset *stored, > struct brbe_regset *live, > int nr_stored, int nr_live, > int nr_max) > { > int nr_move = max(nr_stored, nr_max - nr_live); Should this compare be min() instead ? As all nr_live entries need to be moved starting store[0], there will be (nr_max - nr_live) entries left for initial stored entries movement, irrespective of how many of stored entries are actually present. Hence (nr_max - nr_live) acts as a cap on nr_stored value for this initial movement. But if nr_stored is smaller than nr_max - nr_live, it gets picked up. > > /* Move the tail of the buffer to make room for the new entries */ > memmove(&stored[nr_live], &stored[0], nr_move * sizeof(*stored)); > > /* Copy the new entries into the head of the buffer */ > memcpy(stored[0], &live[0], nr_live * sizeof(*stored)); > > /* Return the number of entries in the stitched buffer */ > return min(nr_live + nr_stored, nr_max); > } Otherwise this makes sense and simpler, will rework. > > ... or if we could save this oldest-first, we could make it a circular buffer > and avoid moving older entries. Storing the youngest entries first is aligned with how perf branch stack sampling stores the entries in struct perf_sample_data which gets copied 'as is' from cpuc->branches->branch_stack. Hence, just keeping all these buffer in the same age order (youngest first in index 0) really makes sense. Although the above rework seems fine.
On Wed, Jun 14, 2023 at 10:44:38AM +0530, Anshuman Khandual wrote: > On 6/13/23 22:47, Mark Rutland wrote: > >> +/* > >> + * This scans over BRBE register banks and captures individual branch reocrds > >> + * [BRBSRC, BRBTGT, BRBINF] into a pre-allocated 'struct brbe_regset' buffer, > >> + * until an invalid one gets encountered. The caller for this function needs > >> + * to ensure BRBE is an appropriate state before the records can be captured. > >> + */ > >> +static int capture_brbe_regset(struct brbe_hw_attr *brbe_attr, struct brbe_regset *buf) > >> +{ > >> + int loop1_idx1, loop1_idx2, loop2_idx1, loop2_idx2; > >> + int idx, count; > >> + > >> + loop1_idx1 = BRBE_BANK0_IDX_MIN; > >> + if (brbe_attr->brbe_nr <= BRBE_BANK_MAX_ENTRIES) { > >> + loop1_idx2 = brbe_attr->brbe_nr - 1; > >> + loop2_idx1 = BRBE_BANK1_IDX_MIN; > >> + loop2_idx2 = BRBE_BANK0_IDX_MAX; > >> + } else { > >> + loop1_idx2 = BRBE_BANK0_IDX_MAX; > >> + loop2_idx1 = BRBE_BANK1_IDX_MIN; > >> + loop2_idx2 = brbe_attr->brbe_nr - 1; > >> + } > >> + > >> + select_brbe_bank(BRBE_BANK_IDX_0); > >> + for (idx = 0, count = loop1_idx1; count <= loop1_idx2; idx++, count++) { > >> + buf[idx].brbinf = get_brbinf_reg(idx); > >> + /* > >> + * There are no valid entries anymore on the buffer. > >> + * Abort the branch record processing to save some > >> + * cycles and also reduce the capture/process load > >> + * for the user space as well. > >> + */ > >> + if (brbe_invalid(buf[idx].brbinf)) > >> + return idx; > >> + > >> + buf[idx].brbsrc = get_brbsrc_reg(idx); > >> + buf[idx].brbtgt = get_brbtgt_reg(idx); > >> + } > >> + > >> + select_brbe_bank(BRBE_BANK_IDX_1); > >> + for (count = loop2_idx1; count <= loop2_idx2; idx++, count++) { > >> + buf[idx].brbinf = get_brbinf_reg(idx); > >> + /* > >> + * There are no valid entries anymore on the buffer. > >> + * Abort the branch record processing to save some > >> + * cycles and also reduce the capture/process load > >> + * for the user space as well. > >> + */ > >> + if (brbe_invalid(buf[idx].brbinf)) > >> + return idx; > >> + > >> + buf[idx].brbsrc = get_brbsrc_reg(idx); > >> + buf[idx].brbtgt = get_brbtgt_reg(idx); > >> + } > >> + return idx; > >> +} > > > > As with __armv8pmu_branch_read(), the loop conditions are a bit hard to follow, > > and I believe that can be rewritten along the lines of the suggestion there. > > I have changed both the places (in separate patches) with suggested loop structure. > > > > > Looking at this, we now have a couple of places that will try to read the > > registers for an individual record, so it probably makes sense to facotr that > > into a helper, e.g. > > There are indeed two places inside capture_brbe_regset() - one for each bank. > > > > > | static bool __read_brbe_regset(struct brbe_regset *entry, int idx) > > | { > > | u64 brbinf = get_brbinf_reg(idx); > > | > > | if (brbe_invalid(brbinf)) > > | return false; > > | > > | entry->brbinf = brbinf; > > | entry->brbsrc = get_brbsrc_reg(idx); > > | entry->brbtgt = get_brbtgt_reg(idx); > > | > > | return true; > > | } > > > > ... which can be used here, e.g. > > > > | /* > > | * Capture all records before the first invalid record, and return the number > > | * of records captured. > > | */ > > | static int capture_brbe_regset(struct brbe_hw_attr *brbe_attr, struct brbe_regset *buf) > > | { > > | > > | int nr_entries = brbe_attr->brbe_nr; > > | int idx = 0; > > | > > | select_brbe_bank(BRBE_BANK_IDX_0); > > | while (idx < nr_entries && IDX < BRBE_BANK0_IDX_MAX) { > > | if (__read_brbe_regset(&buf[idx], idx)) > > It should test !_read_brbe_regset(&buf[idx], idx)) instead as the error > case returns false. Yes, my bad. > >> +static int stitch_stored_live_entries(struct brbe_regset *stored, > >> + struct brbe_regset *live, > >> + int nr_stored, int nr_live, > >> + int nr_max) > >> +{ > >> + int nr_total, nr_excess, nr_last, i; > >> + > >> + nr_total = nr_stored + nr_live; > >> + nr_excess = nr_total - nr_max; > >> + > >> + /* Stored branch records in stitched buffer */ > >> + if (nr_live == nr_max) > >> + nr_stored = 0; > >> + else if (nr_excess > 0) > >> + nr_stored -= nr_excess; > >> + > >> + /* Stitched buffer branch records length */ > >> + if (nr_total > nr_max) > >> + nr_last = nr_max; > >> + else > >> + nr_last = nr_total; > >> + > >> + /* Move stored branch records */ > >> + for (i = 0; i < nr_stored; i++) > >> + copy_brbe_regset(stored, i, stored, nr_last - nr_stored - 1 + i); > >> + > >> + /* Copy live branch records */ > >> + for (i = 0; i < nr_live; i++) > >> + copy_brbe_regset(live, i, stored, i); > >> + > >> + return nr_last; > >> +} > > > > I think this can be written more simply as something like: > > > > static int stitch_stored_live_entries(struct brbe_regset *stored, > > struct brbe_regset *live, > > int nr_stored, int nr_live, > > int nr_max) > > { > > int nr_move = max(nr_stored, nr_max - nr_live); > > Should this compare be min() instead ? Yup, my bad again. That should be min(). > > /* Move the tail of the buffer to make room for the new entries */ > > memmove(&stored[nr_live], &stored[0], nr_move * sizeof(*stored)); > > > > /* Copy the new entries into the head of the buffer */ > > memcpy(stored[0], &live[0], nr_live * sizeof(*stored)); > > > > /* Return the number of entries in the stitched buffer */ > > return min(nr_live + nr_stored, nr_max); > > } > > Otherwise this makes sense and simpler, will rework. Great! Thanks, Mark.
diff --git a/drivers/perf/arm_brbe.c b/drivers/perf/arm_brbe.c index 484842d8cf3e..759db681d673 100644 --- a/drivers/perf/arm_brbe.c +++ b/drivers/perf/arm_brbe.c @@ -44,6 +44,169 @@ static void select_brbe_bank(int bank) isb(); } +/* + * This scans over BRBE register banks and captures individual branch reocrds + * [BRBSRC, BRBTGT, BRBINF] into a pre-allocated 'struct brbe_regset' buffer, + * until an invalid one gets encountered. The caller for this function needs + * to ensure BRBE is an appropriate state before the records can be captured. + */ +static int capture_brbe_regset(struct brbe_hw_attr *brbe_attr, struct brbe_regset *buf) +{ + int loop1_idx1, loop1_idx2, loop2_idx1, loop2_idx2; + int idx, count; + + loop1_idx1 = BRBE_BANK0_IDX_MIN; + if (brbe_attr->brbe_nr <= BRBE_BANK_MAX_ENTRIES) { + loop1_idx2 = brbe_attr->brbe_nr - 1; + loop2_idx1 = BRBE_BANK1_IDX_MIN; + loop2_idx2 = BRBE_BANK0_IDX_MAX; + } else { + loop1_idx2 = BRBE_BANK0_IDX_MAX; + loop2_idx1 = BRBE_BANK1_IDX_MIN; + loop2_idx2 = brbe_attr->brbe_nr - 1; + } + + select_brbe_bank(BRBE_BANK_IDX_0); + for (idx = 0, count = loop1_idx1; count <= loop1_idx2; idx++, count++) { + buf[idx].brbinf = get_brbinf_reg(idx); + /* + * There are no valid entries anymore on the buffer. + * Abort the branch record processing to save some + * cycles and also reduce the capture/process load + * for the user space as well. + */ + if (brbe_invalid(buf[idx].brbinf)) + return idx; + + buf[idx].brbsrc = get_brbsrc_reg(idx); + buf[idx].brbtgt = get_brbtgt_reg(idx); + } + + select_brbe_bank(BRBE_BANK_IDX_1); + for (count = loop2_idx1; count <= loop2_idx2; idx++, count++) { + buf[idx].brbinf = get_brbinf_reg(idx); + /* + * There are no valid entries anymore on the buffer. + * Abort the branch record processing to save some + * cycles and also reduce the capture/process load + * for the user space as well. + */ + if (brbe_invalid(buf[idx].brbinf)) + return idx; + + buf[idx].brbsrc = get_brbsrc_reg(idx); + buf[idx].brbtgt = get_brbtgt_reg(idx); + } + return idx; +} + +static inline void copy_brbe_regset(struct brbe_regset *src, int src_idx, + struct brbe_regset *dst, int dst_idx) +{ + dst[dst_idx].brbinf = src[src_idx].brbinf; + dst[dst_idx].brbsrc = src[src_idx].brbsrc; + dst[dst_idx].brbtgt = src[src_idx].brbtgt; +} + +/* + * This function concatenates branch records from stored and live buffer + * up to maximum nr_max records and the stored buffer holds the resultant + * buffer. The concatenated buffer contains all the branch records from + * the live buffer but might contain some from stored buffer considering + * the maximum combined length does not exceed 'nr_max'. + * + * Stored records Live records + * ------------------------------------------------^ + * | S0 | L0 | Newest | + * --------------------------------- | + * | S1 | L1 | | + * --------------------------------- | + * | S2 | L2 | | + * --------------------------------- | + * | S3 | L3 | | + * --------------------------------- | + * | S4 | L4 | nr_max + * --------------------------------- | + * | | L5 | | + * --------------------------------- | + * | | L6 | | + * --------------------------------- | + * | | L7 | | + * --------------------------------- | + * | | | | + * --------------------------------- | + * | | | Oldest | + * ------------------------------------------------V + * + * + * S0 is the newest in the stored records, where as L7 is the oldest in + * the live reocords. Unless the live buffer is detetcted as being full + * thus potentially dropping off some older records, L7 and S0 records + * are contiguous in time for a user task context. The stitched buffer + * here represents maximum possible branch records, contiguous in time. + * + * Stored records Live records + * ------------------------------------------------^ + * | L0 | L0 | Newest | + * --------------------------------- | + * | L0 | L1 | | + * --------------------------------- | + * | L2 | L2 | | + * --------------------------------- | + * | L3 | L3 | | + * --------------------------------- | + * | L4 | L4 | nr_max + * --------------------------------- | + * | L5 | L5 | | + * --------------------------------- | + * | L6 | L6 | | + * --------------------------------- | + * | L7 | L7 | | + * --------------------------------- | + * | S0 | | | + * --------------------------------- | + * | S1 | | Oldest | + * ------------------------------------------------V + * | S2 | <----| + * ----------------- | + * | S3 | <----| Dropped off after nr_max + * ----------------- | + * | S4 | <----| + * ----------------- + */ +static int stitch_stored_live_entries(struct brbe_regset *stored, + struct brbe_regset *live, + int nr_stored, int nr_live, + int nr_max) +{ + int nr_total, nr_excess, nr_last, i; + + nr_total = nr_stored + nr_live; + nr_excess = nr_total - nr_max; + + /* Stored branch records in stitched buffer */ + if (nr_live == nr_max) + nr_stored = 0; + else if (nr_excess > 0) + nr_stored -= nr_excess; + + /* Stitched buffer branch records length */ + if (nr_total > nr_max) + nr_last = nr_max; + else + nr_last = nr_total; + + /* Move stored branch records */ + for (i = 0; i < nr_stored; i++) + copy_brbe_regset(stored, i, stored, nr_last - nr_stored - 1 + i); + + /* Copy live branch records */ + for (i = 0; i < nr_live; i++) + copy_brbe_regset(live, i, stored, i); + + return nr_last; +} + /* * Generic perf branch filters supported on BRBE *