Message ID | 20230711082455.215983-1-anshuman.khandual@arm.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp330410vqm; Tue, 11 Jul 2023 01:43:30 -0700 (PDT) X-Google-Smtp-Source: APBJJlHeFDnE6Xw4HB8ZAdgSh+SyZI5BLrO4AJLpRCKO/mw8Vv233UFQCKtmYjrYT2I3qgb1xdO0 X-Received: by 2002:a05:6358:4408:b0:134:e422:c500 with SMTP id z8-20020a056358440800b00134e422c500mr12974049rwc.27.1689065010480; Tue, 11 Jul 2023 01:43:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689065010; cv=none; d=google.com; s=arc-20160816; b=mgOZQBs4CyfIALre5sxRzOGBMG8hwugF82l7u2uPm8fx6gun4j4uGyWPOec+cvJQ2Q 2YZuGvPl/zSPORQMYOVP1JrmSbtWFKRImFA48UnRDMRKc3xEbXMkD8TnOS3UD7+MFS9L 89OGU8TUobTeMk0v2DkyvLqCcSTgmflRB08A8tNkAMwFp+D0s8WmqCfduF9DWB+yaJLa owhjWd+0okfpPbacXLF4YJvc9AbEtgRy/QZzhxPyuQma1jETuEmR9lBQNFnbCISJryUA 3VFUHa/K+HuJoqu6JJeNI+09Td/B93oRFUj1mlaD4zzMlCDGUH3ZieV8XzfAv81+bord 38IA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=ceeAE4jw7n2RN7fNymkAHXaQdFOARZukm8hIvs7sJvQ=; fh=uKDf5R1eKBEpFB2sP5rwFdsisg1sKNqEQFf4Rrsmuig=; b=TFBbnRsrNC6h4aZKEwpy6B/HPeamifoOts+lQ/SrkU23S+5QZ0qhyXt07UTm/xpaIi ZeUFv6sbFkBQlxFKujKUJJ6aVaxjkDkc0EbhQ2kJtfCStOEwYYOvPHIY/0lRbj6Lwkbz 2X/mCI0D84PfnZU1MV0Ej2YE/JieFxDXPHRRwnNuM5h4+IBsmNdvfpXrdduGwkHpl2gm WPqecrh9XkFY1b7AVPWWGkpTHF/MsqwAffCz6nFFRI0IGEShn6NsPTzWAksIzHO2/FRa Xhay/sH0mv7JhVyeaXKhoTpLRivGBN9F5QigbEAqy2vdPrEnmVlVXnjY1v4fOU4PO85m 5l9g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m13-20020a65438d000000b0054fda551fecsi1060319pgp.426.2023.07.11.01.43.18; Tue, 11 Jul 2023 01:43:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231266AbjGKIZT (ORCPT <rfc822;gnulinuxfreebsd@gmail.com> + 99 others); Tue, 11 Jul 2023 04:25:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43822 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230455AbjGKIZQ (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 11 Jul 2023 04:25:16 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A800FE7E; Tue, 11 Jul 2023 01:25:10 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 26CF62B; Tue, 11 Jul 2023 01:25:52 -0700 (PDT) Received: from a077893.arm.com (unknown [10.163.47.87]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 4D2053F740; Tue, 11 Jul 2023 01:25:05 -0700 (PDT) From: Anshuman Khandual <anshuman.khandual@arm.com> To: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, will@kernel.org, catalin.marinas@arm.com, mark.rutland@arm.com Cc: Anshuman Khandual <anshuman.khandual@arm.com>, Mark Brown <broonie@kernel.org>, James Clark <james.clark@arm.com>, Rob Herring <robh@kernel.org>, Marc Zyngier <maz@kernel.org>, Suzuki Poulose <suzuki.poulose@arm.com>, Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>, Arnaldo Carvalho de Melo <acme@kernel.org>, linux-perf-users@vger.kernel.org Subject: [PATCH V13 - RESEND 00/10] arm64/perf: Enable branch stack sampling Date: Tue, 11 Jul 2023 13:54:45 +0530 Message-Id: <20230711082455.215983-1-anshuman.khandual@arm.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771113032562553573 X-GMAIL-MSGID: 1771113032562553573 |
Series |
arm64/perf: Enable branch stack sampling
|
|
Message
Anshuman Khandual
July 11, 2023, 8:24 a.m. UTC
This series enables perf branch stack sampling support on arm64 platform via a new arch feature called Branch Record Buffer Extension (BRBE). All relevant register definitions could be accessed here. https://developer.arm.com/documentation/ddi0601/2021-12/AArch64-Registers This series applies on 6.5-rc1. Changes in V13: https://lore.kernel.org/all/20230622065351.1092893-1-anshuman.khandual@arm.com/ - Added branch callback stubs for aarch32 pmuv3 based platforms - Updated the comments for capture_brbe_regset() - Deleted the comments in __read_brbe_regset() - Reversed the arguments order in capture_brbe_regset() and brbe_branch_save() - Fixed BRBE_BANK[0|1]_IDX_MAX indices comparison in armv8pmu_branch_read() - Fixed BRBE_BANK[0|1]_IDX_MAX indices comparison in capture_brbe_regset() Changes in V12: https://lore.kernel.org/all/20230615133239.442736-1-anshuman.khandual@arm.com/ - Replaced branch types with complete DIRECT/INDIRECT prefixes/suffixes - Replaced branch types with complete INSN/ALIGN prefixes/suffixes - Replaced return branch types as simple RET/ERET - Replaced time field GST_PHYSICAL as GUEST_PHYSICAL - Added 0 padding for BRBIDR0_EL1.NUMREC enum values - Dropped helper arm_pmu_branch_stack_supported() - Renamed armv8pmu_branch_valid() as armv8pmu_branch_attr_valid() - Separated perf_task_ctx_cache setup from arm_pmu private allocation - Collected changes to branch_records_alloc() in a single patch [5/10] - Reworked and cleaned up branch_records_alloc() - Reworked armv8pmu_branch_read() with new loop iterations in patch [6/10] - Reworked capture_brbe_regset() with new loop iterations in patch [8/10] - Updated the comment in branch_type_to_brbcr() - Fixed the comment before stitch_stored_live_entries() - Fixed BRBINFINJ_EL1 definition for VALID_FULL enum field - Factored out helper __read_brbe_regset() from capture_brbe_regset() - Dropped the helper copy_brbe_regset() - Simplified stitch_stored_live_entries() with memcpy(), memmove() - Reworked armv8pmu_probe_pmu() to bail out early with !probe.present - Rework brbe_attributes_probe() without 'struct brbe_hw_attr' - Dropped 'struct brbe_hw_attr' argument from capture_brbe_regset() - Dropped 'struct brbe_hw_attr' argument from brbe_branch_save() - Dropped arm_pmu->private and added arm_pmu->reg_trbidr instead Changes in V11: https://lore.kernel.org/all/20230531040428.501523-1-anshuman.khandual@arm.com/ - Fixed the crash for per-cpu events without event->pmu_ctx->task_ctx_data Changes in V10: https://lore.kernel.org/all/20230517022410.722287-1-anshuman.khandual@arm.com/ - Rebased the series on v6.4-rc2 - Moved ARMV8 PMUV3 changes inside drivers/perf/arm_pmuv3.c - Moved BRBE driver changes inside drivers/perf/arm_brbe.[c|h] - Moved the WARN_ON() inside the if condition in armv8pmu_handle_irq() Changes in V9: https://lore.kernel.org/all/20230315051444.1683170-1-anshuman.khandual@arm.com/ - Fixed build problem with has_branch_stack() in arm64 header - BRBINF_EL1 definition has been changed from 'Sysreg' to 'SysregFields' - Renamed all BRBINF_EL1 call sites as BRBINFx_EL1 - Dropped static const char branch_filter_error_msg[] - Implemented a positive list check for BRBE supported perf branch filters - Added a comment in armv8pmu_handle_irq() - Implemented per-cpu allocation for struct branch_record records - Skipped looping through bank 1 if an invalid record is detected in bank 0 - Added comment in armv8pmu_branch_read() explaining prohibited region etc - Added comment warning about erroneously marking transactions as aborted - Replaced the first argument (perf_branch_entry) in capture_brbe_flags() - Dropped the last argument (idx) in capture_brbe_flags() - Dropped the brbcr argument from capture_brbe_flags() - Used perf_sample_save_brstack() to capture branch records for perf_sample_data - Added comment explaining rationale for setting BRBCR_EL1_FZP for user only traces - Dropped BRBE prohibited state mechanism while in armv8pmu_branch_read() - Implemented event task context based branch records save mechanism Changes in V8: https://lore.kernel.org/all/20230123125956.1350336-1-anshuman.khandual@arm.com/ - Replaced arm_pmu->features as arm_pmu->has_branch_stack, updated its helper - Added a comment and line break before arm_pmu->private element - Added WARN_ON_ONCE() in helpers i.e armv8pmu_branch_[read|valid|enable|disable]() - Dropped comments in armv8pmu_enable_event() and armv8pmu_disable_event() - Replaced open bank encoding in BRBFCR_EL1 with SYS_FIELD_PREP() - Changed brbe_hw_attr->brbe_version from 'bool' to 'int' - Updated pr_warn() as pr_warn_once() with values in brbe_get_perf_[type|priv]() - Replaced all pr_warn_once() as pr_debug_once() in armv8pmu_branch_valid() - Added a comment in branch_type_to_brbcr() for the BRBCR_EL1 privilege settings - Modified the comment related to BRBINFx_EL1.LASTFAILED in capture_brbe_flags() - Modified brbe_get_perf_entry_type() as brbe_set_perf_entry_type() - Renamed brbe_valid() as brbe_record_is_complete() - Renamed brbe_source() as brbe_record_is_source_only() - Renamed brbe_target() as brbe_record_is_target_only() - Inverted checks for !brbe_record_is_[target|source]_only() for info capture - Replaced 'fetch' with 'get' in all helpers that extract field value - Dropped 'static int brbe_current_bank' optimization in select_brbe_bank() - Dropped select_brbe_bank_index() completely, added capture_branch_entry() - Process captured branch entries in two separate loops one for each BRBE bank - Moved branch_records_alloc() inside armv8pmu_probe_pmu() - Added a forward declaration for the helper has_branch_stack() - Added new callbacks armv8pmu_private_alloc() and armv8pmu_private_free() - Updated armv8pmu_probe_pmu() to allocate the private structure before SMP call Changes in V7: https://lore.kernel.org/all/20230105031039.207972-1-anshuman.khandual@arm.com/ - Folded [PATCH 7/7] into [PATCH 3/7] which enables branch stack sampling event - Defined BRBFCR_EL1_BRANCH_FILTERS, BRBCR_EL1_DEFAULT_CONFIG in the header - Defined BRBFCR_EL1_DEFAULT_CONFIG in the header - Updated BRBCR_EL1_DEFAULT_CONFIG with BRBCR_EL1_FZP - Defined BRBCR_EL1_DEFAULT_TS in the header - Updated BRBCR_EL1_DEFAULT_CONFIG with BRBCR_EL1_DEFAULT_TS - Moved BRBCR_EL1_DEFAULT_CONFIG check inside branch_type_to_brbcr() - Moved down BRBCR_EL1_CC, BRBCR_EL1_MPRED later in branch_type_to_brbcr() - Also set BRBE in paused state in armv8pmu_branch_disable() - Dropped brbe_paused(), set_brbe_paused() helpers - Extracted error string via branch_filter_error_msg[] for armv8pmu_branch_valid() - Replaced brbe_v1p1 with brbe_version in struct brbe_hw_attr - Added valid_brbe_[cc, format, version]() helpers - Split a separate brbe_attributes_probe() from armv8pmu_branch_probe() - Capture event->attr.branch_sample_type earlier in armv8pmu_branch_valid() - Defined enum brbe_bank_idx with possible values for BRBE bank indices - Changed armpmu->hw_attr into armpmu->private - Added missing space in stub definition for armv8pmu_branch_valid() - Replaced both kmalloc() with kzalloc() - Added BRBE_BANK_MAX_ENTRIES - Updated comment for capture_brbe_flags() - Updated comment for struct brbe_hw_attr - Dropped space after type cast in couple of places - Replaced inverse with negation for testing BRBCR_EL1_FZP in armv8pmu_branch_read() - Captured cpuc->branches->branch_entries[idx] in a local variable - Dropped saved_priv from armv8pmu_branch_read() - Reorganize PERF_SAMPLE_BRANCH_NO_[CYCLES|NO_FLAGS] related configuration - Replaced with FIELD_GET() and FIELD_PREP() wherever applicable - Replaced BRBCR_EL1_TS_PHYSICAL with BRBCR_EL1_TS_VIRTUAL - Moved valid_brbe_nr(), valid_brbe_cc(), valid_brbe_format(), valid_brbe_version() select_brbe_bank(), select_brbe_bank_index() helpers inside the C implementation - Reorganized brbe_valid_nr() and dropped the pr_warn() message - Changed probe sequence in brbe_attributes_probe() - Added 'brbcr' argument into capture_brbe_flags() to ascertain correct state - Disable BRBE before disabling the PMU event counter - Enable PERF_SAMPLE_BRANCH_HV filters when is_kernel_in_hyp_mode() - Guard armv8pmu_reset() & armv8pmu_sched_task() with arm_pmu_branch_stack_supported() Changes in V6: https://lore.kernel.org/linux-arm-kernel/20221208084402.863310-1-anshuman.khandual@arm.com/ - Restore the exception level privilege after reading the branch records - Unpause the buffer after reading the branch records - Decouple BRBCR_EL1_EXCEPTION/ERTN from perf event privilege level - Reworked BRBE implementation and branch stack sampling support on arm pmu - BRBE implementation is now part of overall ARMV8 PMU implementation - BRBE implementation moved from drivers/perf/ to inside arch/arm64/kernel/ - CONFIG_ARM_BRBE_PMU renamed as CONFIG_ARM64_BRBE in arch/arm64/Kconfig - File moved - drivers/perf/arm_pmu_brbe.c -> arch/arm64/kernel/brbe.c - File moved - drivers/perf/arm_pmu_brbe.h -> arch/arm64/kernel/brbe.h - BRBE name has been dropped from struct arm_pmu and struct hw_pmu_events - BRBE name has been abstracted out as 'branches' in arm_pmu and hw_pmu_events - BRBE name has been abstracted out as 'branches' in ARMV8 PMU implementation - Added sched_task() callback into struct arm_pmu - Added 'hw_attr' into struct arm_pmu encapsulating possible PMU HW attributes - Dropped explicit attributes brbe_(v1p1, nr, cc, format) from struct arm_pmu - Dropped brbfcr, brbcr, registers scratch area from struct hw_pmu_events - Dropped brbe_users, brbe_context tracking in struct hw_pmu_events - Added 'features' tracking into struct arm_pmu with ARM_PMU_BRANCH_STACK flag - armpmu->hw_attr maps into 'struct brbe_hw_attr' inside BRBE implementation - Set ARM_PMU_BRANCH_STACK in 'arm_pmu->features' after successful BRBE probe - Added armv8pmu_branch_reset() inside armv8pmu_branch_enable() - Dropped brbe_supported() as events will be rejected via ARM_PMU_BRANCH_STACK - Dropped set_brbe_disabled() as well - Reformated armv8pmu_branch_valid() warnings while rejecting unsupported events Changes in V5: https://lore.kernel.org/linux-arm-kernel/20221107062514.2851047-1-anshuman.khandual@arm.com/ - Changed BRBCR_EL1.VIRTUAL from 0b1 to 0b01 - Changed BRBFCR_EL1.EnL into BRBFCR_EL1.EnI - Changed config ARM_BRBE_PMU from 'tristate' to 'bool' Changes in V4: https://lore.kernel.org/all/20221017055713.451092-1-anshuman.khandual@arm.com/ - Changed ../tools/sysreg declarations as suggested - Set PERF_SAMPLE_BRANCH_STACK in data.sample_flags - Dropped perfmon_capable() check in armpmu_event_init() - s/pr_warn_once/pr_info in armpmu_event_init() - Added brbe_format element into struct pmu_hw_events - Changed v1p1 as brbe_v1p1 in struct pmu_hw_events - Dropped pr_info() from arm64_pmu_brbe_probe(), solved LOCKDEP warning Changes in V3: https://lore.kernel.org/all/20220929075857.158358-1-anshuman.khandual@arm.com/ - Moved brbe_stack from the stack and now dynamically allocated - Return PERF_BR_PRIV_UNKNOWN instead of -1 in brbe_fetch_perf_priv() - Moved BRBIDR0, BRBCR, BRBFCR registers and fields into tools/sysreg - Created dummy BRBINF_EL1 field definitions in tools/sysreg - Dropped ARMPMU_EVT_PRIV framework which cached perfmon_capable() - Both exception and exception return branche records are now captured only if the event has PERF_SAMPLE_BRANCH_KERNEL which would already been checked in generic perf via perf_allow_kernel() Changes in V2: https://lore.kernel.org/all/20220908051046.465307-1-anshuman.khandual@arm.com/ - Dropped branch sample filter helpers consolidation patch from this series - Added new hw_perf_event.flags element ARMPMU_EVT_PRIV to cache perfmon_capable() - Use cached perfmon_capable() while configuring BRBE branch record filters Changes in V1: https://lore.kernel.org/linux-arm-kernel/20220613100119.684673-1-anshuman.khandual@arm.com/ - Added CONFIG_PERF_EVENTS wrapper for all branch sample filter helpers - Process new perf branch types via PERF_BR_EXTEND_ABI Changes in RFC V2: https://lore.kernel.org/linux-arm-kernel/20220412115455.293119-1-anshuman.khandual@arm.com/ - Added branch_sample_priv() while consolidating other branch sample filter helpers - Changed all SYS_BRBXXXN_EL1 register definition encodings per Marc - Changed the BRBE driver as per proposed BRBE related perf ABI changes (V5) - Added documentation for struct arm_pmu changes, updated commit message - Updated commit message for BRBE detection infrastructure patch - PERF_SAMPLE_BRANCH_KERNEL gets checked during arm event init (outside the driver) - Branch privilege state capture mechanism has now moved inside the driver Changes in RFC V1: https://lore.kernel.org/all/1642998653-21377-1-git-send-email-anshuman.khandual@arm.com/ Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Rob Herring <robh@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki Poulose <suzuki.poulose@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org Anshuman Khandual (10): drivers: perf: arm_pmu: Add new sched_task() callback arm64/perf: Add BRBE registers and fields arm64/perf: Add branch stack support in struct arm_pmu arm64/perf: Add branch stack support in struct pmu_hw_events arm64/perf: Add branch stack support in ARMV8 PMU arm64/perf: Enable branch stack events via FEAT_BRBE arm64/perf: Add PERF_ATTACH_TASK_DATA to events with has_branch_stack() arm64/perf: Add struct brbe_regset helper functions arm64/perf: Implement branch records save on task sched out arm64/perf: Implement branch records save on PMU IRQ arch/arm/include/asm/arm_pmuv3.h | 12 + arch/arm64/include/asm/perf_event.h | 46 ++ arch/arm64/include/asm/sysreg.h | 103 ++++ arch/arm64/tools/sysreg | 158 ++++++ drivers/perf/Kconfig | 11 + drivers/perf/Makefile | 1 + drivers/perf/arm_brbe.c | 716 ++++++++++++++++++++++++++++ drivers/perf/arm_brbe.h | 270 +++++++++++ drivers/perf/arm_pmu.c | 12 +- drivers/perf/arm_pmuv3.c | 110 ++++- include/linux/perf/arm_pmu.h | 19 +- 11 files changed, 1431 insertions(+), 27 deletions(-) create mode 100644 drivers/perf/arm_brbe.c create mode 100644 drivers/perf/arm_brbe.h
Comments
Hi Anshuman, On Tue, Jul 11, 2023 at 01:54:45PM +0530, Anshuman Khandual wrote: > This series enables perf branch stack sampling support on arm64 platform > via a new arch feature called Branch Record Buffer Extension (BRBE). All > relevant register definitions could be accessed here. > > https://developer.arm.com/documentation/ddi0601/2021-12/AArch64-Registers > > This series applies on 6.5-rc1. > > Changes in V13: I had a go at reviewing this series and, aside from the macro issue I've already pointed out, I really struggled with the way that you've put the series together: - You incrementally introduce dead code, forcing the reviewer to keep previous patches in their head awaiting for a caller to come along later. Example: Patch 4 literally just adds a new struct to the kernel. - You change arch/arm/, where this driver shouldn't even be _compiled_ despite adding CONFIG_ARM64_BRBE. Example: Patch 5 adds some BRBE stubs to arch/arm/include/asm/arm_pmuv3.h - You undo/rework code that was introduced earlier in the series Example: armv8pmu_branch_read() is introduced as a useless stub in patch 5, rewritten in patch 6 and then rewritten again in patch 10. Why should I waste time reviewing three versions of this function? - You make unrelated cosmetic changes to the existing code inside patches adding new features. Example: Patch 5 randomly removes some comments from the existing code. - The commit messages are, at best, useless and err more on the side of nonsensical. Example: Look at patch 3: | This updates 'struct arm_pmu' for branch stack sampling support being added | later. This adds an element 'reg_trbidr' to capture BRBE attribute details. | These updates here will help in tracking any branch stack sampling support. | | This also enables perf branch stack sampling event on all 'struct arm pmu', | supporting the feature but after removing the current gate that blocks such | events unconditionally in armpmu_event_init(). Instead a quick probe can be | initiated via arm_pmu->has_branch_stack to ascertain the support. If I remove everything that isn't just describing the code, I'm left with: - 'reg_trbidr' captures BRBE attribute details - These updates here will help in tracking any branch stack sampling support. - perf branch stack sampling event is now enabled when it is supported - Probing is quick But crucial information is missing: * What is BRBE? * What is a BRBE attribute? * How are the details of an attribute captured? * How do these "updates" (which ones?) help in tracking branch stack sampling? * What is being tracked and why? * How quick is the probing and why do we care? * What is the perf branch stack sampling event and what does it mean to enable it? Does it offer something useful to the user? * Why do we want any of this? (these examples are not intended to be an exhaustive list of things that need fixing) Overall, this makes the code needlessly difficult to review. However, I don't reckon it's too much effort on your side to fix the things above. You've been doing this for long enough (on the author and reviewer side) that I hope you see what I'm getting at. If not, try reviewing your own patches right before you hit 'git send-email'; I pretty much always find a problem with my own code that way. So, please, can you post a v14 which: 1. Fixes the broken register access macros 2. Adds some meaningful tests at the end of the series 3. Squashes the new driver code (i.e. at least everything in arm_brbe.c and possibly just everything under drivers/perf/) down into a single patch 4. Does any _necessary_ cleanup or refactoring at the start of the series, leaving out cosmetic stuff for now 5. Rewrites the commit messages following the guidelines in submitting-patches.rst. You don't need to talk about specific C expressions; we have the code for that already and if it's doing something subtle then you can add a comment. 6. Resolves the open CYCLES_COUNT issue from Yang and Suzuki Cheers, Will
On Fri, Aug 18, 2023 at 08:42:56AM +0530, Anshuman Khandual wrote: > On 7/31/23 18:35, Will Deacon wrote: > > I had a go at reviewing this series and, aside from the macro issue I've > > already pointed out, I really struggled with the way that you've put the > > series together: > > > > - You incrementally introduce dead code, forcing the reviewer to keep > > previous patches in their head awaiting for a caller to come along > > later. > > My apologies for the delayed response on this particular thread. It's ok, I'm in no rush to merge it. > Another way of looking at this would be to prepare existing infrastructure > ready for an upcoming implementation which requires them later. This helps > avoid doing all the changes together in a single code block/patch, making > the entire function addition a giant monolith ? I'm not after another way of looking at it, I'm telling you that the way you have structured this is needlessly difficult to review. Yes, we often split large patches into incremental changes, but that doesn't mean that breaking a driver up into pieces of dead code and dummy type definitions is useful. Look at how I structured the original SPE driver for example: https://lore.kernel.org/all/1507811438-2267-1-git-send-email-will.deacon@arm.com/ The driver code itself lives entirely in patch 7, with the earlier patches adding self-contained core functionality on which the driver depends. I didn't add an empty 'struct arm_spe_pmu' did I? This isn't rocket science. > Don't we allow patch series to build on functions and structures, which are > dead at the beginning, before enabling the entire function with a new config > selectable right at the end ? The only difference here is, these are changes > to various structures without their corresponding utilization some where in > the patch but the real question here would be - is this way of organizing the > patch series not allowed as per the kernel coding standards or guidelines ? > > Regardless, if you prefer the suggested style for the series, I could switch > to it. Please don't imply that I'm being difficult with an unusual reviewer preference for how this should be structured. Ask any of your colleagues in the Arm kernel team and I suspect you'll hear similar concerns to mine. As I said before, try putting yourself in the shoes of the reviewer and you'll soon see how needlessly difficult you're making things. > > Example: Patch 4 literally just adds a new struct to the kernel. > > Which again follows the same principle as explained above, and Mark had acked > that patch earlier. I had assumed that validated the principle about prior > infrastructure building based patch series organization. I don't think an ack validates any principles about anything. > > - You change arch/arm/, where this driver shouldn't even be _compiled_ > > despite adding CONFIG_ARM64_BRBE. > > These stubs are necessary to protect PMU drivers built on arm32 which share > basic branch record processing abstraction at ARMV8 PMU level. It compiles > successfully both on arm32 and arm64 platforms with these changes. Subject > line for this patch clearly mentions that as well. But they shouldn't be needed. When CONFIG_ARM64_BRBE is not selected, no branch record processing functions should be needed, empty or otherwise. The callers should not exist in the first place (i.e. the empty definitions should live in the core driver code, not in the arch header, or the calling code should not be compiled at all). > arm64/perf: Add branch stack support in ARMV8 PMU > > This is adding abstraction for branch stack support, not the implementation. > This actual implementation will come in the driver and gets enabled via the > new config CONFIG_ARM64_BRBE. > > > > > Example: Patch 5 adds some BRBE stubs to > > arch/arm/include/asm/arm_pmuv3.h > > Those are not BRBE stubs. Oh, come off it. You literally have a useless comment: /* BRBE stubs */ which is apparently even more useless than I thought. Now, can we stop wasting time arguing about which level of abstraction you think these are at and start focussing on removing them, please? > Instead those are branch records processing stubs > which is a higher abstraction layer at ARMV8 PMU level, before getting down > into arm64 specific branch records implementation via FEAT_BRBE enabled in > a subsequent driver patch. > > armv8_pmu driver is shared for arm32 and arm64. The patch is adding generic > branch record stubs for both architectures, ensuring the armv8_pmu continues > to be built successfully, and finally adds the arm64 implementation with > FEAT_BRBE in the following patches. > > > > > - You undo/rework code that was introduced earlier in the series > > > > Example: armv8pmu_branch_read() is introduced as a useless stub in > > patch 5, rewritten in patch 6 and then rewritten again in > > patch 10. Why should I waste time reviewing three versions > > of this function? > > [PATCH 5/10] adds callback for the branch records processing abstraction at > ARMV8 PMU level, without an implementation any where. The ARMV8 PMUV3 based > arm64 real implementation gets added later on via the BRBE driver, enabled > with new CONFIG_ARM64_BRBE. > > armv8pmu_branch_read() is present at all these places. > > 1. arch/arm/include/asm/arm_pmuv3.h: Stub to protect the arm32 build > 2. arch/arm64/include/asm/perf_event.h: Default stub without CONFIG_ARM64_BRBE > 3. arch/arm64/include/asm/perf_event.h: Declaration with CONFIG_ARM64_BRBE > 4. drivers/perf/arm_brbe.c: Definition with CONFIG_ARM64_BRBE > 5. drivers/perf/arm_pmuv3.c: Actual call site during PMU IRQ > > armv8pmu_branch_read() > > - Gets added in [PATCH 5] as branch records stub without an implementation > - Gets defined in [PATCH 6] in the BRBE driver (via CONFIG_ARM64_BRBE) > - Gets re-defined in [PATCH 10] where the new stitching gets implemented > > Patch 6 added basic BRBE functionality. But patch 10 adds the necessary logic > to stitch cached records to provide the maximum records possible without loss > of continuity. I kept it separate to keep the complexity away. But I understand > we could fold it in to single patch 6. I don't know what you're trying to say here, but you seem to be reiterating my point that you're adding armv8pmu_branch_read() three times during the patch series. Cutting to the main part of your email: > > So, please, can you post a v14 which: > > > > 1. Fixes the broken register access macros> > > Already fixed. Thank you, and what a horrifying assembler/compiler divergence that turned out to be! > > > 2. Adds some meaningful tests at the end of the series > > We already have some branch stack sampling tests for perf subsystem. > > tools/perf/tests/shell/test_brstack.sh > > James Clark is also trying to get some BRBE specific tests added there. > > https://gitlab.arm.com/linux-arm/linux-jc/-/commit/3a7ddce70c2daadb63fcc511de0a89055ca48b32 > > Are you looking for some thing more ? Any particular suggestions ? That looks great, but afaict it's 8 months old and hasn't been posted to the mailing list. Did I miss it? Why isn't it part of this series? > > 3. Squashes the new driver code (i.e. at least everything in > > arm_brbe.c and possibly just everything under drivers/perf/) down > > into a single patch > > Please find the code change stats in drivers/perf/ directory for your reference. > > drivers/perf/arm_brbe.c | 716 ++++++++++++++++++++++++++++ > drivers/perf/arm_brbe.h | 270 +++++++++++ > drivers/perf/arm_pmu.c | 12 +- > drivers/perf/arm_pmuv3.c | 110 ++++- > > Squashing drivers/perf/arm_brbe.c and drivers/perf/arm_brbe.h will blur these > functions and fold them back into the base driver addition itself. It's hard to know what to do with a diffstat, so I'll wait for v14 (but please note that this is too late for 6.6 now). > - Base BRBE implementation enabling perf branch stack sampling support > - sched_task() saving off older branch records when task schedules out > - PMU IRQ branch records management accommodating saved older branch records Why would we want a driver that doesn't handle IRQs or context switching? I don't think you gain anything by splitting the driver like this. > In order to summerize the patch series re-organization again, V14 will contain > the patches in order. > > 1. drivers: perf: arm_pmu: Add new sched_task() callback > 2. arm64/perf: Add BRBE registers and fields > 3. arm64/perf: Add branch stack support in struct arm_pmu > 4. arm64/perf: Add branch stack support in struct pmu_hw_events > 5. arm64/perf: Add branch stack support in ARMV8 PMU > 6. arm64/perf: Enable branch stack events via FEAT_BRBE > > I am just wondering if both PATCH 3 and PATCH 4 should be folded into PATCH 5 > which adds the perf branch stack sampling abstraction at ARMV8 PMU level. I can't tell from just the subjects, but patches 3-5 (and possibly 6) sound like they should be the same patch? But again, subject lines and diffstats aren't especially helpful. Will
Hello Will, Separated this part out just to understand it better. On 8/18/23 23:26, Will Deacon wrote: >> These stubs are necessary to protect PMU drivers built on arm32 which share >> basic branch record processing abstraction at ARMV8 PMU level. It compiles >> successfully both on arm32 and arm64 platforms with these changes. Subject >> line for this patch clearly mentions that as well. > But they shouldn't be needed. When CONFIG_ARM64_BRBE is not selected, no > branch record processing functions should be needed, empty or otherwise. But that is not how the code is organized currently. CONFIG_ARM64_BRBE enables the driver to bring in BRBE HW specific branch stack callback implementation details. But these callbacks are always present at ARMV8 PMU level even without BRBE implementation as well. Hence these call sites are present, regardless of CONFIG_ARM64_BRBE. > The callers should not exist in the first place (i.e. the empty definitions But how to achieve that ? Branch stack needs to be driven along side normal PMU events, which in turn get driven from 'struct arm_pmu' callbacks. Hence branch stack callbacks too needs to be at ARMV8 PMU level, and closely tied to normal PMU event handling callbacks. Let's examine from where all these new callbacks are called from. * armv8pmu_disable_event() ---> armv8pmu_branch_disable() * armv8pmu_handle_irq() ---> armv8pmu_branch_read() * armv8pmu_sched_task() ---> armv8pmu_branch_save() * armv8pmu_sched_task() ---> armv8pmu_branch_reset() * armv8pmu_reset() ---> armv8pmu_branch_reset() * __armv8_pmuv3_map_event() ---> armv8pmu_branch_attr_valid() * __armv8pmu_probe_pmu() ---> armv8pmu_branch_probe() * armv8pmu_probe_pmu() ---> armv8pmu_task_ctx_cache_alloc() * armv8pmu_probe_pmu() ---> branch_records_alloc() Please note that, branch_records_alloc() being deferred allocation is only one that is platform agnostic. I did not intend to make any of the BRBE details visible at ARMV8 PMU level i.e right inside armv8pmu_XXXX() definitions, as ARMV8 PMU is shared between arm32 and arm64 platforms. Hence these new branch stack callbacks are added along with required fallback stubs for build protection, so that the HW implementations can be hidden inside a new driver wrapped in CONFIG_ARM64_BRBE. Please note that these new branch stack functions are not 'struct arm_pmu' callbacks but instead normal helpers. > should live in the core driver code, not in the arch header, or the calling Driver code is compiled with CONFIG_ARM64_BRBE, hence the real definitions are there. Instead default stubs are required only when armv8pmu_branch_XXX() call backs are not defined. But are you suggesting that these stubs be moved inside drivers/perf/arm_pmuv3.c (where all call sites reside), so that there will be just one stub copy both for arm32 and arm64 platforms removing their duplicate definitions from arch headers i.e arch/arm64/include/asm/perf_event.h and arch/arm/include/asm/arm_pmuv3.h ? > code should not be compiled at all). Branch stack sampling is always enabled from core perf perspective without any config option to wrap around, hence calling code cannot be selectively enabled or disabled. - Anshuman
On 7/11/23 13:54, Anshuman Khandual wrote: > This series enables perf branch stack sampling support on arm64 platform > via a new arch feature called Branch Record Buffer Extension (BRBE). All > relevant register definitions could be accessed here. > > https://developer.arm.com/documentation/ddi0601/2021-12/AArch64-Registers > > This series applies on 6.5-rc1. > > Changes in V13: > > https://lore.kernel.org/all/20230622065351.1092893-1-anshuman.khandual@arm.com/ > > - Added branch callback stubs for aarch32 pmuv3 based platforms > - Updated the comments for capture_brbe_regset() > - Deleted the comments in __read_brbe_regset() > - Reversed the arguments order in capture_brbe_regset() and brbe_branch_save() > - Fixed BRBE_BANK[0|1]_IDX_MAX indices comparison in armv8pmu_branch_read() > - Fixed BRBE_BANK[0|1]_IDX_MAX indices comparison in capture_brbe_regset() Hello All, Upcoming V14 series is still being worked out, and might take some more time. But meanwhile please find its corresponding development branch for experimentation purpose, if required. Thank you. https://git.gitlab.arm.com/linux-arm/linux-anshuman.git (brbe_v14_rc3) - Anshuman