[V7,3/6] arm64/perf: Add branch stack support in struct arm_pmu

Message ID 20230105031039.207972-4-anshuman.khandual@arm.com
State New
Headers
Series arm64/perf: Enable branch stack sampling |

Commit Message

Anshuman Khandual Jan. 5, 2023, 3:10 a.m. UTC
  This updates 'struct arm_pmu' for branch stack sampling support later. This
adds a new 'features' element in the structure to track supported features,
and another 'private' element to encapsulate implementation attributes on a
given 'struct arm_pmu'. These updates here will help in tracking any branch
stack sampling support, which is being added later. This also adds a helper
arm_pmu_branch_stack_supported().

This also enables perf branch stack sampling event on all 'struct arm pmu',
supporting the feature but after removing the current gate that blocks such
events unconditionally in armpmu_event_init(). Instead a quick probe can be
initiated via arm_pmu_branch_stack_supported() to ascertain the support.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/perf/arm_pmu.c       | 3 +--
 include/linux/perf/arm_pmu.h | 9 +++++++++
 2 files changed, 10 insertions(+), 2 deletions(-)
  

Comments

Mark Rutland Jan. 12, 2023, 1:54 p.m. UTC | #1
On Thu, Jan 05, 2023 at 08:40:36AM +0530, Anshuman Khandual wrote:
> This updates 'struct arm_pmu' for branch stack sampling support later. This
> adds a new 'features' element in the structure to track supported features,
> and another 'private' element to encapsulate implementation attributes on a
> given 'struct arm_pmu'. These updates here will help in tracking any branch
> stack sampling support, which is being added later. This also adds a helper
> arm_pmu_branch_stack_supported().
> 
> This also enables perf branch stack sampling event on all 'struct arm pmu',
> supporting the feature but after removing the current gate that blocks such
> events unconditionally in armpmu_event_init(). Instead a quick probe can be
> initiated via arm_pmu_branch_stack_supported() to ascertain the support.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>  drivers/perf/arm_pmu.c       | 3 +--
>  include/linux/perf/arm_pmu.h | 9 +++++++++
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 14a3ed3bdb0b..a85b2d67022e 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -510,8 +510,7 @@ static int armpmu_event_init(struct perf_event *event)
>  		!cpumask_test_cpu(event->cpu, &armpmu->supported_cpus))
>  		return -ENOENT;
>  
> -	/* does not support taken branch sampling */
> -	if (has_branch_stack(event))
> +	if (has_branch_stack(event) && !arm_pmu_branch_stack_supported(armpmu))
>  		return -EOPNOTSUPP;
>  
>  	return __hw_perf_event_init(event);
> diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
> index 2a9d07cee927..64e1b2594025 100644
> --- a/include/linux/perf/arm_pmu.h
> +++ b/include/linux/perf/arm_pmu.h
> @@ -80,11 +80,14 @@ enum armpmu_attr_groups {
>  	ARMPMU_NR_ATTR_GROUPS
>  };
>  
> +#define ARM_PMU_BRANCH_STACK	BIT(0)
> +
>  struct arm_pmu {
>  	struct pmu	pmu;
>  	cpumask_t	supported_cpus;
>  	char		*name;
>  	int		pmuver;
> +	int		features;
>  	irqreturn_t	(*handle_irq)(struct arm_pmu *pmu);
>  	void		(*enable)(struct perf_event *event);
>  	void		(*disable)(struct perf_event *event);

Hmm, we already have the secure_access field separately. How about we fold that
in and go with:

	unsigned int	secure_access    : 1,
			has_branch_stack : 1;

... that way we have one way to manage flags, we don't need to allocate the
bits, and the bulk of the existing code for secure_access can stay as-is.

> @@ -119,8 +122,14 @@ struct arm_pmu {
>  
>  	/* Only to be used by ACPI probing code */
>  	unsigned long acpi_cpuid;
> +	void		*private;

Does this need to be on the end of struct arm_pmu, or can it be placed earlier?

The line spacing makes it look like the ACPI comment applies to 'private',
which isn't the case.

>  };
>  
> +static inline bool arm_pmu_branch_stack_supported(struct arm_pmu *armpmu)
> +{
> +	return armpmu->features & ARM_PMU_BRANCH_STACK;
> +}

With the above, this would become:

static inline bool arm_pmu_branch_stack_supported(struct arm_pmu *armpmu)
{
	return armpmu->has_branch_stack;
}

Thanks,
Mark.

> +
>  #define to_arm_pmu(p) (container_of(p, struct arm_pmu, pmu))
>  
>  u64 armpmu_event_update(struct perf_event *event);
> -- 
> 2.25.1
>
  
Anshuman Khandual Jan. 13, 2023, 4:15 a.m. UTC | #2
On 1/12/23 19:24, Mark Rutland wrote:
> On Thu, Jan 05, 2023 at 08:40:36AM +0530, Anshuman Khandual wrote:
>> This updates 'struct arm_pmu' for branch stack sampling support later. This
>> adds a new 'features' element in the structure to track supported features,
>> and another 'private' element to encapsulate implementation attributes on a
>> given 'struct arm_pmu'. These updates here will help in tracking any branch
>> stack sampling support, which is being added later. This also adds a helper
>> arm_pmu_branch_stack_supported().
>>
>> This also enables perf branch stack sampling event on all 'struct arm pmu',
>> supporting the feature but after removing the current gate that blocks such
>> events unconditionally in armpmu_event_init(). Instead a quick probe can be
>> initiated via arm_pmu_branch_stack_supported() to ascertain the support.
>>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>  drivers/perf/arm_pmu.c       | 3 +--
>>  include/linux/perf/arm_pmu.h | 9 +++++++++
>>  2 files changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
>> index 14a3ed3bdb0b..a85b2d67022e 100644
>> --- a/drivers/perf/arm_pmu.c
>> +++ b/drivers/perf/arm_pmu.c
>> @@ -510,8 +510,7 @@ static int armpmu_event_init(struct perf_event *event)
>>  		!cpumask_test_cpu(event->cpu, &armpmu->supported_cpus))
>>  		return -ENOENT;
>>  
>> -	/* does not support taken branch sampling */
>> -	if (has_branch_stack(event))
>> +	if (has_branch_stack(event) && !arm_pmu_branch_stack_supported(armpmu))
>>  		return -EOPNOTSUPP;
>>  
>>  	return __hw_perf_event_init(event);
>> diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
>> index 2a9d07cee927..64e1b2594025 100644
>> --- a/include/linux/perf/arm_pmu.h
>> +++ b/include/linux/perf/arm_pmu.h
>> @@ -80,11 +80,14 @@ enum armpmu_attr_groups {
>>  	ARMPMU_NR_ATTR_GROUPS
>>  };
>>  
>> +#define ARM_PMU_BRANCH_STACK	BIT(0)
>> +
>>  struct arm_pmu {
>>  	struct pmu	pmu;
>>  	cpumask_t	supported_cpus;
>>  	char		*name;
>>  	int		pmuver;
>> +	int		features;
>>  	irqreturn_t	(*handle_irq)(struct arm_pmu *pmu);
>>  	void		(*enable)(struct perf_event *event);
>>  	void		(*disable)(struct perf_event *event);
> 
> Hmm, we already have the secure_access field separately. How about we fold that
> in and go with:
> 
> 	unsigned int	secure_access    : 1,
> 			has_branch_stack : 1;

Something like this would work, but should we use __u32 instead of unsigned int
to ensure 32 bit width ?

-       bool            secure_access; /* 32-bit ARM only */
+       unsigned int    secure_access   : 1, /* 32-bit ARM only */
+                       has_branch_stack: 1,
+                       reserved        : 31;

> 
> ... that way we have one way to manage flags, we don't need to allocate the
> bits, and the bulk of the existing code for secure_access can stay as-is.

Right, the changed code also builds on arm32 without any code change.

> 
>> @@ -119,8 +122,14 @@ struct arm_pmu {
>>  
>>  	/* Only to be used by ACPI probing code */
>>  	unsigned long acpi_cpuid;
>> +	void		*private;
> 
> Does this need to be on the end of struct arm_pmu, or can it be placed earlier?

This additional 'private' attribute structure sticking out from struct arm_pmu
should be at the end. But is there any benefit moving this earlier ?

> 
> The line spacing makes it look like the ACPI comment applies to 'private',
> which isn't the case.

Sure, will add the following comment, and a space in between.

diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index f60f7e01acae..c0a090ff7991 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -130,6 +130,8 @@ struct arm_pmu {
 
        /* Only to be used by ACPI probing code */
        unsigned long acpi_cpuid;
+
+       /* Implementation specific attributes */
        void            *private;
 };

> 
>>  };
>>  
>> +static inline bool arm_pmu_branch_stack_supported(struct arm_pmu *armpmu)
>> +{
>> +	return armpmu->features & ARM_PMU_BRANCH_STACK;
>> +}
> 
> With the above, this would become:
> 
> static inline bool arm_pmu_branch_stack_supported(struct arm_pmu *armpmu)
> {
> 	return armpmu->has_branch_stack;
> }

Right, will change this helper as required.
  
Mark Rutland Feb. 8, 2023, 7:26 p.m. UTC | #3
On Fri, Jan 13, 2023 at 09:45:22AM +0530, Anshuman Khandual wrote:
> 
> On 1/12/23 19:24, Mark Rutland wrote:
> > On Thu, Jan 05, 2023 at 08:40:36AM +0530, Anshuman Khandual wrote:
> >>  struct arm_pmu {
> >>  	struct pmu	pmu;
> >>  	cpumask_t	supported_cpus;
> >>  	char		*name;
> >>  	int		pmuver;
> >> +	int		features;
> >>  	irqreturn_t	(*handle_irq)(struct arm_pmu *pmu);
> >>  	void		(*enable)(struct perf_event *event);
> >>  	void		(*disable)(struct perf_event *event);
> > 
> > Hmm, we already have the secure_access field separately. How about we fold that
> > in and go with:
> > 
> > 	unsigned int	secure_access    : 1,
> > 			has_branch_stack : 1;
> 
> Something like this would work, but should we use __u32 instead of unsigned int
> to ensure 32 bit width ?

I don't think that's necessary; the exact size doesn't really matter, and
unsigned int is 32-bits on all targets suppropted by Linux, not just arm and
arm64.

I do agree that if this were a userspace ABI detail, it might be preferable to
use __u32. However, I think using it here gives the misleading impression that
there is an ABI concern when there is not, and as above it's not necessary, so
I'd prefer unsigned int here.

Thanks,
Mark.
  
Anshuman Khandual Feb. 9, 2023, 3:40 a.m. UTC | #4
On 2/9/23 00:56, Mark Rutland wrote:
> On Fri, Jan 13, 2023 at 09:45:22AM +0530, Anshuman Khandual wrote:
>>
>> On 1/12/23 19:24, Mark Rutland wrote:
>>> On Thu, Jan 05, 2023 at 08:40:36AM +0530, Anshuman Khandual wrote:
>>>>  struct arm_pmu {
>>>>  	struct pmu	pmu;
>>>>  	cpumask_t	supported_cpus;
>>>>  	char		*name;
>>>>  	int		pmuver;
>>>> +	int		features;
>>>>  	irqreturn_t	(*handle_irq)(struct arm_pmu *pmu);
>>>>  	void		(*enable)(struct perf_event *event);
>>>>  	void		(*disable)(struct perf_event *event);
>>>
>>> Hmm, we already have the secure_access field separately. How about we fold that
>>> in and go with:
>>>
>>> 	unsigned int	secure_access    : 1,
>>> 			has_branch_stack : 1;
>>
>> Something like this would work, but should we use __u32 instead of unsigned int
>> to ensure 32 bit width ?
> 
> I don't think that's necessary; the exact size doesn't really matter, and
> unsigned int is 32-bits on all targets suppropted by Linux, not just arm and
> arm64.
> 
> I do agree that if this were a userspace ABI detail, it might be preferable to
> use __u32. However, I think using it here gives the misleading impression that
> there is an ABI concern when there is not, and as above it's not necessary, so
> I'd prefer unsigned int here.

Makes sense, will this as unsigned int.
  

Patch

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 14a3ed3bdb0b..a85b2d67022e 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -510,8 +510,7 @@  static int armpmu_event_init(struct perf_event *event)
 		!cpumask_test_cpu(event->cpu, &armpmu->supported_cpus))
 		return -ENOENT;
 
-	/* does not support taken branch sampling */
-	if (has_branch_stack(event))
+	if (has_branch_stack(event) && !arm_pmu_branch_stack_supported(armpmu))
 		return -EOPNOTSUPP;
 
 	return __hw_perf_event_init(event);
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 2a9d07cee927..64e1b2594025 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -80,11 +80,14 @@  enum armpmu_attr_groups {
 	ARMPMU_NR_ATTR_GROUPS
 };
 
+#define ARM_PMU_BRANCH_STACK	BIT(0)
+
 struct arm_pmu {
 	struct pmu	pmu;
 	cpumask_t	supported_cpus;
 	char		*name;
 	int		pmuver;
+	int		features;
 	irqreturn_t	(*handle_irq)(struct arm_pmu *pmu);
 	void		(*enable)(struct perf_event *event);
 	void		(*disable)(struct perf_event *event);
@@ -119,8 +122,14 @@  struct arm_pmu {
 
 	/* Only to be used by ACPI probing code */
 	unsigned long acpi_cpuid;
+	void		*private;
 };
 
+static inline bool arm_pmu_branch_stack_supported(struct arm_pmu *armpmu)
+{
+	return armpmu->features & ARM_PMU_BRANCH_STACK;
+}
+
 #define to_arm_pmu(p) (container_of(p, struct arm_pmu, pmu))
 
 u64 armpmu_event_update(struct perf_event *event);