[V2,2/6] perf: Add branch stack extension
Commit Message
From: Kan Liang <kan.liang@linux.intel.com>
Currently, the extra information of a branch entry is stored in a u64
space. With more and more information added, the space is running out.
For example, the information of occurrences of events will be added for
each branch.
Add an extension space to record the new information for each branch
entry. The space is appended after the struct perf_branch_stack.
Add a bit in struct perf_branch_entry to indicate whether the extra
information is included.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
New patch
- Introduce a generic extension space which can be used to
store the LBR event information for Intel. It can also be used by
other ARCHs for the other purpose.
- Add a new bit in struct perf_branch_entry to indicate whether the
extra information is included.
arch/powerpc/perf/core-book3s.c | 2 +-
arch/x86/events/amd/core.c | 2 +-
arch/x86/events/intel/core.c | 2 +-
arch/x86/events/intel/ds.c | 4 ++--
include/linux/perf_event.h | 18 +++++++++++++++++-
include/uapi/linux/perf_event.h | 4 +++-
kernel/events/core.c | 5 +++++
7 files changed, 30 insertions(+), 7 deletions(-)
Comments
Hi Kan,
On 5/22/2023 5:00 PM, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
>
> Currently, the extra information of a branch entry is stored in a u64
> space. With more and more information added, the space is running out.
> For example, the information of occurrences of events will be added for
> each branch.
>
> Add an extension space to record the new information for each branch
> entry. The space is appended after the struct perf_branch_stack.
>
> Add a bit in struct perf_branch_entry to indicate whether the extra
> information is included.
>
> Reviewed-by: Andi Kleen <ak@linux.intel.com>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> Cc: Sandipan Das <sandipan.das@amd.com>
> Cc: Ravi Bangoria <ravi.bangoria@amd.com>
> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> ---
>
> New patch
> - Introduce a generic extension space which can be used to
> store the LBR event information for Intel. It can also be used by
> other ARCHs for the other purpose.
> - Add a new bit in struct perf_branch_entry to indicate whether the
> extra information is included.
>
> arch/powerpc/perf/core-book3s.c | 2 +-
> arch/x86/events/amd/core.c | 2 +-
> arch/x86/events/intel/core.c | 2 +-
> arch/x86/events/intel/ds.c | 4 ++--
> include/linux/perf_event.h | 18 +++++++++++++++++-
> include/uapi/linux/perf_event.h | 4 +++-
> kernel/events/core.c | 5 +++++
> 7 files changed, 30 insertions(+), 7 deletions(-)
>
This seems to be missing the following:
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 6310fc5c9f52..b6739f63dc34 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1704,7 +1704,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
perf_sample_data_init(&data, 0, event->hw.last_period);
if (has_branch_stack(event))
- perf_sample_save_brstack(&data, event, &cpuc->lbr_stack);
+ perf_sample_save_brstack(&data, event, &cpuc->lbr_stack, NULL);
if (perf_event_overflow(event, &data, regs))
x86_pmu_stop(event, 0);
Otherwise, the changes look good to me.
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
On 2023-05-23 2:03 a.m., Sandipan Das wrote:
> Hi Kan,
>
> On 5/22/2023 5:00 PM, kan.liang@linux.intel.com wrote:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> Currently, the extra information of a branch entry is stored in a u64
>> space. With more and more information added, the space is running out.
>> For example, the information of occurrences of events will be added for
>> each branch.
>>
>> Add an extension space to record the new information for each branch
>> entry. The space is appended after the struct perf_branch_stack.
>>
>> Add a bit in struct perf_branch_entry to indicate whether the extra
>> information is included.
>>
>> Reviewed-by: Andi Kleen <ak@linux.intel.com>
>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>> Cc: Sandipan Das <sandipan.das@amd.com>
>> Cc: Ravi Bangoria <ravi.bangoria@amd.com>
>> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>> ---
>>
>> New patch
>> - Introduce a generic extension space which can be used to
>> store the LBR event information for Intel. It can also be used by
>> other ARCHs for the other purpose.
>> - Add a new bit in struct perf_branch_entry to indicate whether the
>> extra information is included.
>>
>> arch/powerpc/perf/core-book3s.c | 2 +-
>> arch/x86/events/amd/core.c | 2 +-
>> arch/x86/events/intel/core.c | 2 +-
>> arch/x86/events/intel/ds.c | 4 ++--
>> include/linux/perf_event.h | 18 +++++++++++++++++-
>> include/uapi/linux/perf_event.h | 4 +++-
>> kernel/events/core.c | 5 +++++
>> 7 files changed, 30 insertions(+), 7 deletions(-)
>>
>
> This seems to be missing the following:
Oh, right, the patch set based on the perf/core branch which doesn't
include the latest fix, 90befef5a9e8 ("perf/x86: Fix missing sample size
update on AMD BRS").
Peter,
Could you please backport the 90befef5a9e8 to your perf/core branch?
Then I will fold the below change into V3.
Or should I rebase the patch set on top of perf/urgent?
>
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 6310fc5c9f52..b6739f63dc34 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -1704,7 +1704,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
> perf_sample_data_init(&data, 0, event->hw.last_period);
>
> if (has_branch_stack(event))
> - perf_sample_save_brstack(&data, event, &cpuc->lbr_stack);
> + perf_sample_save_brstack(&data, event, &cpuc->lbr_stack, NULL);
>
> if (perf_event_overflow(event, &data, regs))
> x86_pmu_stop(event, 0);
>
>
> Otherwise, the changes look good to me.
>
> Reviewed-by: Sandipan Das <sandipan.das@amd.com>
Thanks!
Kan
On Mon, May 22, 2023 at 04:30:36AM -0700, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
>
> Currently, the extra information of a branch entry is stored in a u64
> space. With more and more information added, the space is running out.
> For example, the information of occurrences of events will be added for
> each branch.
>
> Add an extension space to record the new information for each branch
> entry. The space is appended after the struct perf_branch_stack.
>
> Add a bit in struct perf_branch_entry to indicate whether the extra
> information is included.
>
> Reviewed-by: Andi Kleen <ak@linux.intel.com>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> Cc: Sandipan Das <sandipan.das@amd.com>
> Cc: Ravi Bangoria <ravi.bangoria@amd.com>
> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> ---
>
> New patch
> - Introduce a generic extension space which can be used to
> store the LBR event information for Intel. It can also be used by
> other ARCHs for the other purpose.
> - Add a new bit in struct perf_branch_entry to indicate whether the
> extra information is included.
Bah.. I don't like this, also the actual format isn't clear to me.
The uapi part is severely lacking, it just adds the ext:1 thing, but
doesn't describe what if anything happens when it's set.
The internal perf_branch_stack_ext thing is just that, internal.
Additionally it contains a nr member, which seems to suggest it can be
different from the number of entries in the branch-stack itself -- which
would be odd indeed.
So we have an 'ext' bit per branch entry to indicate the existance of
this extra data, this again suggests no 1:1 correspondence, but at most
one extra entry per set bit.
Parsing this will be pretty horrible, no?
So what we have now is:
{ u64 nr;
{ u64 hw_idx; } && PERF_SAMPLE_BRANCH_HW_INDEX
{ u64 from, to, flags; } lbr[nr];
} && PERF_SAMPLE_BRANCH_STACK
and AFAICT you're doing:
{ u64 nr;
{ u64 hw_idx; } && PERF_SAMPLE_BRANCH_HW_INDEX
{ u64 from, to, flags; } lbr[nr];
+ { u64 nr2;
+ { u64 extra; } extra[nr2];
+ } && OR_i{lbr[i].flags.ext}
} && PERF_SAMPLE_BRANCH_STACK
Which is pretty horrific, no? The straight forward:
{ u64 nr;
{ u64 hw_idx; } && PERF_SAMPLE_BRANCH_HW_INDEX
{ u64 from, to, flags; } lbr[nr];
+ { u64 extra; } ext[nr] && SOMETHING
} && PERF_SAMPLE_BRANCH_STACK
Or perhaps even:
{ u64 nr;
{ u64 hw_idx; } && PERF_SAMPLE_BRANCH_HW_INDEX
{ u64 from, to, flags;
+ u64 extra; && SOMETHING
} lbr[nr];
} && PERF_SAMPLE_BRANCH_STACK
With the obvious question what 'SOMETHING' should be. I suppose
PERF_SAMPLE_BRANCH_EXTRA was considered and discarded?
Implementing the last suggestion wouldn't even be too bad, since having
PERF_SAMPLE_BRANCH_EXTRA set, we know to allocate and cast the existing
perf_sample_data::br_stack to a convenient new type, something like:
struct perf_branch_entry_ext {
__u64 from;
__u64 to;
__u64 mispred:1, /* target mispredicted */
predicted:1,/* target predicted */
in_tx:1, /* in transaction */
abort:1, /* transaction abort */
cycles:16, /* cycle count to last branch */
type:4, /* branch type */
spec:2, /* branch speculation info */
new_type:4, /* additional branch type */
priv:3, /* privilege level */
reserved:31;
__u64 extra;
};
Except at that point I think I would suggest doing s/EXTRA/COUNTERS/g
and making it something like:
union {
__u64 counters;
__u8 c[8];
};
Or something daft like that.
Wouldn't all that make *MUCH* more sense?
On 2023-08-02 5:58 p.m., Peter Zijlstra wrote:
> On Mon, May 22, 2023 at 04:30:36AM -0700, kan.liang@linux.intel.com wrote:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> Currently, the extra information of a branch entry is stored in a u64
>> space. With more and more information added, the space is running out.
>> For example, the information of occurrences of events will be added for
>> each branch.
>>
>> Add an extension space to record the new information for each branch
>> entry. The space is appended after the struct perf_branch_stack.
>>
>> Add a bit in struct perf_branch_entry to indicate whether the extra
>> information is included.
>>
>> Reviewed-by: Andi Kleen <ak@linux.intel.com>
>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>> Cc: Sandipan Das <sandipan.das@amd.com>
>> Cc: Ravi Bangoria <ravi.bangoria@amd.com>
>> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>> ---
>>
>> New patch
>> - Introduce a generic extension space which can be used to
>> store the LBR event information for Intel. It can also be used by
>> other ARCHs for the other purpose.
>> - Add a new bit in struct perf_branch_entry to indicate whether the
>> extra information is included.
>
> Bah.. I don't like this, also the actual format isn't clear to me.
>
> The uapi part is severely lacking, it just adds the ext:1 thing, but
> doesn't describe what if anything happens when it's set.
>
> The internal perf_branch_stack_ext thing is just that, internal.
> Additionally it contains a nr member, which seems to suggest it can be
> different from the number of entries in the branch-stack itself -- which
> would be odd indeed.
>
> So we have an 'ext' bit per branch entry to indicate the existance of
> this extra data, this again suggests no 1:1 correspondence, but at most
> one extra entry per set bit.
>
> Parsing this will be pretty horrible, no?
>
> So what we have now is:
>
> { u64 nr;
> { u64 hw_idx; } && PERF_SAMPLE_BRANCH_HW_INDEX
> { u64 from, to, flags; } lbr[nr];
> } && PERF_SAMPLE_BRANCH_STACK
>
> and AFAICT you're doing:
>
> { u64 nr;
> { u64 hw_idx; } && PERF_SAMPLE_BRANCH_HW_INDEX
> { u64 from, to, flags; } lbr[nr];
> + { u64 nr2;
> + { u64 extra; } extra[nr2];
> + } && OR_i{lbr[i].flags.ext}
> } && PERF_SAMPLE_BRANCH_STACK
>
> Which is pretty horrific, no? The straight forward:
I just tried to make the interface more flexible, since I had no idea
how other ARCHs would use the extra space. But it seems such flexibility
is not necessary. It is indeed not easy to be parsed.
>
> { u64 nr;
> { u64 hw_idx; } && PERF_SAMPLE_BRANCH_HW_INDEX
> { u64 from, to, flags; } lbr[nr];
> + { u64 extra; } ext[nr] && SOMETHING
> } && PERF_SAMPLE_BRANCH_STACK
>
> Or perhaps even:
>
> { u64 nr;
> { u64 hw_idx; } && PERF_SAMPLE_BRANCH_HW_INDEX
> { u64 from, to, flags;
> + u64 extra; && SOMETHING
> } lbr[nr];
> } && PERF_SAMPLE_BRANCH_STACK
>
> With the obvious question what 'SOMETHING' should be. I suppose
> PERF_SAMPLE_BRANCH_EXTRA was considered and discarded?
Yes, it's considered. I once tried to reuse the existing space/structure
as much as possible. So it's dropped.
Other than that, using a new sample type as an indicator should be a
better way and much straight forward. I will use it in V3.
>
> Implementing the last suggestion wouldn't even be too bad, since having
> PERF_SAMPLE_BRANCH_EXTRA set, we know to allocate and cast the existing
> perf_sample_data::br_stack to a convenient new type, something like:
>
> struct perf_branch_entry_ext {
> __u64 from;
> __u64 to;
> __u64 mispred:1, /* target mispredicted */
> predicted:1,/* target predicted */
> in_tx:1, /* in transaction */
> abort:1, /* transaction abort */
> cycles:16, /* cycle count to last branch */
> type:4, /* branch type */
> spec:2, /* branch speculation info */
> new_type:4, /* additional branch type */
> priv:3, /* privilege level */
> reserved:31;
> __u64 extra;
> };
>
> Except at that point I think I would suggest doing s/EXTRA/COUNTERS/g
> and making it something like:
>
> union {
> __u64 counters;
> __u8 c[8];
> };
>
It's good enough for this feature and Intel LBR.
My only concern is that it's only a 64 bit extra space. If we need more
space later, we have to keep adding perf_branch_entry_ext2 and
PERF_SAMPLE_BRANCH_EXTRA2. But I don't have such use case now. Maybe I'm
just too paranoid. :)
I will use the suggested structure in V3. If anyone has other concerns,
we can discuss them from there.
Thanks,
Kan
> Or something daft like that.
>
> Wouldn't all that make *MUCH* more sense?
@@ -2313,7 +2313,7 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
struct cpu_hw_events *cpuhw;
cpuhw = this_cpu_ptr(&cpu_hw_events);
power_pmu_bhrb_read(event, cpuhw);
- perf_sample_save_brstack(&data, event, &cpuhw->bhrb_stack);
+ perf_sample_save_brstack(&data, event, &cpuhw->bhrb_stack, NULL);
}
if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC &&
@@ -930,7 +930,7 @@ static int amd_pmu_v2_handle_irq(struct pt_regs *regs)
continue;
if (has_branch_stack(event))
- perf_sample_save_brstack(&data, event, &cpuc->lbr_stack);
+ perf_sample_save_brstack(&data, event, &cpuc->lbr_stack, NULL);
if (perf_event_overflow(event, &data, regs))
x86_pmu_stop(event, 0);
@@ -3048,7 +3048,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
perf_sample_data_init(&data, 0, event->hw.last_period);
if (has_branch_stack(event))
- perf_sample_save_brstack(&data, event, &cpuc->lbr_stack);
+ perf_sample_save_brstack(&data, event, &cpuc->lbr_stack, NULL);
if (perf_event_overflow(event, &data, regs))
x86_pmu_stop(event, 0);
@@ -1747,7 +1747,7 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
setup_pebs_time(event, data, pebs->tsc);
if (has_branch_stack(event))
- perf_sample_save_brstack(data, event, &cpuc->lbr_stack);
+ perf_sample_save_brstack(data, event, &cpuc->lbr_stack, NULL);
}
static void adaptive_pebs_save_regs(struct pt_regs *regs,
@@ -1904,7 +1904,7 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
if (has_branch_stack(event)) {
intel_pmu_store_pebs_lbrs(lbr);
- perf_sample_save_brstack(data, event, &cpuc->lbr_stack);
+ perf_sample_save_brstack(data, event, &cpuc->lbr_stack, NULL);
}
}
@@ -126,6 +126,16 @@ struct perf_branch_stack {
struct perf_branch_entry entries[];
};
+/*
+ * The extension space is appended after the struct perf_branch_stack.
+ * It is used to store the extra data of each branch, e.g.,
+ * the occurrences of events since the last branch entry for Intel LBR.
+ */
+struct perf_branch_stack_ext {
+ __u64 nr;
+ __u64 data[];
+};
+
struct task_struct;
/*
@@ -1161,6 +1171,7 @@ struct perf_sample_data {
struct perf_callchain_entry *callchain;
struct perf_raw_record *raw;
struct perf_branch_stack *br_stack;
+ struct perf_branch_stack_ext *br_stack_ext;
union perf_sample_weight weight;
union perf_mem_data_src data_src;
u64 txn;
@@ -1237,7 +1248,8 @@ static inline void perf_sample_save_raw_data(struct perf_sample_data *data,
static inline void perf_sample_save_brstack(struct perf_sample_data *data,
struct perf_event *event,
- struct perf_branch_stack *brs)
+ struct perf_branch_stack *brs,
+ struct perf_branch_stack_ext *brs_ext)
{
int size = sizeof(u64); /* nr */
@@ -1245,7 +1257,11 @@ static inline void perf_sample_save_brstack(struct perf_sample_data *data,
size += sizeof(u64);
size += brs->nr * sizeof(struct perf_branch_entry);
+ if (brs_ext)
+ size += (1 + brs_ext->nr) * sizeof(u64);
+
data->br_stack = brs;
+ data->br_stack_ext = brs_ext;
data->dyn_size += size;
data->sample_flags |= PERF_SAMPLE_BRANCH_STACK;
}
@@ -1410,6 +1410,7 @@ union perf_mem_data_src {
* cycles: cycles from last branch (or 0 if not supported)
* type: branch type
* spec: branch speculation info (or 0 if not supported)
+ * ext: has extension space for extra info (or 0 if not supported)
*/
struct perf_branch_entry {
__u64 from;
@@ -1423,7 +1424,8 @@ struct perf_branch_entry {
spec:2, /* branch speculation info */
new_type:4, /* additional branch type */
priv:3, /* privilege level */
- reserved:31;
+ ext:1, /* has extension */
+ reserved:30;
};
union perf_sample_weight {
@@ -7324,6 +7324,11 @@ void perf_output_sample(struct perf_output_handle *handle,
if (branch_sample_hw_index(event))
perf_output_put(handle, data->br_stack->hw_idx);
perf_output_copy(handle, data->br_stack->entries, size);
+ if (data->br_stack_ext) {
+ size = data->br_stack_ext->nr * sizeof(u64);
+ perf_output_put(handle, data->br_stack_ext->nr);
+ perf_output_copy(handle, data->br_stack_ext->data, size);
+ }
} else {
/*
* we always store at least the value of nr