[v2,1/1] perf arm64: Send pointer auth masks to ring buffer

Message ID 20221020101921.1219533-2-james.clark@arm.com
State New
Headers
Series Perf stack unwinding with pointer authentication |

Commit Message

James Clark Oct. 20, 2022, 10:19 a.m. UTC
  From: Andrew Kilroy <andrew.kilroy@arm.com>

Perf report cannot produce callgraphs using dwarf on arm64 where pointer
authentication is enabled.  This is because libunwind and libdw cannot
unmangle instruction pointers that have a pointer authentication code
(PAC) embedded in them.

libunwind and libdw need to be given an instruction mask which they can
use to arrive at the correct return address that does not contain the
PAC.

The bits in the return address that contain the PAC can differ by
process, so this patch adds a new sample field PERF_SAMPLE_ARCH_1
to allow the kernel to send the masks up to userspace perf.

This field can be used in a architecture specific fashion, but on
arm64, it contains the ptrauth mask information. The event will
currently fail to open on architectures other than arm64 if
PERF_SAMPLE_ARCH_1 is set. It will also fail to open on arm64 if
CONFIG_ARM64_PTR_AUTH isn't set, as the data would always be zeros.

Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Kilroy <andrew.kilroy@arm.com>
Signed-off-by: James Clark <james.clark@arm.com>
---
 arch/arm64/include/asm/perf_event.h | 32 +++++++++++++++++++++++++++++
 arch/arm64/kernel/perf_event.c      | 32 +++++++++++++++++++++++++++++
 include/linux/perf_event.h          | 12 +++++++++++
 include/uapi/linux/perf_event.h     |  4 +++-
 kernel/events/core.c                | 31 ++++++++++++++++++++++++++++
 5 files changed, 110 insertions(+), 1 deletion(-)
  

Comments

Peter Zijlstra Oct. 20, 2022, 4:49 p.m. UTC | #1
On Thu, Oct 20, 2022 at 11:19:20AM +0100, James Clark wrote:
> From: Andrew Kilroy <andrew.kilroy@arm.com>
> 
> Perf report cannot produce callgraphs using dwarf on arm64 where pointer
> authentication is enabled.  This is because libunwind and libdw cannot
> unmangle instruction pointers that have a pointer authentication code
> (PAC) embedded in them.
> 
> libunwind and libdw need to be given an instruction mask which they can
> use to arrive at the correct return address that does not contain the
> PAC.
> 
> The bits in the return address that contain the PAC can differ by
> process, so this patch adds a new sample field PERF_SAMPLE_ARCH_1
> to allow the kernel to send the masks up to userspace perf.
> 
> This field can be used in a architecture specific fashion, but on
> arm64, it contains the ptrauth mask information. The event will
> currently fail to open on architectures other than arm64 if
> PERF_SAMPLE_ARCH_1 is set. It will also fail to open on arm64 if
> CONFIG_ARM64_PTR_AUTH isn't set, as the data would always be zeros.

A little more information please; wth is pointer authentication? Are we
going to be having the same thing with x86 LAM where only a subset of
the available bits have meaning to the hardware?

Why do we want the same mask repeated over and over with each sample;
should this not be part of the address space (side-band) data?
  
Peter Zijlstra Oct. 20, 2022, 5 p.m. UTC | #2
On Thu, Oct 20, 2022 at 06:49:17PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 20, 2022 at 11:19:20AM +0100, James Clark wrote:
> > From: Andrew Kilroy <andrew.kilroy@arm.com>
> > 
> > Perf report cannot produce callgraphs using dwarf on arm64 where pointer
> > authentication is enabled.  This is because libunwind and libdw cannot
> > unmangle instruction pointers that have a pointer authentication code
> > (PAC) embedded in them.
> > 
> > libunwind and libdw need to be given an instruction mask which they can
> > use to arrive at the correct return address that does not contain the
> > PAC.
> > 
> > The bits in the return address that contain the PAC can differ by
> > process, so this patch adds a new sample field PERF_SAMPLE_ARCH_1
> > to allow the kernel to send the masks up to userspace perf.
> > 
> > This field can be used in a architecture specific fashion, but on
> > arm64, it contains the ptrauth mask information. The event will
> > currently fail to open on architectures other than arm64 if
> > PERF_SAMPLE_ARCH_1 is set. It will also fail to open on arm64 if
> > CONFIG_ARM64_PTR_AUTH isn't set, as the data would always be zeros.
> 
> A little more information please; wth is pointer authentication? Are we

Mark got me: https://events.static.linuxfound.org/sites/events/files/slides/slides_23.pdf

> going to be having the same thing with x86 LAM where only a subset of
> the available bits have meaning to the hardware?
> 
> Why do we want the same mask repeated over and over with each sample;
> should this not be part of the address space (side-band) data?
  
James Clark Oct. 25, 2022, 3:28 p.m. UTC | #3
On 20/10/2022 17:49, Peter Zijlstra wrote:
> On Thu, Oct 20, 2022 at 11:19:20AM +0100, James Clark wrote:
>> From: Andrew Kilroy <andrew.kilroy@arm.com>
>>
>> Perf report cannot produce callgraphs using dwarf on arm64 where pointer
>> authentication is enabled.  This is because libunwind and libdw cannot
>> unmangle instruction pointers that have a pointer authentication code
>> (PAC) embedded in them.
>>
>> libunwind and libdw need to be given an instruction mask which they can
>> use to arrive at the correct return address that does not contain the
>> PAC.
>>
>> The bits in the return address that contain the PAC can differ by
>> process, so this patch adds a new sample field PERF_SAMPLE_ARCH_1
>> to allow the kernel to send the masks up to userspace perf.
>>
>> This field can be used in a architecture specific fashion, but on
>> arm64, it contains the ptrauth mask information. The event will
>> currently fail to open on architectures other than arm64 if
>> PERF_SAMPLE_ARCH_1 is set. It will also fail to open on arm64 if
>> CONFIG_ARM64_PTR_AUTH isn't set, as the data would always be zeros.
> 
> A little more information please; wth is pointer authentication? Are we

Yes the commit message could do with some links and details. I
will add more background about the feature if I resend it.


> going to be having the same thing with x86 LAM where only a subset of
> the available bits have meaning to the hardware?

If LAM is used with things like return addresses on the stack then yes.
For data addresses then it wouldn't cause this exact issue.

The problem is that libunwind and perf do pointer arithmetic when
unwinding with other values that don't have these bits set. For example
computing offset in an mmap where the base address doesn't have an
authentication code but a return value does results in an incorrect
value. Like x-y=offset is only correct if the PAC bits are masked out first.

> 
> Why do we want the same mask repeated over and over with each sample;
> should this not be part of the address space (side-band) data?

You are probably right that it could be done that way. The reason that
we did it this way was to be consistent with ptrace feature [1] where it
is delivered to userspace on a per-process basis. And there is also a
prctl for the enabled key types [2] which can be changed dynamically.
Particularly for the last reason is why it was done per sample.

Having said that, the enabled keys field is not used by perf, only the
mask is used, so I can drop the per sample data until enabled keys is
needed, which may be never.

I'm going to assume that perf shouldn't use ptrace because of
permissions and conflicts with debuggers, so I could put the mask
somewhere like PERF_RECORD_FORK instead of per sample.


[1]:
https://lore.kernel.org/all/20181207183931.4285-9-kristina.martsenko@arm.com/
[2]:
https://lore.kernel.org/all/d6609065f8f40397a4124654eb68c9f490b4d477.1616123271.git.pcc@google.com/
  
James Clark Oct. 27, 2022, 2:11 p.m. UTC | #4
On 25/10/2022 16:28, James Clark wrote:
> 
> 
> On 20/10/2022 17:49, Peter Zijlstra wrote:
>> On Thu, Oct 20, 2022 at 11:19:20AM +0100, James Clark wrote:
>>> From: Andrew Kilroy <andrew.kilroy@arm.com>
>>>
>>> Perf report cannot produce callgraphs using dwarf on arm64 where pointer
>>> authentication is enabled.  This is because libunwind and libdw cannot
>>> unmangle instruction pointers that have a pointer authentication code
>>> (PAC) embedded in them.
>>>
>>> libunwind and libdw need to be given an instruction mask which they can
>>> use to arrive at the correct return address that does not contain the
>>> PAC.
>>>
>>> The bits in the return address that contain the PAC can differ by
>>> process, so this patch adds a new sample field PERF_SAMPLE_ARCH_1
>>> to allow the kernel to send the masks up to userspace perf.
>>>
>>> This field can be used in a architecture specific fashion, but on
>>> arm64, it contains the ptrauth mask information. The event will
>>> currently fail to open on architectures other than arm64 if
>>> PERF_SAMPLE_ARCH_1 is set. It will also fail to open on arm64 if
>>> CONFIG_ARM64_PTR_AUTH isn't set, as the data would always be zeros.
>>
>> A little more information please; wth is pointer authentication? Are we
> 
> Yes the commit message could do with some links and details. I
> will add more background about the feature if I resend it.
> 
> 
>> going to be having the same thing with x86 LAM where only a subset of
>> the available bits have meaning to the hardware?
> 
> If LAM is used with things like return addresses on the stack then yes.
> For data addresses then it wouldn't cause this exact issue.
> 
> The problem is that libunwind and perf do pointer arithmetic when
> unwinding with other values that don't have these bits set. For example
> computing offset in an mmap where the base address doesn't have an
> authentication code but a return value does results in an incorrect
> value. Like x-y=offset is only correct if the PAC bits are masked out first.
> 
>>
>> Why do we want the same mask repeated over and over with each sample;
>> should this not be part of the address space (side-band) data?
> 
> You are probably right that it could be done that way. The reason that
> we did it this way was to be consistent with ptrace feature [1] where it
> is delivered to userspace on a per-process basis. And there is also a
> prctl for the enabled key types [2] which can be changed dynamically.
> Particularly for the last reason is why it was done per sample.
> 
> Having said that, the enabled keys field is not used by perf, only the
> mask is used, so I can drop the per sample data until enabled keys is
> needed, which may be never.
> 
> I'm going to assume that perf shouldn't use ptrace because of
> permissions and conflicts with debuggers, so I could put the mask
> somewhere like PERF_RECORD_FORK instead of per sample.
> 

Hi Peter,

Sorry for flip flopping, but I've read those threads that I linked and
spoke with Kristina and we would like to stick with the per sample
implementation after all.

The reason is that in the future there may also be a prctrl for 48/52
bit userspace addressing which would change the mask dynamically in the
same way as the enabled keys. Although this isn't possible now it makes
sense to do it this way in case of that, and also for consistency with
the ptrace feature.

I also think that repeating the mask in this case has a very low impact
because if you are doing dwarf unwinding, then the whole stack is saved
anyway, so a few extra u64s wouldn't be noticeable.

Are you ok with that and for me to resubmit with the expanded commit
message?

Thanks
James

> 
> [1]:
> https://lore.kernel.org/all/20181207183931.4285-9-kristina.martsenko@arm.com/
> [2]:
> https://lore.kernel.org/all/d6609065f8f40397a4124654eb68c9f490b4d477.1616123271.git.pcc@google.com/
  
Peter Zijlstra Oct. 27, 2022, 5:20 p.m. UTC | #5
On Tue, Oct 25, 2022 at 04:28:12PM +0100, James Clark wrote:

> > Why do we want the same mask repeated over and over with each sample;
> > should this not be part of the address space (side-band) data?
> 
> You are probably right that it could be done that way. The reason that
> we did it this way was to be consistent with ptrace feature [1] where it
> is delivered to userspace on a per-process basis. And there is also a
> prctl for the enabled key types [2] which can be changed dynamically.
> Particularly for the last reason is why it was done per sample.
> 
> Having said that, the enabled keys field is not used by perf, only the
> mask is used, so I can drop the per sample data until enabled keys is
> needed, which may be never.
> 
> I'm going to assume that perf shouldn't use ptrace because of
> permissions and conflicts with debuggers, so I could put the mask
> somewhere like PERF_RECORD_FORK instead of per sample.

Yeah, or create an new RECORD type which you can also send around at
prctl() time.

The only thing that's needed on top of that is exposing the mask
somewhere in /proc for existing tasks; which is what perf also uses to
syntesize RECORD_MMAP events on startup etc..
  
Peter Zijlstra Oct. 27, 2022, 5:25 p.m. UTC | #6
On Thu, Oct 27, 2022 at 03:11:47PM +0100, James Clark wrote:

> Sorry for flip flopping, but I've read those threads that I linked and
> spoke with Kristina and we would like to stick with the per sample
> implementation after all.
> 
> The reason is that in the future there may also be a prctrl for 48/52
> bit userspace addressing which would change the mask dynamically in the
> same way as the enabled keys. Although this isn't possible now it makes
> sense to do it this way in case of that, and also for consistency with
> the ptrace feature.
> 
> I also think that repeating the mask in this case has a very low impact
> because if you are doing dwarf unwinding, then the whole stack is saved
> anyway, so a few extra u64s wouldn't be noticeable.
> 
> Are you ok with that and for me to resubmit with the expanded commit
> message?

But you can send a side-band thing around if/when it changes. Same as
all the other stuff. We don't include MMAP data with each event either.

Yes, the DWARF thing is slow as anything. but imagine needing this for
something else as well, and then you're stuck with it.
  
James Clark Oct. 31, 2022, 1:48 p.m. UTC | #7
On 27/10/2022 18:20, Peter Zijlstra wrote:
> On Tue, Oct 25, 2022 at 04:28:12PM +0100, James Clark wrote:
> 
>>> Why do we want the same mask repeated over and over with each sample;
>>> should this not be part of the address space (side-band) data?
>>
>> You are probably right that it could be done that way. The reason that
>> we did it this way was to be consistent with ptrace feature [1] where it
>> is delivered to userspace on a per-process basis. And there is also a
>> prctl for the enabled key types [2] which can be changed dynamically.
>> Particularly for the last reason is why it was done per sample.
>>
>> Having said that, the enabled keys field is not used by perf, only the
>> mask is used, so I can drop the per sample data until enabled keys is
>> needed, which may be never.
>>
>> I'm going to assume that perf shouldn't use ptrace because of
>> permissions and conflicts with debuggers, so I could put the mask
>> somewhere like PERF_RECORD_FORK instead of per sample.
> 
> Yeah, or create an new RECORD type which you can also send around at
> prctl() time.
> 
> The only thing that's needed on top of that is exposing the mask
> somewhere in /proc for existing tasks; which is what perf also uses to
> syntesize RECORD_MMAP events on startup etc..
> 

Hmm ok, in that case I can just add the /proc interface for now because
the mask won't change and we can add the new record type at the point
it's needed in the future.

Thanks for the feedback.
  

Patch

diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h
index 3eaf462f5752..160fdb8fca1c 100644
--- a/arch/arm64/include/asm/perf_event.h
+++ b/arch/arm64/include/asm/perf_event.h
@@ -273,4 +273,36 @@  extern unsigned long perf_misc_flags(struct pt_regs *regs);
 	(regs)->pstate = PSR_MODE_EL1h;	\
 }
 
+#ifdef CONFIG_ARM64_PTR_AUTH
+#define HAS_ARCH_SAMPLE_DATA
+/*
+ * Structure holding masks to help userspace stack unwinding
+ * in the presence of arm64 pointer authentication.
+ */
+struct ptrauth_info {
+	/*
+	 * Bits 0, 1, 2, 3, 4 may be set to on, to indicate which keys are being used
+	 * The APIAKEY, APIBKEY, APDAKEY, APDBKEY, or the APGAKEY respectively.
+	 * Where all bits are off, pointer authentication is not in use for the
+	 * process.
+	 */
+	u64 enabled_keys;
+
+	/*
+	 * The on bits represent which bits in an instruction pointer
+	 * constitute the pointer authentication code.
+	 */
+	u64 insn_mask;
+
+	/*
+	 * The on bits represent which bits in a data pointer constitute the
+	 * pointer authentication code.
+	 */
+	u64 data_mask;
+};
+
+struct arch_sample_data {
+	struct ptrauth_info ptrauth;
+};
+#endif /* CONFIG_ARM64_PTR_AUTH */
 #endif
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index cb69ff1e6138..9c209168e055 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -1459,3 +1459,35 @@  void arch_perf_update_userpage(struct perf_event *event,
 	userpg->cap_user_time_zero = 1;
 	userpg->cap_user_time_short = 1;
 }
+
+#ifdef CONFIG_ARM64_PTR_AUTH
+void perf_output_sample_arch_1(struct perf_output_handle *handle,
+			       struct perf_event_header *header,
+			       struct perf_sample_data *data,
+			       struct perf_event *event)
+{
+	perf_output_put(handle, data->arch.ptrauth.enabled_keys);
+	perf_output_put(handle, data->arch.ptrauth.insn_mask);
+	perf_output_put(handle, data->arch.ptrauth.data_mask);
+}
+
+void perf_prepare_sample_arch_1(struct perf_event_header *header,
+				struct perf_sample_data *data,
+				struct perf_event *event,
+				struct pt_regs *regs)
+{
+	int keys_result = ptrauth_get_enabled_keys(current);
+	u64 user_pac_mask = keys_result > 0 ? ptrauth_user_pac_mask() : 0;
+
+	data->arch.ptrauth.enabled_keys = keys_result > 0 ? keys_result : 0;
+	data->arch.ptrauth.insn_mask = user_pac_mask;
+	data->arch.ptrauth.data_mask = user_pac_mask;
+
+	header->size += (3 * sizeof(u64));
+}
+
+int perf_event_open_request_arch_1(void)
+{
+	return 0;
+}
+#endif /* CONFIG_ARM64_PTR_AUTH */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 853f64b6c8c2..f6b0cc93faae 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1065,6 +1065,9 @@  struct perf_sample_data {
 	u64				cgroup;
 	u64				data_page_size;
 	u64				code_page_size;
+#ifdef HAS_ARCH_SAMPLE_DATA
+	struct arch_sample_data		arch;
+#endif
 } ____cacheline_aligned;
 
 /* default value for data source */
@@ -1674,6 +1677,15 @@  int perf_event_exit_cpu(unsigned int cpu);
 extern void __weak arch_perf_update_userpage(struct perf_event *event,
 					     struct perf_event_mmap_page *userpg,
 					     u64 now);
+extern void perf_output_sample_arch_1(struct perf_output_handle *handle,
+				      struct perf_event_header *header,
+				      struct perf_sample_data *data,
+				      struct perf_event *event);
+extern void perf_prepare_sample_arch_1(struct perf_event_header *header,
+				       struct perf_sample_data *data,
+				       struct perf_event *event,
+				       struct pt_regs *regs);
+extern int perf_event_open_request_arch_1(void);
 
 #ifdef CONFIG_MMU
 extern __weak u64 arch_perf_get_page_size(struct mm_struct *mm, unsigned long addr);
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 85be78e0e7f6..3c8349111422 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -162,10 +162,12 @@  enum perf_event_sample_format {
 	PERF_SAMPLE_DATA_PAGE_SIZE		= 1U << 22,
 	PERF_SAMPLE_CODE_PAGE_SIZE		= 1U << 23,
 	PERF_SAMPLE_WEIGHT_STRUCT		= 1U << 24,
+	PERF_SAMPLE_ARCH_1			= 1U << 25,
 
-	PERF_SAMPLE_MAX = 1U << 25,		/* non-ABI */
+	PERF_SAMPLE_MAX = 1U << 26,		/* non-ABI */
 };
 
+#define PERF_SAMPLE_ARM64_PTRAUTH PERF_SAMPLE_ARCH_1
 #define PERF_SAMPLE_WEIGHT_TYPE	(PERF_SAMPLE_WEIGHT | PERF_SAMPLE_WEIGHT_STRUCT)
 /*
  * values to program into branch_sample_type when PERF_SAMPLE_BRANCH is set
diff --git a/kernel/events/core.c b/kernel/events/core.c
index b981b879bcd8..8ca0501d608d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5822,6 +5822,25 @@  void __weak arch_perf_update_userpage(
 {
 }
 
+void __weak perf_output_sample_arch_1(struct perf_output_handle *handle,
+			       struct perf_event_header *header,
+			       struct perf_sample_data *data,
+			       struct perf_event *event)
+{
+}
+
+void __weak perf_prepare_sample_arch_1(struct perf_event_header *header,
+				struct perf_sample_data *data,
+				struct perf_event *event,
+				struct pt_regs *regs)
+{
+}
+
+int __weak perf_event_open_request_arch_1(void)
+{
+	return -EINVAL;
+}
+
 /*
  * Callers need to ensure there can be no nesting of this function, otherwise
  * the seqlock logic goes bad. We can not serialize this because the arch
@@ -7142,6 +7161,9 @@  void perf_output_sample(struct perf_output_handle *handle,
 			perf_aux_sample_output(event, handle, data);
 	}
 
+	if (sample_type & PERF_SAMPLE_ARCH_1)
+		perf_output_sample_arch_1(handle, header, data, event);
+
 	if (!event->attr.watermark) {
 		int wakeup_events = event->attr.wakeup_events;
 
@@ -7466,6 +7488,9 @@  void perf_prepare_sample(struct perf_event_header *header,
 	if (sample_type & PERF_SAMPLE_CODE_PAGE_SIZE)
 		data->code_page_size = perf_get_page_size(data->ip);
 
+	if (sample_type & PERF_SAMPLE_ARCH_1)
+		perf_prepare_sample_arch_1(header, data, event, regs);
+
 	if (sample_type & PERF_SAMPLE_AUX) {
 		u64 size;
 
@@ -12140,6 +12165,12 @@  SYSCALL_DEFINE5(perf_event_open,
 			return err;
 	}
 
+	if (attr.sample_type & PERF_SAMPLE_ARCH_1) {
+		err = perf_event_open_request_arch_1();
+		if (err)
+			return err;
+	}
+
 	/*
 	 * In cgroup mode, the pid argument is used to pass the fd
 	 * opened to the cgroup directory in cgroupfs. The cpu argument