[05/30] selftests/rseq: Use ELF auxiliary vector for extensible rseq

Message ID 20221122203932.231377-6-mathieu.desnoyers@efficios.com
State New
Headers
Series RSEQ node id and mm concurrency id extensions |

Commit Message

Mathieu Desnoyers Nov. 22, 2022, 8:39 p.m. UTC
  Use the ELF auxiliary vector AT_RSEQ_FEATURE_SIZE to detect the RSEQ
features supported by the kernel.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 tools/testing/selftests/rseq/rseq-abi.h |  5 ++
 tools/testing/selftests/rseq/rseq.c     | 67 ++++++++++++++++++++++---
 tools/testing/selftests/rseq/rseq.h     | 18 +++++--
 3 files changed, 78 insertions(+), 12 deletions(-)
  

Comments

Florian Weimer Jan. 4, 2023, 7:14 p.m. UTC | #1
* Mathieu Desnoyers:

> +static
> +unsigned int get_rseq_feature_size(void)
> +{
> +	unsigned long auxv_rseq_feature_size, auxv_rseq_align;
> +
> +	auxv_rseq_align = getauxval(AT_RSEQ_ALIGN);
> +	assert(!auxv_rseq_align || auxv_rseq_align <= RSEQ_THREAD_AREA_ALLOC_SIZE);
> +
> +	auxv_rseq_feature_size = getauxval(AT_RSEQ_FEATURE_SIZE);
> +	assert(!auxv_rseq_feature_size || auxv_rseq_feature_size <= RSEQ_THREAD_AREA_ALLOC_SIZE);
> +	if (auxv_rseq_feature_size)
> +		return auxv_rseq_feature_size;
> +	else
> +		return ORIG_RSEQ_FEATURE_SIZE;
> +}

Do you intend to use the auxiliary vector as the userspace handshake
for glibc-managed rseq, too?  I don't think it works if the kernel
overtakes glibc.  Or is there some other approach shown in the series
that I missed?

Maybe we should just skip the existing padding and use it only for
some vaguely kernel-internal purpose (say through a vDSO helper), so
that it is less of an issue how to communicate the presence of these
fields to userspace.
  
Mathieu Desnoyers Jan. 4, 2023, 7:51 p.m. UTC | #2
On 2023-01-04 14:14, Florian Weimer wrote:
> * Mathieu Desnoyers:
> 
>> +static
>> +unsigned int get_rseq_feature_size(void)
>> +{
>> +	unsigned long auxv_rseq_feature_size, auxv_rseq_align;
>> +
>> +	auxv_rseq_align = getauxval(AT_RSEQ_ALIGN);
>> +	assert(!auxv_rseq_align || auxv_rseq_align <= RSEQ_THREAD_AREA_ALLOC_SIZE);
>> +
>> +	auxv_rseq_feature_size = getauxval(AT_RSEQ_FEATURE_SIZE);
>> +	assert(!auxv_rseq_feature_size || auxv_rseq_feature_size <= RSEQ_THREAD_AREA_ALLOC_SIZE);
>> +	if (auxv_rseq_feature_size)
>> +		return auxv_rseq_feature_size;
>> +	else
>> +		return ORIG_RSEQ_FEATURE_SIZE;
>> +}
> 
> Do you intend to use the auxiliary vector as the userspace handshake
> for glibc-managed rseq, too?

Yes.

I don't think it works if the kernel
> overtakes glibc.  Or is there some other approach shown in the series
> that I missed?

The handshake I am proposing is as follows:

1- libc init:

issues getauxval(AT_RSEQ_FEATURE_SIZE) to learn the rseq feature size supported
by the Linux kernel. It can be either:

a) getauxval(AT_RSEQ_FEATURE_SIZE) == 0, errno=ENOENT: pre-6.3 kernel or CONFIG_RSEQ=n.

Need to issue the rseq system call to figure out if rseq is implemented/available or not.
If rseq is indeed implemented, use a __rseq_size=32.

b) getauxval(AT_RSEQ_FEATURE_SIZE) > 0:

The kernel exposes its supported rseq feature size. libc either needs to register rseq with a rseq_len
of 32-byte (original size), or with a rseq_len larger than 32 bytes with enough space to hold all
features.

2- Now about applications (and libc) use of rseq fields:

Using both __rseq_size (from libc) and the result of getauxval(AT_RSEQ_FEATURE_SIZE),
a rseq user can figure which rseq fields can indeed be used. The important part is
how get_rseq_feature_size() is called in the rseq selftests:


                 rseq_feature_size = get_rseq_feature_size();
                 if (rseq_feature_size > rseq_size)
                         rseq_feature_size = rseq_size;

which basically sets rseq_feature_size to the feature size exposed by the kernel, except
if libc's __rseq_size is smaller than the feature size exposed by the kernel, in which case
it will truncate the rseq_feature_size to __rseq_size.

This allows rseq users to know which feature set is supported by the kernel and for which
libc has allocated enough space.

The only thing here is that rseq users cannot rely on libc's __rseq_size symbol to get the
feature size. But considering that this is a contract between the kernel and the rseq user
(libc is mostly just there to allocate per-thread memory), I don't think it's a concern to
request users to query getauxval(AT_RSEQ_FEATURE_SIZE) in addition to load __rseq_size.

> 
> Maybe we should just skip the existing padding and use it only for
> some vaguely kernel-internal purpose (say through a vDSO helper), so
> that it is less of an issue how to communicate the presence of these
> fields to userspace.

The fact that libc's __rseq_size is included the original struct rseq padding is unfortunate,
but I really see this as a purely userspace ABI concern, which should not dictate the layout
of the kernel ABI exposed to user-space, especially given that all the information required to
allow rseq users to know which fields can be used is readily available by combining the value
loaded from __rseq_size and the result of getauxval(AT_RSEQ_FEATURE_SIZE).

Thoughts ?

Thanks,

Mathieu
  
Florian Weimer Jan. 5, 2023, 4:19 p.m. UTC | #3
* Mathieu Desnoyers:

> 2- Now about applications (and libc) use of rseq fields:
>
> Using both __rseq_size (from libc) and the result of
> getauxval(AT_RSEQ_FEATURE_SIZE), a rseq user can figure which rseq
> fields can indeed be used. The important part is how
> get_rseq_feature_size() is called in the rseq selftests:
>
>
>                  rseq_feature_size = get_rseq_feature_size();
>                  if (rseq_feature_size > rseq_size)
>                          rseq_feature_size = rseq_size;
>
> which basically sets rseq_feature_size to the feature size exposed
> by the kernel, except if libc's __rseq_size is smaller than the
> feature size exposed by the kernel, in which case it will truncate
> the rseq_feature_size to __rseq_size.

Ahh, this happens to work because we pass 32 today from glibc, and
there is nothing left to do in glibc to enable these new fields.

If true, that really argues in favor of this approach.

>> Maybe we should just skip the existing padding and use it only for
>> some vaguely kernel-internal purpose (say through a vDSO helper), so
>> that it is less of an issue how to communicate the presence of these
>> fields to userspace.
>
> The fact that libc's __rseq_size is included the original struct
> rseq padding is unfortunate, but I really see this as a purely
> userspace ABI concern, which should not dictate the layout of the
> kernel ABI exposed to user-space, especially given that all the
> information required to allow rseq users to know which fields can be
> used is readily available by combining the value loaded from
> __rseq_size and the result of getauxval(AT_RSEQ_FEATURE_SIZE).

But we must pass size 32 to the kernel today, otherwise rseq
registration fails.  It's a kernel-mandated value, not something
that's purely a userspace concern.
  
Mathieu Desnoyers Jan. 5, 2023, 4:28 p.m. UTC | #4
On 2023-01-05 11:19, Florian Weimer wrote:
> * Mathieu Desnoyers:
> 
>> 2- Now about applications (and libc) use of rseq fields:
>>
>> Using both __rseq_size (from libc) and the result of
>> getauxval(AT_RSEQ_FEATURE_SIZE), a rseq user can figure which rseq
>> fields can indeed be used. The important part is how
>> get_rseq_feature_size() is called in the rseq selftests:
>>
>>
>>                   rseq_feature_size = get_rseq_feature_size();
>>                   if (rseq_feature_size > rseq_size)
>>                           rseq_feature_size = rseq_size;
>>
>> which basically sets rseq_feature_size to the feature size exposed
>> by the kernel, except if libc's __rseq_size is smaller than the
>> feature size exposed by the kernel, in which case it will truncate
>> the rseq_feature_size to __rseq_size.
> 
> Ahh, this happens to work because we pass 32 today from glibc, and
> there is nothing left to do in glibc to enable these new fields.
> 
> If true, that really argues in favor of this approach.

Yes, you are correct..

> 
>>> Maybe we should just skip the existing padding and use it only for
>>> some vaguely kernel-internal purpose (say through a vDSO helper), so
>>> that it is less of an issue how to communicate the presence of these
>>> fields to userspace.
>>
>> The fact that libc's __rseq_size is included the original struct
>> rseq padding is unfortunate, but I really see this as a purely
>> userspace ABI concern, which should not dictate the layout of the
>> kernel ABI exposed to user-space, especially given that all the
>> information required to allow rseq users to know which fields can be
>> used is readily available by combining the value loaded from
>> __rseq_size and the result of getauxval(AT_RSEQ_FEATURE_SIZE).
> 
> But we must pass size 32 to the kernel today, otherwise rseq
> registration fails.  It's a kernel-mandated value, not something
> that's purely a userspace concern.

What I mean when stating the "userspace concern" is the semantic of the libc's
__rseq_size symbol: whether it means allocated space (including padding)
or actively populated feature fields.

In terms of rseq registration rseq_len argument, here is the updated check
in the system call:

         /*
          * If there was no rseq previously registered, ensure the provided rseq
          * is properly aligned, as communcated to user-space through the ELF
          * auxiliary vector AT_RSEQ_ALIGN. If rseq_len is the original rseq
          * size, the required alignment is the original struct rseq alignment.
          *
          * In order to be valid, rseq_len is either the original rseq size, or
          * large enough to contain all supported fields, as communicated to
          * user-space through the ELF auxiliary vector AT_RSEQ_FEATURE_SIZE.
          */
         if (rseq_len < ORIG_RSEQ_SIZE ||
             (rseq_len == ORIG_RSEQ_SIZE && !IS_ALIGNED((unsigned long)rseq, ORIG_RSEQ_SIZE)) ||
             (rseq_len != ORIG_RSEQ_SIZE && (!IS_ALIGNED((unsigned long)rseq, __alignof__(*rseq)) ||
                                             rseq_len < offsetof(struct rseq, end))))
                 return -EINVAL;

Which keeps accepting rseq_len=32 (original ABI), else requires that enough space
is available to hold all supported feature fields (but never less than 32 bytes).

Do my explanations take care of your concerns, or are there still aspects that
you are uneasy with ?

Thanks,

Mathieu
  

Patch

diff --git a/tools/testing/selftests/rseq/rseq-abi.h b/tools/testing/selftests/rseq/rseq-abi.h
index a8c44d9af71f..00ac846d85b0 100644
--- a/tools/testing/selftests/rseq/rseq-abi.h
+++ b/tools/testing/selftests/rseq/rseq-abi.h
@@ -146,6 +146,11 @@  struct rseq_abi {
 	 *     this thread.
 	 */
 	__u32 flags;
+
+	/*
+	 * Flexible array member at end of structure, after last feature field.
+	 */
+	char end[];
 } __attribute__((aligned(4 * sizeof(__u64))));
 
 #endif /* _RSEQ_ABI_H */
diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c
index 376a73f1ac41..1e8e3265bdbf 100644
--- a/tools/testing/selftests/rseq/rseq.c
+++ b/tools/testing/selftests/rseq/rseq.c
@@ -28,6 +28,8 @@ 
 #include <limits.h>
 #include <dlfcn.h>
 #include <stddef.h>
+#include <sys/auxv.h>
+#include <linux/auxvec.h>
 
 #include "../kselftest.h"
 #include "rseq.h"
@@ -36,20 +38,38 @@  static const ptrdiff_t *libc_rseq_offset_p;
 static const unsigned int *libc_rseq_size_p;
 static const unsigned int *libc_rseq_flags_p;
 
-/* Offset from the thread pointer to the rseq area.  */
+/* Offset from the thread pointer to the rseq area. */
 ptrdiff_t rseq_offset;
 
-/* Size of the registered rseq area.  0 if the registration was
-   unsuccessful.  */
+/*
+ * Size of the registered rseq area. 0 if the registration was
+ * unsuccessful.
+ */
 unsigned int rseq_size = -1U;
 
 /* Flags used during rseq registration.  */
 unsigned int rseq_flags;
 
+/*
+ * rseq feature size supported by the kernel. 0 if the registration was
+ * unsuccessful.
+ */
+unsigned int rseq_feature_size = -1U;
+
 static int rseq_ownership;
+static int rseq_reg_success;	/* At least one rseq registration has succeded. */
+
+/* Allocate a large area for the TLS. */
+#define RSEQ_THREAD_AREA_ALLOC_SIZE	1024
+
+/* Original struct rseq feature size is 20 bytes. */
+#define ORIG_RSEQ_FEATURE_SIZE		20
+
+/* Original struct rseq allocation size is 32 bytes. */
+#define ORIG_RSEQ_ALLOC_SIZE		32
 
 static
-__thread struct rseq_abi __rseq_abi __attribute__((tls_model("initial-exec"))) = {
+__thread struct rseq_abi __rseq_abi __attribute__((tls_model("initial-exec"), aligned(RSEQ_THREAD_AREA_ALLOC_SIZE))) = {
 	.cpu_id = RSEQ_ABI_CPU_ID_UNINITIALIZED,
 };
 
@@ -84,10 +104,16 @@  int rseq_register_current_thread(void)
 		/* Treat libc's ownership as a successful registration. */
 		return 0;
 	}
-	rc = sys_rseq(&__rseq_abi, sizeof(struct rseq_abi), 0, RSEQ_SIG);
-	if (rc)
+	rc = sys_rseq(&__rseq_abi, rseq_size, 0, RSEQ_SIG);
+	if (rc) {
+		if (RSEQ_READ_ONCE(rseq_reg_success)) {
+			/* Incoherent success/failure within process. */
+			abort();
+		}
 		return -1;
+	}
 	assert(rseq_current_cpu_raw() >= 0);
+	RSEQ_WRITE_ONCE(rseq_reg_success, 1);
 	return 0;
 }
 
@@ -99,12 +125,28 @@  int rseq_unregister_current_thread(void)
 		/* Treat libc's ownership as a successful unregistration. */
 		return 0;
 	}
-	rc = sys_rseq(&__rseq_abi, sizeof(struct rseq_abi), RSEQ_ABI_FLAG_UNREGISTER, RSEQ_SIG);
+	rc = sys_rseq(&__rseq_abi, rseq_size, RSEQ_ABI_FLAG_UNREGISTER, RSEQ_SIG);
 	if (rc)
 		return -1;
 	return 0;
 }
 
+static
+unsigned int get_rseq_feature_size(void)
+{
+	unsigned long auxv_rseq_feature_size, auxv_rseq_align;
+
+	auxv_rseq_align = getauxval(AT_RSEQ_ALIGN);
+	assert(!auxv_rseq_align || auxv_rseq_align <= RSEQ_THREAD_AREA_ALLOC_SIZE);
+
+	auxv_rseq_feature_size = getauxval(AT_RSEQ_FEATURE_SIZE);
+	assert(!auxv_rseq_feature_size || auxv_rseq_feature_size <= RSEQ_THREAD_AREA_ALLOC_SIZE);
+	if (auxv_rseq_feature_size)
+		return auxv_rseq_feature_size;
+	else
+		return ORIG_RSEQ_FEATURE_SIZE;
+}
+
 static __attribute__((constructor))
 void rseq_init(void)
 {
@@ -117,16 +159,24 @@  void rseq_init(void)
 		rseq_offset = *libc_rseq_offset_p;
 		rseq_size = *libc_rseq_size_p;
 		rseq_flags = *libc_rseq_flags_p;
+		rseq_feature_size = get_rseq_feature_size();
+		if (rseq_feature_size > rseq_size)
+			rseq_feature_size = rseq_size;
 		return;
 	}
 	rseq_ownership = 1;
 	if (!rseq_available()) {
 		rseq_size = 0;
+		rseq_feature_size = 0;
 		return;
 	}
 	rseq_offset = (void *)&__rseq_abi - rseq_thread_pointer();
-	rseq_size = sizeof(struct rseq_abi);
 	rseq_flags = 0;
+	rseq_feature_size = get_rseq_feature_size();
+	if (rseq_feature_size == ORIG_RSEQ_FEATURE_SIZE)
+		rseq_size = ORIG_RSEQ_ALLOC_SIZE;
+	else
+		rseq_size = RSEQ_THREAD_AREA_ALLOC_SIZE;
 }
 
 static __attribute__((destructor))
@@ -136,6 +186,7 @@  void rseq_exit(void)
 		return;
 	rseq_offset = 0;
 	rseq_size = -1U;
+	rseq_feature_size = -1U;
 	rseq_ownership = 0;
 }
 
diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
index 6f7513384bf5..95adc1e1b0db 100644
--- a/tools/testing/selftests/rseq/rseq.h
+++ b/tools/testing/selftests/rseq/rseq.h
@@ -47,14 +47,24 @@ 
 
 #include "rseq-thread-pointer.h"
 
-/* Offset from the thread pointer to the rseq area.  */
+/* Offset from the thread pointer to the rseq area. */
 extern ptrdiff_t rseq_offset;
-/* Size of the registered rseq area.  0 if the registration was
-   unsuccessful.  */
+
+/*
+ * Size of the registered rseq area. 0 if the registration was
+ * unsuccessful.
+ */
 extern unsigned int rseq_size;
-/* Flags used during rseq registration.  */
+
+/* Flags used during rseq registration. */
 extern unsigned int rseq_flags;
 
+/*
+ * rseq feature size supported by the kernel. 0 if the registration was
+ * unsuccessful.
+ */
+extern unsigned int rseq_feature_size;
+
 static inline struct rseq_abi *rseq_get_abi(void)
 {
 	return (struct rseq_abi *) ((uintptr_t) rseq_thread_pointer() + rseq_offset);