[v5,14/15] rust: sync: reduce stack usage of `UniqueArc::try_new_uninit`

Message ID 20230403160511.174894-7-y86-dev@protonmail.com
State New
Headers
Series Rust pin-init API for pinned initialization of structs |

Commit Message

y86-dev April 3, 2023, 4:05 p.m. UTC
  `UniqueArc::try_new_uninit` calls `Arc::try_new(MaybeUninit::uninit())`.
This results in the uninitialized memory being placed on the stack,
which may be arbitrarily large due to the generic `T` and thus could
cause a stack overflow for large types.

Change the implementation to use the pin-init API which enables in-place
initialization. In particular it avoids having to first construct and
then move the uninitialized memory from the stack into the final location.

Signed-off-by: Benno Lossin <y86-dev@protonmail.com>
Cc: Gary Guo <gary@garyguo.net>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andreas Hindborg <a.hindborg@samsung.com>
---
 rust/kernel/lib.rs      |  1 -
 rust/kernel/sync/arc.rs | 16 +++++++++++++---
 2 files changed, 13 insertions(+), 4 deletions(-)

--
2.39.2
  

Comments

Alice Ryhl April 3, 2023, 5:56 p.m. UTC | #1
On 4/3/23 18:05, Benno Lossin wrote:
> `UniqueArc::try_new_uninit` calls `Arc::try_new(MaybeUninit::uninit())`.
> This results in the uninitialized memory being placed on the stack,
> which may be arbitrarily large due to the generic `T` and thus could
> cause a stack overflow for large types.
> 
> Change the implementation to use the pin-init API which enables in-place
> initialization. In particular it avoids having to first construct and
> then move the uninitialized memory from the stack into the final location.
> 
> Signed-off-by: Benno Lossin <y86-dev@protonmail.com>
> Cc: Gary Guo <gary@garyguo.net>
> Cc: Alice Ryhl <aliceryhl@google.com>
> Cc: Andreas Hindborg <a.hindborg@samsung.com>

Reviewed-by: Alice Ryhl <aliceryhl@google.com>

> 
>       /// Tries to allocate a new [`UniqueArc`] instance whose contents are not initialised yet.
>       pub fn try_new_uninit() -> Result<UniqueArc<MaybeUninit<T>>, AllocError> {
> -        Ok(UniqueArc::<MaybeUninit<T>> {
> +        // INVARIANT: The refcount is initialised to a non-zero value.
> +        let inner = Box::try_init::<AllocError>(try_init!(ArcInner {
> +            // SAFETY: There are no safety requirements for this FFI call.
> +            refcount: Opaque::new(unsafe { bindings::REFCOUNT_INIT(1) }),
> +            data <- init::uninit::<T, AllocError>(),
> +        }? AllocError))?;
> +        Ok(UniqueArc {
>               // INVARIANT: The newly-created object has a ref-count of 1.
> -            inner: Arc::try_new(MaybeUninit::uninit())?,
> +            // SAFETY: The pointer from the `Box` is valid.
> +            inner: unsafe { Arc::from_inner(Box::leak(inner).into()) },
>           })
>       }
>   }

I'm curious - do you know whether this compiles to the same machine code 
as this?

pub fn try_new_uninit() -> Result<UniqueArc<MaybeUninit<T>>, AllocError> {
     let inner: Box<MaybeUninit<ArcInner<T>>> = Box::try_new_uninit()?;
     let ptr = Box::into_raw(inner) as *mut ArcInner<T>;
     addr_of_mut!((*ptr).refcount).write(bindings::REFCOUNT_INIT(1));
     Ok(UniqueArc {
         inner: Arc {
             ptr: unsafe { NonNull::new_unchecked(ptr) },
             _p: PhantomData,
         }
     })
}
  
y86-dev April 3, 2023, 9:06 p.m. UTC | #2
On 03.04.23 19:56, Alice Ryhl wrote:
> On 4/3/23 18:05, Benno Lossin wrote:
>>        /// Tries to allocate a new [`UniqueArc`] instance whose contents are not initialised yet.
>>        pub fn try_new_uninit() -> Result<UniqueArc<MaybeUninit<T>>, AllocError> {
>> -        Ok(UniqueArc::<MaybeUninit<T>> {
>> +        // INVARIANT: The refcount is initialised to a non-zero value.
>> +        let inner = Box::try_init::<AllocError>(try_init!(ArcInner {
>> +            // SAFETY: There are no safety requirements for this FFI call.
>> +            refcount: Opaque::new(unsafe { bindings::REFCOUNT_INIT(1) }),
>> +            data <- init::uninit::<T, AllocError>(),
>> +        }? AllocError))?;
>> +        Ok(UniqueArc {
>>                // INVARIANT: The newly-created object has a ref-count of 1.
>> -            inner: Arc::try_new(MaybeUninit::uninit())?,
>> +            // SAFETY: The pointer from the `Box` is valid.
>> +            inner: unsafe { Arc::from_inner(Box::leak(inner).into()) },
>>            })
>>        }
>>    }
> 
> I'm curious - do you know whether this compiles to the same machine code
> as this?
> 
> pub fn try_new_uninit() -> Result<UniqueArc<MaybeUninit<T>>, AllocError> {
>       let inner: Box<MaybeUninit<ArcInner<T>>> = Box::try_new_uninit()?;
>       let ptr = Box::into_raw(inner) as *mut ArcInner<T>;
>       addr_of_mut!((*ptr).refcount).write(bindings::REFCOUNT_INIT(1));
>       Ok(UniqueArc {
>           inner: Arc {
>               ptr: unsafe { NonNull::new_unchecked(ptr) },
>               _p: PhantomData,
>           }
>       })
> }

Yes it compiles to exactly the same assembly (byte for byte), because I was
not sure if I was compiling the right thing, I compiled this function:

    unsafe { core::arch::asm!("ud2") };
    let r1 = UniqueArc::try_new_uninit();
    unsafe { core::arch::asm!("ud2") };
    let r2 = UniqueArc::try_new_uninit2();
    unsafe { core::arch::asm!("ud2") };

The `ud2` instructions are for better visibility, as I have not read a lot
of assembly. The above disassembles to this:

    ffffffff8143bb80 <_RNvXCsdVu6umiBwhr_12rust_minimalNtB2_11RustMinimalNtCsfATHBUcknU9_6kernel6Module4init>:
    ffffffff8143bb80:	41 57                	push   r15
    ffffffff8143bb82:	41 56                	push   r14
    ffffffff8143bb84:	53                   	push   rbx
    ffffffff8143bb85:	49 89 ff             	mov    r15,rdi
    ffffffff8143bb88:	0f 0b                	ud2
    ffffffff8143bb8a:	bf 04 00 10 00       	mov    edi,0x100004
    ffffffff8143bb8f:	be 04 00 00 00       	mov    esi,0x4
    ffffffff8143bb94:	e8 a7 23 f5 ff       	call   ffffffff8138df40 <__rust_alloc>
    ffffffff8143bb99:	48 85 c0             	test   rax,rax
    ffffffff8143bb9c:	74 12                	je     ffffffff8143bbb0 <_RNvXCsdVu6umiBwhr_12rust_minimalNtB2_11RustMinimalNtCsfATHBUcknU9_6kernel6Module4init+0x30>
    ffffffff8143bb9e:	49 89 c6             	mov    r14,rax
    ffffffff8143bba1:	bf 01 00 00 00       	mov    edi,0x1
    ffffffff8143bba6:	e8 e5 fc f4 ff       	call   ffffffff8138b890 <rust_helper_REFCOUNT_INIT>
    ffffffff8143bbab:	41 89 06             	mov    DWORD PTR [r14],eax
    ffffffff8143bbae:	eb 03                	jmp    ffffffff8143bbb3 <_RNvXCsdVu6umiBwhr_12rust_minimalNtB2_11RustMinimalNtCsfATHBUcknU9_6kernel6Module4init+0x33>
    ffffffff8143bbb0:	45 31 f6             	xor    r14d,r14d
    ffffffff8143bbb3:	0f 0b                	ud2
    ffffffff8143bbb5:	bf 04 00 10 00       	mov    edi,0x100004
    ffffffff8143bbba:	be 04 00 00 00       	mov    esi,0x4
    ffffffff8143bbbf:	e8 7c 23 f5 ff       	call   ffffffff8138df40 <__rust_alloc>
    ffffffff8143bbc4:	48 85 c0             	test   rax,rax
    ffffffff8143bbc7:	74 11                	je     ffffffff8143bbda <_RNvXCsdVu6umiBwhr_12rust_minimalNtB2_11RustMinimalNtCsfATHBUcknU9_6kernel6Module4init+0x5a>
    ffffffff8143bbc9:	48 89 c3             	mov    rbx,rax
    ffffffff8143bbcc:	bf 01 00 00 00       	mov    edi,0x1
    ffffffff8143bbd1:	e8 ba fc f4 ff       	call   ffffffff8138b890 <rust_helper_REFCOUNT_INIT>
    ffffffff8143bbd6:	89 03                	mov    DWORD PTR [rbx],eax
    ffffffff8143bbd8:	eb 02                	jmp    ffffffff8143bbdc <_RNvXCsdVu6umiBwhr_12rust_minimalNtB2_11RustMinimalNtCsfATHBUcknU9_6kernel6Module4init+0x5c>
    ffffffff8143bbda:	31 db                	xor    ebx,ebx
    ffffffff8143bbdc:	0f 0b                	ud2
    ffffffff8143bbde:	4d 89 77 08          	mov    QWORD PTR [r15+0x8],r14
    ffffffff8143bbe2:	49 89 5f 10          	mov    QWORD PTR [r15+0x10],rbx
    ffffffff8143bbe6:	41 c7 07 00 00 00 00 	mov    DWORD PTR [r15],0x0
    ffffffff8143bbed:	4c 89 f8             	mov    rax,r15
    ffffffff8143bbf0:	5b                   	pop    rbx
    ffffffff8143bbf1:	41 5e                	pop    r14
    ffffffff8143bbf3:	41 5f                	pop    r15
    ffffffff8143bbf5:	c3                   	ret

I have not done extensive enough tests to be sure about this in general,
but most of the examples of the pin-init API that I looked at were
optimized to the same assembly as manual initialization.

The only examples were this was not the case was when I had triply nested
structs with `Box`es that all were initialized via `PinInit`. That was the
point were the initialization closure was not inlined any more.
I also verified that in this particular case the closure was again inlined
after adding `#[inline]` to it (which requires `stmt_expr_attributes`).

At some point I might do a more thorough analysis.
  
Gary Guo April 4, 2023, 1:25 p.m. UTC | #3
On Mon, 03 Apr 2023 16:05:56 +0000
Benno Lossin <y86-dev@protonmail.com> wrote:

> `UniqueArc::try_new_uninit` calls `Arc::try_new(MaybeUninit::uninit())`.
> This results in the uninitialized memory being placed on the stack,
> which may be arbitrarily large due to the generic `T` and thus could
> cause a stack overflow for large types.
> 
> Change the implementation to use the pin-init API which enables in-place
> initialization. In particular it avoids having to first construct and
> then move the uninitialized memory from the stack into the final location.
> 
> Signed-off-by: Benno Lossin <y86-dev@protonmail.com>
> Cc: Gary Guo <gary@garyguo.net>
> Cc: Alice Ryhl <aliceryhl@google.com>
> Cc: Andreas Hindborg <a.hindborg@samsung.com>

Reviewed-by: Gary Guo <gary@garyguo.net>

> ---
>  rust/kernel/lib.rs      |  1 -
>  rust/kernel/sync/arc.rs | 16 +++++++++++++---
>  2 files changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> index 821bd067151c..2d7606135ef6 100644
> --- a/rust/kernel/lib.rs
> +++ b/rust/kernel/lib.rs
> @@ -28,7 +28,6 @@
>  #[cfg(not(CONFIG_RUST))]
>  compile_error!("Missing kernel configuration for conditional compilation");
> 
> -#[allow(unused_extern_crates)]
>  // Allow proc-macros to refer to `::kernel` inside the `kernel` crate (this crate).
>  extern crate self as kernel;
> 
> diff --git a/rust/kernel/sync/arc.rs b/rust/kernel/sync/arc.rs
> index 43a53fbe175d..d05caa723718 100644
> --- a/rust/kernel/sync/arc.rs
> +++ b/rust/kernel/sync/arc.rs
> @@ -18,7 +18,8 @@
>  use crate::{
>      bindings,
>      error::{self, Error},
> -    init::{InPlaceInit, Init, PinInit},
> +    init::{self, InPlaceInit, Init, PinInit},
> +    try_init,
>      types::{ForeignOwnable, Opaque},
>  };
>  use alloc::boxed::Box;
> @@ -30,6 +31,7 @@ use core::{
>      pin::Pin,
>      ptr::NonNull,
>  };
> +use macros::pin_data;
> 
>  /// A reference-counted pointer to an instance of `T`.
>  ///
> @@ -122,6 +124,7 @@ pub struct Arc<T: ?Sized> {
>      _p: PhantomData<ArcInner<T>>,
>  }
> 
> +#[pin_data]
>  #[repr(C)]
>  struct ArcInner<T: ?Sized> {
>      refcount: Opaque<bindings::refcount_t>,
> @@ -502,9 +505,16 @@ impl<T> UniqueArc<T> {
> 
>      /// Tries to allocate a new [`UniqueArc`] instance whose contents are not initialised yet.
>      pub fn try_new_uninit() -> Result<UniqueArc<MaybeUninit<T>>, AllocError> {
> -        Ok(UniqueArc::<MaybeUninit<T>> {
> +        // INVARIANT: The refcount is initialised to a non-zero value.
> +        let inner = Box::try_init::<AllocError>(try_init!(ArcInner {
> +            // SAFETY: There are no safety requirements for this FFI call.
> +            refcount: Opaque::new(unsafe { bindings::REFCOUNT_INIT(1) }),
> +            data <- init::uninit::<T, AllocError>(),
> +        }? AllocError))?;
> +        Ok(UniqueArc {
>              // INVARIANT: The newly-created object has a ref-count of 1.
> -            inner: Arc::try_new(MaybeUninit::uninit())?,
> +            // SAFETY: The pointer from the `Box` is valid.
> +            inner: unsafe { Arc::from_inner(Box::leak(inner).into()) },
>          })
>      }
>  }
> --
> 2.39.2
> 
>
  

Patch

diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 821bd067151c..2d7606135ef6 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -28,7 +28,6 @@ 
 #[cfg(not(CONFIG_RUST))]
 compile_error!("Missing kernel configuration for conditional compilation");

-#[allow(unused_extern_crates)]
 // Allow proc-macros to refer to `::kernel` inside the `kernel` crate (this crate).
 extern crate self as kernel;

diff --git a/rust/kernel/sync/arc.rs b/rust/kernel/sync/arc.rs
index 43a53fbe175d..d05caa723718 100644
--- a/rust/kernel/sync/arc.rs
+++ b/rust/kernel/sync/arc.rs
@@ -18,7 +18,8 @@ 
 use crate::{
     bindings,
     error::{self, Error},
-    init::{InPlaceInit, Init, PinInit},
+    init::{self, InPlaceInit, Init, PinInit},
+    try_init,
     types::{ForeignOwnable, Opaque},
 };
 use alloc::boxed::Box;
@@ -30,6 +31,7 @@  use core::{
     pin::Pin,
     ptr::NonNull,
 };
+use macros::pin_data;

 /// A reference-counted pointer to an instance of `T`.
 ///
@@ -122,6 +124,7 @@  pub struct Arc<T: ?Sized> {
     _p: PhantomData<ArcInner<T>>,
 }

+#[pin_data]
 #[repr(C)]
 struct ArcInner<T: ?Sized> {
     refcount: Opaque<bindings::refcount_t>,
@@ -502,9 +505,16 @@  impl<T> UniqueArc<T> {

     /// Tries to allocate a new [`UniqueArc`] instance whose contents are not initialised yet.
     pub fn try_new_uninit() -> Result<UniqueArc<MaybeUninit<T>>, AllocError> {
-        Ok(UniqueArc::<MaybeUninit<T>> {
+        // INVARIANT: The refcount is initialised to a non-zero value.
+        let inner = Box::try_init::<AllocError>(try_init!(ArcInner {
+            // SAFETY: There are no safety requirements for this FFI call.
+            refcount: Opaque::new(unsafe { bindings::REFCOUNT_INIT(1) }),
+            data <- init::uninit::<T, AllocError>(),
+        }? AllocError))?;
+        Ok(UniqueArc {
             // INVARIANT: The newly-created object has a ref-count of 1.
-            inner: Arc::try_new(MaybeUninit::uninit())?,
+            // SAFETY: The pointer from the `Box` is valid.
+            inner: unsafe { Arc::from_inner(Box::leak(inner).into()) },
         })
     }
 }