[v1,21/28] rust: str: add `CString` type

Message ID 20221110164152.26136-22-ojeda@kernel.org
State New
Headers
Series Rust core additions |

Commit Message

Miguel Ojeda Nov. 10, 2022, 4:41 p.m. UTC
  From: Wedson Almeida Filho <wedsonaf@gmail.com>

Add the `CString` type, which is an owned string that is guaranteed
to have exactly one `NUL` byte at the end, i.e. the owned equivalent
to `CStr` introduced earlier.

It is used for interoperability with kernel APIs that take C strings.

In order to do so, implement the `RawFormatter::new()` constructor
and the `RawFormatter::bytes_written()` method as well.

Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com>
[Reworded, adapted for upstream and applied latest changes]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
---
 rust/kernel/str.rs | 91 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 89 insertions(+), 2 deletions(-)
  

Comments

Gary Guo Nov. 14, 2022, 2:53 p.m. UTC | #1
On Thu, 10 Nov 2022 17:41:33 +0100
Miguel Ojeda <ojeda@kernel.org> wrote:

> From: Wedson Almeida Filho <wedsonaf@gmail.com>
> 
> Add the `CString` type, which is an owned string that is guaranteed
> to have exactly one `NUL` byte at the end, i.e. the owned equivalent
> to `CStr` introduced earlier.
> 
> It is used for interoperability with kernel APIs that take C strings.
> 
> In order to do so, implement the `RawFormatter::new()` constructor
> and the `RawFormatter::bytes_written()` method as well.
> 
> Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com>
> [Reworded, adapted for upstream and applied latest changes]
> Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
> ---
>  rust/kernel/str.rs | 91 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 89 insertions(+), 2 deletions(-)
> 
> diff --git a/rust/kernel/str.rs b/rust/kernel/str.rs
> index db6473db31c6..877148b77e71 100644
> --- a/rust/kernel/str.rs
> +++ b/rust/kernel/str.rs
> @@ -2,6 +2,7 @@
>  
>  //! String representations.
>  
> +use alloc::vec::Vec;
>  use core::fmt::{self, Write};
>  use core::ops::{self, Deref, Index};
>  
> @@ -393,13 +394,22 @@ mod tests {
>  /// is less than `end`.
>  pub(crate) struct RawFormatter {
>      // Use `usize` to use `saturating_*` functions.
> -    #[allow(dead_code)]
>      beg: usize,
>      pos: usize,
>      end: usize,
>  }
>  
>  impl RawFormatter {
> +    /// Creates a new instance of [`RawFormatter`] with an empty buffer.
> +    fn new() -> Self {
> +        // INVARIANT: The buffer is empty, so the region that needs to be writable is empty.
> +        Self {
> +            beg: 0,
> +            pos: 0,
> +            end: 0,
> +        }
> +    }
> +
>      /// Creates a new instance of [`RawFormatter`] with the given buffer pointers.
>      ///
>      /// # Safety
> @@ -438,6 +448,11 @@ impl RawFormatter {
>      pub(crate) fn pos(&self) -> *mut u8 {
>          self.pos as _
>      }
> +
> +    /// Return the number of bytes written to the formatter.
> +    pub(crate) fn bytes_written(&self) -> usize {
> +        self.pos - self.beg
> +    }
>  }
>  
>  impl fmt::Write for RawFormatter {
> @@ -478,7 +493,6 @@ impl Formatter {
>      ///
>      /// The memory region starting at `buf` and extending for `len` bytes must be valid for writes
>      /// for the lifetime of the returned [`Formatter`].
> -    #[allow(dead_code)]
>      pub(crate) unsafe fn from_buffer(buf: *mut u8, len: usize) -> Self {
>          // SAFETY: The safety requirements of this function satisfy those of the callee.
>          Self(unsafe { RawFormatter::from_buffer(buf, len) })
> @@ -505,3 +519,76 @@ impl fmt::Write for Formatter {
>          }
>      }
>  }
> +
> +/// An owned string that is guaranteed to have exactly one `NUL` byte, which is at the end.
> +///
> +/// Used for interoperability with kernel APIs that take C strings.
> +///
> +/// # Invariants
> +///
> +/// The string is always `NUL`-terminated and contains no other `NUL` bytes.
> +///
> +/// # Examples
> +///
> +/// ```
> +/// use kernel::str::CString;
> +///
> +/// let s = CString::try_from_fmt(fmt!("{}{}{}", "abc", 10, 20)).unwrap();
> +/// assert_eq!(s.as_bytes_with_nul(), "abc1020\0".as_bytes());
> +///
> +/// let tmp = "testing";
> +/// let s = CString::try_from_fmt(fmt!("{tmp}{}", 123)).unwrap();
> +/// assert_eq!(s.as_bytes_with_nul(), "testing123\0".as_bytes());
> +///
> +/// // This fails because it has an embedded `NUL` byte.
> +/// let s = CString::try_from_fmt(fmt!("a\0b{}", 123));
> +/// assert_eq!(s.is_ok(), false);
> +/// ```
> +pub struct CString {
> +    buf: Vec<u8>,
> +}
> +
> +impl CString {
> +    /// Creates an instance of [`CString`] from the given formatted arguments.
> +    pub fn try_from_fmt(args: fmt::Arguments<'_>) -> Result<Self, Error> {
> +        // Calculate the size needed (formatted string plus `NUL` terminator).
> +        let mut f = RawFormatter::new();
> +        f.write_fmt(args)?;
> +        f.write_str("\0")?;

I haven't checked the assembly, so this is possibly optimized out
already, but I feel that this line could be removed and we just use
`f.bytes_written() + 1` instead on the following line.

> +        let size = f.bytes_written();
> +
> +        // Allocate a vector with the required number of bytes, and write to it.
> +        let mut buf = Vec::try_with_capacity(size)?;
> +        // SAFETY: The buffer stored in `buf` is at least of size `size` and is valid for writes.
> +        let mut f = unsafe { Formatter::from_buffer(buf.as_mut_ptr(), size) };
> +        f.write_fmt(args)?;
> +        f.write_str("\0")?;
> +
> +        // SAFETY: The number of bytes that can be written to `f` is bounded by `size`, which is
> +        // `buf`'s capacity. The contents of the buffer have been initialised by writes to `f`.
> +        unsafe { buf.set_len(f.bytes_written()) };

`f.bytes_written() <= size` does not always hold. It holds here because
otherwise the `?` operator above would return an error early
(guaranteed by `impl Write for Formatter`). I feel that this fact is
not clearly stated in the safety comment but I don't have a good way to
rephrase this either.

> +
> +        // Check that there are no `NUL` bytes before the end.
> +        // SAFETY: The buffer is valid for read because `f.bytes_written()` is bounded by `size`
> +        // (which the minimum buffer size) and is non-zero (we wrote at least the `NUL` terminator)
> +        // so `f.bytes_written() - 1` doesn't underflow.
> +        let ptr = unsafe { bindings::memchr(buf.as_ptr().cast(), 0, (f.bytes_written() - 1) as _) };

How about just use `if buf[..(f.bytes_written() - 1)].contains(&0) {`
here? libcore specialises `<[u8]>::contains` to use a faster search
than `.iter().any(...)`.

> +        if !ptr.is_null() {
> +            return Err(EINVAL);
> +        }
> +
> +        // INVARIANT: We wrote the `NUL` terminator and checked above that no other `NUL` bytes
> +        // exist in the buffer.
> +        Ok(Self { buf })
> +    }
> +}
> +
> +impl Deref for CString {
> +    type Target = CStr;
> +
> +    fn deref(&self) -> &Self::Target {
> +        // SAFETY: The type invariants guarantee that the string is `NUL`-terminated and that no
> +        // other `NUL` bytes exist.
> +        unsafe { CStr::from_bytes_with_nul_unchecked(self.buf.as_slice()) }
> +    }
> +}
  

Patch

diff --git a/rust/kernel/str.rs b/rust/kernel/str.rs
index db6473db31c6..877148b77e71 100644
--- a/rust/kernel/str.rs
+++ b/rust/kernel/str.rs
@@ -2,6 +2,7 @@ 
 
 //! String representations.
 
+use alloc::vec::Vec;
 use core::fmt::{self, Write};
 use core::ops::{self, Deref, Index};
 
@@ -393,13 +394,22 @@  mod tests {
 /// is less than `end`.
 pub(crate) struct RawFormatter {
     // Use `usize` to use `saturating_*` functions.
-    #[allow(dead_code)]
     beg: usize,
     pos: usize,
     end: usize,
 }
 
 impl RawFormatter {
+    /// Creates a new instance of [`RawFormatter`] with an empty buffer.
+    fn new() -> Self {
+        // INVARIANT: The buffer is empty, so the region that needs to be writable is empty.
+        Self {
+            beg: 0,
+            pos: 0,
+            end: 0,
+        }
+    }
+
     /// Creates a new instance of [`RawFormatter`] with the given buffer pointers.
     ///
     /// # Safety
@@ -438,6 +448,11 @@  impl RawFormatter {
     pub(crate) fn pos(&self) -> *mut u8 {
         self.pos as _
     }
+
+    /// Return the number of bytes written to the formatter.
+    pub(crate) fn bytes_written(&self) -> usize {
+        self.pos - self.beg
+    }
 }
 
 impl fmt::Write for RawFormatter {
@@ -478,7 +493,6 @@  impl Formatter {
     ///
     /// The memory region starting at `buf` and extending for `len` bytes must be valid for writes
     /// for the lifetime of the returned [`Formatter`].
-    #[allow(dead_code)]
     pub(crate) unsafe fn from_buffer(buf: *mut u8, len: usize) -> Self {
         // SAFETY: The safety requirements of this function satisfy those of the callee.
         Self(unsafe { RawFormatter::from_buffer(buf, len) })
@@ -505,3 +519,76 @@  impl fmt::Write for Formatter {
         }
     }
 }
+
+/// An owned string that is guaranteed to have exactly one `NUL` byte, which is at the end.
+///
+/// Used for interoperability with kernel APIs that take C strings.
+///
+/// # Invariants
+///
+/// The string is always `NUL`-terminated and contains no other `NUL` bytes.
+///
+/// # Examples
+///
+/// ```
+/// use kernel::str::CString;
+///
+/// let s = CString::try_from_fmt(fmt!("{}{}{}", "abc", 10, 20)).unwrap();
+/// assert_eq!(s.as_bytes_with_nul(), "abc1020\0".as_bytes());
+///
+/// let tmp = "testing";
+/// let s = CString::try_from_fmt(fmt!("{tmp}{}", 123)).unwrap();
+/// assert_eq!(s.as_bytes_with_nul(), "testing123\0".as_bytes());
+///
+/// // This fails because it has an embedded `NUL` byte.
+/// let s = CString::try_from_fmt(fmt!("a\0b{}", 123));
+/// assert_eq!(s.is_ok(), false);
+/// ```
+pub struct CString {
+    buf: Vec<u8>,
+}
+
+impl CString {
+    /// Creates an instance of [`CString`] from the given formatted arguments.
+    pub fn try_from_fmt(args: fmt::Arguments<'_>) -> Result<Self, Error> {
+        // Calculate the size needed (formatted string plus `NUL` terminator).
+        let mut f = RawFormatter::new();
+        f.write_fmt(args)?;
+        f.write_str("\0")?;
+        let size = f.bytes_written();
+
+        // Allocate a vector with the required number of bytes, and write to it.
+        let mut buf = Vec::try_with_capacity(size)?;
+        // SAFETY: The buffer stored in `buf` is at least of size `size` and is valid for writes.
+        let mut f = unsafe { Formatter::from_buffer(buf.as_mut_ptr(), size) };
+        f.write_fmt(args)?;
+        f.write_str("\0")?;
+
+        // SAFETY: The number of bytes that can be written to `f` is bounded by `size`, which is
+        // `buf`'s capacity. The contents of the buffer have been initialised by writes to `f`.
+        unsafe { buf.set_len(f.bytes_written()) };
+
+        // Check that there are no `NUL` bytes before the end.
+        // SAFETY: The buffer is valid for read because `f.bytes_written()` is bounded by `size`
+        // (which the minimum buffer size) and is non-zero (we wrote at least the `NUL` terminator)
+        // so `f.bytes_written() - 1` doesn't underflow.
+        let ptr = unsafe { bindings::memchr(buf.as_ptr().cast(), 0, (f.bytes_written() - 1) as _) };
+        if !ptr.is_null() {
+            return Err(EINVAL);
+        }
+
+        // INVARIANT: We wrote the `NUL` terminator and checked above that no other `NUL` bytes
+        // exist in the buffer.
+        Ok(Self { buf })
+    }
+}
+
+impl Deref for CString {
+    type Target = CStr;
+
+    fn deref(&self) -> &Self::Target {
+        // SAFETY: The type invariants guarantee that the string is `NUL`-terminated and that no
+        // other `NUL` bytes exist.
+        unsafe { CStr::from_bytes_with_nul_unchecked(self.buf.as_slice()) }
+    }
+}