[12/13] rust: sync: introduce `CondVar`

Message ID 20230330043954.562237-12-wedsonaf@gmail.com
State New
Headers
Series [01/13] rust: sync: introduce `LockClassKey` |

Commit Message

Wedson Almeida Filho March 30, 2023, 4:39 a.m. UTC
  From: Wedson Almeida Filho <walmeida@microsoft.com>

This is the traditional condition variable or monitor synchronisation
primitive. It is implemented with C's `wait_queue_head_t`.

It allows users to release a lock and go to sleep while guaranteeing
that notifications won't be missed. This is achieved by enqueuing a wait
entry before releasing the lock.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
---
 rust/bindings/bindings_helper.h |   1 +
 rust/helpers.c                  |   7 ++
 rust/kernel/sync.rs             |   2 +
 rust/kernel/sync/condvar.rs     | 178 ++++++++++++++++++++++++++++++++
 rust/kernel/sync/lock.rs        |   1 -
 5 files changed, 188 insertions(+), 1 deletion(-)
 create mode 100644 rust/kernel/sync/condvar.rs
  

Comments

Peter Zijlstra March 30, 2023, 12:52 p.m. UTC | #1
On Thu, Mar 30, 2023 at 01:39:53AM -0300, Wedson Almeida Filho wrote:
> From: Wedson Almeida Filho <walmeida@microsoft.com>
> 
> This is the traditional condition variable or monitor synchronisation
> primitive. It is implemented with C's `wait_queue_head_t`.
> 
> It allows users to release a lock and go to sleep while guaranteeing
> that notifications won't be missed. This is achieved by enqueuing a wait
> entry before releasing the lock.
> 

> +/// A conditional variable.
> +///
> +/// Exposes the kernel's [`struct wait_queue_head`] as a condition variable. It allows the caller to
> +/// atomically release the given lock and go to sleep. It reacquires the lock when it wakes up. And
> +/// it wakes up when notified by another thread (via [`CondVar::notify_one`] or
> +/// [`CondVar::notify_all`]) or because the thread received a signal. It may also wake up
> +/// spuriously.

Urgh so wide :-/

But no, threads can *always* and for any reason, have spurious wakeups.

Also, is this hard tied to mutex? If so, you should probably use swait
instead of wait.
  
Peter Zijlstra March 30, 2023, 12:59 p.m. UTC | #2
On Thu, Mar 30, 2023 at 01:39:53AM -0300, Wedson Almeida Filho wrote:

> +impl CondVar {
> +    /// Constructs a new condvar initialiser.
> +    #[allow(clippy::new_ret_no_self)]
> +    pub fn new(name: &'static CStr, key: &'static LockClassKey) -> impl PinInit<Self> {
> +        pin_init!(Self {
> +            _pin: PhantomPinned,
> +            // SAFETY: `__init_waitqueue_head` initialises the waitqueue head, and both `name` and
> +            // `key` have static lifetimes so they live indefinitely.
> +            wait_list <- unsafe {
> +                Opaque::ffi_init2(
> +                    bindings::__init_waitqueue_head,
> +                    name.as_char_ptr(),
> +                    key.as_ptr(),
> +                )
> +            },
> +        })
> +    }
> +
> +    fn wait_internal<T: ?Sized, B: Backend>(&self, wait_state: u32, guard: &mut Guard<'_, T, B>) {
> +        let wait = Opaque::<bindings::wait_queue_entry>::uninit();
> +
> +        // SAFETY: `wait` points to valid memory.
> +        unsafe { bindings::init_wait(wait.get()) };
> +
> +        // SAFETY: Both `wait` and `wait_list` point to valid memory.
> +        unsafe {
> +            bindings::prepare_to_wait_exclusive(self.wait_list.get(), wait.get(), wait_state as _)
> +        };

I can't read this rust gunk, but where is the condition test gone?

Also, where is the loop gone to?

> +
> +        // SAFETY: No arguments, switches to another thread.
> +        guard.do_unlocked(|| unsafe { bindings::schedule() });
> +
> +        // SAFETY: Both `wait` and `wait_list` point to valid memory.
> +        unsafe { bindings::finish_wait(self.wait_list.get(), wait.get()) };
> +    }
> +
> +    /// Releases the lock and waits for a notification in interruptible mode.
> +    ///
> +    /// Atomically releases the given lock (whose ownership is proven by the guard) and puts the
> +    /// thread to sleep, reacquiring the lock on wake up. It wakes up when notified by
> +    /// [`CondVar::notify_one`] or [`CondVar::notify_all`], or when the thread receives a signal.
> +    /// It may also wake up spuriously.
> +    ///
> +    /// Returns whether there is a signal pending.
> +    #[must_use = "wait returns if a signal is pending, so the caller must check the return value"]
> +    pub fn wait<T: ?Sized, B: Backend>(&self, guard: &mut Guard<'_, T, B>) -> bool {
> +        self.wait_internal(bindings::TASK_INTERRUPTIBLE, guard);
> +        Task::current().signal_pending()
> +    }
> +
> +    /// Releases the lock and waits for a notification in uninterruptible mode.
> +    ///
> +    /// Similar to [`CondVar::wait`], except that the wait is not interruptible. That is, the
> +    /// thread won't wake up due to signals. It may, however, wake up supirously.
> +    pub fn wait_uninterruptible<T: ?Sized, B: Backend>(&self, guard: &mut Guard<'_, T, B>) {
> +        self.wait_internal(bindings::TASK_UNINTERRUPTIBLE, guard)
> +    }
> +
> +    /// Calls the kernel function to notify the appropriate number of threads with the given flags.
> +    fn notify(&self, count: i32, flags: u32) {
> +        // SAFETY: `wait_list` points to valid memory.
> +        unsafe {
> +            bindings::__wake_up(
> +                self.wait_list.get(),
> +                bindings::TASK_NORMAL,
> +                count,
> +                flags as _,
> +            )
> +        };
> +    }
> +
> +    /// Wakes a single waiter up, if any.
> +    ///
> +    /// This is not 'sticky' in the sense that if no thread is waiting, the notification is lost
> +    /// completely (as opposed to automatically waking up the next waiter).
> +    pub fn notify_one(&self) {
> +        self.notify(1, 0);
> +    }
> +
> +    /// Wakes all waiters up, if any.
> +    ///
> +    /// This is not 'sticky' in the sense that if no thread is waiting, the notification is lost
> +    /// completely (as opposed to automatically waking up the next waiter).
> +    pub fn notify_all(&self) {
> +        self.notify(0, 0);
> +    }
> +}
  
Wedson Almeida Filho March 30, 2023, 2:43 p.m. UTC | #3
On Thu, Mar 30, 2023 at 02:52:23PM +0200, Peter Zijlstra wrote:
> On Thu, Mar 30, 2023 at 01:39:53AM -0300, Wedson Almeida Filho wrote:
> > From: Wedson Almeida Filho <walmeida@microsoft.com>
> > 
> > This is the traditional condition variable or monitor synchronisation
> > primitive. It is implemented with C's `wait_queue_head_t`.
> > 
> > It allows users to release a lock and go to sleep while guaranteeing
> > that notifications won't be missed. This is achieved by enqueuing a wait
> > entry before releasing the lock.
> > 
> 
> > +/// A conditional variable.
> > +///
> > +/// Exposes the kernel's [`struct wait_queue_head`] as a condition variable. It allows the caller to
> > +/// atomically release the given lock and go to sleep. It reacquires the lock when it wakes up. And
> > +/// it wakes up when notified by another thread (via [`CondVar::notify_one`] or
> > +/// [`CondVar::notify_all`]) or because the thread received a signal. It may also wake up
> > +/// spuriously.
> 
> Urgh so wide :-/

Thanks for reviewing :)

> But no, threads can *always* and for any reason, have spurious wakeups.

I don't believe I said otherwise. Is there anything in the text above you'd like to see changed?

> Also, is this hard tied to mutex? If so, you should probably use swait
> instead of wait.

This is not tied to mutex, it works with any lock.

Cheers,
-Wedson
  
Wedson Almeida Filho March 30, 2023, 2:56 p.m. UTC | #4
On Thu, Mar 30, 2023 at 02:59:27PM +0200, Peter Zijlstra wrote:
> On Thu, Mar 30, 2023 at 01:39:53AM -0300, Wedson Almeida Filho wrote:
> 
> > +    fn wait_internal<T: ?Sized, B: Backend>(&self, wait_state: u32, guard: &mut Guard<'_, T, B>) {
> > +        let wait = Opaque::<bindings::wait_queue_entry>::uninit();
> > +
> > +        // SAFETY: `wait` points to valid memory.
> > +        unsafe { bindings::init_wait(wait.get()) };
> > +
> > +        // SAFETY: Both `wait` and `wait_list` point to valid memory.
> > +        unsafe {
> > +            bindings::prepare_to_wait_exclusive(self.wait_list.get(), wait.get(), wait_state as _)
> > +        };
> 
> I can't read this rust gunk, but where is the condition test gone?
> 
> Also, where is the loop gone to?

They're both at the caller. The usage of condition variables is something like:

while guard.value != v {
    condvar.wait_uninterruptible(&mut guard);
}

(Note that this is not specific to the kernel or to Rust: this is how condvars
work in general. You'll find this in any textbook on the topic.)

In the implementation of wait_internal(), we add the local wait entry to the
wait queue _before_ releasing the lock (i.e., before the test result can
change), so we guarantee that we don't miss wake up attempts.

Thanks,
-Wedson
  
Peter Zijlstra April 3, 2023, 8:59 a.m. UTC | #5
On Thu, Mar 30, 2023 at 11:56:33AM -0300, Wedson Almeida Filho wrote:
> On Thu, Mar 30, 2023 at 02:59:27PM +0200, Peter Zijlstra wrote:
> > On Thu, Mar 30, 2023 at 01:39:53AM -0300, Wedson Almeida Filho wrote:
> > 
> > > +    fn wait_internal<T: ?Sized, B: Backend>(&self, wait_state: u32, guard: &mut Guard<'_, T, B>) {
> > > +        let wait = Opaque::<bindings::wait_queue_entry>::uninit();
> > > +
> > > +        // SAFETY: `wait` points to valid memory.
> > > +        unsafe { bindings::init_wait(wait.get()) };
> > > +
> > > +        // SAFETY: Both `wait` and `wait_list` point to valid memory.
> > > +        unsafe {
> > > +            bindings::prepare_to_wait_exclusive(self.wait_list.get(), wait.get(), wait_state as _)
> > > +        };
> > 
> > I can't read this rust gunk, but where is the condition test gone?
> > 
> > Also, where is the loop gone to?
> 
> They're both at the caller. The usage of condition variables is something like:
> 
> while guard.value != v {
>     condvar.wait_uninterruptible(&mut guard);
> }
> 
> (Note that this is not specific to the kernel or to Rust: this is how condvars
> work in general. You'll find this in any textbook on the topic.)
> 
> In the implementation of wait_internal(), we add the local wait entry to the
> wait queue _before_ releasing the lock (i.e., before the test result can
> change), so we guarantee that we don't miss wake up attempts.

Ah, so you've not yet been exposed to the wonderful 'feature' where
pthread_cond_timedwait() gets called with .mutex=NULL and people expect
things to just work :/ (luckily not accepted by the majority of
implementations)

Or a little more devious, calling signal and not holding the same mutex.

But then yes, I suppose it should work. I just got alarm bells going off
because I see prepare_to_wait without an obvious loop around it.
  
Wedson Almeida Filho April 3, 2023, 1:35 p.m. UTC | #6
On Mon, Apr 03, 2023 at 10:59:59AM +0200, Peter Zijlstra wrote:
> On Thu, Mar 30, 2023 at 11:56:33AM -0300, Wedson Almeida Filho wrote:
> > On Thu, Mar 30, 2023 at 02:59:27PM +0200, Peter Zijlstra wrote:
> > > On Thu, Mar 30, 2023 at 01:39:53AM -0300, Wedson Almeida Filho wrote:
> > > 
> > > > +    fn wait_internal<T: ?Sized, B: Backend>(&self, wait_state: u32, guard: &mut Guard<'_, T, B>) {
> > > > +        let wait = Opaque::<bindings::wait_queue_entry>::uninit();
> > > > +
> > > > +        // SAFETY: `wait` points to valid memory.
> > > > +        unsafe { bindings::init_wait(wait.get()) };
> > > > +
> > > > +        // SAFETY: Both `wait` and `wait_list` point to valid memory.
> > > > +        unsafe {
> > > > +            bindings::prepare_to_wait_exclusive(self.wait_list.get(), wait.get(), wait_state as _)
> > > > +        };
> > > 
> > > I can't read this rust gunk, but where is the condition test gone?
> > > 
> > > Also, where is the loop gone to?
> > 
> > They're both at the caller. The usage of condition variables is something like:
> > 
> > while guard.value != v {
> >     condvar.wait_uninterruptible(&mut guard);
> > }
> > 
> > (Note that this is not specific to the kernel or to Rust: this is how condvars
> > work in general. You'll find this in any textbook on the topic.)
> > 
> > In the implementation of wait_internal(), we add the local wait entry to the
> > wait queue _before_ releasing the lock (i.e., before the test result can
> > change), so we guarantee that we don't miss wake up attempts.
> 
> Ah, so you've not yet been exposed to the wonderful 'feature' where
> pthread_cond_timedwait() gets called with .mutex=NULL and people expect
> things to just work :/ (luckily not accepted by the majority of
> implementations)

Rust doesn't have this problem: a `Guard` cannot exist without a lock, and one
cannot call `wait` or `wait_uninterruptible` without a `Guard`.

> Or a little more devious, calling signal and not holding the same mutex.

We don't require that callers hold the lock while singaling. If they signal when
the condition isn't satisfied (with or without the lock held, it doesn't
matter), it will just look like a spurious signal to the waiting thread.

> I just got alarm bells going off because I see prepare_to_wait without an
> obvious loop around it.

Fair enough, we do need a loop.
  

Patch

diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index 03656a44a83f..50e7a76d5455 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -8,6 +8,7 @@ 
 
 #include <linux/slab.h>
 #include <linux/refcount.h>
+#include <linux/wait.h>
 #include <linux/sched.h>
 
 /* `bindgen` gets confused at certain things. */
diff --git a/rust/helpers.c b/rust/helpers.c
index 96441744030e..8ff2559c1572 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -24,6 +24,7 @@ 
 #include <linux/mutex.h>
 #include <linux/spinlock.h>
 #include <linux/sched/signal.h>
+#include <linux/wait.h>
 
 __noreturn void rust_helper_BUG(void)
 {
@@ -76,6 +77,12 @@  void rust_helper_spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags)
 }
 EXPORT_SYMBOL_GPL(rust_helper_spin_unlock_irqrestore);
 
+void rust_helper_init_wait(struct wait_queue_entry *wq_entry)
+{
+	init_wait(wq_entry);
+}
+EXPORT_SYMBOL_GPL(rust_helper_init_wait);
+
 int rust_helper_signal_pending(struct task_struct *t)
 {
 	return signal_pending(t);
diff --git a/rust/kernel/sync.rs b/rust/kernel/sync.rs
index ed07437d7d55..d6dd0e2c1678 100644
--- a/rust/kernel/sync.rs
+++ b/rust/kernel/sync.rs
@@ -8,9 +8,11 @@ 
 use crate::types::Opaque;
 
 mod arc;
+mod condvar;
 pub mod lock;
 
 pub use arc::{Arc, ArcBorrow, UniqueArc};
+pub use condvar::CondVar;
 pub use lock::{mutex::Mutex, spinlock::SpinLock};
 
 /// Represents a lockdep class. It's a wrapper around C's `lock_class_key`.
diff --git a/rust/kernel/sync/condvar.rs b/rust/kernel/sync/condvar.rs
new file mode 100644
index 000000000000..3f528fc7fa48
--- /dev/null
+++ b/rust/kernel/sync/condvar.rs
@@ -0,0 +1,178 @@ 
+// SPDX-License-Identifier: GPL-2.0
+
+//! A condition variable.
+//!
+//! This module allows Rust code to use the kernel's [`struct wait_queue_head`] as a condition
+//! variable.
+
+use super::{lock::Backend, lock::Guard, LockClassKey};
+use crate::{bindings, init::PinInit, pin_init, str::CStr, task::Task, types::Opaque};
+use core::marker::PhantomPinned;
+use macros::pin_data;
+
+/// Creates a [`CondVar`] initialiser with the given name and a newly-created lock class.
+#[macro_export]
+macro_rules! new_condvar {
+    ($($name:literal)?) => {
+        $crate::sync::CondVar::new($crate::optional_name!($($name)?), $crate::static_lock_class!())
+    };
+}
+
+/// A conditional variable.
+///
+/// Exposes the kernel's [`struct wait_queue_head`] as a condition variable. It allows the caller to
+/// atomically release the given lock and go to sleep. It reacquires the lock when it wakes up. And
+/// it wakes up when notified by another thread (via [`CondVar::notify_one`] or
+/// [`CondVar::notify_all`]) or because the thread received a signal. It may also wake up
+/// spuriously.
+///
+/// Instances of [`CondVar`] need a lock class and to be pinned. The recommended way to create such
+/// instances is with the [`pin_init`](crate::pin_init) and [`new_condvar`] macros.
+///
+/// # Examples
+///
+/// The following is an example of using a condvar with a mutex:
+///
+/// ```
+/// use kernel::sync::{CondVar, Mutex};
+/// use kernel::{new_condvar, new_mutex};
+///
+/// #[pin_data]
+/// pub struct Example {
+///     #[pin]
+///     value: Mutex<u32>,
+///
+///     #[pin]
+///     value_changed: CondVar,
+/// }
+///
+/// /// Waits for `e.value` to become `v`.
+/// fn wait_for_vaue(e: &Example, v: u32) {
+///     let mut guard = e.value.lock();
+///     while *guard != v {
+///         e.value_changed.wait_uninterruptible(&mut guard);
+///     }
+/// }
+///
+/// /// Increments `e.value` and notifies all potential waiters.
+/// fn increment(e: &Example) {
+///     *e.value.lock() += 1;
+///     e.value_changed.notify_all();
+/// }
+///
+/// /// Allocates a new boxed `Example`.
+/// fn new_example() -> Result<Pin<Box<Example>>> {
+///     Box::pin_init(pin_init!(Example {
+///         value <- new_mutex!(0),
+///         value_changed <- new_condvar!(),
+///     }))
+/// }
+/// ```
+///
+/// [`struct wait_queue_head`]: ../../../include/linux/wait.h
+#[pin_data]
+pub struct CondVar {
+    #[pin]
+    pub(crate) wait_list: Opaque<bindings::wait_queue_head>,
+
+    /// A condvar needs to be pinned because it contains a [`struct list_head`] that is
+    /// self-referential, so it cannot be safely moved once it is initialised.
+    #[pin]
+    _pin: PhantomPinned,
+}
+
+// SAFETY: `CondVar` only uses a `struct wait_queue_head`, which is safe to use on any thread.
+#[allow(clippy::non_send_fields_in_send_ty)]
+unsafe impl Send for CondVar {}
+
+// SAFETY: `CondVar` only uses a `struct wait_queue_head`, which is safe to use on multiple threads
+// concurrently.
+unsafe impl Sync for CondVar {}
+
+impl CondVar {
+    /// Constructs a new condvar initialiser.
+    #[allow(clippy::new_ret_no_self)]
+    pub fn new(name: &'static CStr, key: &'static LockClassKey) -> impl PinInit<Self> {
+        pin_init!(Self {
+            _pin: PhantomPinned,
+            // SAFETY: `__init_waitqueue_head` initialises the waitqueue head, and both `name` and
+            // `key` have static lifetimes so they live indefinitely.
+            wait_list <- unsafe {
+                Opaque::ffi_init2(
+                    bindings::__init_waitqueue_head,
+                    name.as_char_ptr(),
+                    key.as_ptr(),
+                )
+            },
+        })
+    }
+
+    fn wait_internal<T: ?Sized, B: Backend>(&self, wait_state: u32, guard: &mut Guard<'_, T, B>) {
+        let wait = Opaque::<bindings::wait_queue_entry>::uninit();
+
+        // SAFETY: `wait` points to valid memory.
+        unsafe { bindings::init_wait(wait.get()) };
+
+        // SAFETY: Both `wait` and `wait_list` point to valid memory.
+        unsafe {
+            bindings::prepare_to_wait_exclusive(self.wait_list.get(), wait.get(), wait_state as _)
+        };
+
+        // SAFETY: No arguments, switches to another thread.
+        guard.do_unlocked(|| unsafe { bindings::schedule() });
+
+        // SAFETY: Both `wait` and `wait_list` point to valid memory.
+        unsafe { bindings::finish_wait(self.wait_list.get(), wait.get()) };
+    }
+
+    /// Releases the lock and waits for a notification in interruptible mode.
+    ///
+    /// Atomically releases the given lock (whose ownership is proven by the guard) and puts the
+    /// thread to sleep, reacquiring the lock on wake up. It wakes up when notified by
+    /// [`CondVar::notify_one`] or [`CondVar::notify_all`], or when the thread receives a signal.
+    /// It may also wake up spuriously.
+    ///
+    /// Returns whether there is a signal pending.
+    #[must_use = "wait returns if a signal is pending, so the caller must check the return value"]
+    pub fn wait<T: ?Sized, B: Backend>(&self, guard: &mut Guard<'_, T, B>) -> bool {
+        self.wait_internal(bindings::TASK_INTERRUPTIBLE, guard);
+        Task::current().signal_pending()
+    }
+
+    /// Releases the lock and waits for a notification in uninterruptible mode.
+    ///
+    /// Similar to [`CondVar::wait`], except that the wait is not interruptible. That is, the
+    /// thread won't wake up due to signals. It may, however, wake up supirously.
+    pub fn wait_uninterruptible<T: ?Sized, B: Backend>(&self, guard: &mut Guard<'_, T, B>) {
+        self.wait_internal(bindings::TASK_UNINTERRUPTIBLE, guard)
+    }
+
+    /// Calls the kernel function to notify the appropriate number of threads with the given flags.
+    fn notify(&self, count: i32, flags: u32) {
+        // SAFETY: `wait_list` points to valid memory.
+        unsafe {
+            bindings::__wake_up(
+                self.wait_list.get(),
+                bindings::TASK_NORMAL,
+                count,
+                flags as _,
+            )
+        };
+    }
+
+    /// Wakes a single waiter up, if any.
+    ///
+    /// This is not 'sticky' in the sense that if no thread is waiting, the notification is lost
+    /// completely (as opposed to automatically waking up the next waiter).
+    pub fn notify_one(&self) {
+        self.notify(1, 0);
+    }
+
+    /// Wakes all waiters up, if any.
+    ///
+    /// This is not 'sticky' in the sense that if no thread is waiting, the notification is lost
+    /// completely (as opposed to automatically waking up the next waiter).
+    pub fn notify_all(&self) {
+        self.notify(0, 0);
+    }
+}
diff --git a/rust/kernel/sync/lock.rs b/rust/kernel/sync/lock.rs
index ae20277c39c8..f52ba9ab1b70 100644
--- a/rust/kernel/sync/lock.rs
+++ b/rust/kernel/sync/lock.rs
@@ -177,7 +177,6 @@  pub struct Guard<'a, T: ?Sized, B: Backend> {
 unsafe impl<T: Sync + ?Sized, B: Backend> Sync for Guard<'_, T, B> {}
 
 impl<T: ?Sized, B: Backend> Guard<'_, T, B> {
-    #[allow(dead_code)]
     pub(crate) fn do_unlocked(&mut self, cb: impl FnOnce()) {
         // SAFETY: The caller owns the lock, so it is safe to unlock it.
         unsafe { B::unlock(self.lock.state.get(), &self.state) };