[v2,7/7] rust: file: add abstraction for `poll_table`

Message ID 20231206-alice-file-v2-7-af617c0d9d94@google.com
State New
Headers
Series File abstractions needed by Rust Binder |

Commit Message

Alice Ryhl Dec. 6, 2023, 11:59 a.m. UTC
  The existing `CondVar` abstraction is a wrapper around `wait_list`, but
it does not support all use-cases of the C `wait_list` type. To be
specific, a `CondVar` cannot be registered with a `struct poll_table`.
This limitation has the advantage that you do not need to call
`synchronize_rcu` when destroying a `CondVar`.

However, we need the ability to register a `poll_table` with a
`wait_list` in Rust Binder. To enable this, introduce a type called
`PollCondVar`, which is like `CondVar` except that you can register a
`poll_table`. We also introduce `PollTable`, which is a safe wrapper
around `poll_table` that is intended to be used with `PollCondVar`.

The destructor of `PollCondVar` unconditionally calls `synchronize_rcu`
to ensure that the removal of epoll waiters has fully completed before
the `wait_list` is destroyed.

That said, `synchronize_rcu` is rather expensive and is not needed in
all cases: If we have never registered a `poll_table` with the
`wait_list`, then we don't need to call `synchronize_rcu`. (And this is
a common case in Binder - not all processes use Binder with epoll.) The
current implementation does not account for this, but if we find that it
is necessary to improve this, a future patch could change store a
boolean next to the `wait_list` to keep track of whether a `poll_table`
has ever been registered.

Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
 rust/bindings/bindings_helper.h |   2 +
 rust/bindings/lib.rs            |   1 +
 rust/kernel/sync.rs             |   1 +
 rust/kernel/sync/poll.rs        | 103 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 107 insertions(+)
  

Comments

Benno Lossin Dec. 8, 2023, 5:53 p.m. UTC | #1
On 12/6/23 12:59, Alice Ryhl wrote:
> diff --git a/rust/bindings/lib.rs b/rust/bindings/lib.rs
> index 9bcbea04dac3..eeb291cc60db 100644
> --- a/rust/bindings/lib.rs
> +++ b/rust/bindings/lib.rs
> @@ -51,3 +51,4 @@ mod bindings_helper {
> 
>  pub const GFP_KERNEL: gfp_t = BINDINGS_GFP_KERNEL;
>  pub const __GFP_ZERO: gfp_t = BINDINGS___GFP_ZERO;
> +pub const POLLFREE: __poll_t = BINDINGS_POLLFREE;

You are no longer using this constant, should this still exist?

[...]

> +    fn get_qproc(&self) -> bindings::poll_queue_proc {
> +        let ptr = self.0.get();
> +        // SAFETY: The `ptr` is valid because it originates from a reference, and the `_qproc`
> +        // field is not modified concurrently with this call since we have an immutable reference.

This needs an invariant on `PollTable` (i.e. `self.0` is valid).

> +        unsafe { (*ptr)._qproc }
> +    }
> +
> +    /// Register this [`PollTable`] with the provided [`PollCondVar`], so that it can be notified
> +    /// using the condition variable.
> +    pub fn register_wait(&mut self, file: &File, cv: &PollCondVar) {
> +        if let Some(qproc) = self.get_qproc() {
> +            // SAFETY: The pointers to `self` and `file` are valid because they are references.

What about cv.wait_list...

> +            //
> +            // Before the wait list is destroyed, the destructor of `PollCondVar` will clear
> +            // everything in the wait list, so the wait list is not used after it is freed.
> +            unsafe { qproc(file.as_ptr() as _, cv.wait_list.get(), self.0.get()) };
> +        }
> +    }
> +}
> +
> +/// A wrapper around [`CondVar`] that makes it usable with [`PollTable`].
> +///
> +/// # Invariant
> +///
> +/// If `needs_synchronize_rcu` is false, then there is nothing registered with `register_wait`.

Not able to find `needs_synchronize_rcu` anywhere else, should this be
here?

> +///
> +/// [`CondVar`]: crate::sync::CondVar
> +#[pin_data(PinnedDrop)]
> +pub struct PollCondVar {
> +    #[pin]
> +    inner: CondVar,
> +}

[..]

> +#[pinned_drop]
> +impl PinnedDrop for PollCondVar {
> +    fn drop(self: Pin<&mut Self>) {
> +        // Clear anything registered using `register_wait`.
> +        //
> +        // SAFETY: The pointer points at a valid wait list.

I was a bit confused by "wait list", since the C type is named
`wait_queue_head`, maybe just use the type name?
  
Alice Ryhl Dec. 12, 2023, 9:59 a.m. UTC | #2
On Fri, Dec 8, 2023 at 6:53 PM Benno Lossin <benno.lossin@proton.me> wrote:
>
> On 12/6/23 12:59, Alice Ryhl wrote:
> > diff --git a/rust/bindings/lib.rs b/rust/bindings/lib.rs
> > index 9bcbea04dac3..eeb291cc60db 100644
> > --- a/rust/bindings/lib.rs
> > +++ b/rust/bindings/lib.rs
> > @@ -51,3 +51,4 @@ mod bindings_helper {
> >
> >  pub const GFP_KERNEL: gfp_t = BINDINGS_GFP_KERNEL;
> >  pub const __GFP_ZERO: gfp_t = BINDINGS___GFP_ZERO;
> > +pub const POLLFREE: __poll_t = BINDINGS_POLLFREE;
>
> You are no longer using this constant, should this still exist?

Nice catch, thanks!

> > +    fn get_qproc(&self) -> bindings::poll_queue_proc {
> > +        let ptr = self.0.get();
> > +        // SAFETY: The `ptr` is valid because it originates from a reference, and the `_qproc`
> > +        // field is not modified concurrently with this call since we have an immutable reference.
>
> This needs an invariant on `PollTable` (i.e. `self.0` is valid).

How would you phrase it?

> > +        unsafe { (*ptr)._qproc }
> > +    }
> > +
> > +    /// Register this [`PollTable`] with the provided [`PollCondVar`], so that it can be notified
> > +    /// using the condition variable.
> > +    pub fn register_wait(&mut self, file: &File, cv: &PollCondVar) {
> > +        if let Some(qproc) = self.get_qproc() {
> > +            // SAFETY: The pointers to `self` and `file` are valid because they are references.
>
> What about cv.wait_list...

I can add it to the list of things that are valid due to references.

> > +            //
> > +            // Before the wait list is destroyed, the destructor of `PollCondVar` will clear
> > +            // everything in the wait list, so the wait list is not used after it is freed.
> > +            unsafe { qproc(file.as_ptr() as _, cv.wait_list.get(), self.0.get()) };
> > +        }
> > +    }
> > +}
> > +
> > +/// A wrapper around [`CondVar`] that makes it usable with [`PollTable`].
> > +///
> > +/// # Invariant
> > +///
> > +/// If `needs_synchronize_rcu` is false, then there is nothing registered with `register_wait`.
>
> Not able to find `needs_synchronize_rcu` anywhere else, should this be
> here?

Sorry, this shouldn't be there. It was something I experimented with,
but gave up on.

> > +#[pinned_drop]
> > +impl PinnedDrop for PollCondVar {
> > +    fn drop(self: Pin<&mut Self>) {
> > +        // Clear anything registered using `register_wait`.
> > +        //
> > +        // SAFETY: The pointer points at a valid wait list.
>
> I was a bit confused by "wait list", since the C type is named
> `wait_queue_head`, maybe just use the type name?

I will update all instances of "wait list" to "wait_queue_head". It's
because I incorrectly remembered the C type name to be "wait_list".

Alice
  
Benno Lossin Dec. 12, 2023, 5:01 p.m. UTC | #3
On 12/12/23 10:59, Alice Ryhl wrote:
> On Fri, Dec 8, 2023 at 6:53 PM Benno Lossin <benno.lossin@proton.me> wrote:
>> On 12/6/23 12:59, Alice Ryhl wrote:
>>> +    fn get_qproc(&self) -> bindings::poll_queue_proc {
>>> +        let ptr = self.0.get();
>>> +        // SAFETY: The `ptr` is valid because it originates from a reference, and the `_qproc`
>>> +        // field is not modified concurrently with this call since we have an immutable reference.
>>
>> This needs an invariant on `PollTable` (i.e. `self.0` is valid).
> 
> How would you phrase it?

- `self.0` contains a valid `bindings::poll_table`.
- `self.0` is only modified via references to `Self`.

>>> +        unsafe { (*ptr)._qproc }
>>> +    }
>>> +
>>> +    /// Register this [`PollTable`] with the provided [`PollCondVar`], so that it can be notified
>>> +    /// using the condition variable.
>>> +    pub fn register_wait(&mut self, file: &File, cv: &PollCondVar) {
>>> +        if let Some(qproc) = self.get_qproc() {
>>> +            // SAFETY: The pointers to `self` and `file` are valid because they are references.
>>
>> What about cv.wait_list...
> 
> I can add it to the list of things that are valid due to references.

Yes this is getting a bit tedious.

What if we create a newtype wrapping `Opaque<T>` with the invariant
that it contains a valid value? Then we could have a specially named
getter for which we would always assume that the returned pointer is
valid. And thus permit you to not mention it in the SAFETY comment?

[...]

>>> +#[pinned_drop]
>>> +impl PinnedDrop for PollCondVar {
>>> +    fn drop(self: Pin<&mut Self>) {
>>> +        // Clear anything registered using `register_wait`.
>>> +        //
>>> +        // SAFETY: The pointer points at a valid wait list.
>>
>> I was a bit confused by "wait list", since the C type is named
>> `wait_queue_head`, maybe just use the type name?
> 
> I will update all instances of "wait list" to "wait_queue_head". It's
> because I incorrectly remembered the C type name to be "wait_list".

Maybe we should also change the name of the field on `CondVar`?

If you guys agree, I can open a good-first-issue, since it is a very
simple change.
  
Boqun Feng Dec. 13, 2023, 1:35 a.m. UTC | #4
On Tue, Dec 12, 2023 at 05:01:28PM +0000, Benno Lossin wrote:
> On 12/12/23 10:59, Alice Ryhl wrote:
> > On Fri, Dec 8, 2023 at 6:53 PM Benno Lossin <benno.lossin@proton.me> wrote:
> >> On 12/6/23 12:59, Alice Ryhl wrote:
> >>> +    fn get_qproc(&self) -> bindings::poll_queue_proc {
> >>> +        let ptr = self.0.get();
> >>> +        // SAFETY: The `ptr` is valid because it originates from a reference, and the `_qproc`
> >>> +        // field is not modified concurrently with this call since we have an immutable reference.
> >>
> >> This needs an invariant on `PollTable` (i.e. `self.0` is valid).
> > 
> > How would you phrase it?
> 
> - `self.0` contains a valid `bindings::poll_table`.
> - `self.0` is only modified via references to `Self`.
> 
> >>> +        unsafe { (*ptr)._qproc }
> >>> +    }
> >>> +
> >>> +    /// Register this [`PollTable`] with the provided [`PollCondVar`], so that it can be notified
> >>> +    /// using the condition variable.
> >>> +    pub fn register_wait(&mut self, file: &File, cv: &PollCondVar) {
> >>> +        if let Some(qproc) = self.get_qproc() {
> >>> +            // SAFETY: The pointers to `self` and `file` are valid because they are references.
> >>
> >> What about cv.wait_list...
> > 
> > I can add it to the list of things that are valid due to references.
> 

Actually, there is an implied safety requirement here, it's about how
qproc is implemented. As we can see, PollCondVar::drop() will wait for a
RCU grace period, that means the waiter (a file or something) has to use
RCU to access the cv.wait_list, otherwise, the synchronize_rcu() in
PollCondVar::drop() won't help.

To phrase it, it's more like:

(in the safety requirement of `PollTable::from_ptr` and the type
invariant of `PollTable`):

", further, if the qproc function in poll_table publishs the pointer of
the wait_queue_head, it must publish it in a way that reads on the
published pointer have to be in an RCU read-side critical section."

and here we can said,

"per type invariant, `qproc` cannot publish `cv.wait_list` without
proper RCU protection, so it's safe to use `cv.wait_list` here, and with
the synchronize_rcu() in PollCondVar::drop(), free of the wait_list will
be delayed until all usages are done."

I know, this is quite verbose, but just imagine some one removes the
rcu_read_lock() and rcu_read_unlock() in ep_remove_wait_queue(), the
poll table from epoll (using ep_ptable_queue_proc()) is still valid one
according to the current safety requirement, but now there is a
use-after-free in the following case:

	CPU 0                        CPU1
	                             ep_remove_wait_queue():
				       struct wait_queue_head *whead;
	                               whead = smp_load_acquire(...);
	                               if (whead) { // not null
	PollCondVar::drop():
	  __wake_pollfree();
	  synchronize_rcu(); // no current RCU readers, yay.
	  <free the wait_queue_head>
	                                 remove_wait_queue(whead, ...); // BOOM, use-after-free

Regards,
Boqun
  
Benno Lossin Dec. 13, 2023, 9:12 a.m. UTC | #5
On 12/13/23 02:35, Boqun Feng wrote:
> On Tue, Dec 12, 2023 at 05:01:28PM +0000, Benno Lossin wrote:
>> On 12/12/23 10:59, Alice Ryhl wrote:
>>> On Fri, Dec 8, 2023 at 6:53 PM Benno Lossin <benno.lossin@proton.me> wrote:
>>>> On 12/6/23 12:59, Alice Ryhl wrote:
>>>>> +    fn get_qproc(&self) -> bindings::poll_queue_proc {
>>>>> +        let ptr = self.0.get();
>>>>> +        // SAFETY: The `ptr` is valid because it originates from a reference, and the `_qproc`
>>>>> +        // field is not modified concurrently with this call since we have an immutable reference.
>>>>
>>>> This needs an invariant on `PollTable` (i.e. `self.0` is valid).
>>>
>>> How would you phrase it?
>>
>> - `self.0` contains a valid `bindings::poll_table`.
>> - `self.0` is only modified via references to `Self`.
>>
>>>>> +        unsafe { (*ptr)._qproc }
>>>>> +    }
>>>>> +
>>>>> +    /// Register this [`PollTable`] with the provided [`PollCondVar`], so that it can be notified
>>>>> +    /// using the condition variable.
>>>>> +    pub fn register_wait(&mut self, file: &File, cv: &PollCondVar) {
>>>>> +        if let Some(qproc) = self.get_qproc() {
>>>>> +            // SAFETY: The pointers to `self` and `file` are valid because they are references.
>>>>
>>>> What about cv.wait_list...
>>>
>>> I can add it to the list of things that are valid due to references.
>>
> 
> Actually, there is an implied safety requirement here, it's about how
> qproc is implemented. As we can see, PollCondVar::drop() will wait for a
> RCU grace period, that means the waiter (a file or something) has to use
> RCU to access the cv.wait_list, otherwise, the synchronize_rcu() in
> PollCondVar::drop() won't help.

Good catch, this is rather important. I did not find the implementation
of `qproc`, since it is a function pointer. Since this pattern is
common, what is the way to find the implementation of those in general?

I imagine that the pattern is used to enable dynamic selection of the
concrete implementation, but there must be some general specification of
what the function does, is this documented somewhere?

> To phrase it, it's more like:
> 
> (in the safety requirement of `PollTable::from_ptr` and the type
> invariant of `PollTable`):
> 
> ", further, if the qproc function in poll_table publishs the pointer of
> the wait_queue_head, it must publish it in a way that reads on the
> published pointer have to be in an RCU read-side critical section."

What do you mean by `publish`?

> and here we can said,
> 
> "per type invariant, `qproc` cannot publish `cv.wait_list` without
> proper RCU protection, so it's safe to use `cv.wait_list` here, and with
> the synchronize_rcu() in PollCondVar::drop(), free of the wait_list will
> be delayed until all usages are done."

I think I am missing how the call to `__wake_up_pollfree` ensures that
nobody uses the `PollCondVar` any longer. How is it removed from the
table?
  
Alice Ryhl Dec. 13, 2023, 10:09 a.m. UTC | #6
Benno Lossin <benno.lossin@proton.me> writes:
>> and here we can said,
>> 
>> "per type invariant, `qproc` cannot publish `cv.wait_list` without
>> proper RCU protection, so it's safe to use `cv.wait_list` here, and with
>> the synchronize_rcu() in PollCondVar::drop(), free of the wait_list will
>> be delayed until all usages are done."
> 
> I think I am missing how the call to `__wake_up_pollfree` ensures that
> nobody uses the `PollCondVar` any longer. How is it removed from the
> table?

The __wake_up_pollfree function clears the queue. Here is its
documentation:

/**
 * wake_up_pollfree - signal that a polled waitqueue is going away
 * @wq_head: the wait queue head
 *
 * In the very rare cases where a ->poll() implementation uses a waitqueue whose
 * lifetime is tied to a task rather than to the 'struct file' being polled,
 * this function must be called before the waitqueue is freed so that
 * non-blocking polls (e.g. epoll) are notified that the queue is going away.
 *
 * The caller must also RCU-delay the freeing of the wait_queue_head, e.g. via
 * an explicit synchronize_rcu() or call_rcu(), or via SLAB_TYPESAFE_BY_RCU.
 */

The only way for another thread to touch the queue after it has been
cleared is if they are concurrently removing themselves from the queue
under RCU. Because of that, we have to wait for an RCU grace period
after the call to __wake_up_pollfree to ensure that any such concurrent
users have gone away.

Alice
  
Alice Ryhl Dec. 13, 2023, 11:02 a.m. UTC | #7
Benno Lossin <benno.lossin@proton.me> writes:
>>>> +#[pinned_drop]
>>>> +impl PinnedDrop for PollCondVar {
>>>> +    fn drop(self: Pin<&mut Self>) {
>>>> +        // Clear anything registered using `register_wait`.
>>>> +        //
>>>> +        // SAFETY: The pointer points at a valid wait list.
>>>
>>> I was a bit confused by "wait list", since the C type is named
>>> `wait_queue_head`, maybe just use the type name?
>> 
>> I will update all instances of "wait list" to "wait_queue_head". It's
>> because I incorrectly remembered the C type name to be "wait_list".
> 
> Maybe we should also change the name of the field on `CondVar`?
> 
> If you guys agree, I can open a good-first-issue, since it is a very
> simple change.

I think that change is fine, but let's not add it to this patchset,
since it would need to be an eight patch. I'll let you open an issue for
it.

Alice
  
Boqun Feng Dec. 13, 2023, 5:05 p.m. UTC | #8
On Wed, Dec 13, 2023 at 09:12:45AM +0000, Benno Lossin wrote:
[...]
> > 
> > Actually, there is an implied safety requirement here, it's about how
> > qproc is implemented. As we can see, PollCondVar::drop() will wait for a
> > RCU grace period, that means the waiter (a file or something) has to use
> > RCU to access the cv.wait_list, otherwise, the synchronize_rcu() in
> > PollCondVar::drop() won't help.
> 
> Good catch, this is rather important. I did not find the implementation
> of `qproc`, since it is a function pointer. Since this pattern is
> common, what is the way to find the implementation of those in general?
> 

Actually I don't find any. Ping vfs ;-)

Personally, it took me a while to get a rough understanding of the API:
it's similar to `Future::poll` (or at least the registering waker part),
it basically should registers a waiter, so that when an event happens
later, the waiter gets notified. Also the waiter registration can have a
(optional?) cancel mechanism (like an async drop of Future ;-)), and
that's what gives us headache here: cancellation needs to remove the
waiter from the wait_queue_head, which means wait_queue_head must be
valid during the removal, and that means the kfree of wait_queue_head
must be delayed to a state where no one can access it in waiter removal.

> I imagine that the pattern is used to enable dynamic selection of the
> concrete implementation, but there must be some general specification of
> what the function does, is this documented somewhere?
> 
> > To phrase it, it's more like:
> > 
> > (in the safety requirement of `PollTable::from_ptr` and the type
> > invariant of `PollTable`):
> > 
> > ", further, if the qproc function in poll_table publishs the pointer of
> > the wait_queue_head, it must publish it in a way that reads on the
> > published pointer have to be in an RCU read-side critical section."
> 
> What do you mean by `publish`?
> 

Publishing a pointer is like `Send`ing a `&T` (or put pointer in a
global variable), so that other threads can access it. Note that since
the cancel mechanism is optional (best to my knowledge), so a qproc call
may not pushlish the pointer.

Regards,
Boqun
  

Patch

diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index c8daee341df6..14f84aeef62d 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -13,6 +13,7 @@ 
 #include <linux/file.h>
 #include <linux/fs.h>
 #include <linux/pid_namespace.h>
+#include <linux/poll.h>
 #include <linux/security.h>
 #include <linux/slab.h>
 #include <linux/refcount.h>
@@ -25,3 +26,4 @@ 
 const size_t BINDINGS_ARCH_SLAB_MINALIGN = ARCH_SLAB_MINALIGN;
 const gfp_t BINDINGS_GFP_KERNEL = GFP_KERNEL;
 const gfp_t BINDINGS___GFP_ZERO = __GFP_ZERO;
+const __poll_t BINDINGS_POLLFREE = POLLFREE;
diff --git a/rust/bindings/lib.rs b/rust/bindings/lib.rs
index 9bcbea04dac3..eeb291cc60db 100644
--- a/rust/bindings/lib.rs
+++ b/rust/bindings/lib.rs
@@ -51,3 +51,4 @@  mod bindings_helper {
 
 pub const GFP_KERNEL: gfp_t = BINDINGS_GFP_KERNEL;
 pub const __GFP_ZERO: gfp_t = BINDINGS___GFP_ZERO;
+pub const POLLFREE: __poll_t = BINDINGS_POLLFREE;
diff --git a/rust/kernel/sync.rs b/rust/kernel/sync.rs
index d219ee518eff..84726f80c406 100644
--- a/rust/kernel/sync.rs
+++ b/rust/kernel/sync.rs
@@ -11,6 +11,7 @@ 
 mod condvar;
 pub mod lock;
 mod locked_by;
+pub mod poll;
 
 pub use arc::{Arc, ArcBorrow, UniqueArc};
 pub use condvar::CondVar;
diff --git a/rust/kernel/sync/poll.rs b/rust/kernel/sync/poll.rs
new file mode 100644
index 000000000000..e1dded9b7b9d
--- /dev/null
+++ b/rust/kernel/sync/poll.rs
@@ -0,0 +1,103 @@ 
+// SPDX-License-Identifier: GPL-2.0
+
+//! Utilities for working with `struct poll_table`.
+
+use crate::{
+    bindings,
+    file::File,
+    prelude::*,
+    sync::{CondVar, LockClassKey},
+    types::Opaque,
+};
+use core::ops::Deref;
+
+/// Creates a [`PollCondVar`] initialiser with the given name and a newly-created lock class.
+#[macro_export]
+macro_rules! new_poll_condvar {
+    ($($name:literal)?) => {
+        $crate::file::PollCondVar::new($crate::optional_name!($($name)?), $crate::static_lock_class!())
+    };
+}
+
+/// Wraps the kernel's `struct poll_table`.
+#[repr(transparent)]
+pub struct PollTable(Opaque<bindings::poll_table>);
+
+impl PollTable {
+    /// Creates a reference to a [`PollTable`] from a valid pointer.
+    ///
+    /// # Safety
+    ///
+    /// The caller must ensure that for the duration of 'a, the pointer will point at a valid poll
+    /// table, and that it is only accessed via the returned reference.
+    pub unsafe fn from_ptr<'a>(ptr: *mut bindings::poll_table) -> &'a mut PollTable {
+        // SAFETY: The safety requirements guarantee the validity of the dereference, while the
+        // `PollTable` type being transparent makes the cast ok.
+        unsafe { &mut *ptr.cast() }
+    }
+
+    fn get_qproc(&self) -> bindings::poll_queue_proc {
+        let ptr = self.0.get();
+        // SAFETY: The `ptr` is valid because it originates from a reference, and the `_qproc`
+        // field is not modified concurrently with this call since we have an immutable reference.
+        unsafe { (*ptr)._qproc }
+    }
+
+    /// Register this [`PollTable`] with the provided [`PollCondVar`], so that it can be notified
+    /// using the condition variable.
+    pub fn register_wait(&mut self, file: &File, cv: &PollCondVar) {
+        if let Some(qproc) = self.get_qproc() {
+            // SAFETY: The pointers to `self` and `file` are valid because they are references.
+            //
+            // Before the wait list is destroyed, the destructor of `PollCondVar` will clear
+            // everything in the wait list, so the wait list is not used after it is freed.
+            unsafe { qproc(file.as_ptr() as _, cv.wait_list.get(), self.0.get()) };
+        }
+    }
+}
+
+/// A wrapper around [`CondVar`] that makes it usable with [`PollTable`].
+///
+/// # Invariant
+///
+/// If `needs_synchronize_rcu` is false, then there is nothing registered with `register_wait`.
+///
+/// [`CondVar`]: crate::sync::CondVar
+#[pin_data(PinnedDrop)]
+pub struct PollCondVar {
+    #[pin]
+    inner: CondVar,
+}
+
+impl PollCondVar {
+    /// Constructs a new condvar initialiser.
+    pub fn new(name: &'static CStr, key: &'static LockClassKey) -> impl PinInit<Self> {
+        pin_init!(Self {
+            inner <- CondVar::new(name, key),
+        })
+    }
+}
+
+// Make the `CondVar` methods callable on `PollCondVar`.
+impl Deref for PollCondVar {
+    type Target = CondVar;
+
+    fn deref(&self) -> &CondVar {
+        &self.inner
+    }
+}
+
+#[pinned_drop]
+impl PinnedDrop for PollCondVar {
+    fn drop(self: Pin<&mut Self>) {
+        // Clear anything registered using `register_wait`.
+        //
+        // SAFETY: The pointer points at a valid wait list.
+        unsafe { bindings::__wake_up_pollfree(self.inner.wait_list.get()) };
+
+        // Wait for epoll items to be properly removed.
+        //
+        // SAFETY: Just an FFI call.
+        unsafe { bindings::synchronize_rcu() };
+    }
+}