[v2] char: xillybus: Prevent use-after-free due to race condition

Message ID 20221030094209.65916-1-eli.billauer@gmail.com
State New
Headers
Series [v2] char: xillybus: Prevent use-after-free due to race condition |

Commit Message

Eli Billauer Oct. 30, 2022, 9:42 a.m. UTC
  The driver for XillyUSB devices maintains a kref reference count on each
xillyusb_dev structure, which represents a physical device. This reference
count reaches zero when the device has been disconnected and there are no
open file descriptors that are related to the device. When this occurs,
kref_put() calls cleanup_dev(), which clears up the device's data,
including the structure itself.

However, when xillyusb_open() is called, this reference count becomes
tricky: This function needs to obtain the xillyusb_dev structure that
relates to the inode's major and minor (as there can be several such).
xillybus_find_inode() (which is defined in xillybus_class.c) is called
for this purpose. xillybus_find_inode() holds a mutex that is global in
xillybus_class.c to protect the list of devices, and releases this
mutex before returning. As a result, nothing protects the xillyusb_dev's
reference counter from being decremented to zero before xillyusb_open()
increments it on its own behalf. Hence the structure can be freed
due to a rare race condition.

To solve this, a mutex is added. It is locked by xillyusb_open() before
the call to xillybus_find_inode() and is released only after the kref
counter has been incremented on behalf of the newly opened inode. This
protects the kref reference counters of all xillyusb_dev structs from
being decremented by xillyusb_disconnect() during this time segment, as
the call to kref_put() in this function is done with the same lock held.

There is no need to hold the lock on other calls to kref_put(), because
if xillybus_find_inode() finds a struct, xillyusb_disconnect() has not
made the call to remove it, and hence not made its call to kref_put(),
which takes place afterwards. Hence preventing xillyusb_disconnect's
call to kref_put() is enough to ensure that the reference doesn't reach
zero before it's incremented by xillyusb_open().

It would have been more natural to increment the reference count in
xillybus_find_inode() of course, however this function is also called by
Xillybus' driver for PCIe / OF, which registers a completely different
structure. Therefore, xillybus_find_inode() treats these structures as
void pointers, and accordingly can't make any changes.

Reported-by: Hyunwoo Kim <imv4bel@gmail.com>
Suggested-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Eli Billauer <eli.billauer@gmail.com>
---

Notes:
    Changelog:
    
    v2
      -- Rewrite completely: Instead of juggling with a mutex, add a new
         one.

 drivers/char/xillybus/xillyusb.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)
  

Comments

Alan Stern Oct. 30, 2022, 4:23 p.m. UTC | #1
On Sun, Oct 30, 2022 at 11:42:09AM +0200, Eli Billauer wrote:
> The driver for XillyUSB devices maintains a kref reference count on each
> xillyusb_dev structure, which represents a physical device. This reference
> count reaches zero when the device has been disconnected and there are no
> open file descriptors that are related to the device. When this occurs,
> kref_put() calls cleanup_dev(), which clears up the device's data,
> including the structure itself.
> 
> However, when xillyusb_open() is called, this reference count becomes
> tricky: This function needs to obtain the xillyusb_dev structure that
> relates to the inode's major and minor (as there can be several such).
> xillybus_find_inode() (which is defined in xillybus_class.c) is called
> for this purpose. xillybus_find_inode() holds a mutex that is global in
> xillybus_class.c to protect the list of devices, and releases this
> mutex before returning. As a result, nothing protects the xillyusb_dev's
> reference counter from being decremented to zero before xillyusb_open()
> increments it on its own behalf. Hence the structure can be freed
> due to a rare race condition.
> 
> To solve this, a mutex is added. It is locked by xillyusb_open() before
> the call to xillybus_find_inode() and is released only after the kref
> counter has been incremented on behalf of the newly opened inode. This
> protects the kref reference counters of all xillyusb_dev structs from
> being decremented by xillyusb_disconnect() during this time segment, as
> the call to kref_put() in this function is done with the same lock held.
> 
> There is no need to hold the lock on other calls to kref_put(), because
> if xillybus_find_inode() finds a struct, xillyusb_disconnect() has not
> made the call to remove it, and hence not made its call to kref_put(),
> which takes place afterwards. Hence preventing xillyusb_disconnect's
> call to kref_put() is enough to ensure that the reference doesn't reach
> zero before it's incremented by xillyusb_open().
> 
> It would have been more natural to increment the reference count in
> xillybus_find_inode() of course, however this function is also called by
> Xillybus' driver for PCIe / OF, which registers a completely different
> structure. Therefore, xillybus_find_inode() treats these structures as
> void pointers, and accordingly can't make any changes.
> 
> Reported-by: Hyunwoo Kim <imv4bel@gmail.com>
> Suggested-by: Alan Stern <stern@rowland.harvard.edu>
> Signed-off-by: Eli Billauer <eli.billauer@gmail.com>

It looks like the xillybus driver already has a private mutex that would 
have been very well suited for this task: unit_mutex defined in 
xillybus_class.c.  Of course, there's nothing wrong with using a new 
mutex instead -- just make sure there aren't any ABBA locking order 
problems.

Alan Stern
  
Eli Billauer Oct. 30, 2022, 6:55 p.m. UTC | #2
On 30/10/2022 18:23, Alan Stern wrote:
> It looks like the xillybus driver already has a private mutex that would
> have been very well suited for this task: unit_mutex defined in
> xillybus_class.c.

Indeed so. The problem is that unit_mutex is global to xillybus_class.c, 
and I need the mutex to be released in xillyusb.c: The kref counter, 
which needs to be incremented with a mutex held, is inside a structure 
that only the XillyUSB driver knows about, so xillybus_class can't do 
that. Which is why xillybus_find_inode() passed a pointer to unit_mutex 
  to its caller in the v1 version of this patch. But that turned out to 
be too odd.

> Of course, there's nothing wrong with using a new
> mutex instead -- just make sure there aren't any ABBA locking order
> problems.

The kref_mutex is always taken with no other (Xillybus-related) mutex 
taken. So the locking order is assured.

But thanks for the reminder. Never hurts checking again.

Regards,
    Eli
  
Hyunwoo Kim Nov. 13, 2022, 8:05 a.m. UTC | #3
Dear,

Sorry for the late review.

This patch cannot prevent the UAF scenario I presented:
```
                cpu0                                                cpu1
       1. xillyusb_open()
          mutex_lock(&kref_mutex);    // unaffected lock
          xillybus_find_inode()
          mutex_lock(&unit_mutex);
          unit = iter;
          mutex_unlock(&unit_mutex);
                                                             2. xillyusb_disconnect()
                                                                xillybus_cleanup_chrdev()
                                                                mutex_lock(&unit_mutex);
                                                                kfree(unit);
                                                                mutex_unlock(&unit_mutex);
       3. *private_data = unit->private_data;   // UAF

```

This is a UAF for 'unit', not a UAF for 'xdev'.
So, the added 'kref_mutex' has no effect.


Regards,
Hyunwoo Kim
  
Eli Billauer Nov. 13, 2022, 8:30 a.m. UTC | #4
On 13/11/2022 10:05, Hyunwoo Kim wrote:
> Dear,
> 
> Sorry for the late review.
> 
> This patch cannot prevent the UAF scenario I presented:
> ```
>                  cpu0                                                cpu1
>         1. xillyusb_open()
>            mutex_lock(&kref_mutex);    // unaffected lock
>            xillybus_find_inode()
>            mutex_lock(&unit_mutex);
>            unit = iter;
>            mutex_unlock(&unit_mutex);
>                                                               2. xillyusb_disconnect()
>                                                                  xillybus_cleanup_chrdev()
>                                                                  mutex_lock(&unit_mutex);
>                                                                  kfree(unit);
>                                                                  mutex_unlock(&unit_mutex);
>         3. *private_data = unit->private_data;   // UAF
> 
> ```
> 
> This is a UAF for 'unit', not a UAF for 'xdev'.
> So, the added 'kref_mutex' has no effect.
> 

You're correct. This submitted patch solves only one problem out of two. 
It prevents the content of @private_data to be freed, but you're right 
that @unit itself isn't protected well enough.

The problem you're pointing at is that @unit can be freed before its 
attempted use, because the mutex is released too early.

This is easily solved by moving down the mutex_unlock() call to after 
@unit has been used. Do you want the pleasure to submit this patch, or 
should I?

Thanks,
   Eli
  
Hyunwoo Kim Nov. 13, 2022, 8:47 a.m. UTC | #5
Dear,

On Sun, Nov 13, 2022 at 10:30:16AM +0200, Eli Billauer wrote:
> On 13/11/2022 10:05, Hyunwoo Kim wrote:
> > Dear,
> > 
> > Sorry for the late review.
> > 
> > This patch cannot prevent the UAF scenario I presented:
> > ```
> >                  cpu0                                                cpu1
> >         1. xillyusb_open()
> >            mutex_lock(&kref_mutex);    // unaffected lock
> >            xillybus_find_inode()
> >            mutex_lock(&unit_mutex);
> >            unit = iter;
> >            mutex_unlock(&unit_mutex);
> >                                                               2. xillyusb_disconnect()
> >                                                                  xillybus_cleanup_chrdev()
> >                                                                  mutex_lock(&unit_mutex);
> >                                                                  kfree(unit);
> >                                                                  mutex_unlock(&unit_mutex);
> >         3. *private_data = unit->private_data;   // UAF
> > 
> > ```
> > 
> > This is a UAF for 'unit', not a UAF for 'xdev'.
> > So, the added 'kref_mutex' has no effect.
> > 
> 
> You're correct. This submitted patch solves only one problem out of two. It
> prevents the content of @private_data to be freed, but you're right that
> @unit itself isn't protected well enough.
> 
> The problem you're pointing at is that @unit can be freed before its
> attempted use, because the mutex is released too early.
> 
> This is easily solved by moving down the mutex_unlock() call to after @unit
> has been used. Do you want the pleasure to submit this patch, or should I?

I hope you work as before.

And, even if the mutex_unlock(&unit_mutex); of xillybus_find_inode() 
is finally moved, xdev may be released before kref_get() is executed 
if xillyusb_disconnect() ends just before the function returns. 
(Of course, this is an extremely rare case.)

So, in xillyusb_open() we need to move kref_get() above xillybus_find_inode().


Regards,
Hyunwoo Kim
  
Eli Billauer Nov. 13, 2022, 9:03 a.m. UTC | #6
On 13/11/2022 10:47, Hyunwoo Kim wrote:
> And, even if the mutex_unlock(&unit_mutex); of xillybus_find_inode()
> is finally moved, xdev may be released before kref_get() is executed
> if xillyusb_disconnect() ends just before the function returns.
> (Of course, this is an extremely rare case.)
> 
> So, in xillyusb_open() we need to move kref_get() above xillybus_find_inode().

First of all, that's impossible. kref_get() is called on a member of a 
specific @xdev's struct, and it's xillybus_find_inode()'s job to find 
it. So before the call to xillybus_find_inode(), we don't know which 
@xdev it is. That's the tricky part of all this.

The solution of this submitted patch was a lock that briefly prevents 
the kref_put() of all @xdevs. The way it works is that if an @xdev is 
found by xillybus_find_inode(), it necessarily means that 
xillyusb_disconnect()'s call to xillybus_cleanup_chrdev() hasn't 
returned (yet). Therefore, holding @kref_mutex guarantees that the 
kref_put() call, which is later on, isn't reached for the @xdev that has 
been found.

If you've found a flaw in this mechanism, please be more specific about it.

Regards,
   Eli
  
Hyunwoo Kim Nov. 13, 2022, 9:14 a.m. UTC | #7
On Sun, Nov 13, 2022 at 11:03:20AM +0200, Eli Billauer wrote:
> On 13/11/2022 10:47, Hyunwoo Kim wrote:
> > And, even if the mutex_unlock(&unit_mutex); of xillybus_find_inode()
> > is finally moved, xdev may be released before kref_get() is executed
> > if xillyusb_disconnect() ends just before the function returns.
> > (Of course, this is an extremely rare case.)
> > 
> > So, in xillyusb_open() we need to move kref_get() above xillybus_find_inode().
> 
> First of all, that's impossible. kref_get() is called on a member of a
> specific @xdev's struct, and it's xillybus_find_inode()'s job to find it. So
> before the call to xillybus_find_inode(), we don't know which @xdev it is.
> That's the tricky part of all this.
> 
> The solution of this submitted patch was a lock that briefly prevents the
> kref_put() of all @xdevs. The way it works is that if an @xdev is found by
> xillybus_find_inode(), it necessarily means that xillyusb_disconnect()'s
> call to xillybus_cleanup_chrdev() hasn't returned (yet). Therefore, holding
> @kref_mutex guarantees that the kref_put() call, which is later on, isn't
> reached for the @xdev that has been found.
> 
> If you've found a flaw in this mechanism, please be more specific about it.

you're right.

It seems that you only need to move the location of unit_mutex 
in xillybus_find_inode().


Regards,
Hyunwoo Kim
  
Eli Billauer Nov. 13, 2022, 11:36 a.m. UTC | #8
On 13/11/2022 11:14, Hyunwoo Kim wrote:
> 
> It seems that you only need to move the location of unit_mutex
> in xillybus_find_inode().
> 

Agreed. I'll prepare a follow-up patch to fix this issue as well.

Thanks a lot for drawing my attention to this.

   Eli
  

Patch

diff --git a/drivers/char/xillybus/xillyusb.c b/drivers/char/xillybus/xillyusb.c
index 39bcbfd908b4..5a5afa14ca8c 100644
--- a/drivers/char/xillybus/xillyusb.c
+++ b/drivers/char/xillybus/xillyusb.c
@@ -184,6 +184,14 @@  struct xillyusb_dev {
 	struct mutex process_in_mutex; /* synchronize wakeup_all() */
 };
 
+/*
+ * kref_mutex is used in xillyusb_open() to prevent the xillyusb_dev
+ * struct from being freed during the gap between being found by
+ * xillybus_find_inode() and having its reference count incremented.
+ */
+
+static DEFINE_MUTEX(kref_mutex);
+
 /* FPGA to host opcodes */
 enum {
 	OPCODE_DATA = 0,
@@ -1237,9 +1245,16 @@  static int xillyusb_open(struct inode *inode, struct file *filp)
 	int rc;
 	int index;
 
+	mutex_lock(&kref_mutex);
+
 	rc = xillybus_find_inode(inode, (void **)&xdev, &index);
-	if (rc)
+	if (rc) {
+		mutex_unlock(&kref_mutex);
 		return rc;
+	}
+
+	kref_get(&xdev->kref);
+	mutex_unlock(&kref_mutex);
 
 	chan = &xdev->channels[index];
 	filp->private_data = chan;
@@ -1275,8 +1290,6 @@  static int xillyusb_open(struct inode *inode, struct file *filp)
 	    ((filp->f_mode & FMODE_WRITE) && chan->open_for_write))
 		goto unmutex_fail;
 
-	kref_get(&xdev->kref);
-
 	if (filp->f_mode & FMODE_READ)
 		chan->open_for_read = 1;
 
@@ -1413,6 +1426,7 @@  static int xillyusb_open(struct inode *inode, struct file *filp)
 	return rc;
 
 unmutex_fail:
+	kref_put(&xdev->kref, cleanup_dev);
 	mutex_unlock(&chan->lock);
 	return rc;
 }
@@ -2227,7 +2241,9 @@  static void xillyusb_disconnect(struct usb_interface *interface)
 
 	xdev->dev = NULL;
 
+	mutex_lock(&kref_mutex);
 	kref_put(&xdev->kref, cleanup_dev);
+	mutex_unlock(&kref_mutex);
 }
 
 static struct usb_driver xillyusb_driver = {