[v6] media: uvcvideo: Fix race condition with usb_kill_urb

Message ID 20221212-uvc-race-v6-0-2a662f8de011@chromium.org
State New
Headers
Series [v6] media: uvcvideo: Fix race condition with usb_kill_urb |

Commit Message

Ricardo Ribalda Jan. 2, 2023, 2:48 p.m. UTC
  usb_kill_urb warranties that all the handlers are finished when it
returns, but does not protect against threads that might be handling
asynchronously the urb.

For UVC, the function uvc_ctrl_status_event_async() takes care of
control changes asynchronously.

 If the code is executed in the following order:

CPU 0					CPU 1
===== 					=====
uvc_status_complete()
					uvc_status_stop()
uvc_ctrl_status_event_work()
					uvc_status_start() -> FAIL

Then uvc_status_start will keep failing and this error will be shown:

<4>[    5.540139] URB 0000000000000000 submitted while active
drivers/usb/core/urb.c:378 usb_submit_urb+0x4c3/0x528

Let's improve the current situation, by not re-submiting the urb if
we are stopping the status event. Also process the queued work
(if any) during stop.

CPU 0					CPU 1
===== 					=====
uvc_status_complete()
					uvc_status_stop()
					uvc_status_start()
uvc_ctrl_status_event_work() -> FAIL

Hopefully, with the usb layer protection this should be enough to cover
all the cases.

Cc: stable@vger.kernel.org
Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
Reviewed-by: Yunke Cao <yunkec@chromium.org>
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
---
uvc: Fix race condition on uvc

Make sure that all the async work is finished when we stop the status urb.

To: Yunke Cao <yunkec@chromium.org>
To: Sergey Senozhatsky <senozhatsky@chromium.org>
To: Max Staudt <mstaudt@google.com>
To: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
To: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-media@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes in v6:
- Improve comments. (Thanks Laurent).
- Use true/false instead of 1/0 (Thanks Laurent).
- Link to v5: https://lore.kernel.org/r/20221212-uvc-race-v5-0-3db3933d1608@chromium.org

Changes in v5:
- atomic_t do not impose barriers, use smp_mb() instead. (Thanks Laurent)
- Add an extra cancel_work_sync().
- Link to v4: https://lore.kernel.org/r/20221212-uvc-race-v4-0-38d7075b03f5@chromium.org

Changes in v4:
- Replace bool with atomic_t to avoid compiler reordering.
- First complete the async work and then kill the urb to avoid race (Thanks Laurent!)
- Link to v3: https://lore.kernel.org/r/20221212-uvc-race-v3-0-954efc752c9a@chromium.org

Changes in v3:
- Remove the patch for dev->status, makes more sense in another series, and makes
  the zero day less nervous.
- Update reviewed-by (thanks Yunke!).
- Link to v2: https://lore.kernel.org/r/20221212-uvc-race-v2-0-54496cc3b8ab@chromium.org

Changes in v2:
- Add a patch for not kalloc dev->status
- Redo the logic mechanism, so it also works with suspend (Thanks Yunke!)
- Link to v1: https://lore.kernel.org/r/20221212-uvc-race-v1-0-c52e1783c31d@chromium.org
---
 drivers/media/usb/uvc/uvc_ctrl.c   |  3 +++
 drivers/media/usb/uvc/uvc_status.c | 40 ++++++++++++++++++++++++++++++++++++++
 drivers/media/usb/uvc/uvcvideo.h   |  1 +
 3 files changed, 44 insertions(+)


---
base-commit: 0ec5a38bf8499f403f81cb81a0e3a60887d1993c
change-id: 20221212-uvc-race-09276ea68bf8

Best regards,
  

Comments

Ricardo Ribalda Jan. 3, 2023, 9:08 a.m. UTC | #1
Hi Hillf

Thanks for the heads up

On Tue, 3 Jan 2023 at 03:25, Hillf Danton <hdanton@sina.com> wrote:
>
> On 02 Jan 2023 15:48:01 +0100 Ricardo Ribalda <ribalda@chromium.org>
> > --- a/drivers/media/usb/uvc/uvc_ctrl.c
> > +++ b/drivers/media/usb/uvc/uvc_ctrl.c
> > @@ -1442,6 +1442,9 @@ static void uvc_ctrl_status_event_work(struct work_struct *work)
> >
> >       uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
> >
> > +     if (dev->flush_status)
> > +             return;
> > +
> >       /* Resubmit the URB. */
> >       w->urb->interval = dev->int_ep->desc.bInterval;
> >       ret = usb_submit_urb(w->urb, GFP_KERNEL);
> > diff --git a/drivers/media/usb/uvc/uvc_status.c b/drivers/media/usb/uvc/uvc_status.c
> > index 7518ffce22ed..e457889345a3 100644
> > --- a/drivers/media/usb/uvc/uvc_status.c
> > +++ b/drivers/media/usb/uvc/uvc_status.c
> > @@ -6,6 +6,7 @@
> >   *          Laurent Pinchart (laurent.pinchart@ideasonboard.com)
> >   */
> >
> > +#include <asm/barrier.h>
> >  #include <linux/kernel.h>
> >  #include <linux/input.h>
> >  #include <linux/slab.h>
> > @@ -309,5 +310,44 @@ int uvc_status_start(struct uvc_device *dev, gfp_t flags)
> >
> >  void uvc_status_stop(struct uvc_device *dev)
> >  {
> > +     struct uvc_ctrl_work *w = &dev->async_ctrl;
> > +
> > +     /* Prevent the asynchronous control handler from requeing the URB */
> > +     dev->flush_status = true;
> > +
> > +     /*
> > +      * The barrier is needed so the flush_status change is visible to other
> > +      * CPUs running the asynchronous handler before usb_kill_urb() is
> > +      * called below.
> > +      */
> > +     smp_mb();
>
> Given unpaired mb, take a look at the release/acquire memory barrier pairing
> in c5b2cbdbdac5 ("ipc/mqueue.c: update/document memory barriers")

Would it work? to replace:

dev->flush_status = true;
smp_mb();

with:
smp_store_release(&dev->flush_status, 1);

and then read it always with:

smp_load_acquire(&dev->flush_status);

Thanks!

>
> > +
> > +     /* If there is any status event on the queue, process it. */
> > +     if (cancel_work_sync(&w->work))
> > +             uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
> > +
> > +     /* Kill the urb. */
> >       usb_kill_urb(dev->int_urb);
> > +
> > +     /*
> > +      * The URB completion handler may have queued asynchronous work. This
> > +      * won't resubmit the URB as flush_status is set, but it needs to be
> > +      * cancelled before returning or it could then race with a future
> > +      * uvc_status_start() call.
> > +      */
> > +     if (cancel_work_sync(&w->work))
> > +             uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
> > +
> > +     /*
> > +      * From this point, there are no events on the queue and the status URB
> > +      * is dead, this is, no events will be queued until uvc_status_start()
> > +      * is called.
> > +      */
> > +     dev->flush_status = false;
> > +
> > +     /*
> > +      * Write to memory the value of flush_status before uvc_status_start()
> > +      * is called again.
> > +      */
> > +     smp_mb();
> >  }
  
Ricardo Ribalda Jan. 4, 2023, 9:11 p.m. UTC | #2
Hi Hillf

On Wed, 4 Jan 2023 at 21:54, Hillf Danton <hdanton@sina.com> wrote:
>
> On 3 Jan 2023 10:08:56 +0100 Ricardo Ribalda <ribalda@chromium.org>
> >
> > Would it work?
>
> How did you test your patch v6?

I have not tested v6 besides the compile test. The last one that I
device tested was v4 (with atomics) and it didn't crash on x86. I
assumed that since this is more restrictive it was less likely to
fail.

I put this patch on top of my "granular power saving" patch, because
it is more likely to run into race conditions, and then try to
exercise the driver.

So you have a suggestion to trigger the race conditions more effectively?

To be clear, what I mean by would it work with smp_store_release and
smp_load_aqcuire, is that based on the doc it seems like the right
choe, but I am sure that you have more experience than me here :). I
usually rely on mutexes for this.

Thanks!
  

Patch

diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
index c95a2229f4fa..5160facc8e20 100644
--- a/drivers/media/usb/uvc/uvc_ctrl.c
+++ b/drivers/media/usb/uvc/uvc_ctrl.c
@@ -1442,6 +1442,9 @@  static void uvc_ctrl_status_event_work(struct work_struct *work)
 
 	uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
 
+	if (dev->flush_status)
+		return;
+
 	/* Resubmit the URB. */
 	w->urb->interval = dev->int_ep->desc.bInterval;
 	ret = usb_submit_urb(w->urb, GFP_KERNEL);
diff --git a/drivers/media/usb/uvc/uvc_status.c b/drivers/media/usb/uvc/uvc_status.c
index 7518ffce22ed..e457889345a3 100644
--- a/drivers/media/usb/uvc/uvc_status.c
+++ b/drivers/media/usb/uvc/uvc_status.c
@@ -6,6 +6,7 @@ 
  *          Laurent Pinchart (laurent.pinchart@ideasonboard.com)
  */
 
+#include <asm/barrier.h>
 #include <linux/kernel.h>
 #include <linux/input.h>
 #include <linux/slab.h>
@@ -309,5 +310,44 @@  int uvc_status_start(struct uvc_device *dev, gfp_t flags)
 
 void uvc_status_stop(struct uvc_device *dev)
 {
+	struct uvc_ctrl_work *w = &dev->async_ctrl;
+
+	/* Prevent the asynchronous control handler from requeing the URB */
+	dev->flush_status = true;
+
+	/*
+	 * The barrier is needed so the flush_status change is visible to other
+	 * CPUs running the asynchronous handler before usb_kill_urb() is
+	 * called below.
+	 */
+	smp_mb();
+
+	/* If there is any status event on the queue, process it. */
+	if (cancel_work_sync(&w->work))
+		uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
+
+	/* Kill the urb. */
 	usb_kill_urb(dev->int_urb);
+
+	/*
+	 * The URB completion handler may have queued asynchronous work. This
+	 * won't resubmit the URB as flush_status is set, but it needs to be
+	 * cancelled before returning or it could then race with a future
+	 * uvc_status_start() call.
+	 */
+	if (cancel_work_sync(&w->work))
+		uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
+
+	/*
+	 * From this point, there are no events on the queue and the status URB
+	 * is dead, this is, no events will be queued until uvc_status_start()
+	 * is called.
+	 */
+	dev->flush_status = false;
+
+	/*
+	 * Write to memory the value of flush_status before uvc_status_start()
+	 * is called again.
+	 */
+	smp_mb();
 }
diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
index df93db259312..6a9b72d6789e 100644
--- a/drivers/media/usb/uvc/uvcvideo.h
+++ b/drivers/media/usb/uvc/uvcvideo.h
@@ -560,6 +560,7 @@  struct uvc_device {
 	struct usb_host_endpoint *int_ep;
 	struct urb *int_urb;
 	u8 *status;
+	bool flush_status;
 	struct input_dev *input;
 	char input_phys[64];