xhci: Do not create endpoint debugfs while holding the bandwidth mutex

Message ID 20230531-xhci-deadlock-v1-1-57780bff5124@chromium.org
State New
Headers
Series xhci: Do not create endpoint debugfs while holding the bandwidth mutex |

Commit Message

Ricardo Ribalda May 31, 2023, 12:40 p.m. UTC
  xhci_debugfs_create_endpoint needs to take the mm->mmap_sem, which is
not serialized with the hcd->bandwidth_mutex across the codebase.

Without this patch a deadlock has been observed with the uvc driver at
the functions v4l2_mmap() and usb_set_interface().

Cc: Stephen Boyd <swboyd@chromium.org
Fixes: 167657a1bb5f ("xhci: don't create endpoint debugfs entry before ring buffer is set.")
Signed-off-by: Ricardo Ribalda Delgado <ribalda@chromium.org>
---
I do not have a proper reproducer for this and I am not used to this
subsystem, so please take a careful look at this patch :).

Thanks!
---
 drivers/usb/host/xhci-debugfs.c | 4 ++++
 drivers/usb/host/xhci.c         | 4 ++--
 2 files changed, 6 insertions(+), 2 deletions(-)


---
base-commit: 48b1320a674e1ff5de2fad8606bee38f724594dc
change-id: 20230531-xhci-deadlock-de2ab21c90bc

Best regards,
  

Comments

Greg KH May 31, 2023, 1:10 p.m. UTC | #1
On Wed, May 31, 2023 at 02:40:02PM +0200, Ricardo Ribalda Delgado wrote:
> xhci_debugfs_create_endpoint needs to take the mm->mmap_sem, which is
> not serialized with the hcd->bandwidth_mutex across the codebase.
> 
> Without this patch a deadlock has been observed with the uvc driver at
> the functions v4l2_mmap() and usb_set_interface().
> 
> Cc: Stephen Boyd <swboyd@chromium.org

Missing ">" :(
  
Mathias Nyman June 1, 2023, 1:45 p.m. UTC | #2
On 31.5.2023 15.40, Ricardo Ribalda Delgado wrote:
> xhci_debugfs_create_endpoint needs to take the mm->mmap_sem, which is
> not serialized with the hcd->bandwidth_mutex across the codebase.
> 
> Without this patch a deadlock has been observed with the uvc driver at
> the functions v4l2_mmap() and usb_set_interface().
> 
> Cc: Stephen Boyd <swboyd@chromium.org
> Fixes: 167657a1bb5f ("xhci: don't create endpoint debugfs entry before ring buffer is set.")
> Signed-off-by: Ricardo Ribalda Delgado <ribalda@chromium.org>
> ---
> I do not have a proper reproducer for this and I am not used to this
> subsystem, so please take a careful look at this patch :).
> 
> Thanks!

Do you still have the lockdep output showing the deadlock?

I'm not sure how calling xhci_debugfs_create_endpoint() from
xhci_add_endpoint() instead of xhci_check_bandwidth() helps.

Both are called with hcd->bandwidth_mutex held:

usb_set_interface()
	mutex_lock(hcd->bandwidth_mutex);
	usb_hcd_alloc_bandwidth()
		hcd->driver->add_endpoint()    -> xhci_add_endpoint()
		hcd->driver->check_bandwidth() -> xhci_check_bandwidth()
	mutex_unlock(hcd->bandwidth_mutex);

Thanks
Mathias
  
Ricardo Ribalda June 1, 2023, 4:05 p.m. UTC | #3
Hi Mathias

On Thu, 1 Jun 2023 at 16:13, Mathias Nyman
<mathias.nyman@linux.intel.com> wrote:
>
> On 31.5.2023 15.40, Ricardo Ribalda Delgado wrote:
> > xhci_debugfs_create_endpoint needs to take the mm->mmap_sem, which is
> > not serialized with the hcd->bandwidth_mutex across the codebase.
> >
> > Without this patch a deadlock has been observed with the uvc driver at
> > the functions v4l2_mmap() and usb_set_interface().
> >
> > Cc: Stephen Boyd <swboyd@chromium.org
> > Fixes: 167657a1bb5f ("xhci: don't create endpoint debugfs entry before ring buffer is set.")
> > Signed-off-by: Ricardo Ribalda Delgado <ribalda@chromium.org>
> > ---
> > I do not have a proper reproducer for this and I am not used to this
> > subsystem, so please take a careful look at this patch :).
> >
> > Thanks!
>
> Do you still have the lockdep output showing the deadlock?

[  459.731142] ======================================================
[  459.731150] WARNING: possible circular locking dependency detected
[  459.731161] 5.4.169-lockdep-17434-g505c8a10e6fe #1 Not tainted
[  459.731168] ------------------------------------------------------
[  459.731176] syz-executor.3/15308 is trying to acquire lock:
[  459.731184] ffffff80c63e0ee0 (&queue->mutex){+.+.}, at:
uvc_queue_mmap+0x30/0xa0 [uvcvideo]
[  459.731226]
               but task is already holding lock:
[  459.731232] ffffff80a748eea8 (&mm->mmap_sem){++++}, at:
vm_mmap_pgoff+0x10c/0x1f4
[  459.731255]
               which lock already depends on the new lock.

[  459.731262]
               the existing dependency chain (in reverse order) is:
[  459.731269]
               -> #3 (&mm->mmap_sem){++++}:
[  459.731286]        __might_fault+0xec/0x150
[  459.731298]        filldir64+0x2e0/0x15dc
[  459.731310]        dcache_readdir+0x134/0x660
[  459.731320]        iterate_dir+0x200/0x40c
[  459.731331]        ksys_getdents64+0x218/0x78c
[  459.731342]        __arm64_sys_getdents64+0x7c/0x90
[  459.731353]        el0_svc_common+0x1c0/0x3dc
[  459.731363]        el0_svc_compat_handler+0x88/0xd4
[  459.731373]        el0_svc_compat+0x8/0x2c
[  459.731379]
               -> #2 (&sb->s_type->i_mutex_key#4){++++}:
[  459.731398]        down_write+0x60/0x118
[  459.731409]        start_creating+0xf8/0x260
[  459.731419]        debugfs_create_dir+0x30/0x290
[  459.731430]        xhci_debugfs_create_endpoint+0x118/0x1c8
[  459.731442]        xhci_check_bandwidth+0x520/0x6c0
[  459.731453]        usb_hcd_alloc_bandwidth+0x800/0x900
[  459.731464]        usb_set_configuration+0x4c4/0x1258
[  459.731475]        usb_generic_driver_probe+0x80/0x140
[  459.731486]        usb_probe_device+0xc8/0x240
[  459.731497]        really_probe+0x2f0/0x9e4
[  459.731507]        driver_probe_device+0xf0/0x2e4
[  459.731517]        __device_attach_driver+0x180/0x254
[  459.731528]        bus_for_each_drv+0x114/0x184
[  459.731537]        __device_attach+0x224/0x398
[  459.731547]        device_initial_probe+0x24/0x30
[  459.731557]        bus_probe_device+0xa8/0x1b8
[  459.731567]        device_add+0x63c/0x940
[  459.731577]        usb_new_device+0x810/0xe68
[  459.731587]        hub_port_connect+0xaf0/0x16bc
[  459.731597]        hub_port_connect_change+0x32c/0x5c0
[  459.731606]        port_event+0x9e8/0xe4c
[  459.731616]        hub_event+0x3fc/0xaec
[  459.731628]        process_one_work+0x604/0xa2c
[  459.731638]        worker_thread+0x930/0xea8
[  459.731648]        kthread+0x2dc/0x350
[  459.731658]        ret_from_fork+0x10/0x18
[  459.731664]
               -> #1 (hcd->bandwidth_mutex){+.+.}:
[  459.731680]        __mutex_lock_common+0x140/0x18c8
[  459.731690]        mutex_lock_nested+0x48/0x58
[  459.731701]        usb_set_interface+0x108/0x778
[  459.731724]        uvc_video_start_transfer+0x824/0x12a4 [uvcvideo]
[  459.731746]        uvc_video_start_streaming+0x148/0x2c8 [uvcvideo]
[  459.731767]        uvc_start_streaming+0x108/0x214 [uvcvideo]
[  459.731779]        vb2_start_streaming+0x110/0x3f0
[  459.731789]        vb2_core_streamon+0x234/0x340
[  459.731799]        vb2_streamon+0x80/0xac
[  459.731819]        uvc_queue_streamon+0x3c/0x5c [uvcvideo]
[  459.731840]        uvc_ioctl_streamon+0xd0/0x118 [uvcvideo]
[  459.731850]        v4l_streamon+0x6c/0x9c
[  459.731860]        __video_do_ioctl+0x940/0xaa8
[  459.731870]        video_usercopy+0x528/0x920
[  459.731880]        video_ioctl2+0x3c/0x4c
[  459.731889]        v4l2_ioctl+0x120/0x158
[  459.731900]        do_video_ioctl+0xdec/0x1784
[  459.731910]        v4l2_compat_ioctl32+0xc0/0x198
[  459.731921]        __arm64_compat_sys_ioctl+0x314/0x778
[  459.731931]        el0_svc_common+0x1c0/0x3dc
[  459.731941]        el0_svc_compat_handler+0x88/0xd4
[  459.731950]        el0_svc_compat+0x8/0x2c
[  459.731957]
               -> #0 (&queue->mutex){+.+.}:
[  459.731974]        __lock_acquire+0x1b74/0x4f04
[  459.731985]        lock_acquire+0xd0/0x168
[  459.731995]        __mutex_lock_common+0x140/0x18c8
[  459.732004]        mutex_lock_nested+0x48/0x58
[  459.732024]        uvc_queue_mmap+0x30/0xa0 [uvcvideo]
[  459.732045]        uvc_v4l2_mmap+0xa4/0x138 [uvcvideo]
[  459.732054]        v4l2_mmap+0x114/0x1f8
[  459.732065]        mmap_region+0x8b8/0xc90
[  459.732075]        do_mmap+0x654/0xaec
[  459.732084]        vm_mmap_pgoff+0x15c/0x1f4
[  459.732094]        ksys_mmap_pgoff+0x124/0x194
[  459.732105]        __arm64_compat_sys_aarch32_mmap2+0xd8/0xf0
[  459.732114]        el0_svc_common+0x1c0/0x3dc
[  459.732124]        el0_svc_compat_handler+0x88/0xd4
[  459.732134]        el0_svc_compat+0x8/0x2c
[  459.732141]
               other info that might help us debug this:

[  459.732148] Chain exists of:
                 &queue->mutex --> &sb->s_type->i_mutex_key#4 --> &mm->mmap_sem

[  459.732165]  Possible unsafe locking scenario:

[  459.732172]        CPU0                    CPU1
[  459.732178]        ----                    ----
[  459.732184]   lock(&mm->mmap_sem);
[  459.732193]                                lock(&sb->s_type->i_mutex_key#4);
[  459.732204]                                lock(&mm->mmap_sem);
[  459.732212]   lock(&queue->mutex);
[  459.732221]
                *** DEADLOCK ***

[  459.732230] 1 lock held by syz-executor.3/15308:
[  459.732237]  #0: ffffff80a748eea8 (&mm->mmap_sem){++++}, at:
vm_mmap_pgoff+0x10c/0x1f4
[  459.732256]
               stack backtrace:
[  459.732269] CPU: 6 PID: 15308 Comm: syz-executor.3 Not tainted
5.4.169-lockdep-17434-g505c8a10e6fe #1
[  459.732277] Hardware name: Google Pazquel (TI,LTE) (DT)
[  459.732284] Call trace:
[  459.732294]  dump_backtrace+0x0/0x2ec
[  459.732304]  show_stack+0x24/0x30
[  459.732315]  dump_stack+0x148/0x21c
[  459.732324]  print_circular_bug+0x18c/0x1b8
[  459.732334]  check_noncircular+0x2e4/0x3c4
[  459.732344]  __lock_acquire+0x1b74/0x4f04
[  459.732355]  lock_acquire+0xd0/0x168
[  459.732364]  __mutex_lock_common+0x140/0x18c8
[  459.732374]  mutex_lock_nested+0x48/0x58
[  459.732395]  uvc_queue_mmap+0x30/0xa0 [uvcvideo]
[  459.732415]  uvc_v4l2_mmap+0xa4/0x138 [uvcvideo]
[  459.732425]  v4l2_mmap+0x114/0x1f8
[  459.732435]  mmap_region+0x8b8/0xc90
[  459.732444]  do_mmap+0x654/0xaec
[  459.732454]  vm_mmap_pgoff+0x15c/0x1f4
[  459.732463]  ksys_mmap_pgoff+0x124/0x194
[  459.732474]  __arm64_compat_sys_aarch32_mmap2+0xd8/0xf0
[  459.732483]  el0_svc_common+0x1c0/0x3dc
[  459.732493]  el0_svc_compat_handler+0x88/0xd4
[  459.732502]  el0_svc_compat+0x8/0x2c


>
> I'm not sure how calling xhci_debugfs_create_endpoint() from
> xhci_add_endpoint() instead of xhci_check_bandwidth() helps.
>
> Both are called with hcd->bandwidth_mutex held:
>
> usb_set_interface()
>         mutex_lock(hcd->bandwidth_mutex);
>         usb_hcd_alloc_bandwidth()
>                 hcd->driver->add_endpoint()    -> xhci_add_endpoint()
>                 hcd->driver->check_bandwidth() -> xhci_check_bandwidth()
>         mutex_unlock(hcd->bandwidth_mutex);

Yep, I guess I was lucky not to be able to repro again :)

The locks involved are:

hcd->bandwidth_mutex
mm->mmap_sem
[uvc] queue->mutex



>
> Thanks
> Mathias
>
  
Mathias Nyman June 12, 2023, 2:30 p.m. UTC | #4
On 1.6.2023 19.05, Ricardo Ribalda wrote:
> Hi Mathias
> 
> On Thu, 1 Jun 2023 at 16:13, Mathias Nyman
> <mathias.nyman@linux.intel.com> wrote:
>>
>> Do you still have the lockdep output showing the deadlock?
> 
> [  459.731142] ======================================================
> [  459.731150] WARNING: possible circular locking dependency detected
> [  459.731161] 5.4.169-lockdep-17434-g505c8a10e6fe #1 Not tainted
> [  459.731168] ------------------------------------------------------
> [  459.731176] syz-executor.3/15308 is trying to acquire lock:
> [  459.731184] ffffff80c63e0ee0 (&queue->mutex){+.+.}, at:
> uvc_queue_mmap+0x30/0xa0 [uvcvideo]
> [  459.731226]
>                 but task is already holding lock:
> [  459.731232] ffffff80a748eea8 (&mm->mmap_sem){++++}, at:
> vm_mmap_pgoff+0x10c/0x1f4
> [  459.731255]
>                 which lock already depends on the new lock.
> 
...
> [  459.732148] Chain exists of:
>                   &queue->mutex --> &sb->s_type->i_mutex_key#4 --> &mm->mmap_sem
> 
> [  459.732165]  Possible unsafe locking scenario:
> 
> [  459.732172]        CPU0                    CPU1
> [  459.732178]        ----                    ----
> [  459.732184]   lock(&mm->mmap_sem);
> [  459.732193]                                lock(&sb->s_type->i_mutex_key#4);
> [  459.732204]                                lock(&mm->mmap_sem);
> [  459.732212]   lock(&queue->mutex);
> [  459.732221]
>                  *** DEADLOCK ***
> 
>>
>> I'm not sure how calling xhci_debugfs_create_endpoint() from
>> xhci_add_endpoint() instead of xhci_check_bandwidth() helps.
>>
>> Both are called with hcd->bandwidth_mutex held:
>>
>> usb_set_interface()
>>          mutex_lock(hcd->bandwidth_mutex);
>>          usb_hcd_alloc_bandwidth()
>>                  hcd->driver->add_endpoint()    -> xhci_add_endpoint()
>>                  hcd->driver->check_bandwidth() -> xhci_check_bandwidth()
>>          mutex_unlock(hcd->bandwidth_mutex);
> 
> Yep, I guess I was lucky not to be able to repro again :)
> 
> The locks involved are:
> 
> hcd->bandwidth_mutex
> mm->mmap_sem
> [uvc] queue->mutex
> 

Ok, took a look at this.
I don't think the bandwidth mutex matters that much.

To my understanding this is caused by the following lock chains:

ucv_queue_mmap()
   mmap_sem --> queue->mutex

uvc_ioctl_streamon() calling usb_set_interface() calling debugfs_create_dir()
   queue->mutex --> i_mutex_key

Some debugfs error case:
   i_mutex_key --> mmap_sem

So we could end up with this deadlock:
CPU0		CPU1		CPU2
mmap_sem	queue->mutex	i_mutex_key
  
waiting for	waiting for	waiting for
queue->mutex	i_mutex_key	mmap_sem	

I have no idea if this can be triggered in real life.

Looks like that requires a some specific debugfs error
to trigger at the same time we are creating a debugfs directory

Thanks
Mathias
  
Ricardo Ribalda June 17, 2023, 12:09 p.m. UTC | #5
Hi Mathias

Thanks for looking into this.  Relooking into the bug queue->mutex was
only bothering due to a downstream patch (that we are working on
upstreaming), so this should not affect upstream.

Sorry for  the noise :S

On Mon, 12 Jun 2023 at 16:28, Mathias Nyman
<mathias.nyman@linux.intel.com> wrote:
>
> On 1.6.2023 19.05, Ricardo Ribalda wrote:
> > Hi Mathias
> >
> > On Thu, 1 Jun 2023 at 16:13, Mathias Nyman
> > <mathias.nyman@linux.intel.com> wrote:
> >>
> >> Do you still have the lockdep output showing the deadlock?
> >
> > [  459.731142] ======================================================
> > [  459.731150] WARNING: possible circular locking dependency detected
> > [  459.731161] 5.4.169-lockdep-17434-g505c8a10e6fe #1 Not tainted
> > [  459.731168] ------------------------------------------------------
> > [  459.731176] syz-executor.3/15308 is trying to acquire lock:
> > [  459.731184] ffffff80c63e0ee0 (&queue->mutex){+.+.}, at:
> > uvc_queue_mmap+0x30/0xa0 [uvcvideo]
> > [  459.731226]
> >                 but task is already holding lock:
> > [  459.731232] ffffff80a748eea8 (&mm->mmap_sem){++++}, at:
> > vm_mmap_pgoff+0x10c/0x1f4
> > [  459.731255]
> >                 which lock already depends on the new lock.
> >
> ...
> > [  459.732148] Chain exists of:
> >                   &queue->mutex --> &sb->s_type->i_mutex_key#4 --> &mm->mmap_sem
> >
> > [  459.732165]  Possible unsafe locking scenario:
> >
> > [  459.732172]        CPU0                    CPU1
> > [  459.732178]        ----                    ----
> > [  459.732184]   lock(&mm->mmap_sem);
> > [  459.732193]                                lock(&sb->s_type->i_mutex_key#4);
> > [  459.732204]                                lock(&mm->mmap_sem);
> > [  459.732212]   lock(&queue->mutex);
> > [  459.732221]
> >                  *** DEADLOCK ***
> >
> >>
> >> I'm not sure how calling xhci_debugfs_create_endpoint() from
> >> xhci_add_endpoint() instead of xhci_check_bandwidth() helps.
> >>
> >> Both are called with hcd->bandwidth_mutex held:
> >>
> >> usb_set_interface()
> >>          mutex_lock(hcd->bandwidth_mutex);
> >>          usb_hcd_alloc_bandwidth()
> >>                  hcd->driver->add_endpoint()    -> xhci_add_endpoint()
> >>                  hcd->driver->check_bandwidth() -> xhci_check_bandwidth()
> >>          mutex_unlock(hcd->bandwidth_mutex);
> >
> > Yep, I guess I was lucky not to be able to repro again :)
> >
> > The locks involved are:
> >
> > hcd->bandwidth_mutex
> > mm->mmap_sem
> > [uvc] queue->mutex
> >
>
> Ok, took a look at this.
> I don't think the bandwidth mutex matters that much.
>
> To my understanding this is caused by the following lock chains:
>
> ucv_queue_mmap()
>    mmap_sem --> queue->mutex
>
> uvc_ioctl_streamon() calling usb_set_interface() calling debugfs_create_dir()
>    queue->mutex --> i_mutex_key
>
> Some debugfs error case:
>    i_mutex_key --> mmap_sem
>
> So we could end up with this deadlock:
> CPU0            CPU1            CPU2
> mmap_sem        queue->mutex    i_mutex_key
>
> waiting for     waiting for     waiting for
> queue->mutex    i_mutex_key     mmap_sem
>
> I have no idea if this can be triggered in real life.
>
> Looks like that requires a some specific debugfs error
> to trigger at the same time we are creating a debugfs directory
>
> Thanks
> Mathias
>
  

Patch

diff --git a/drivers/usb/host/xhci-debugfs.c b/drivers/usb/host/xhci-debugfs.c
index 99baa60ef50f..2acce2af2ca9 100644
--- a/drivers/usb/host/xhci-debugfs.c
+++ b/drivers/usb/host/xhci-debugfs.c
@@ -238,6 +238,10 @@  static int xhci_ring_open(struct inode *inode, struct file *file)
 	int			i;
 	struct xhci_file_map	*f_map;
 	const char		*file_name = file_dentry(file)->d_iname;
+	struct xhci_ring	*ring = *(struct xhci_ring **)inode->i_private;
+
+	if (!ring)
+		return -EAGAIN;
 
 	for (i = 0; i < ARRAY_SIZE(ring_files); i++) {
 		f_map = &ring_files[i];
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 78790dc13c5f..2715900b2540 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -1930,6 +1930,8 @@  int xhci_add_endpoint(struct usb_hcd *hcd, struct usb_device *udev,
 	ep_ctx = xhci_get_ep_ctx(xhci, virt_dev->in_ctx, ep_index);
 	trace_xhci_add_endpoint(ep_ctx);
 
+	xhci_debugfs_create_endpoint(xhci, virt_dev, ep_index);
+
 	xhci_dbg(xhci, "add ep 0x%x, slot id %d, new drop flags = %#x, new add flags = %#x\n",
 			(unsigned int) ep->desc.bEndpointAddress,
 			udev->slot_id,
@@ -2963,7 +2965,6 @@  int xhci_check_bandwidth(struct usb_hcd *hcd, struct usb_device *udev)
 		xhci_check_bw_drop_ep_streams(xhci, virt_dev, i);
 		virt_dev->eps[i].ring = virt_dev->eps[i].new_ring;
 		virt_dev->eps[i].new_ring = NULL;
-		xhci_debugfs_create_endpoint(xhci, virt_dev, i);
 	}
 command_cleanup:
 	kfree(command->completion);
@@ -2989,7 +2990,6 @@  void xhci_reset_bandwidth(struct usb_hcd *hcd, struct usb_device *udev)
 	/* Free any rings allocated for added endpoints */
 	for (i = 0; i < 31; i++) {
 		if (virt_dev->eps[i].new_ring) {
-			xhci_debugfs_remove_endpoint(xhci, virt_dev, i);
 			xhci_ring_free(xhci, virt_dev->eps[i].new_ring);
 			virt_dev->eps[i].new_ring = NULL;
 		}