[v1,0/2] usb: gadget: uvc: stability fixes when stopping streams

Message ID 20230912041910.726442-1-arakesh@google.com
Headers
Series usb: gadget: uvc: stability fixes when stopping streams |

Message

Avichal Rakesh Sept. 12, 2023, 4:19 a.m. UTC
  We have been seeing two main bugs when stopping stream:
1. attempting to queue usb_requests on a disabled usb endpoint, and
2. use-after-free problems for inflight requests

Avichal Rakesh (2):
  usb: gadget: uvc: prevent use of disabled endpoint
  usb: gadget: uvc: prevent de-allocating inflight usb_requests

 drivers/usb/gadget/function/f_uvc.c     | 11 ++++----
 drivers/usb/gadget/function/f_uvc.h     |  2 +-
 drivers/usb/gadget/function/uvc.h       |  5 +++-
 drivers/usb/gadget/function/uvc_v4l2.c  | 21 ++++++++++++---
 drivers/usb/gadget/function/uvc_video.c | 34 +++++++++++++++++++++++--
 5 files changed, 60 insertions(+), 13 deletions(-)

--
2.42.0.283.g2d96d420d3-goog
  

Comments

Avichal Rakesh Sept. 14, 2023, 11:05 p.m. UTC | #1
On Mon, Sep 11, 2023 at 9:19 PM Avichal Rakesh <arakesh@google.com> wrote:
>
> We have been seeing two main bugs when stopping stream:
> 1. attempting to queue usb_requests on a disabled usb endpoint, and
> 2. use-after-free problems for inflight requests
>
> Avichal Rakesh (2):
>   usb: gadget: uvc: prevent use of disabled endpoint
>   usb: gadget: uvc: prevent de-allocating inflight usb_requests
>
>  drivers/usb/gadget/function/f_uvc.c     | 11 ++++----
>  drivers/usb/gadget/function/f_uvc.h     |  2 +-
>  drivers/usb/gadget/function/uvc.h       |  5 +++-
>  drivers/usb/gadget/function/uvc_v4l2.c  | 21 ++++++++++++---
>  drivers/usb/gadget/function/uvc_video.c | 34 +++++++++++++++++++++++--
>  5 files changed, 60 insertions(+), 13 deletions(-)
>

Bumping this thread up. Laurent, Dan, and Michael could you take a look?

Thank you!
- Avi.
  
Michael Grzeschik Sept. 15, 2023, 11:16 p.m. UTC | #2
Hi Avichal

On Thu, Sep 14, 2023 at 04:05:36PM -0700, Avichal Rakesh wrote:
>On Mon, Sep 11, 2023 at 9:19 PM Avichal Rakesh <arakesh@google.com> wrote:
>>
>> We have been seeing two main bugs when stopping stream:
>> 1. attempting to queue usb_requests on a disabled usb endpoint, and
>> 2. use-after-free problems for inflight requests
>>
>> Avichal Rakesh (2):
>>   usb: gadget: uvc: prevent use of disabled endpoint
>>   usb: gadget: uvc: prevent de-allocating inflight usb_requests
>>
>>  drivers/usb/gadget/function/f_uvc.c     | 11 ++++----
>>  drivers/usb/gadget/function/f_uvc.h     |  2 +-
>>  drivers/usb/gadget/function/uvc.h       |  5 +++-
>>  drivers/usb/gadget/function/uvc_v4l2.c  | 21 ++++++++++++---
>>  drivers/usb/gadget/function/uvc_video.c | 34 +++++++++++++++++++++++--
>>  5 files changed, 60 insertions(+), 13 deletions(-)
>>
>
>Bumping this thread up. Laurent, Dan, and Michael could you take a look?

I tested the patches against my setup and it did not help.

In fact I saw two different issues when calling the streamoff event.

One issue was a stalled pipeline after the streamoff from the host came in.
The streaming application did not handle any events anymore.

The second issue was when the streamoff event is triggered sometimes the
following trace is shown, even with your patches applied.


[  104.202689] Unable to handle kernel paging request at virtual address 005bf43a692a5fd5
[  104.235122] Mem abort info:
[  104.238257]   ESR = 0x0000000096000004
[  104.242449]   EC = 0x25: DABT (current EL), IL = 32 bits
[  104.248391]   SET = 0, FnV = 0
[  104.251803]   EA = 0, S1PTW = 0
[  104.255313]   FSC = 0x04: level 0 translation fault
[  104.260765] Data abort info:
[  104.263982]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[  104.270114]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  104.275760]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  104.281698] [005bf43a692a5fd5] address between user and kernel address ranges
[  104.290042] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[  104.297060] Dumping ftrace buffer:
[  104.300869]    (ftrace buffer empty)
[  104.304862] Modules linked in: st1232 hantro_vpu v4l2_vp9 v4l2_h264 uio_pdrv_genirq fuse [last unloaded: rockchip_vpu(C)]
[  104.312080] panfrost fde60000.gpu: Panfrost Dump: BO has no sgt, cannot dump
[  104.317137] CPU: 0 PID: 465 Comm: irq/46-dwc3 Tainted: G         C         6.5.0-20230831-2+ #5
[  104.317144] Hardware name: WolfVision PF5 (DT)
[  104.317148] pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  104.317154] pc : __list_del_entry_valid+0x48/0xe8
[  104.352728] lr : dwc3_gadget_giveback+0x3c/0x1b0
[  104.357893] sp : ffffffc08381bc60
[  104.361593] x29: ffffffc08381bc60 x28: ffffff80047d4000 x27: ffffff80047de440
[  104.369576] x26: 0000000000000000 x25: ffffffc08135b2d0 x24: ffffffc08381bd00
[  104.377559] x23: 00000000ffffff98 x22: ffffff8004204880 x21: ffffff80047d4000
[  104.385541] x20: ffffff800718dea0 x19: ffffff800718dea0 x18: 0000000000000000
[  104.393523] x17: 7461747320687469 x16: 7720646574656c70 x15: 6d6f632074736575
[  104.401504] x14: 716572205356203a x13: 2e3430312d207375 x12: 7461747320687469
[  104.409486] x11: ffffffc0815c98f0 x10: 0000000000000000 x9 : ffffffc0808f4fa0
[  104.417468] x8 : ffffffc082415000 x7 : ffffffc0808f4e2c x6 : ffffffc0823d0928
[  104.425450] x5 : 0000000000000282 x4 : 0000000000000201 x3 : d85bf43a692a5fcd
[  104.433431] x2 : ffffff80047d4048 x1 : ffffff800718dea0 x0 : dead000000000122
[  104.441413] Call trace:
[  104.444142]  __list_del_entry_valid+0x48/0xe8
[  104.449013]  dwc3_gadget_giveback+0x3c/0x1b0
[  104.453786]  dwc3_gadget_ep_cleanup_cancelled_requests+0xe0/0x170
[  104.460599]  dwc3_process_event_buf+0x2a8/0xbb0
[  104.465662]  dwc3_thread_interrupt+0x4c/0x90
[  104.470435]  irq_thread_fn+0x34/0xb8
[  104.474431]  irq_thread+0x1a0/0x290
[  104.478327]  kthread+0x10c/0x120
[  104.481933]  ret_from_fork+0x10/0x20

The error path triggering these list errors are usually in the
dwc3 driver handling the cancelled or completed list.

Regards,
Michael
  
Avichal Rakesh Sept. 16, 2023, 1:35 a.m. UTC | #3
On 9/15/23 16:16, Michael Grzeschik wrote:
> Hi Avichal
> 
> On Thu, Sep 14, 2023 at 04:05:36PM -0700, Avichal Rakesh wrote:
>> On Mon, Sep 11, 2023 at 9:19 PM Avichal Rakesh <arakesh@google.com> wrote:
>>>
>>> We have been seeing two main bugs when stopping stream:
>>> 1. attempting to queue usb_requests on a disabled usb endpoint, and
>>> 2. use-after-free problems for inflight requests
>>>
>>> Avichal Rakesh (2):
>>>   usb: gadget: uvc: prevent use of disabled endpoint
>>>   usb: gadget: uvc: prevent de-allocating inflight usb_requests
>>>
>>>  drivers/usb/gadget/function/f_uvc.c     | 11 ++++----
>>>  drivers/usb/gadget/function/f_uvc.h     |  2 +-
>>>  drivers/usb/gadget/function/uvc.h       |  5 +++-
>>>  drivers/usb/gadget/function/uvc_v4l2.c  | 21 ++++++++++++---
>>>  drivers/usb/gadget/function/uvc_video.c | 34 +++++++++++++++++++++++--
>>>  5 files changed, 60 insertions(+), 13 deletions(-)
>>>
>>
>> Bumping this thread up. Laurent, Dan, and Michael could you take a look?
> 
> I tested the patches against my setup and it did not help.

Thank you for testing the patch, I really appreciate your help in testing the 
patches!

> 
> In fact I saw two different issues when calling the streamoff event.
> 
> One issue was a stalled pipeline after the streamoff from the host came in.
> The streaming application did not handle any events anymore.

This sounds like uvc_function_setup_continue was never called, so no more control
events were queued for the userspace to handle. This is a little bit of a shot in
the dark: if you are not using the Laurent's uvc-gadget
(https://git.ideasonboard.org/uvc-gadget.git) on the gadget, could you check that 
your userspace application (on the gadget side) calls VIDIOC_STREAMOFF when 
handling UVC_EVENT_STREAMOFF? 

This is similar to how it should called VIDIOC_STREAMON when handling 
UVC_EVENT_STREAMON. Before my patch, I think UVC gadget driver is functionally fine
with userspace application not calling VIDIOC_STREAMOFF. However, my patch prevents
the host from making any more control requests until the gadget's userspace
application calls VIDIOC_STREAMOFF, which looks like a stalled control ep.

> 
> The second issue was when the streamoff event is triggered sometimes the
> following trace is shown, even with your patches applied.
> 
> 
> [  104.202689] Unable to handle kernel paging request at virtual address 005bf43a692a5fd5
> [  104.235122] Mem abort info:
> [  104.238257]   ESR = 0x0000000096000004
> [  104.242449]   EC = 0x25: DABT (current EL), IL = 32 bits
> [  104.248391]   SET = 0, FnV = 0
> [  104.251803]   EA = 0, S1PTW = 0
> [  104.255313]   FSC = 0x04: level 0 translation fault
> [  104.260765] Data abort info:
> [  104.263982]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> [  104.270114]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [  104.275760]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [  104.281698] [005bf43a692a5fd5] address between user and kernel address ranges
> [  104.290042] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> [  104.297060] Dumping ftrace buffer:
> [  104.300869]    (ftrace buffer empty)
> [  104.304862] Modules linked in: st1232 hantro_vpu v4l2_vp9 v4l2_h264 uio_pdrv_genirq fuse [last unloaded: rockchip_vpu(C)]
> [  104.312080] panfrost fde60000.gpu: Panfrost Dump: BO has no sgt, cannot dump
> [  104.317137] CPU: 0 PID: 465 Comm: irq/46-dwc3 Tainted: G         C         6.5.0-20230831-2+ #5
> [  104.317144] Hardware name: WolfVision PF5 (DT)
> [  104.317148] pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  104.317154] pc : __list_del_entry_valid+0x48/0xe8
> [  104.352728] lr : dwc3_gadget_giveback+0x3c/0x1b0
> [  104.357893] sp : ffffffc08381bc60
> [  104.361593] x29: ffffffc08381bc60 x28: ffffff80047d4000 x27: ffffff80047de440
> [  104.369576] x26: 0000000000000000 x25: ffffffc08135b2d0 x24: ffffffc08381bd00
> [  104.377559] x23: 00000000ffffff98 x22: ffffff8004204880 x21: ffffff80047d4000
> [  104.385541] x20: ffffff800718dea0 x19: ffffff800718dea0 x18: 0000000000000000
> [  104.393523] x17: 7461747320687469 x16: 7720646574656c70 x15: 6d6f632074736575
> [  104.401504] x14: 716572205356203a x13: 2e3430312d207375 x12: 7461747320687469
> [  104.409486] x11: ffffffc0815c98f0 x10: 0000000000000000 x9 : ffffffc0808f4fa0
> [  104.417468] x8 : ffffffc082415000 x7 : ffffffc0808f4e2c x6 : ffffffc0823d0928
> [  104.425450] x5 : 0000000000000282 x4 : 0000000000000201 x3 : d85bf43a692a5fcd
> [  104.433431] x2 : ffffff80047d4048 x1 : ffffff800718dea0 x0 : dead000000000122
> [  104.441413] Call trace:
> [  104.444142]  __list_del_entry_valid+0x48/0xe8
> [  104.449013]  dwc3_gadget_giveback+0x3c/0x1b0
> [  104.453786]  dwc3_gadget_ep_cleanup_cancelled_requests+0xe0/0x170
> [  104.460599]  dwc3_process_event_buf+0x2a8/0xbb0
> [  104.465662]  dwc3_thread_interrupt+0x4c/0x90
> [  104.470435]  irq_thread_fn+0x34/0xb8
> [  104.474431]  irq_thread+0x1a0/0x290
> [  104.478327]  kthread+0x10c/0x120
> [  104.481933]  ret_from_fork+0x10/0x20
> 
> The error path triggering these list errors are usually in the
> dwc3 driver handling the cancelled or completed list.

It looks like we're still freeing un-returned requests :(. If you still have
the setup can you pull the uvc logs to see if waiting for requests to be returned timed 
out? I wonder if dwc3's interrupt handler is being scheduled too late. 500ms seemed
like a reasonable time out to me, but this seems to prove otherwise.

Thank you!
- Avi.
  
Avichal Rakesh Sept. 19, 2023, 6:24 p.m. UTC | #4
On 9/15/23 18:35, Avichal Rakesh wrote:
> 
> 
> On 9/15/23 16:16, Michael Grzeschik wrote:
>> Hi Avichal
>>
>> On Thu, Sep 14, 2023 at 04:05:36PM -0700, Avichal Rakesh wrote:
>>> On Mon, Sep 11, 2023 at 9:19 PM Avichal Rakesh <arakesh@google.com> wrote:
>>>>
>>>> We have been seeing two main bugs when stopping stream:
>>>> 1. attempting to queue usb_requests on a disabled usb endpoint, and
>>>> 2. use-after-free problems for inflight requests
>>>>
>>>> Avichal Rakesh (2):
>>>>   usb: gadget: uvc: prevent use of disabled endpoint
>>>>   usb: gadget: uvc: prevent de-allocating inflight usb_requests
>>>>
>>>>  drivers/usb/gadget/function/f_uvc.c     | 11 ++++----
>>>>  drivers/usb/gadget/function/f_uvc.h     |  2 +-
>>>>  drivers/usb/gadget/function/uvc.h       |  5 +++-
>>>>  drivers/usb/gadget/function/uvc_v4l2.c  | 21 ++++++++++++---
>>>>  drivers/usb/gadget/function/uvc_video.c | 34 +++++++++++++++++++++++--
>>>>  5 files changed, 60 insertions(+), 13 deletions(-)
>>>>
>>>
>>> Bumping this thread up. Laurent, Dan, and Michael could you take a look?
>>
>> I tested the patches against my setup and it did not help.
> 
> Thank you for testing the patch, I really appreciate your help in testing the 
> patches!
> 
>>
>> In fact I saw two different issues when calling the streamoff event.
>>
>> One issue was a stalled pipeline after the streamoff from the host came in.
>> The streaming application did not handle any events anymore.
> 
> This sounds like uvc_function_setup_continue was never called, so no more control
> events were queued for the userspace to handle. This is a little bit of a shot in
> the dark: if you are not using the Laurent's uvc-gadget
> (https://git.ideasonboard.org/uvc-gadget.git) on the gadget, could you check that 
> your userspace application (on the gadget side) calls VIDIOC_STREAMOFF when 
> handling UVC_EVENT_STREAMOFF? 
> 
> This is similar to how it should called VIDIOC_STREAMON when handling 
> UVC_EVENT_STREAMON. Before my patch, I think UVC gadget driver is functionally fine
> with userspace application not calling VIDIOC_STREAMOFF. However, my patch prevents
> the host from making any more control requests until the gadget's userspace
> application calls VIDIOC_STREAMOFF, which looks like a stalled control ep.
> 
>>
>> The second issue was when the streamoff event is triggered sometimes the
>> following trace is shown, even with your patches applied.
>>
>>
>> [  104.202689] Unable to handle kernel paging request at virtual address 005bf43a692a5fd5
>> [  104.235122] Mem abort info:
>> [  104.238257]   ESR = 0x0000000096000004
>> [  104.242449]   EC = 0x25: DABT (current EL), IL = 32 bits
>> [  104.248391]   SET = 0, FnV = 0
>> [  104.251803]   EA = 0, S1PTW = 0
>> [  104.255313]   FSC = 0x04: level 0 translation fault
>> [  104.260765] Data abort info:
>> [  104.263982]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>> [  104.270114]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>> [  104.275760]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>> [  104.281698] [005bf43a692a5fd5] address between user and kernel address ranges
>> [  104.290042] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>> [  104.297060] Dumping ftrace buffer:
>> [  104.300869]    (ftrace buffer empty)
>> [  104.304862] Modules linked in: st1232 hantro_vpu v4l2_vp9 v4l2_h264 uio_pdrv_genirq fuse [last unloaded: rockchip_vpu(C)]
>> [  104.312080] panfrost fde60000.gpu: Panfrost Dump: BO has no sgt, cannot dump
>> [  104.317137] CPU: 0 PID: 465 Comm: irq/46-dwc3 Tainted: G         C         6.5.0-20230831-2+ #5
>> [  104.317144] Hardware name: WolfVision PF5 (DT)
>> [  104.317148] pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> [  104.317154] pc : __list_del_entry_valid+0x48/0xe8
>> [  104.352728] lr : dwc3_gadget_giveback+0x3c/0x1b0
>> [  104.357893] sp : ffffffc08381bc60
>> [  104.361593] x29: ffffffc08381bc60 x28: ffffff80047d4000 x27: ffffff80047de440
>> [  104.369576] x26: 0000000000000000 x25: ffffffc08135b2d0 x24: ffffffc08381bd00
>> [  104.377559] x23: 00000000ffffff98 x22: ffffff8004204880 x21: ffffff80047d4000
>> [  104.385541] x20: ffffff800718dea0 x19: ffffff800718dea0 x18: 0000000000000000
>> [  104.393523] x17: 7461747320687469 x16: 7720646574656c70 x15: 6d6f632074736575
>> [  104.401504] x14: 716572205356203a x13: 2e3430312d207375 x12: 7461747320687469
>> [  104.409486] x11: ffffffc0815c98f0 x10: 0000000000000000 x9 : ffffffc0808f4fa0
>> [  104.417468] x8 : ffffffc082415000 x7 : ffffffc0808f4e2c x6 : ffffffc0823d0928
>> [  104.425450] x5 : 0000000000000282 x4 : 0000000000000201 x3 : d85bf43a692a5fcd
>> [  104.433431] x2 : ffffff80047d4048 x1 : ffffff800718dea0 x0 : dead000000000122
>> [  104.441413] Call trace:
>> [  104.444142]  __list_del_entry_valid+0x48/0xe8
>> [  104.449013]  dwc3_gadget_giveback+0x3c/0x1b0
>> [  104.453786]  dwc3_gadget_ep_cleanup_cancelled_requests+0xe0/0x170
>> [  104.460599]  dwc3_process_event_buf+0x2a8/0xbb0
>> [  104.465662]  dwc3_thread_interrupt+0x4c/0x90
>> [  104.470435]  irq_thread_fn+0x34/0xb8
>> [  104.474431]  irq_thread+0x1a0/0x290
>> [  104.478327]  kthread+0x10c/0x120
>> [  104.481933]  ret_from_fork+0x10/0x20
>>
>> The error path triggering these list errors are usually in the
>> dwc3 driver handling the cancelled or completed list.
> 
> It looks like we're still freeing un-returned requests :(. If you still have
> the setup can you pull the uvc logs to see if waiting for requests to be returned timed 
> out? I wonder if dwc3's interrupt handler is being scheduled too late. 500ms seemed
> like a reasonable time out to me, but this seems to prove otherwise.
> 


Hey Michael, were you able to look into the comments from the previous
email? 

Avi.
  
Avichal Rakesh Sept. 27, 2023, 8:47 p.m. UTC | #5
On 9/19/23 11:24, Avichal Rakesh wrote:
> 
> 
> On 9/15/23 18:35, Avichal Rakesh wrote:
>>
>>
>> On 9/15/23 16:16, Michael Grzeschik wrote:
>>> Hi Avichal
>>>
>>> On Thu, Sep 14, 2023 at 04:05:36PM -0700, Avichal Rakesh wrote:
>>>> On Mon, Sep 11, 2023 at 9:19 PM Avichal Rakesh <arakesh@google.com> wrote:
>>>>>
>>>>> We have been seeing two main bugs when stopping stream:
>>>>> 1. attempting to queue usb_requests on a disabled usb endpoint, and
>>>>> 2. use-after-free problems for inflight requests
>>>>>

>>>
>>> The error path triggering these list errors are usually in the
>>> dwc3 driver handling the cancelled or completed list.
>>
>> It looks like we're still freeing un-returned requests :(. If you still have
>> the setup can you pull the uvc logs to see if waiting for requests to be returned timed 
>> out? I wonder if dwc3's interrupt handler is being scheduled too late. 500ms seemed
>> like a reasonable time out to me, but this seems to prove otherwise.
>>
> 
> 
> Hey Michael, were you able to look into the comments from the previous
> email? 
> 

Bumping the thread up. Laurent, Dan, and Michael: the patches are ready to
review/test. Please take a look when possible. 

Thank you!
Avi.