[RESEND] xhci: Keep interrupt disabled in initialization until host is running.

Message ID 1696847966-27555-1-git-send-email-quic_prashk@quicinc.com
State New
Headers
Series [RESEND] xhci: Keep interrupt disabled in initialization until host is running. |

Commit Message

Prashanth K Oct. 9, 2023, 10:39 a.m. UTC
  From: Hongyu Xie <xy521521@gmail.com>

[ Upstream commit a808925075fb750804a60ff0710614466c396db4 ]

irq is disabled in xhci_quiesce(called by xhci_halt, with bit:2 cleared
in USBCMD register), but xhci_run(called by usb_add_hcd) re-enable it.
It's possible that you will receive thousands of interrupt requests
after initialization for 2.0 roothub. And you will get a lot of
warning like, "xHCI dying, ignoring interrupt. Shouldn't IRQs be
disabled?". This amount of interrupt requests will cause the entire
system to freeze.
This problem was first found on a device with ASM2142 host controller
on it.

[tidy up old code while moving it, reword header -Mathias]

Cc: stable@kernel.org
Signed-off-by: Hongyu Xie <xiehongyu1@kylinos.cn>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20220623111945.1557702-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org> # 5.15
Signed-off-by: Prashanth K <quic_prashk@quicinc.com>
---
 drivers/usb/host/xhci.c | 34 +++++++++++++++++++++-------------
 1 file changed, 21 insertions(+), 13 deletions(-)
  

Comments

Greg KH Oct. 9, 2023, 12:52 p.m. UTC | #1
On Mon, Oct 09, 2023 at 04:09:26PM +0530, Prashanth K wrote:
> From: Hongyu Xie <xy521521@gmail.com>
> 
> [ Upstream commit a808925075fb750804a60ff0710614466c396db4 ]
> 
> irq is disabled in xhci_quiesce(called by xhci_halt, with bit:2 cleared
> in USBCMD register), but xhci_run(called by usb_add_hcd) re-enable it.
> It's possible that you will receive thousands of interrupt requests
> after initialization for 2.0 roothub. And you will get a lot of
> warning like, "xHCI dying, ignoring interrupt. Shouldn't IRQs be
> disabled?". This amount of interrupt requests will cause the entire
> system to freeze.
> This problem was first found on a device with ASM2142 host controller
> on it.
> 
> [tidy up old code while moving it, reword header -Mathias]
> 
> Cc: stable@kernel.org
> Signed-off-by: Hongyu Xie <xiehongyu1@kylinos.cn>
> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
> Link: https://lore.kernel.org/r/20220623111945.1557702-2-mathias.nyman@linux.intel.com
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: <stable@vger.kernel.org> # 5.15
> Signed-off-by: Prashanth K <quic_prashk@quicinc.com>
> ---

Any specific reason you missed adding the extra blank line in this
version of the backport that the original added?  That is going to cause
problems in the future if other patches are added on top of this that
would be expecting it because it is that way in Linus's tree.

And why is this only relevant for 5.15.y?

thanks,

greg k-h
  
Prashanth K Oct. 10, 2023, 9:04 a.m. UTC | #2
On 09-10-23 06:22 pm, Greg Kroah-Hartman wrote:
> On Mon, Oct 09, 2023 at 04:09:26PM +0530, Prashanth K wrote:
>> From: Hongyu Xie <xy521521@gmail.com>
>>
>> [ Upstream commit a808925075fb750804a60ff0710614466c396db4 ]
>>
>> irq is disabled in xhci_quiesce(called by xhci_halt, with bit:2 cleared
>> in USBCMD register), but xhci_run(called by usb_add_hcd) re-enable it.
>> It's possible that you will receive thousands of interrupt requests
>> after initialization for 2.0 roothub. And you will get a lot of
>> warning like, "xHCI dying, ignoring interrupt. Shouldn't IRQs be
>> disabled?". This amount of interrupt requests will cause the entire
>> system to freeze.
>> This problem was first found on a device with ASM2142 host controller
>> on it.
>>
>> [tidy up old code while moving it, reword header -Mathias]
>>
>> Cc: stable@kernel.org
>> Signed-off-by: Hongyu Xie <xiehongyu1@kylinos.cn>
>> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
>> Link: https://lore.kernel.org/r/20220623111945.1557702-2-mathias.nyman@linux.intel.com
>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: <stable@vger.kernel.org> # 5.15
>> Signed-off-by: Prashanth K <quic_prashk@quicinc.com>
>> ---
> 
> Any specific reason you missed adding the extra blank line in this
> version of the backport that the original added?  That is going to cause
> problems in the future if other patches are added on top of this that
> would be expecting it because it is that way in Linus's tree.
> 

Thanks for pointing out, i removed it while resolving some merge 
conflicts. Will add it back in next version.

> And why is this only relevant for 5.15.y?

I'm not really sure why this was only ported from 5.19 onwards and not 
present in older kernels (could be because of dependencies/conflicts).

But anyways im backporting it to 5.15 since an irq storm was seen on a 
qcom SOC working on 5.15, and this patch is helping solve it.

Should I change the CC to just stable kernel (without mentioning kernel 
version) ?
something like this -- Cc: <stable@vger.kernel.org>

Regards.
Prashanth K
  
Greg KH Oct. 10, 2023, 11:18 a.m. UTC | #3
On Tue, Oct 10, 2023 at 02:34:44PM +0530, Prashanth K wrote:
> 
> 
> On 09-10-23 06:22 pm, Greg Kroah-Hartman wrote:
> > On Mon, Oct 09, 2023 at 04:09:26PM +0530, Prashanth K wrote:
> > > From: Hongyu Xie <xy521521@gmail.com>
> > > 
> > > [ Upstream commit a808925075fb750804a60ff0710614466c396db4 ]
> > > 
> > > irq is disabled in xhci_quiesce(called by xhci_halt, with bit:2 cleared
> > > in USBCMD register), but xhci_run(called by usb_add_hcd) re-enable it.
> > > It's possible that you will receive thousands of interrupt requests
> > > after initialization for 2.0 roothub. And you will get a lot of
> > > warning like, "xHCI dying, ignoring interrupt. Shouldn't IRQs be
> > > disabled?". This amount of interrupt requests will cause the entire
> > > system to freeze.
> > > This problem was first found on a device with ASM2142 host controller
> > > on it.
> > > 
> > > [tidy up old code while moving it, reword header -Mathias]
> > > 
> > > Cc: stable@kernel.org
> > > Signed-off-by: Hongyu Xie <xiehongyu1@kylinos.cn>
> > > Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
> > > Link: https://lore.kernel.org/r/20220623111945.1557702-2-mathias.nyman@linux.intel.com
> > > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > Cc: <stable@vger.kernel.org> # 5.15
> > > Signed-off-by: Prashanth K <quic_prashk@quicinc.com>
> > > ---
> > 
> > Any specific reason you missed adding the extra blank line in this
> > version of the backport that the original added?  That is going to cause
> > problems in the future if other patches are added on top of this that
> > would be expecting it because it is that way in Linus's tree.
> > 
> 
> Thanks for pointing out, i removed it while resolving some merge conflicts.
> Will add it back in next version.
> 
> > And why is this only relevant for 5.15.y?
> 
> I'm not really sure why this was only ported from 5.19 onwards and not
> present in older kernels (could be because of dependencies/conflicts).
> 
> But anyways im backporting it to 5.15 since an irq storm was seen on a qcom
> SOC working on 5.15, and this patch is helping solve it.
> 
> Should I change the CC to just stable kernel (without mentioning kernel
> version) ?
> something like this -- Cc: <stable@vger.kernel.org>

No, let us know what kernel version this is to be applied to so we know,
if you only think this is relevant for 5.15.y as you have tested it
there, that's fine, I just wanted to be sure.

thanks,

greg k-h
  
Prashanth K Oct. 11, 2023, 6:24 a.m. UTC | #4
On 10-10-23 04:48 pm, Greg Kroah-Hartman wrote:
> On Tue, Oct 10, 2023 at 02:34:44PM +0530, Prashanth K wrote:
>>
>>
>> On 09-10-23 06:22 pm, Greg Kroah-Hartman wrote:
>>> On Mon, Oct 09, 2023 at 04:09:26PM +0530, Prashanth K wrote:
>>>> From: Hongyu Xie <xy521521@gmail.com>
>>>>
>>>> [ Upstream commit a808925075fb750804a60ff0710614466c396db4 ]
>>>>
>>>> irq is disabled in xhci_quiesce(called by xhci_halt, with bit:2 cleared
>>>> in USBCMD register), but xhci_run(called by usb_add_hcd) re-enable it.
>>>> It's possible that you will receive thousands of interrupt requests
>>>> after initialization for 2.0 roothub. And you will get a lot of
>>>> warning like, "xHCI dying, ignoring interrupt. Shouldn't IRQs be
>>>> disabled?". This amount of interrupt requests will cause the entire
>>>> system to freeze.
>>>> This problem was first found on a device with ASM2142 host controller
>>>> on it.
>>>>
>>>> [tidy up old code while moving it, reword header -Mathias]
>>>>
>>>> Cc: stable@kernel.org
>>>> Signed-off-by: Hongyu Xie <xiehongyu1@kylinos.cn>
>>>> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
>>>> Link: https://lore.kernel.org/r/20220623111945.1557702-2-mathias.nyman@linux.intel.com
>>>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>> Cc: <stable@vger.kernel.org> # 5.15
>>>> Signed-off-by: Prashanth K <quic_prashk@quicinc.com>
>>>> ---
>>>
>>> Any specific reason you missed adding the extra blank line in this
>>> version of the backport that the original added?  That is going to cause
>>> problems in the future if other patches are added on top of this that
>>> would be expecting it because it is that way in Linus's tree.
>>>
>>
>> Thanks for pointing out, i removed it while resolving some merge conflicts.
>> Will add it back in next version.
>>
>>> And why is this only relevant for 5.15.y?
>>
>> I'm not really sure why this was only ported from 5.19 onwards and not
>> present in older kernels (could be because of dependencies/conflicts).
>>
>> But anyways im backporting it to 5.15 since an irq storm was seen on a qcom
>> SOC working on 5.15, and this patch is helping solve it.
>>
>> Should I change the CC to just stable kernel (without mentioning kernel
>> version) ?
>> something like this -- Cc: <stable@vger.kernel.org>
> 
> No, let us know what kernel version this is to be applied to so we know,
> if you only think this is relevant for 5.15.y as you have tested it
> there, that's fine, I just wanted to be sure.

We tested it on 5.15 for over 20 hours and didn't see any issue. Will 
send a new patch after adding the newline.

Thanks,
Prashanth K
  

Patch

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 541fe4d..7ee747e 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -607,8 +607,27 @@  static int xhci_init(struct usb_hcd *hcd)
 
 static int xhci_run_finished(struct xhci_hcd *xhci)
 {
+	unsigned long	flags;
+	u32		temp;
+
+	/*
+	 * Enable interrupts before starting the host (xhci 4.2 and 5.5.2).
+	 * Protect the short window before host is running with a lock
+	 */
+	spin_lock_irqsave(&xhci->lock, flags);
+
+	xhci_dbg_trace(xhci, trace_xhci_dbg_init, "Enable interrupts");
+	temp = readl(&xhci->op_regs->command);
+	temp |= (CMD_EIE);
+	writel(temp, &xhci->op_regs->command);
+
+	xhci_dbg_trace(xhci, trace_xhci_dbg_init, "Enable primary interrupter");
+	temp = readl(&xhci->ir_set->irq_pending);
+	writel(ER_IRQ_ENABLE(temp), &xhci->ir_set->irq_pending);
+
 	if (xhci_start(xhci)) {
 		xhci_halt(xhci);
+		spin_unlock_irqrestore(&xhci->lock, flags);
 		return -ENODEV;
 	}
 	xhci->shared_hcd->state = HC_STATE_RUNNING;
@@ -619,6 +638,8 @@  static int xhci_run_finished(struct xhci_hcd *xhci)
 
 	xhci_dbg_trace(xhci, trace_xhci_dbg_init,
 			"Finished xhci_run for USB3 roothub");
+
+	spin_unlock_irqrestore(&xhci->lock, flags);
 	return 0;
 }
 
@@ -667,19 +688,6 @@  int xhci_run(struct usb_hcd *hcd)
 	temp |= (xhci->imod_interval / 250) & ER_IRQ_INTERVAL_MASK;
 	writel(temp, &xhci->ir_set->irq_control);
 
-	/* Set the HCD state before we enable the irqs */
-	temp = readl(&xhci->op_regs->command);
-	temp |= (CMD_EIE);
-	xhci_dbg_trace(xhci, trace_xhci_dbg_init,
-			"// Enable interrupts, cmd = 0x%x.", temp);
-	writel(temp, &xhci->op_regs->command);
-
-	temp = readl(&xhci->ir_set->irq_pending);
-	xhci_dbg_trace(xhci, trace_xhci_dbg_init,
-			"// Enabling event ring interrupter %p by writing 0x%x to irq_pending",
-			xhci->ir_set, (unsigned int) ER_IRQ_ENABLE(temp));
-	writel(ER_IRQ_ENABLE(temp), &xhci->ir_set->irq_pending);
-
 	if (xhci->quirks & XHCI_NEC_HOST) {
 		struct xhci_command *command;