Message ID | 20230328160756.30520-2-quic_kriskura@quicinc.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2352966vqo; Tue, 28 Mar 2023 09:36:04 -0700 (PDT) X-Google-Smtp-Source: AKy350a18ZOrPRTomAMFiqyRJOwdeBpg/Rw4pUgxU2gzGb2T1UO0FaWwRzPhag1azymE1u+qUCnG X-Received: by 2002:a17:907:1c09:b0:930:f953:9608 with SMTP id nc9-20020a1709071c0900b00930f9539608mr23479052ejc.0.1680021364474; Tue, 28 Mar 2023 09:36:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680021364; cv=none; d=google.com; s=arc-20160816; b=ghoWF9CLgwtDLjMcIWkor0oHYoSQ1SBmB5jAAo5iklc4RF1n8N8nSN4QBb34BXyqRS 4desWAcFC5tOrymNqlVjSYJiahcDbr9dtuelno7xDobQugEBdX98hO2r7VmxQXd2ZVzM oNVBNBQGMnhN6UvJ6OkqwZJFFbH8qLTjBiyJBVv+5QjGG3XGTYZMWqf/H3A5CDgSud4V 6AqO10Q0r+65d+lu5r1kPi2qhJQT+dL+8hbh9MlNLTKLKaHLU82xixhBdzOGMAKUskv4 MM58gQdJCx7j0OhXUBIvGjMhylvF8I5CvDyDrtH7Hzmvtvo+k96j2NTDkgmImP4mmiCo x63Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=QKhL0KzrR4b6znOdJrnhGJP/JXQIAdpm6o5hO8iT3zo=; b=hQnJ6+jABnByMWnuMPGk5Qc9I9u7QOrtJuURf68AZ3OoUf2+bfX7isukxYYtSI2NSF 19S2Mcs7MTG1nGgVW+UADs0yDTx5SLzo6F3fYjB+TqI9LhCvl2+Vlbp9TqSDM2RK0EKn 4Gq8q2e7gQNR27nZNNxX9UIUcLPtUxg0vbPmAfNfxFvYauN6tCmcjDkaukxM4qRuzWIT W6HAar0Us/xqwNauEVhBf3PSLfXZk4li1cjhvfGyl23Lzw0wyMDL9/Bk+8PmIknjIvNY YRZkoIkMuYNB+MJTfvhp74U6e783ZiifBldy7vrOcG/fL+yEqiQKsrf+a+r5Ph+OcIPP ETkQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@quicinc.com header.s=qcppdkim1 header.b=f5exfEoa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=quicinc.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gt36-20020a1709072da400b0093defbd6292si13221327ejc.1049.2023.03.28.09.35.41; Tue, 28 Mar 2023 09:36:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@quicinc.com header.s=qcppdkim1 header.b=f5exfEoa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=quicinc.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233122AbjC1QJP (ORCPT <rfc822;rua109.linux@gmail.com> + 99 others); Tue, 28 Mar 2023 12:09:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40852 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232987AbjC1QJF (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 28 Mar 2023 12:09:05 -0400 Received: from mx0a-0031df01.pphosted.com (mx0a-0031df01.pphosted.com [205.220.168.131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1A49DFF3C; Tue, 28 Mar 2023 09:08:53 -0700 (PDT) Received: from pps.filterd (m0279867.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 32SDUSdG029866; Tue, 28 Mar 2023 16:08:16 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=qcppdkim1; bh=QKhL0KzrR4b6znOdJrnhGJP/JXQIAdpm6o5hO8iT3zo=; b=f5exfEoamxF6isProRqyoHRbL59Rd/PXrWEdT87yK9DIhAqhAPwiv77VGg2pnJL0DZ4Y wPzr1KTObta1Lcvm7cbsomnALQ0dUrDaJvgjpvrIxvmEY+ffQGAS5CYl9wPYkR6LMC8l 9PScKFswyjWIByMMA15FMF4upFseRVTXso+7ch0lAjfyafonLSGx/MThZB4SNpQSCoxJ 9Edri0bRSPfbbukc/CHVIfFXcDfWdOdNXjzBXHVA2JMo723MB8kwPUKH8pgYHc5HB8IP wxW7Gye7VVl1iTPSduGs8/MX7vC0dvLdMO4tlK/d8d03dZUXiruhoPgrhlJ08vFLaBfl sQ== Received: from nalasppmta03.qualcomm.com (Global_NAT1.qualcomm.com [129.46.96.20]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3pk5774nu2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 28 Mar 2023 16:08:15 +0000 Received: from nalasex01a.na.qualcomm.com (nalasex01a.na.qualcomm.com [10.47.209.196]) by NALASPPMTA03.qualcomm.com (8.17.1.5/8.17.1.5) with ESMTPS id 32SG8EJb012255 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 28 Mar 2023 16:08:14 GMT Received: from hu-kriskura-hyd.qualcomm.com (10.80.80.8) by nalasex01a.na.qualcomm.com (10.47.209.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.41; Tue, 28 Mar 2023 09:08:10 -0700 From: Krishna Kurapati <quic_kriskura@quicinc.com> To: Thinh Nguyen <Thinh.Nguyen@synopsys.com>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Alan Stern <stern@rowland.harvard.edu>, "Geert Uytterhoeven" <geert+renesas@glider.be>, Colin Ian King <colin.i.king@gmail.com>, Jiantao Zhang <water.zhangjiantao@huawei.com>, "Rafael J . Wysocki" <rafael@kernel.org> CC: <linux-usb@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <quic_ppratap@quicinc.com>, <quic_wcheng@quicinc.com>, <quic_jackp@quicinc.com>, <quic_ugoswami@quicinc.com>, Krishna Kurapati <quic_kriskura@quicinc.com> Subject: [PATCH v2 1/2] usb: dwc3: gadget: Bail out in pullup if soft reset timeout happens Date: Tue, 28 Mar 2023 21:37:55 +0530 Message-ID: <20230328160756.30520-2-quic_kriskura@quicinc.com> X-Mailer: git-send-email 2.40.0 In-Reply-To: <20230328160756.30520-1-quic_kriskura@quicinc.com> References: <20230328160756.30520-1-quic_kriskura@quicinc.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01b.na.qualcomm.com (10.46.141.250) To nalasex01a.na.qualcomm.com (10.47.209.196) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-GUID: AoVszABDzWQTIz1CwF3bS0QWeQDBeEov X-Proofpoint-ORIG-GUID: AoVszABDzWQTIz1CwF3bS0QWeQDBeEov X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-24_11,2023-03-28_02,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 mlxscore=0 suspectscore=0 lowpriorityscore=0 impostorscore=0 bulkscore=0 adultscore=0 mlxlogscore=999 phishscore=0 malwarescore=0 spamscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2303280126 X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761630082184775855?= X-GMAIL-MSGID: =?utf-8?q?1761630082184775855?= |
Series |
Handle core soft reset failure in pullup
|
|
Commit Message
Krishna Kurapati
March 28, 2023, 4:07 p.m. UTC
If the core soft reset timeout happens, avoid setting up event
buffers and starting gadget as the writes to these registers
may not reflect when in reset and setting the run stop bit
can lead the controller to access wrong event buffer address
resulting in a crash.
Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com>
---
drivers/usb/dwc3/gadget.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
Comments
Hi, On Tue, Mar 28, 2023, Krishna Kurapati wrote: > If the core soft reset timeout happens, avoid setting up event > buffers and starting gadget as the writes to these registers > may not reflect when in reset and setting the run stop bit > can lead the controller to access wrong event buffer address > resulting in a crash. > > Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> > --- > drivers/usb/dwc3/gadget.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > index 3c63fa97a680..f0472801d9a5 100644 > --- a/drivers/usb/dwc3/gadget.c > +++ b/drivers/usb/dwc3/gadget.c > @@ -2620,13 +2620,16 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on) > * device-initiated disconnect requires a core soft reset > * (DCTL.CSftRst) before enabling the run/stop bit. > */ > - dwc3_core_soft_reset(dwc); > + ret = dwc3_core_soft_reset(dwc); > + if (ret) > + goto done; > > dwc3_event_buffers_setup(dwc); > __dwc3_gadget_start(dwc); > ret = dwc3_gadget_run_stop(dwc, true, false); > } > > +done: > pm_runtime_put(dwc->dev); > > return ret; > -- > 2.40.0 > I think there's one more place that may needs this check. Can you also add this check in __dwc3_set_mode()? Thanks, Thinh
On 3/29/2023 2:50 AM, Thinh Nguyen wrote: > Hi, > > On Tue, Mar 28, 2023, Krishna Kurapati wrote: >> If the core soft reset timeout happens, avoid setting up event >> buffers and starting gadget as the writes to these registers >> may not reflect when in reset and setting the run stop bit >> can lead the controller to access wrong event buffer address >> resulting in a crash. >> >> Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> >> --- >> drivers/usb/dwc3/gadget.c | 5 ++++- >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c >> index 3c63fa97a680..f0472801d9a5 100644 >> --- a/drivers/usb/dwc3/gadget.c >> +++ b/drivers/usb/dwc3/gadget.c >> @@ -2620,13 +2620,16 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on) >> * device-initiated disconnect requires a core soft reset >> * (DCTL.CSftRst) before enabling the run/stop bit. >> */ >> - dwc3_core_soft_reset(dwc); >> + ret = dwc3_core_soft_reset(dwc); >> + if (ret) >> + goto done; >> >> dwc3_event_buffers_setup(dwc); >> __dwc3_gadget_start(dwc); >> ret = dwc3_gadget_run_stop(dwc, true, false); >> } >> >> +done: >> pm_runtime_put(dwc->dev); >> >> return ret; >> -- >> 2.40.0 >> > > I think there's one more place that may needs this check. Can you also > add this check in __dwc3_set_mode()? Hi Thinh, Sure. Will do it. Will the below be good enough ? Or would it be good to add an error/warn log there> kriskura@hu-kriskura-hyd:/local/mnt/workspace/krishna/skales2/skales/kernel$ git diff drivers/usb/ diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c index 476b63618511..8d1d213d1dcd 100644 --- a/drivers/usb/dwc3/core.c +++ b/drivers/usb/dwc3/core.c @@ -210,7 +210,9 @@ static void __dwc3_set_mode(struct work_struct *work) } break; case DWC3_GCTL_PRTCAP_DEVICE: - dwc3_core_soft_reset(dwc); + ret = dwc3_core_soft_reset(dwc); + if (ret) + goto out; dwc3_event_buffers_setup(dwc); Regards, Krishna,
On Wed, Mar 29, 2023, Krishna Kurapati PSSNV wrote: > > > On 3/29/2023 2:50 AM, Thinh Nguyen wrote: > > Hi, > > > > On Tue, Mar 28, 2023, Krishna Kurapati wrote: > > > If the core soft reset timeout happens, avoid setting up event > > > buffers and starting gadget as the writes to these registers > > > may not reflect when in reset and setting the run stop bit > > > can lead the controller to access wrong event buffer address > > > resulting in a crash. > > > > > > Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> > > > --- > > > drivers/usb/dwc3/gadget.c | 5 ++++- > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > > > index 3c63fa97a680..f0472801d9a5 100644 > > > --- a/drivers/usb/dwc3/gadget.c > > > +++ b/drivers/usb/dwc3/gadget.c > > > @@ -2620,13 +2620,16 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on) > > > * device-initiated disconnect requires a core soft reset > > > * (DCTL.CSftRst) before enabling the run/stop bit. > > > */ > > > - dwc3_core_soft_reset(dwc); > > > + ret = dwc3_core_soft_reset(dwc); > > > + if (ret) > > > + goto done; > > > dwc3_event_buffers_setup(dwc); > > > __dwc3_gadget_start(dwc); > > > ret = dwc3_gadget_run_stop(dwc, true, false); > > > } > > > +done: > > > pm_runtime_put(dwc->dev); > > > return ret; > > > -- > > > 2.40.0 > > > > > > > I think there's one more place that may needs this check. Can you also > > add this check in __dwc3_set_mode()? > > Hi Thinh, > > Sure. Will do it. > Will the below be good enough ? Or would it be good to add an error/warn log > there> There's already a warning message in dwc3_core_soft_reset() if it fails. > > kriskura@hu-kriskura-hyd:/local/mnt/workspace/krishna/skales2/skales/kernel$ > git diff drivers/usb/ > diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c > index 476b63618511..8d1d213d1dcd 100644 > --- a/drivers/usb/dwc3/core.c > +++ b/drivers/usb/dwc3/core.c > @@ -210,7 +210,9 @@ static void __dwc3_set_mode(struct work_struct *work) > } > break; > case DWC3_GCTL_PRTCAP_DEVICE: > - dwc3_core_soft_reset(dwc); > + ret = dwc3_core_soft_reset(dwc); > + if (ret) > + goto out; > > dwc3_event_buffers_setup(dwc); > If soft-reset failed, the controller is in a bad state. We should not perform any further operation until the next hard reset. We should flag the controller as dead. I don't think we have the equivalent of the host's HCD_FLAG_DEAD. It may require some work in the UDC core. Perhaps we can flag within dwc3 for now and prevent any further operation for a simpler fix. Thanks, Thinh
On 3/30/2023 5:40 AM, Thinh Nguyen wrote: > On Wed, Mar 29, 2023, Krishna Kurapati PSSNV wrote: >> >> >> On 3/29/2023 2:50 AM, Thinh Nguyen wrote: >>> Hi, >>> >>> On Tue, Mar 28, 2023, Krishna Kurapati wrote: >>>> If the core soft reset timeout happens, avoid setting up event >>>> buffers and starting gadget as the writes to these registers >>>> may not reflect when in reset and setting the run stop bit >>>> can lead the controller to access wrong event buffer address >>>> resulting in a crash. >>>> >>>> Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> >>>> --- >>>> drivers/usb/dwc3/gadget.c | 5 ++++- >>>> 1 file changed, 4 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c >>>> index 3c63fa97a680..f0472801d9a5 100644 >>>> --- a/drivers/usb/dwc3/gadget.c >>>> +++ b/drivers/usb/dwc3/gadget.c >>>> @@ -2620,13 +2620,16 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on) >>>> * device-initiated disconnect requires a core soft reset >>>> * (DCTL.CSftRst) before enabling the run/stop bit. >>>> */ >>>> - dwc3_core_soft_reset(dwc); >>>> + ret = dwc3_core_soft_reset(dwc); >>>> + if (ret) >>>> + goto done; >>>> dwc3_event_buffers_setup(dwc); >>>> __dwc3_gadget_start(dwc); >>>> ret = dwc3_gadget_run_stop(dwc, true, false); >>>> } >>>> +done: >>>> pm_runtime_put(dwc->dev); >>>> return ret; >>>> -- >>>> 2.40.0 >>>> >>> >>> I think there's one more place that may needs this check. Can you also >>> add this check in __dwc3_set_mode()? >> >> Hi Thinh, >> >> Sure. Will do it. >> Will the below be good enough ? Or would it be good to add an error/warn log >> there> > > There's already a warning message in dwc3_core_soft_reset() if it fails. > >> >> kriskura@hu-kriskura-hyd:/local/mnt/workspace/krishna/skales2/skales/kernel$ >> git diff drivers/usb/ >> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c >> index 476b63618511..8d1d213d1dcd 100644 >> --- a/drivers/usb/dwc3/core.c >> +++ b/drivers/usb/dwc3/core.c >> @@ -210,7 +210,9 @@ static void __dwc3_set_mode(struct work_struct *work) >> } >> break; >> case DWC3_GCTL_PRTCAP_DEVICE: >> - dwc3_core_soft_reset(dwc); >> + ret = dwc3_core_soft_reset(dwc); >> + if (ret) >> + goto out; >> >> dwc3_event_buffers_setup(dwc); >> > > If soft-reset failed, the controller is in a bad state. We should not > perform any further operation until the next hard reset. We should flag > the controller as dead. I don't think we have the equivalent of the > host's HCD_FLAG_DEAD. It may require some work in the UDC core. Perhaps > we can flag within dwc3 for now and prevent any further operation for a > simpler fix. > Hi Thinh, Are you referring that if __dwc3_set_mode failed with core soft reset timing out, the caller i.e., dwc3_set_mode who queues the work need to know that the operation actually failed. So we can add a flag to indicate that gadget is dead and the caller of dwc3_set_mode can check the flag to see if the operation is successful or not. Or am I misunderstanding your comment ? Regards, Krishna,
On Thu, Mar 30, 2023, Krishna Kurapati PSSNV wrote: > > > On 3/30/2023 5:40 AM, Thinh Nguyen wrote: > > On Wed, Mar 29, 2023, Krishna Kurapati PSSNV wrote: > > > > > > > > > On 3/29/2023 2:50 AM, Thinh Nguyen wrote: > > > > Hi, > > > > > > > > On Tue, Mar 28, 2023, Krishna Kurapati wrote: > > > > > If the core soft reset timeout happens, avoid setting up event > > > > > buffers and starting gadget as the writes to these registers > > > > > may not reflect when in reset and setting the run stop bit > > > > > can lead the controller to access wrong event buffer address > > > > > resulting in a crash. > > > > > > > > > > Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> > > > > > --- > > > > > drivers/usb/dwc3/gadget.c | 5 ++++- > > > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > > > > > index 3c63fa97a680..f0472801d9a5 100644 > > > > > --- a/drivers/usb/dwc3/gadget.c > > > > > +++ b/drivers/usb/dwc3/gadget.c > > > > > @@ -2620,13 +2620,16 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on) > > > > > * device-initiated disconnect requires a core soft reset > > > > > * (DCTL.CSftRst) before enabling the run/stop bit. > > > > > */ > > > > > - dwc3_core_soft_reset(dwc); > > > > > + ret = dwc3_core_soft_reset(dwc); > > > > > + if (ret) > > > > > + goto done; > > > > > dwc3_event_buffers_setup(dwc); > > > > > __dwc3_gadget_start(dwc); > > > > > ret = dwc3_gadget_run_stop(dwc, true, false); > > > > > } > > > > > +done: > > > > > pm_runtime_put(dwc->dev); > > > > > return ret; > > > > > -- > > > > > 2.40.0 > > > > > > > > > > > > > I think there's one more place that may needs this check. Can you also > > > > add this check in __dwc3_set_mode()? > > > > > > Hi Thinh, > > > > > > Sure. Will do it. > > > Will the below be good enough ? Or would it be good to add an error/warn log > > > there> > > > > There's already a warning message in dwc3_core_soft_reset() if it fails. > > > > > > > > kriskura@hu-kriskura-hyd:/local/mnt/workspace/krishna/skales2/skales/kernel$ > > > git diff drivers/usb/ > > > diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c > > > index 476b63618511..8d1d213d1dcd 100644 > > > --- a/drivers/usb/dwc3/core.c > > > +++ b/drivers/usb/dwc3/core.c > > > @@ -210,7 +210,9 @@ static void __dwc3_set_mode(struct work_struct *work) > > > } > > > break; > > > case DWC3_GCTL_PRTCAP_DEVICE: > > > - dwc3_core_soft_reset(dwc); > > > + ret = dwc3_core_soft_reset(dwc); > > > + if (ret) > > > + goto out; > > > > > > dwc3_event_buffers_setup(dwc); > > > > > > > If soft-reset failed, the controller is in a bad state. We should not > > perform any further operation until the next hard reset. We should flag > > the controller as dead. I don't think we have the equivalent of the > > host's HCD_FLAG_DEAD. It may require some work in the UDC core. Perhaps > > we can flag within dwc3 for now and prevent any further operation for a > > simpler fix. > > > Hi Thinh, > > Are you referring that if __dwc3_set_mode failed with core soft reset > timing out, the caller i.e., dwc3_set_mode who queues the work need to know > that the operation actually failed. So we can add a flag to indicate that > gadget is dead and the caller of dwc3_set_mode can check the flag to see if > the operation is successful or not. > > Or am I misunderstanding your comment ? > Not just in __dwc3_set_mode(). I mean any time dwc3_core_soft_reset fails, then we set this flag. So that it can prevent the user calling any gadget ops causing more crashes/invalid behavior. The dwc->softconnect is already wrong on pullup() on failure. So that we can have a check in different gadget ops. For pullup(): static int dwc3_gadget_pullup() { if (dwc->udc_is_dead) { dev_err(dev, "reset me. x_x \n"); return; } abc(); } Perhaps the effort is probably the same if we enhance the UDC core for this? In any case, I'm fine either way. Thanks, Thinh
On 4/4/2023 5:19 AM, Thinh Nguyen wrote: > On Thu, Mar 30, 2023, Krishna Kurapati PSSNV wrote: >> >> >> On 3/30/2023 5:40 AM, Thinh Nguyen wrote: >>> On Wed, Mar 29, 2023, Krishna Kurapati PSSNV wrote: >>>> >>>> >>>> On 3/29/2023 2:50 AM, Thinh Nguyen wrote: >>>>> Hi, >>>>> >>>>> On Tue, Mar 28, 2023, Krishna Kurapati wrote: >>>>>> If the core soft reset timeout happens, avoid setting up event >>>>>> buffers and starting gadget as the writes to these registers >>>>>> may not reflect when in reset and setting the run stop bit >>>>>> can lead the controller to access wrong event buffer address >>>>>> resulting in a crash. >>>>>> >>>>>> Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> >>>>>> --- >>>>>> drivers/usb/dwc3/gadget.c | 5 ++++- >>>>>> 1 file changed, 4 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c >>>>>> index 3c63fa97a680..f0472801d9a5 100644 >>>>>> --- a/drivers/usb/dwc3/gadget.c >>>>>> +++ b/drivers/usb/dwc3/gadget.c >>>>>> @@ -2620,13 +2620,16 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on) >>>>>> * device-initiated disconnect requires a core soft reset >>>>>> * (DCTL.CSftRst) before enabling the run/stop bit. >>>>>> */ >>>>>> - dwc3_core_soft_reset(dwc); >>>>>> + ret = dwc3_core_soft_reset(dwc); >>>>>> + if (ret) >>>>>> + goto done; >>>>>> dwc3_event_buffers_setup(dwc); >>>>>> __dwc3_gadget_start(dwc); >>>>>> ret = dwc3_gadget_run_stop(dwc, true, false); >>>>>> } >>>>>> +done: >>>>>> pm_runtime_put(dwc->dev); >>>>>> return ret; >>>>>> -- >>>>>> 2.40.0 >>>>>> >>>>> >>>>> I think there's one more place that may needs this check. Can you also >>>>> add this check in __dwc3_set_mode()? >>>> >>>> Hi Thinh, >>>> >>>> Sure. Will do it. >>>> Will the below be good enough ? Or would it be good to add an error/warn log >>>> there> >>> >>> There's already a warning message in dwc3_core_soft_reset() if it fails. >>> >>>> >>>> kriskura@hu-kriskura-hyd:/local/mnt/workspace/krishna/skales2/skales/kernel$ >>>> git diff drivers/usb/ >>>> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c >>>> index 476b63618511..8d1d213d1dcd 100644 >>>> --- a/drivers/usb/dwc3/core.c >>>> +++ b/drivers/usb/dwc3/core.c >>>> @@ -210,7 +210,9 @@ static void __dwc3_set_mode(struct work_struct *work) >>>> } >>>> break; >>>> case DWC3_GCTL_PRTCAP_DEVICE: >>>> - dwc3_core_soft_reset(dwc); >>>> + ret = dwc3_core_soft_reset(dwc); >>>> + if (ret) >>>> + goto out; >>>> >>>> dwc3_event_buffers_setup(dwc); >>>> >>> >>> If soft-reset failed, the controller is in a bad state. We should not >>> perform any further operation until the next hard reset. We should flag >>> the controller as dead. I don't think we have the equivalent of the >>> host's HCD_FLAG_DEAD. It may require some work in the UDC core. Perhaps >>> we can flag within dwc3 for now and prevent any further operation for a >>> simpler fix. >>> >> Hi Thinh, >> >> Are you referring that if __dwc3_set_mode failed with core soft reset >> timing out, the caller i.e., dwc3_set_mode who queues the work need to know >> that the operation actually failed. So we can add a flag to indicate that >> gadget is dead and the caller of dwc3_set_mode can check the flag to see if >> the operation is successful or not. >> >> Or am I misunderstanding your comment ? >> > > Not just in __dwc3_set_mode(). I mean any time dwc3_core_soft_reset > fails, then we set this flag. So that it can prevent the user calling > any gadget ops causing more crashes/invalid behavior. The > dwc->softconnect is already wrong on pullup() on failure. > > So that we can have a check in different gadget ops. For pullup(): > > static int dwc3_gadget_pullup() { > if (dwc->udc_is_dead) { > dev_err(dev, "reset me. x_x \n"); > return; > } > > abc(); > } > > Perhaps the effort is probably the same if we enhance the UDC core for > this? In any case, I'm fine either way. > > Thanks, > Thinh Hi Thinh, So you don't want UDC to retry pullup if it fails the first time ? As per patch-2 of this series, I was trying to propagate the EITMEDOUT to UDC so that the caller (most probably configfs) can take appropriate action as to whether it must retry pullup or not. Regards, Krishna,
On Tue, Apr 04, 2023, Krishna Kurapati PSSNV wrote: > > > On 4/4/2023 5:19 AM, Thinh Nguyen wrote: > > On Thu, Mar 30, 2023, Krishna Kurapati PSSNV wrote: > > > > > > > > > On 3/30/2023 5:40 AM, Thinh Nguyen wrote: > > > > On Wed, Mar 29, 2023, Krishna Kurapati PSSNV wrote: > > > > > > > > > > > > > > > On 3/29/2023 2:50 AM, Thinh Nguyen wrote: > > > > > > Hi, > > > > > > > > > > > > On Tue, Mar 28, 2023, Krishna Kurapati wrote: > > > > > > > If the core soft reset timeout happens, avoid setting up event > > > > > > > buffers and starting gadget as the writes to these registers > > > > > > > may not reflect when in reset and setting the run stop bit > > > > > > > can lead the controller to access wrong event buffer address > > > > > > > resulting in a crash. > > > > > > > > > > > > > > Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> > > > > > > > --- > > > > > > > drivers/usb/dwc3/gadget.c | 5 ++++- > > > > > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > > > > > > > index 3c63fa97a680..f0472801d9a5 100644 > > > > > > > --- a/drivers/usb/dwc3/gadget.c > > > > > > > +++ b/drivers/usb/dwc3/gadget.c > > > > > > > @@ -2620,13 +2620,16 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on) > > > > > > > * device-initiated disconnect requires a core soft reset > > > > > > > * (DCTL.CSftRst) before enabling the run/stop bit. > > > > > > > */ > > > > > > > - dwc3_core_soft_reset(dwc); > > > > > > > + ret = dwc3_core_soft_reset(dwc); > > > > > > > + if (ret) > > > > > > > + goto done; > > > > > > > dwc3_event_buffers_setup(dwc); > > > > > > > __dwc3_gadget_start(dwc); > > > > > > > ret = dwc3_gadget_run_stop(dwc, true, false); > > > > > > > } > > > > > > > +done: > > > > > > > pm_runtime_put(dwc->dev); > > > > > > > return ret; > > > > > > > -- > > > > > > > 2.40.0 > > > > > > > > > > > > > > > > > > > I think there's one more place that may needs this check. Can you also > > > > > > add this check in __dwc3_set_mode()? > > > > > > > > > > Hi Thinh, > > > > > > > > > > Sure. Will do it. > > > > > Will the below be good enough ? Or would it be good to add an error/warn log > > > > > there> > > > > > > > > There's already a warning message in dwc3_core_soft_reset() if it fails. > > > > > > > > > > > > > > kriskura@hu-kriskura-hyd:/local/mnt/workspace/krishna/skales2/skales/kernel$ > > > > > git diff drivers/usb/ > > > > > diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c > > > > > index 476b63618511..8d1d213d1dcd 100644 > > > > > --- a/drivers/usb/dwc3/core.c > > > > > +++ b/drivers/usb/dwc3/core.c > > > > > @@ -210,7 +210,9 @@ static void __dwc3_set_mode(struct work_struct *work) > > > > > } > > > > > break; > > > > > case DWC3_GCTL_PRTCAP_DEVICE: > > > > > - dwc3_core_soft_reset(dwc); > > > > > + ret = dwc3_core_soft_reset(dwc); > > > > > + if (ret) > > > > > + goto out; > > > > > > > > > > dwc3_event_buffers_setup(dwc); > > > > > > > > > > > > > If soft-reset failed, the controller is in a bad state. We should not > > > > perform any further operation until the next hard reset. We should flag > > > > the controller as dead. I don't think we have the equivalent of the > > > > host's HCD_FLAG_DEAD. It may require some work in the UDC core. Perhaps > > > > we can flag within dwc3 for now and prevent any further operation for a > > > > simpler fix. > > > > > > > Hi Thinh, > > > > > > Are you referring that if __dwc3_set_mode failed with core soft reset > > > timing out, the caller i.e., dwc3_set_mode who queues the work need to know > > > that the operation actually failed. So we can add a flag to indicate that > > > gadget is dead and the caller of dwc3_set_mode can check the flag to see if > > > the operation is successful or not. > > > > > > Or am I misunderstanding your comment ? > > > > > > > Not just in __dwc3_set_mode(). I mean any time dwc3_core_soft_reset > > fails, then we set this flag. So that it can prevent the user calling > > any gadget ops causing more crashes/invalid behavior. The > > dwc->softconnect is already wrong on pullup() on failure. > > > > So that we can have a check in different gadget ops. For pullup(): > > > > static int dwc3_gadget_pullup() { > > if (dwc->udc_is_dead) { > > dev_err(dev, "reset me. x_x \n"); > > return; > > } > > > > abc(); > > } > > > > Perhaps the effort is probably the same if we enhance the UDC core for > > this? In any case, I'm fine either way. > > > > Thanks, > > Thinh > > Hi Thinh, > > So you don't want UDC to retry pullup if it fails the first time ? As per > patch-2 of this series, I was trying to propagate the EITMEDOUT to UDC so > that the caller (most probably configfs) can take appropriate action as to > whether it must retry pullup or not. > Now I'm confused. If the soft-reset times out, that means that the soft-reset (self-clearing) bit isn't cleared. How can we retry if it's stuck in this state? My impression is that soft-reset would not complete at all. Is that not the case for you, or it's simply because we need a longer soft-reset timeout? Thanks, Thinh
On 4/5/2023 3:13 AM, Thinh Nguyen wrote: > On Tue, Apr 04, 2023, Krishna Kurapati PSSNV wrote: >> >> >> On 4/4/2023 5:19 AM, Thinh Nguyen wrote: >>> On Thu, Mar 30, 2023, Krishna Kurapati PSSNV wrote: >>>> >>>> >>>> On 3/30/2023 5:40 AM, Thinh Nguyen wrote: >>>>> On Wed, Mar 29, 2023, Krishna Kurapati PSSNV wrote: >>>>>> >>>>>> >>>>>> On 3/29/2023 2:50 AM, Thinh Nguyen wrote: >>>>>>> Hi, >>>>>>> >>>>>>> On Tue, Mar 28, 2023, Krishna Kurapati wrote: >>>>>>>> If the core soft reset timeout happens, avoid setting up event >>>>>>>> buffers and starting gadget as the writes to these registers >>>>>>>> may not reflect when in reset and setting the run stop bit >>>>>>>> can lead the controller to access wrong event buffer address >>>>>>>> resulting in a crash. >>>>>>>> >>>>>>>> Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> >>>>>>>> --- >>>>>>>> drivers/usb/dwc3/gadget.c | 5 ++++- >>>>>>>> 1 file changed, 4 insertions(+), 1 deletion(-) >>>>>>>> >>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c >>>>>>>> index 3c63fa97a680..f0472801d9a5 100644 >>>>>>>> --- a/drivers/usb/dwc3/gadget.c >>>>>>>> +++ b/drivers/usb/dwc3/gadget.c >>>>>>>> @@ -2620,13 +2620,16 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on) >>>>>>>> * device-initiated disconnect requires a core soft reset >>>>>>>> * (DCTL.CSftRst) before enabling the run/stop bit. >>>>>>>> */ >>>>>>>> - dwc3_core_soft_reset(dwc); >>>>>>>> + ret = dwc3_core_soft_reset(dwc); >>>>>>>> + if (ret) >>>>>>>> + goto done; >>>>>>>> dwc3_event_buffers_setup(dwc); >>>>>>>> __dwc3_gadget_start(dwc); >>>>>>>> ret = dwc3_gadget_run_stop(dwc, true, false); >>>>>>>> } >>>>>>>> +done: >>>>>>>> pm_runtime_put(dwc->dev); >>>>>>>> return ret; >>>>>>>> -- >>>>>>>> 2.40.0 >>>>>>>> >>>>>>> >>>>>>> I think there's one more place that may needs this check. Can you also >>>>>>> add this check in __dwc3_set_mode()? >>>>>> >>>>>> Hi Thinh, >>>>>> >>>>>> Sure. Will do it. >>>>>> Will the below be good enough ? Or would it be good to add an error/warn log >>>>>> there> >>>>> >>>>> There's already a warning message in dwc3_core_soft_reset() if it fails. >>>>> >>>>>> >>>>>> kriskura@hu-kriskura-hyd:/local/mnt/workspace/krishna/skales2/skales/kernel$ >>>>>> git diff drivers/usb/ >>>>>> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c >>>>>> index 476b63618511..8d1d213d1dcd 100644 >>>>>> --- a/drivers/usb/dwc3/core.c >>>>>> +++ b/drivers/usb/dwc3/core.c >>>>>> @@ -210,7 +210,9 @@ static void __dwc3_set_mode(struct work_struct *work) >>>>>> } >>>>>> break; >>>>>> case DWC3_GCTL_PRTCAP_DEVICE: >>>>>> - dwc3_core_soft_reset(dwc); >>>>>> + ret = dwc3_core_soft_reset(dwc); >>>>>> + if (ret) >>>>>> + goto out; >>>>>> >>>>>> dwc3_event_buffers_setup(dwc); >>>>>> >>>>> >>>>> If soft-reset failed, the controller is in a bad state. We should not >>>>> perform any further operation until the next hard reset. We should flag >>>>> the controller as dead. I don't think we have the equivalent of the >>>>> host's HCD_FLAG_DEAD. It may require some work in the UDC core. Perhaps >>>>> we can flag within dwc3 for now and prevent any further operation for a >>>>> simpler fix. >>>>> >>>> Hi Thinh, >>>> >>>> Are you referring that if __dwc3_set_mode failed with core soft reset >>>> timing out, the caller i.e., dwc3_set_mode who queues the work need to know >>>> that the operation actually failed. So we can add a flag to indicate that >>>> gadget is dead and the caller of dwc3_set_mode can check the flag to see if >>>> the operation is successful or not. >>>> >>>> Or am I misunderstanding your comment ? >>>> >>> >>> Not just in __dwc3_set_mode(). I mean any time dwc3_core_soft_reset >>> fails, then we set this flag. So that it can prevent the user calling >>> any gadget ops causing more crashes/invalid behavior. The >>> dwc->softconnect is already wrong on pullup() on failure. >>> >>> So that we can have a check in different gadget ops. For pullup(): >>> >>> static int dwc3_gadget_pullup() { >>> if (dwc->udc_is_dead) { >>> dev_err(dev, "reset me. x_x \n"); >>> return; >>> } >>> >>> abc(); >>> } >>> >>> Perhaps the effort is probably the same if we enhance the UDC core for >>> this? In any case, I'm fine either way. >>> >>> Thanks, >>> Thinh >> >> Hi Thinh, >> >> So you don't want UDC to retry pullup if it fails the first time ? As per >> patch-2 of this series, I was trying to propagate the EITMEDOUT to UDC so >> that the caller (most probably configfs) can take appropriate action as to >> whether it must retry pullup or not. >> > > Now I'm confused. If the soft-reset times out, that means that the > soft-reset (self-clearing) bit isn't cleared. How can we retry if it's > stuck in this state? My impression is that soft-reset would not complete > at all. Is that not the case for you, or it's simply because we need a > longer soft-reset timeout? > > Thanks, > Thinh Hi Thinh, Sorry for not being clear. The intention of patch-1 was to ensure we don't start the controller if reset times out and patch-2 was to ensure that UDC is in sync with controller by understanding that gadget_connect has failed and necessary cleanup has to be done in gadget_bind_driver. But usually since the UDC_store is the one that is causing pullup to be called, the error value is propagated back to UDC_store. If it sees a failure, it does a retry to pullup. I didn't check whether subsequent retries by UDC to pullup are helping clear the reset bit or not. But I thought retrying pullup won't be of any harm. I now get that my patches are incomplete w.r.t handling the timeout. IIRC one of the following is what you are suggesting we need to do: Option-1: Set a flag when reset times out and clear it upon core_exit / core_init. If the flag is set, block calls to all the gadget_ops in dwc3. Basically even if retry happens from configfs/UDC, we bail out in pullup/udc_start even without trying the requested gadget operation. Option-2: If gadget_connect fails with -ETIMEDOUT in UDC, handle the failure and implement the same flag in UDC and don't even call any gadget_ops. Regards, Krishna,
On Wed, Apr 05, 2023, Krishna Kurapati PSSNV wrote: > > > On 4/5/2023 3:13 AM, Thinh Nguyen wrote: > > On Tue, Apr 04, 2023, Krishna Kurapati PSSNV wrote: > > > > > > > > > On 4/4/2023 5:19 AM, Thinh Nguyen wrote: > > > > On Thu, Mar 30, 2023, Krishna Kurapati PSSNV wrote: > > > > > > > > > > > > > > > On 3/30/2023 5:40 AM, Thinh Nguyen wrote: > > > > > > On Wed, Mar 29, 2023, Krishna Kurapati PSSNV wrote: > > > > > > > > > > > > > > > > > > > > > On 3/29/2023 2:50 AM, Thinh Nguyen wrote: > > > > > > > > Hi, > > > > > > > > > > > > > > > > On Tue, Mar 28, 2023, Krishna Kurapati wrote: > > > > > > > > > If the core soft reset timeout happens, avoid setting up event > > > > > > > > > buffers and starting gadget as the writes to these registers > > > > > > > > > may not reflect when in reset and setting the run stop bit > > > > > > > > > can lead the controller to access wrong event buffer address > > > > > > > > > resulting in a crash. > > > > > > > > > > > > > > > > > > Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> > > > > > > > > > --- > > > > > > > > > drivers/usb/dwc3/gadget.c | 5 ++++- > > > > > > > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > > > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > > > > > > > > > index 3c63fa97a680..f0472801d9a5 100644 > > > > > > > > > --- a/drivers/usb/dwc3/gadget.c > > > > > > > > > +++ b/drivers/usb/dwc3/gadget.c > > > > > > > > > @@ -2620,13 +2620,16 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on) > > > > > > > > > * device-initiated disconnect requires a core soft reset > > > > > > > > > * (DCTL.CSftRst) before enabling the run/stop bit. > > > > > > > > > */ > > > > > > > > > - dwc3_core_soft_reset(dwc); > > > > > > > > > + ret = dwc3_core_soft_reset(dwc); > > > > > > > > > + if (ret) > > > > > > > > > + goto done; > > > > > > > > > dwc3_event_buffers_setup(dwc); > > > > > > > > > __dwc3_gadget_start(dwc); > > > > > > > > > ret = dwc3_gadget_run_stop(dwc, true, false); > > > > > > > > > } > > > > > > > > > +done: > > > > > > > > > pm_runtime_put(dwc->dev); > > > > > > > > > return ret; > > > > > > > > > -- > > > > > > > > > 2.40.0 > > > > > > > > > > > > > > > > > > > > > > > > > I think there's one more place that may needs this check. Can you also > > > > > > > > add this check in __dwc3_set_mode()? > > > > > > > > > > > > > > Hi Thinh, > > > > > > > > > > > > > > Sure. Will do it. > > > > > > > Will the below be good enough ? Or would it be good to add an error/warn log > > > > > > > there> > > > > > > > > > > > > There's already a warning message in dwc3_core_soft_reset() if it fails. > > > > > > > > > > > > > > > > > > > > kriskura@hu-kriskura-hyd:/local/mnt/workspace/krishna/skales2/skales/kernel$ > > > > > > > git diff drivers/usb/ > > > > > > > diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c > > > > > > > index 476b63618511..8d1d213d1dcd 100644 > > > > > > > --- a/drivers/usb/dwc3/core.c > > > > > > > +++ b/drivers/usb/dwc3/core.c > > > > > > > @@ -210,7 +210,9 @@ static void __dwc3_set_mode(struct work_struct *work) > > > > > > > } > > > > > > > break; > > > > > > > case DWC3_GCTL_PRTCAP_DEVICE: > > > > > > > - dwc3_core_soft_reset(dwc); > > > > > > > + ret = dwc3_core_soft_reset(dwc); > > > > > > > + if (ret) > > > > > > > + goto out; > > > > > > > > > > > > > > dwc3_event_buffers_setup(dwc); > > > > > > > > > > > > > > > > > > > If soft-reset failed, the controller is in a bad state. We should not > > > > > > perform any further operation until the next hard reset. We should flag > > > > > > the controller as dead. I don't think we have the equivalent of the > > > > > > host's HCD_FLAG_DEAD. It may require some work in the UDC core. Perhaps > > > > > > we can flag within dwc3 for now and prevent any further operation for a > > > > > > simpler fix. > > > > > > > > > > > Hi Thinh, > > > > > > > > > > Are you referring that if __dwc3_set_mode failed with core soft reset > > > > > timing out, the caller i.e., dwc3_set_mode who queues the work need to know > > > > > that the operation actually failed. So we can add a flag to indicate that > > > > > gadget is dead and the caller of dwc3_set_mode can check the flag to see if > > > > > the operation is successful or not. > > > > > > > > > > Or am I misunderstanding your comment ? > > > > > > > > > > > > > Not just in __dwc3_set_mode(). I mean any time dwc3_core_soft_reset > > > > fails, then we set this flag. So that it can prevent the user calling > > > > any gadget ops causing more crashes/invalid behavior. The > > > > dwc->softconnect is already wrong on pullup() on failure. > > > > > > > > So that we can have a check in different gadget ops. For pullup(): > > > > > > > > static int dwc3_gadget_pullup() { > > > > if (dwc->udc_is_dead) { > > > > dev_err(dev, "reset me. x_x \n"); > > > > return; > > > > } > > > > > > > > abc(); > > > > } > > > > > > > > Perhaps the effort is probably the same if we enhance the UDC core for > > > > this? In any case, I'm fine either way. > > > > > > > > Thanks, > > > > Thinh > > > > > > Hi Thinh, > > > > > > So you don't want UDC to retry pullup if it fails the first time ? As per > > > patch-2 of this series, I was trying to propagate the EITMEDOUT to UDC so > > > that the caller (most probably configfs) can take appropriate action as to > > > whether it must retry pullup or not. > > > > > > > Now I'm confused. If the soft-reset times out, that means that the > > soft-reset (self-clearing) bit isn't cleared. How can we retry if it's > > stuck in this state? My impression is that soft-reset would not complete > > at all. Is that not the case for you, or it's simply because we need a > > longer soft-reset timeout? > > > > Thanks, > > Thinh > > Hi Thinh, > > Sorry for not being clear. The intention of patch-1 was to ensure we don't > start the controller if reset times out and patch-2 was to ensure that UDC > is in sync with controller by understanding that gadget_connect has failed > and necessary cleanup has to be done in gadget_bind_driver. That should still be there. > > But usually since the UDC_store is the one that is causing pullup to be > called, the error value is propagated back to UDC_store. If it sees a > failure, it does a retry to pullup. > > I didn't check whether subsequent retries by UDC to pullup are helping > clear the reset bit or not. But I thought retrying pullup won't be of any > harm. It's fine to retry. I'm thinking that the controller is in a bad state at this point that we can't recover (hopefully that's not the case). > > I now get that my patches are incomplete w.r.t handling the timeout. > > IIRC one of the following is what you are suggesting we need to do: > > Option-1: > Set a flag when reset times out and clear it upon core_exit / core_init. If > the flag is set, block calls to all the gadget_ops in dwc3. Basically even > if retry happens from configfs/UDC, we bail out in pullup/udc_start even > without trying the requested gadget operation. > > Option-2: > If gadget_connect fails with -ETIMEDOUT in UDC, handle the failure and > implement the same flag in UDC and don't even call any gadget_ops. > I'm thinking of option-1. For option-2, we can't control if the gadget_ops will be called. We only have control how we will respond in case they get called again. But now I'm thinking again, I think it may be ok without adding the flag. The UDC core and gadget driver won't do anything else before pullup(1) is successful. Calling other gadget_ops should be harmless (ie. it won't crash/break the system)? Sorry for the noise, but I think it may be ok without marking the controller dead. I wonder if we can confirm my suspiction on retry? I believe this is not easy to reproduce on your setup? If not, I think we can take your change as is. Thanks, Thinh
On 4/6/2023 6:15 AM, Thinh Nguyen wrote: > On Wed, Apr 05, 2023, Krishna Kurapati PSSNV wrote: >> >> >> On 4/5/2023 3:13 AM, Thinh Nguyen wrote: >>> On Tue, Apr 04, 2023, Krishna Kurapati PSSNV wrote: >>>> >>>> >>>> On 4/4/2023 5:19 AM, Thinh Nguyen wrote: >>>>> On Thu, Mar 30, 2023, Krishna Kurapati PSSNV wrote: >>>>>> >>>>>> >>>>>> On 3/30/2023 5:40 AM, Thinh Nguyen wrote: >>>>>>> On Wed, Mar 29, 2023, Krishna Kurapati PSSNV wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 3/29/2023 2:50 AM, Thinh Nguyen wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> On Tue, Mar 28, 2023, Krishna Kurapati wrote: >>>>>>>>>> If the core soft reset timeout happens, avoid setting up event >>>>>>>>>> buffers and starting gadget as the writes to these registers >>>>>>>>>> may not reflect when in reset and setting the run stop bit >>>>>>>>>> can lead the controller to access wrong event buffer address >>>>>>>>>> resulting in a crash. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> >>>>>>>>>> --- >>>>>>>>>> drivers/usb/dwc3/gadget.c | 5 ++++- >>>>>>>>>> 1 file changed, 4 insertions(+), 1 deletion(-) >>>>>>>>>> >>>>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c >>>>>>>>>> index 3c63fa97a680..f0472801d9a5 100644 >>>>>>>>>> --- a/drivers/usb/dwc3/gadget.c >>>>>>>>>> +++ b/drivers/usb/dwc3/gadget.c >>>>>>>>>> @@ -2620,13 +2620,16 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on) >>>>>>>>>> * device-initiated disconnect requires a core soft reset >>>>>>>>>> * (DCTL.CSftRst) before enabling the run/stop bit. >>>>>>>>>> */ >>>>>>>>>> - dwc3_core_soft_reset(dwc); >>>>>>>>>> + ret = dwc3_core_soft_reset(dwc); >>>>>>>>>> + if (ret) >>>>>>>>>> + goto done; >>>>>>>>>> dwc3_event_buffers_setup(dwc); >>>>>>>>>> __dwc3_gadget_start(dwc); >>>>>>>>>> ret = dwc3_gadget_run_stop(dwc, true, false); >>>>>>>>>> } >>>>>>>>>> +done: >>>>>>>>>> pm_runtime_put(dwc->dev); >>>>>>>>>> return ret; >>>>>>>>>> -- >>>>>>>>>> 2.40.0 >>>>>>>>>> >>>>>>>>> >>>>>>>>> I think there's one more place that may needs this check. Can you also >>>>>>>>> add this check in __dwc3_set_mode()? >>>>>>>> >>>>>>>> Hi Thinh, >>>>>>>> >>>>>>>> Sure. Will do it. >>>>>>>> Will the below be good enough ? Or would it be good to add an error/warn log >>>>>>>> there> >>>>>>> >>>>>>> There's already a warning message in dwc3_core_soft_reset() if it fails. >>>>>>> >>>>>>>> >>>>>>>> kriskura@hu-kriskura-hyd:/local/mnt/workspace/krishna/skales2/skales/kernel$ >>>>>>>> git diff drivers/usb/ >>>>>>>> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c >>>>>>>> index 476b63618511..8d1d213d1dcd 100644 >>>>>>>> --- a/drivers/usb/dwc3/core.c >>>>>>>> +++ b/drivers/usb/dwc3/core.c >>>>>>>> @@ -210,7 +210,9 @@ static void __dwc3_set_mode(struct work_struct *work) >>>>>>>> } >>>>>>>> break; >>>>>>>> case DWC3_GCTL_PRTCAP_DEVICE: >>>>>>>> - dwc3_core_soft_reset(dwc); >>>>>>>> + ret = dwc3_core_soft_reset(dwc); >>>>>>>> + if (ret) >>>>>>>> + goto out; >>>>>>>> >>>>>>>> dwc3_event_buffers_setup(dwc); >>>>>>>> >>>>>>> >>>>>>> If soft-reset failed, the controller is in a bad state. We should not >>>>>>> perform any further operation until the next hard reset. We should flag >>>>>>> the controller as dead. I don't think we have the equivalent of the >>>>>>> host's HCD_FLAG_DEAD. It may require some work in the UDC core. Perhaps >>>>>>> we can flag within dwc3 for now and prevent any further operation for a >>>>>>> simpler fix. >>>>>>> >>>>>> Hi Thinh, >>>>>> >>>>>> Are you referring that if __dwc3_set_mode failed with core soft reset >>>>>> timing out, the caller i.e., dwc3_set_mode who queues the work need to know >>>>>> that the operation actually failed. So we can add a flag to indicate that >>>>>> gadget is dead and the caller of dwc3_set_mode can check the flag to see if >>>>>> the operation is successful or not. >>>>>> >>>>>> Or am I misunderstanding your comment ? >>>>>> >>>>> >>>>> Not just in __dwc3_set_mode(). I mean any time dwc3_core_soft_reset >>>>> fails, then we set this flag. So that it can prevent the user calling >>>>> any gadget ops causing more crashes/invalid behavior. The >>>>> dwc->softconnect is already wrong on pullup() on failure. >>>>> >>>>> So that we can have a check in different gadget ops. For pullup(): >>>>> >>>>> static int dwc3_gadget_pullup() { >>>>> if (dwc->udc_is_dead) { >>>>> dev_err(dev, "reset me. x_x \n"); >>>>> return; >>>>> } >>>>> >>>>> abc(); >>>>> } >>>>> >>>>> Perhaps the effort is probably the same if we enhance the UDC core for >>>>> this? In any case, I'm fine either way. >>>>> >>>>> Thanks, >>>>> Thinh >>>> >>>> Hi Thinh, >>>> >>>> So you don't want UDC to retry pullup if it fails the first time ? As per >>>> patch-2 of this series, I was trying to propagate the EITMEDOUT to UDC so >>>> that the caller (most probably configfs) can take appropriate action as to >>>> whether it must retry pullup or not. >>>> >>> >>> Now I'm confused. If the soft-reset times out, that means that the >>> soft-reset (self-clearing) bit isn't cleared. How can we retry if it's >>> stuck in this state? My impression is that soft-reset would not complete >>> at all. Is that not the case for you, or it's simply because we need a >>> longer soft-reset timeout? >>> >>> Thanks, >>> Thinh >> >> Hi Thinh, >> >> Sorry for not being clear. The intention of patch-1 was to ensure we don't >> start the controller if reset times out and patch-2 was to ensure that UDC >> is in sync with controller by understanding that gadget_connect has failed >> and necessary cleanup has to be done in gadget_bind_driver. > > That should still be there. > >> >> But usually since the UDC_store is the one that is causing pullup to be >> called, the error value is propagated back to UDC_store. If it sees a >> failure, it does a retry to pullup. >> >> I didn't check whether subsequent retries by UDC to pullup are helping >> clear the reset bit or not. But I thought retrying pullup won't be of any >> harm. > > It's fine to retry. I'm thinking that the controller is in a bad state > at this point that we can't recover (hopefully that's not the case). > >> >> I now get that my patches are incomplete w.r.t handling the timeout. >> >> IIRC one of the following is what you are suggesting we need to do: >> >> Option-1: >> Set a flag when reset times out and clear it upon core_exit / core_init. If >> the flag is set, block calls to all the gadget_ops in dwc3. Basically even >> if retry happens from configfs/UDC, we bail out in pullup/udc_start even >> without trying the requested gadget operation. >> >> Option-2: >> If gadget_connect fails with -ETIMEDOUT in UDC, handle the failure and >> implement the same flag in UDC and don't even call any gadget_ops. >> > Hi Thinh, Thanks for the review. > I'm thinking of option-1. For option-2, we can't control if the > gadget_ops will be called. We only have control how we will respond in > case they get called again. > > But now I'm thinking again, I think it may be ok without adding the > flag. The UDC core and gadget driver won't do anything else before > pullup(1) is successful. Calling other gadget_ops should be harmless > (ie. it won't crash/break the system)? > I can give this a try in long run testing (For 7-14 days) and see if anything else is breaking. Most probably we do a composition switch / PIPO in between which can call usb_gadget_unregister_driver which might invoke a pullup(0) followed by udc_stop() and like you mentioned must not be a problem. > Sorry for the noise, but I think it may be ok without marking the > controller dead. I wonder if we can confirm my suspiction on retry? I > believe this is not easy to reproduce on your setup? If not, I think we > can take your change as is. > Regards, Krishna,
On 4/6/2023 7:44 AM, Krishna Kurapati PSSNV wrote: > > > On 4/6/2023 6:15 AM, Thinh Nguyen wrote: >> On Wed, Apr 05, 2023, Krishna Kurapati PSSNV wrote: >>> >>> >>> On 4/5/2023 3:13 AM, Thinh Nguyen wrote: >>>> On Tue, Apr 04, 2023, Krishna Kurapati PSSNV wrote: >>>>> >>>>> >>>>> On 4/4/2023 5:19 AM, Thinh Nguyen wrote: >>>>>> On Thu, Mar 30, 2023, Krishna Kurapati PSSNV wrote: >>>>>>> >>>>>>> >>>>>>> On 3/30/2023 5:40 AM, Thinh Nguyen wrote: >>>>>>>> On Wed, Mar 29, 2023, Krishna Kurapati PSSNV wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On 3/29/2023 2:50 AM, Thinh Nguyen wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> On Tue, Mar 28, 2023, Krishna Kurapati wrote: >>>>>>>>>>> If the core soft reset timeout happens, avoid setting up event >>>>>>>>>>> buffers and starting gadget as the writes to these registers >>>>>>>>>>> may not reflect when in reset and setting the run stop bit >>>>>>>>>>> can lead the controller to access wrong event buffer address >>>>>>>>>>> resulting in a crash. >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> >>>>>>>>>>> --- >>>>>>>>>>> drivers/usb/dwc3/gadget.c | 5 ++++- >>>>>>>>>>> 1 file changed, 4 insertions(+), 1 deletion(-) >>>>>>>>>>> >>>>>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c >>>>>>>>>>> b/drivers/usb/dwc3/gadget.c >>>>>>>>>>> index 3c63fa97a680..f0472801d9a5 100644 >>>>>>>>>>> --- a/drivers/usb/dwc3/gadget.c >>>>>>>>>>> +++ b/drivers/usb/dwc3/gadget.c >>>>>>>>>>> @@ -2620,13 +2620,16 @@ static int dwc3_gadget_pullup(struct >>>>>>>>>>> usb_gadget *g, int is_on) >>>>>>>>>>> * device-initiated disconnect requires a core >>>>>>>>>>> soft reset >>>>>>>>>>> * (DCTL.CSftRst) before enabling the run/stop >>>>>>>>>>> bit. >>>>>>>>>>> */ >>>>>>>>>>> - dwc3_core_soft_reset(dwc); >>>>>>>>>>> + ret = dwc3_core_soft_reset(dwc); >>>>>>>>>>> + if (ret) >>>>>>>>>>> + goto done; >>>>>>>>>>> dwc3_event_buffers_setup(dwc); >>>>>>>>>>> __dwc3_gadget_start(dwc); >>>>>>>>>>> ret = dwc3_gadget_run_stop(dwc, true, false); >>>>>>>>>>> } >>>>>>>>>>> +done: >>>>>>>>>>> pm_runtime_put(dwc->dev); >>>>>>>>>>> return ret; >>>>>>>>>>> -- >>>>>>>>>>> 2.40.0 >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I think there's one more place that may needs this check. Can >>>>>>>>>> you also >>>>>>>>>> add this check in __dwc3_set_mode()? >>>>>>>>> >>>>>>>>> Hi Thinh, >>>>>>>>> >>>>>>>>> Sure. Will do it. >>>>>>>>> Will the below be good enough ? Or would it be good to add an >>>>>>>>> error/warn log >>>>>>>>> there> >>>>>>>> >>>>>>>> There's already a warning message in dwc3_core_soft_reset() if >>>>>>>> it fails. >>>>>>>> >>>>>>>>> >>>>>>>>> kriskura@hu-kriskura-hyd:/local/mnt/workspace/krishna/skales2/skales/kernel$ >>>>>>>>> git diff drivers/usb/ >>>>>>>>> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c >>>>>>>>> index 476b63618511..8d1d213d1dcd 100644 >>>>>>>>> --- a/drivers/usb/dwc3/core.c >>>>>>>>> +++ b/drivers/usb/dwc3/core.c >>>>>>>>> @@ -210,7 +210,9 @@ static void __dwc3_set_mode(struct >>>>>>>>> work_struct *work) >>>>>>>>> } >>>>>>>>> break; >>>>>>>>> case DWC3_GCTL_PRTCAP_DEVICE: >>>>>>>>> - dwc3_core_soft_reset(dwc); >>>>>>>>> + ret = dwc3_core_soft_reset(dwc); >>>>>>>>> + if (ret) >>>>>>>>> + goto out; >>>>>>>>> >>>>>>>>> dwc3_event_buffers_setup(dwc); >>>>>>>>> >>>>>>>> >>>>>>>> If soft-reset failed, the controller is in a bad state. We >>>>>>>> should not >>>>>>>> perform any further operation until the next hard reset. We >>>>>>>> should flag >>>>>>>> the controller as dead. I don't think we have the equivalent of the >>>>>>>> host's HCD_FLAG_DEAD. It may require some work in the UDC core. >>>>>>>> Perhaps >>>>>>>> we can flag within dwc3 for now and prevent any further >>>>>>>> operation for a >>>>>>>> simpler fix. >>>>>>>> >>>>>>> Hi Thinh, >>>>>>> >>>>>>> Are you referring that if __dwc3_set_mode failed with core >>>>>>> soft reset >>>>>>> timing out, the caller i.e., dwc3_set_mode who queues the work >>>>>>> need to know >>>>>>> that the operation actually failed. So we can add a flag to >>>>>>> indicate that >>>>>>> gadget is dead and the caller of dwc3_set_mode can check the flag >>>>>>> to see if >>>>>>> the operation is successful or not. >>>>>>> >>>>>>> Or am I misunderstanding your comment ? >>>>>>> >>>>>> >>>>>> Not just in __dwc3_set_mode(). I mean any time dwc3_core_soft_reset >>>>>> fails, then we set this flag. So that it can prevent the user calling >>>>>> any gadget ops causing more crashes/invalid behavior. The >>>>>> dwc->softconnect is already wrong on pullup() on failure. >>>>>> >>>>>> So that we can have a check in different gadget ops. For pullup(): >>>>>> >>>>>> static int dwc3_gadget_pullup() { >>>>>> if (dwc->udc_is_dead) { >>>>>> dev_err(dev, "reset me. x_x \n"); >>>>>> return; >>>>>> } >>>>>> >>>>>> abc(); >>>>>> } >>>>>> >>>>>> Perhaps the effort is probably the same if we enhance the UDC core >>>>>> for >>>>>> this? In any case, I'm fine either way. >>>>>> >>>>>> Thanks, >>>>>> Thinh >>>>> >>>>> Hi Thinh, >>>>> >>>>> So you don't want UDC to retry pullup if it fails the first time >>>>> ? As per >>>>> patch-2 of this series, I was trying to propagate the EITMEDOUT to >>>>> UDC so >>>>> that the caller (most probably configfs) can take appropriate >>>>> action as to >>>>> whether it must retry pullup or not. >>>>> >>>> >>>> Now I'm confused. If the soft-reset times out, that means that the >>>> soft-reset (self-clearing) bit isn't cleared. How can we retry if it's >>>> stuck in this state? My impression is that soft-reset would not >>>> complete >>>> at all. Is that not the case for you, or it's simply because we need a >>>> longer soft-reset timeout? >>>> >>>> Thanks, >>>> Thinh >>> >>> Hi Thinh, >>> >>> Sorry for not being clear. The intention of patch-1 was to ensure >>> we don't >>> start the controller if reset times out and patch-2 was to ensure >>> that UDC >>> is in sync with controller by understanding that gadget_connect has >>> failed >>> and necessary cleanup has to be done in gadget_bind_driver. >> >> That should still be there. >> >>> >>> But usually since the UDC_store is the one that is causing pullup to be >>> called, the error value is propagated back to UDC_store. If it sees a >>> failure, it does a retry to pullup. >>> >>> I didn't check whether subsequent retries by UDC to pullup are helping >>> clear the reset bit or not. But I thought retrying pullup won't be of >>> any >>> harm. >> >> It's fine to retry. I'm thinking that the controller is in a bad state >> at this point that we can't recover (hopefully that's not the case). >> >>> >>> I now get that my patches are incomplete w.r.t handling the timeout. >>> >>> IIRC one of the following is what you are suggesting we need to do: >>> >>> Option-1: >>> Set a flag when reset times out and clear it upon core_exit / >>> core_init. If >>> the flag is set, block calls to all the gadget_ops in dwc3. Basically >>> even >>> if retry happens from configfs/UDC, we bail out in pullup/udc_start even >>> without trying the requested gadget operation. >>> >>> Option-2: >>> If gadget_connect fails with -ETIMEDOUT in UDC, handle the failure and >>> implement the same flag in UDC and don't even call any gadget_ops. >>> >> > Hi Thinh, > > Thanks for the review. > >> I'm thinking of option-1. For option-2, we can't control if the >> gadget_ops will be called. We only have control how we will respond in >> case they get called again. >> >> But now I'm thinking again, I think it may be ok without adding the >> flag. The UDC core and gadget driver won't do anything else before >> pullup(1) is successful. Calling other gadget_ops should be harmless >> (ie. it won't crash/break the system)? >> > I can give this a try in long run testing (For 7-14 days) and see if > anything else is breaking. > > Most probably we do a composition switch / PIPO in between which can > call usb_gadget_unregister_driver which might invoke a pullup(0) > followed by udc_stop() and like you mentioned must not be a problem. > >> Sorry for the noise, but I think it may be ok without marking the >> controller dead. I wonder if we can confirm my suspiction on retry? I >> believe this is not easy to reproduce on your setup? If not, I think we >> can take your change as is. Hi Thinh, I got this patch tested on two diff Gen-2 targets for around 10 days and no issues were seen. (No SMMU fault seen on a 10 day run). Let me know of any other concerns that might come up with this patch. Else I can rebase it to get merged. Regards, Krishna,
On Tue, Apr 25, 2023, Krishna Kurapati PSSNV wrote: > > > On 4/6/2023 7:44 AM, Krishna Kurapati PSSNV wrote: > > > > > > On 4/6/2023 6:15 AM, Thinh Nguyen wrote: > > > On Wed, Apr 05, 2023, Krishna Kurapati PSSNV wrote: > > > > > > > > > > > > On 4/5/2023 3:13 AM, Thinh Nguyen wrote: > > > > > On Tue, Apr 04, 2023, Krishna Kurapati PSSNV wrote: > > > > > > > > > > > > > > > > > > On 4/4/2023 5:19 AM, Thinh Nguyen wrote: > > > > > > > On Thu, Mar 30, 2023, Krishna Kurapati PSSNV wrote: > > > > > > > > > > > > > > > > > > > > > > > > On 3/30/2023 5:40 AM, Thinh Nguyen wrote: > > > > > > > > > On Wed, Mar 29, 2023, Krishna Kurapati PSSNV wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 3/29/2023 2:50 AM, Thinh Nguyen wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 28, 2023, Krishna Kurapati wrote: > > > > > > > > > > > > If the core soft reset timeout happens, avoid setting up event > > > > > > > > > > > > buffers and starting gadget as the writes to these registers > > > > > > > > > > > > may not reflect when in reset and setting the run stop bit > > > > > > > > > > > > can lead the controller to access wrong event buffer address > > > > > > > > > > > > resulting in a crash. > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> > > > > > > > > > > > > --- > > > > > > > > > > > > drivers/usb/dwc3/gadget.c | 5 ++++- > > > > > > > > > > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > > > > > > > > > > > diff --git > > > > > > > > > > > > a/drivers/usb/dwc3/gadget.c > > > > > > > > > > > > b/drivers/usb/dwc3/gadget.c > > > > > > > > > > > > index 3c63fa97a680..f0472801d9a5 100644 > > > > > > > > > > > > --- a/drivers/usb/dwc3/gadget.c > > > > > > > > > > > > +++ b/drivers/usb/dwc3/gadget.c > > > > > > > > > > > > @@ -2620,13 +2620,16 @@ static > > > > > > > > > > > > int dwc3_gadget_pullup(struct > > > > > > > > > > > > usb_gadget *g, int is_on) > > > > > > > > > > > > * > > > > > > > > > > > > device-initiated disconnect > > > > > > > > > > > > requires a core soft reset > > > > > > > > > > > > * (DCTL.CSftRst) > > > > > > > > > > > > before enabling the run/stop > > > > > > > > > > > > bit. > > > > > > > > > > > > */ > > > > > > > > > > > > - dwc3_core_soft_reset(dwc); > > > > > > > > > > > > + ret = dwc3_core_soft_reset(dwc); > > > > > > > > > > > > + if (ret) > > > > > > > > > > > > + goto done; > > > > > > > > > > > > dwc3_event_buffers_setup(dwc); > > > > > > > > > > > > __dwc3_gadget_start(dwc); > > > > > > > > > > > > ret = dwc3_gadget_run_stop(dwc, true, false); > > > > > > > > > > > > } > > > > > > > > > > > > +done: > > > > > > > > > > > > pm_runtime_put(dwc->dev); > > > > > > > > > > > > return ret; > > > > > > > > > > > > -- > > > > > > > > > > > > 2.40.0 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think there's one more place that > > > > > > > > > > > may needs this check. Can you also > > > > > > > > > > > add this check in __dwc3_set_mode()? > > > > > > > > > > > > > > > > > > > > Hi Thinh, > > > > > > > > > > > > > > > > > > > > Sure. Will do it. > > > > > > > > > > Will the below be good enough ? Or would > > > > > > > > > > it be good to add an error/warn log > > > > > > > > > > there> > > > > > > > > > > > > > > > > > > There's already a warning message in > > > > > > > > > dwc3_core_soft_reset() if it fails. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > kriskura@hu-kriskura-hyd:/local/mnt/workspace/krishna/skales2/skales/kernel$ > > > > > > > > > > git diff drivers/usb/ > > > > > > > > > > diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c > > > > > > > > > > index 476b63618511..8d1d213d1dcd 100644 > > > > > > > > > > --- a/drivers/usb/dwc3/core.c > > > > > > > > > > +++ b/drivers/usb/dwc3/core.c > > > > > > > > > > @@ -210,7 +210,9 @@ static void > > > > > > > > > > __dwc3_set_mode(struct work_struct > > > > > > > > > > *work) > > > > > > > > > > } > > > > > > > > > > break; > > > > > > > > > > case DWC3_GCTL_PRTCAP_DEVICE: > > > > > > > > > > - dwc3_core_soft_reset(dwc); > > > > > > > > > > + ret = dwc3_core_soft_reset(dwc); > > > > > > > > > > + if (ret) > > > > > > > > > > + goto out; > > > > > > > > > > > > > > > > > > > > dwc3_event_buffers_setup(dwc); > > > > > > > > > > > > > > > > > > > > > > > > > > > > If soft-reset failed, the controller is in a > > > > > > > > > bad state. We should not > > > > > > > > > perform any further operation until the next > > > > > > > > > hard reset. We should flag > > > > > > > > > the controller as dead. I don't think we have the equivalent of the > > > > > > > > > host's HCD_FLAG_DEAD. It may require some > > > > > > > > > work in the UDC core. Perhaps > > > > > > > > > we can flag within dwc3 for now and prevent > > > > > > > > > any further operation for a > > > > > > > > > simpler fix. > > > > > > > > > > > > > > > > > Hi Thinh, > > > > > > > > > > > > > > > > Are you referring that if __dwc3_set_mode > > > > > > > > failed with core soft reset > > > > > > > > timing out, the caller i.e., dwc3_set_mode who > > > > > > > > queues the work need to know > > > > > > > > that the operation actually failed. So we can > > > > > > > > add a flag to indicate that > > > > > > > > gadget is dead and the caller of dwc3_set_mode > > > > > > > > can check the flag to see if > > > > > > > > the operation is successful or not. > > > > > > > > > > > > > > > > Or am I misunderstanding your comment ? > > > > > > > > > > > > > > > > > > > > > > Not just in __dwc3_set_mode(). I mean any time dwc3_core_soft_reset > > > > > > > fails, then we set this flag. So that it can prevent the user calling > > > > > > > any gadget ops causing more crashes/invalid behavior. The > > > > > > > dwc->softconnect is already wrong on pullup() on failure. > > > > > > > > > > > > > > So that we can have a check in different gadget ops. For pullup(): > > > > > > > > > > > > > > static int dwc3_gadget_pullup() { > > > > > > > if (dwc->udc_is_dead) { > > > > > > > dev_err(dev, "reset me. x_x \n"); > > > > > > > return; > > > > > > > } > > > > > > > > > > > > > > abc(); > > > > > > > } > > > > > > > > > > > > > > Perhaps the effort is probably the same if we > > > > > > > enhance the UDC core for > > > > > > > this? In any case, I'm fine either way. > > > > > > > > > > > > > > Thanks, > > > > > > > Thinh > > > > > > > > > > > > Hi Thinh, > > > > > > > > > > > > So you don't want UDC to retry pullup if it fails the > > > > > > first time ? As per > > > > > > patch-2 of this series, I was trying to propagate the > > > > > > EITMEDOUT to UDC so > > > > > > that the caller (most probably configfs) can take > > > > > > appropriate action as to > > > > > > whether it must retry pullup or not. > > > > > > > > > > > > > > > > Now I'm confused. If the soft-reset times out, that means that the > > > > > soft-reset (self-clearing) bit isn't cleared. How can we retry if it's > > > > > stuck in this state? My impression is that soft-reset would > > > > > not complete > > > > > at all. Is that not the case for you, or it's simply because we need a > > > > > longer soft-reset timeout? > > > > > > > > > > Thanks, > > > > > Thinh > > > > > > > > Hi Thinh, > > > > > > > > Sorry for not being clear. The intention of patch-1 was to > > > > ensure we don't > > > > start the controller if reset times out and patch-2 was to > > > > ensure that UDC > > > > is in sync with controller by understanding that gadget_connect > > > > has failed > > > > and necessary cleanup has to be done in gadget_bind_driver. > > > > > > That should still be there. > > > > > > > > > > > But usually since the UDC_store is the one that is causing pullup to be > > > > called, the error value is propagated back to UDC_store. If it sees a > > > > failure, it does a retry to pullup. > > > > > > > > I didn't check whether subsequent retries by UDC to pullup are helping > > > > clear the reset bit or not. But I thought retrying pullup won't > > > > be of any > > > > harm. > > > > > > It's fine to retry. I'm thinking that the controller is in a bad state > > > at this point that we can't recover (hopefully that's not the case). > > > > > > > > > > > I now get that my patches are incomplete w.r.t handling the timeout. > > > > > > > > IIRC one of the following is what you are suggesting we need to do: > > > > > > > > Option-1: > > > > Set a flag when reset times out and clear it upon core_exit / > > > > core_init. If > > > > the flag is set, block calls to all the gadget_ops in dwc3. > > > > Basically even > > > > if retry happens from configfs/UDC, we bail out in pullup/udc_start even > > > > without trying the requested gadget operation. > > > > > > > > Option-2: > > > > If gadget_connect fails with -ETIMEDOUT in UDC, handle the failure and > > > > implement the same flag in UDC and don't even call any gadget_ops. > > > > > > > > > Hi Thinh, > > > > Thanks for the review. > > > > > I'm thinking of option-1. For option-2, we can't control if the > > > gadget_ops will be called. We only have control how we will respond in > > > case they get called again. > > > > > > But now I'm thinking again, I think it may be ok without adding the > > > flag. The UDC core and gadget driver won't do anything else before > > > pullup(1) is successful. Calling other gadget_ops should be harmless > > > (ie. it won't crash/break the system)? > > > > > I can give this a try in long run testing (For 7-14 days) and see if > > anything else is breaking. > > > > Most probably we do a composition switch / PIPO in between which can > > call usb_gadget_unregister_driver which might invoke a pullup(0) > > followed by udc_stop() and like you mentioned must not be a problem. > > > > > Sorry for the noise, but I think it may be ok without marking the > > > controller dead. I wonder if we can confirm my suspiction on retry? I > > > believe this is not easy to reproduce on your setup? If not, I think we > > > can take your change as is. > > Hi Thinh, > > I got this patch tested on two diff Gen-2 targets for around 10 days and > no issues were seen. (No SMMU fault seen on a 10 day run). Let me know of > any other concerns that might come up with this patch. Else I can rebase it > to get merged. > > Regards, > Krishna, Thanks for the tests. So you were able to observe the failure and able to recover from it without SMMU fault right? Thanks, Thinh
On 4/26/2023 5:52 AM, Thinh Nguyen wrote: > On Tue, Apr 25, 2023, Krishna Kurapati PSSNV wrote: >> >> >> On 4/6/2023 7:44 AM, Krishna Kurapati PSSNV wrote: >>> >>> >>> On 4/6/2023 6:15 AM, Thinh Nguyen wrote: >>>> On Wed, Apr 05, 2023, Krishna Kurapati PSSNV wrote: >>>>> >>>>> >>>>> On 4/5/2023 3:13 AM, Thinh Nguyen wrote: >>>>>> On Tue, Apr 04, 2023, Krishna Kurapati PSSNV wrote: >>>>>>> >>>>>>> >>>>>>> On 4/4/2023 5:19 AM, Thinh Nguyen wrote: >>>>>>>> On Thu, Mar 30, 2023, Krishna Kurapati PSSNV wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On 3/30/2023 5:40 AM, Thinh Nguyen wrote: >>>>>>>>>> On Wed, Mar 29, 2023, Krishna Kurapati PSSNV wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 3/29/2023 2:50 AM, Thinh Nguyen wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Mar 28, 2023, Krishna Kurapati wrote: >>>>>>>>>>>>> If the core soft reset timeout happens, avoid setting up event >>>>>>>>>>>>> buffers and starting gadget as the writes to these registers >>>>>>>>>>>>> may not reflect when in reset and setting the run stop bit >>>>>>>>>>>>> can lead the controller to access wrong event buffer address >>>>>>>>>>>>> resulting in a crash. >>>>>>>>>>>>> >>>>>>>>>>>>> Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> >>>>>>>>>>>>> --- >>>>>>>>>>>>> drivers/usb/dwc3/gadget.c | 5 ++++- >>>>>>>>>>>>> 1 file changed, 4 insertions(+), 1 deletion(-) >>>>>>>>>>>>> >>>>>>>>>>>>> diff --git >>>>>>>>>>>>> a/drivers/usb/dwc3/gadget.c >>>>>>>>>>>>> b/drivers/usb/dwc3/gadget.c >>>>>>>>>>>>> index 3c63fa97a680..f0472801d9a5 100644 >>>>>>>>>>>>> --- a/drivers/usb/dwc3/gadget.c >>>>>>>>>>>>> +++ b/drivers/usb/dwc3/gadget.c >>>>>>>>>>>>> @@ -2620,13 +2620,16 @@ static >>>>>>>>>>>>> int dwc3_gadget_pullup(struct >>>>>>>>>>>>> usb_gadget *g, int is_on) >>>>>>>>>>>>> * >>>>>>>>>>>>> device-initiated disconnect >>>>>>>>>>>>> requires a core soft reset >>>>>>>>>>>>> * (DCTL.CSftRst) >>>>>>>>>>>>> before enabling the run/stop >>>>>>>>>>>>> bit. >>>>>>>>>>>>> */ >>>>>>>>>>>>> - dwc3_core_soft_reset(dwc); >>>>>>>>>>>>> + ret = dwc3_core_soft_reset(dwc); >>>>>>>>>>>>> + if (ret) >>>>>>>>>>>>> + goto done; >>>>>>>>>>>>> dwc3_event_buffers_setup(dwc); >>>>>>>>>>>>> __dwc3_gadget_start(dwc); >>>>>>>>>>>>> ret = dwc3_gadget_run_stop(dwc, true, false); >>>>>>>>>>>>> } >>>>>>>>>>>>> +done: >>>>>>>>>>>>> pm_runtime_put(dwc->dev); >>>>>>>>>>>>> return ret; >>>>>>>>>>>>> -- >>>>>>>>>>>>> 2.40.0 >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I think there's one more place that >>>>>>>>>>>> may needs this check. Can you also >>>>>>>>>>>> add this check in __dwc3_set_mode()? >>>>>>>>>>> >>>>>>>>>>> Hi Thinh, >>>>>>>>>>> >>>>>>>>>>> Sure. Will do it. >>>>>>>>>>> Will the below be good enough ? Or would >>>>>>>>>>> it be good to add an error/warn log >>>>>>>>>>> there> >>>>>>>>>> >>>>>>>>>> There's already a warning message in >>>>>>>>>> dwc3_core_soft_reset() if it fails. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> kriskura@hu-kriskura-hyd:/local/mnt/workspace/krishna/skales2/skales/kernel$ >>>>>>>>>>> git diff drivers/usb/ >>>>>>>>>>> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c >>>>>>>>>>> index 476b63618511..8d1d213d1dcd 100644 >>>>>>>>>>> --- a/drivers/usb/dwc3/core.c >>>>>>>>>>> +++ b/drivers/usb/dwc3/core.c >>>>>>>>>>> @@ -210,7 +210,9 @@ static void >>>>>>>>>>> __dwc3_set_mode(struct work_struct >>>>>>>>>>> *work) >>>>>>>>>>> } >>>>>>>>>>> break; >>>>>>>>>>> case DWC3_GCTL_PRTCAP_DEVICE: >>>>>>>>>>> - dwc3_core_soft_reset(dwc); >>>>>>>>>>> + ret = dwc3_core_soft_reset(dwc); >>>>>>>>>>> + if (ret) >>>>>>>>>>> + goto out; >>>>>>>>>>> >>>>>>>>>>> dwc3_event_buffers_setup(dwc); >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> If soft-reset failed, the controller is in a >>>>>>>>>> bad state. We should not >>>>>>>>>> perform any further operation until the next >>>>>>>>>> hard reset. We should flag >>>>>>>>>> the controller as dead. I don't think we have the equivalent of the >>>>>>>>>> host's HCD_FLAG_DEAD. It may require some >>>>>>>>>> work in the UDC core. Perhaps >>>>>>>>>> we can flag within dwc3 for now and prevent >>>>>>>>>> any further operation for a >>>>>>>>>> simpler fix. >>>>>>>>>> >>>>>>>>> Hi Thinh, >>>>>>>>> >>>>>>>>> Are you referring that if __dwc3_set_mode >>>>>>>>> failed with core soft reset >>>>>>>>> timing out, the caller i.e., dwc3_set_mode who >>>>>>>>> queues the work need to know >>>>>>>>> that the operation actually failed. So we can >>>>>>>>> add a flag to indicate that >>>>>>>>> gadget is dead and the caller of dwc3_set_mode >>>>>>>>> can check the flag to see if >>>>>>>>> the operation is successful or not. >>>>>>>>> >>>>>>>>> Or am I misunderstanding your comment ? >>>>>>>>> >>>>>>>> >>>>>>>> Not just in __dwc3_set_mode(). I mean any time dwc3_core_soft_reset >>>>>>>> fails, then we set this flag. So that it can prevent the user calling >>>>>>>> any gadget ops causing more crashes/invalid behavior. The >>>>>>>> dwc->softconnect is already wrong on pullup() on failure. >>>>>>>> >>>>>>>> So that we can have a check in different gadget ops. For pullup(): >>>>>>>> >>>>>>>> static int dwc3_gadget_pullup() { >>>>>>>> if (dwc->udc_is_dead) { >>>>>>>> dev_err(dev, "reset me. x_x \n"); >>>>>>>> return; >>>>>>>> } >>>>>>>> >>>>>>>> abc(); >>>>>>>> } >>>>>>>> >>>>>>>> Perhaps the effort is probably the same if we >>>>>>>> enhance the UDC core for >>>>>>>> this? In any case, I'm fine either way. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Thinh >>>>>>> >>>>>>> Hi Thinh, >>>>>>> >>>>>>> So you don't want UDC to retry pullup if it fails the >>>>>>> first time ? As per >>>>>>> patch-2 of this series, I was trying to propagate the >>>>>>> EITMEDOUT to UDC so >>>>>>> that the caller (most probably configfs) can take >>>>>>> appropriate action as to >>>>>>> whether it must retry pullup or not. >>>>>>> >>>>>> >>>>>> Now I'm confused. If the soft-reset times out, that means that the >>>>>> soft-reset (self-clearing) bit isn't cleared. How can we retry if it's >>>>>> stuck in this state? My impression is that soft-reset would >>>>>> not complete >>>>>> at all. Is that not the case for you, or it's simply because we need a >>>>>> longer soft-reset timeout? >>>>>> >>>>>> Thanks, >>>>>> Thinh >>>>> >>>>> Hi Thinh, >>>>> >>>>> Sorry for not being clear. The intention of patch-1 was to >>>>> ensure we don't >>>>> start the controller if reset times out and patch-2 was to >>>>> ensure that UDC >>>>> is in sync with controller by understanding that gadget_connect >>>>> has failed >>>>> and necessary cleanup has to be done in gadget_bind_driver. >>>> >>>> That should still be there. >>>> >>>>> >>>>> But usually since the UDC_store is the one that is causing pullup to be >>>>> called, the error value is propagated back to UDC_store. If it sees a >>>>> failure, it does a retry to pullup. >>>>> >>>>> I didn't check whether subsequent retries by UDC to pullup are helping >>>>> clear the reset bit or not. But I thought retrying pullup won't >>>>> be of any >>>>> harm. >>>> >>>> It's fine to retry. I'm thinking that the controller is in a bad state >>>> at this point that we can't recover (hopefully that's not the case). >>>> >>>>> >>>>> I now get that my patches are incomplete w.r.t handling the timeout. >>>>> >>>>> IIRC one of the following is what you are suggesting we need to do: >>>>> >>>>> Option-1: >>>>> Set a flag when reset times out and clear it upon core_exit / >>>>> core_init. If >>>>> the flag is set, block calls to all the gadget_ops in dwc3. >>>>> Basically even >>>>> if retry happens from configfs/UDC, we bail out in pullup/udc_start even >>>>> without trying the requested gadget operation. >>>>> >>>>> Option-2: >>>>> If gadget_connect fails with -ETIMEDOUT in UDC, handle the failure and >>>>> implement the same flag in UDC and don't even call any gadget_ops. >>>>> >>>> >>> Hi Thinh, >>> >>> Thanks for the review. >>> >>>> I'm thinking of option-1. For option-2, we can't control if the >>>> gadget_ops will be called. We only have control how we will respond in >>>> case they get called again. >>>> >>>> But now I'm thinking again, I think it may be ok without adding the >>>> flag. The UDC core and gadget driver won't do anything else before >>>> pullup(1) is successful. Calling other gadget_ops should be harmless >>>> (ie. it won't crash/break the system)? >>>> >>> I can give this a try in long run testing (For 7-14 days) and see if >>> anything else is breaking. >>> >>> Most probably we do a composition switch / PIPO in between which can >>> call usb_gadget_unregister_driver which might invoke a pullup(0) >>> followed by udc_stop() and like you mentioned must not be a problem. >>> >>>> Sorry for the noise, but I think it may be ok without marking the >>>> controller dead. I wonder if we can confirm my suspiction on retry? I >>>> believe this is not easy to reproduce on your setup? If not, I think we >>>> can take your change as is. >> >> Hi Thinh, >> >> I got this patch tested on two diff Gen-2 targets for around 10 days and >> no issues were seen. (No SMMU fault seen on a 10 day run). Let me know of >> any other concerns that might come up with this patch. Else I can rebase it >> to get merged. >> >> Regards, >> Krishna, > > Thanks for the tests. So you were able to observe the failure and able > to recover from it without SMMU fault right? > Hi Thinh, Ideally if the issue didn't come up for more than a week straight, I believe that we are able to recover from the issue. Regards, Krishna,
On Wed, Apr 26, 2023, Krishna Kurapati PSSNV wrote: > > > On 4/26/2023 5:52 AM, Thinh Nguyen wrote: > > On Tue, Apr 25, 2023, Krishna Kurapati PSSNV wrote: > > > > > > > > > On 4/6/2023 7:44 AM, Krishna Kurapati PSSNV wrote: > > > > > > > > > > > > On 4/6/2023 6:15 AM, Thinh Nguyen wrote: > > > > > On Wed, Apr 05, 2023, Krishna Kurapati PSSNV wrote: > > > > > > > > > > > > > > > > > > On 4/5/2023 3:13 AM, Thinh Nguyen wrote: > > > > > > > On Tue, Apr 04, 2023, Krishna Kurapati PSSNV wrote: > > > > > > > > > > > > > > > > > > > > > > > > On 4/4/2023 5:19 AM, Thinh Nguyen wrote: > > > > > > > > > On Thu, Mar 30, 2023, Krishna Kurapati PSSNV wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 3/30/2023 5:40 AM, Thinh Nguyen wrote: > > > > > > > > > > > On Wed, Mar 29, 2023, Krishna Kurapati PSSNV wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 3/29/2023 2:50 AM, Thinh Nguyen wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 28, 2023, Krishna Kurapati wrote: > > > > > > > > > > > > > > If the core soft reset timeout happens, avoid setting up event > > > > > > > > > > > > > > buffers and starting gadget as the writes to these registers > > > > > > > > > > > > > > may not reflect when in reset and setting the run stop bit > > > > > > > > > > > > > > can lead the controller to access wrong event buffer address > > > > > > > > > > > > > > resulting in a crash. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > drivers/usb/dwc3/gadget.c | 5 ++++- > > > > > > > > > > > > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > > > > > > > > > > > > > > > diff --git > > > > > > > > > > > > > > a/drivers/usb/dwc3/gadget.c > > > > > > > > > > > > > > b/drivers/usb/dwc3/gadget.c > > > > > > > > > > > > > > index 3c63fa97a680..f0472801d9a5 100644 > > > > > > > > > > > > > > --- a/drivers/usb/dwc3/gadget.c > > > > > > > > > > > > > > +++ b/drivers/usb/dwc3/gadget.c > > > > > > > > > > > > > > @@ -2620,13 +2620,16 @@ static > > > > > > > > > > > > > > int dwc3_gadget_pullup(struct > > > > > > > > > > > > > > usb_gadget *g, int is_on) > > > > > > > > > > > > > > * > > > > > > > > > > > > > > device-initiated disconnect > > > > > > > > > > > > > > requires a core soft reset > > > > > > > > > > > > > > * (DCTL.CSftRst) > > > > > > > > > > > > > > before enabling the run/stop > > > > > > > > > > > > > > bit. > > > > > > > > > > > > > > */ > > > > > > > > > > > > > > - dwc3_core_soft_reset(dwc); > > > > > > > > > > > > > > + ret = dwc3_core_soft_reset(dwc); > > > > > > > > > > > > > > + if (ret) > > > > > > > > > > > > > > + goto done; > > > > > > > > > > > > > > dwc3_event_buffers_setup(dwc); > > > > > > > > > > > > > > __dwc3_gadget_start(dwc); > > > > > > > > > > > > > > ret = dwc3_gadget_run_stop(dwc, true, false); > > > > > > > > > > > > > > } > > > > > > > > > > > > > > +done: > > > > > > > > > > > > > > pm_runtime_put(dwc->dev); > > > > > > > > > > > > > > return ret; > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > 2.40.0 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think there's one more place that > > > > > > > > > > > > > may needs this check. Can you also > > > > > > > > > > > > > add this check in __dwc3_set_mode()? > > > > > > > > > > > > > > > > > > > > > > > > Hi Thinh, > > > > > > > > > > > > > > > > > > > > > > > > Sure. Will do it. > > > > > > > > > > > > Will the below be good enough ? Or would > > > > > > > > > > > > it be good to add an error/warn log > > > > > > > > > > > > there> > > > > > > > > > > > > > > > > > > > > > > There's already a warning message in > > > > > > > > > > > dwc3_core_soft_reset() if it fails. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > kriskura@hu-kriskura-hyd:/local/mnt/workspace/krishna/skales2/skales/kernel$ > > > > > > > > > > > > git diff drivers/usb/ > > > > > > > > > > > > diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c > > > > > > > > > > > > index 476b63618511..8d1d213d1dcd 100644 > > > > > > > > > > > > --- a/drivers/usb/dwc3/core.c > > > > > > > > > > > > +++ b/drivers/usb/dwc3/core.c > > > > > > > > > > > > @@ -210,7 +210,9 @@ static void > > > > > > > > > > > > __dwc3_set_mode(struct work_struct > > > > > > > > > > > > *work) > > > > > > > > > > > > } > > > > > > > > > > > > break; > > > > > > > > > > > > case DWC3_GCTL_PRTCAP_DEVICE: > > > > > > > > > > > > - dwc3_core_soft_reset(dwc); > > > > > > > > > > > > + ret = dwc3_core_soft_reset(dwc); > > > > > > > > > > > > + if (ret) > > > > > > > > > > > > + goto out; > > > > > > > > > > > > > > > > > > > > > > > > dwc3_event_buffers_setup(dwc); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If soft-reset failed, the controller is in a > > > > > > > > > > > bad state. We should not > > > > > > > > > > > perform any further operation until the next > > > > > > > > > > > hard reset. We should flag > > > > > > > > > > > the controller as dead. I don't think we have the equivalent of the > > > > > > > > > > > host's HCD_FLAG_DEAD. It may require some > > > > > > > > > > > work in the UDC core. Perhaps > > > > > > > > > > > we can flag within dwc3 for now and prevent > > > > > > > > > > > any further operation for a > > > > > > > > > > > simpler fix. > > > > > > > > > > > > > > > > > > > > > Hi Thinh, > > > > > > > > > > > > > > > > > > > > Are you referring that if __dwc3_set_mode > > > > > > > > > > failed with core soft reset > > > > > > > > > > timing out, the caller i.e., dwc3_set_mode who > > > > > > > > > > queues the work need to know > > > > > > > > > > that the operation actually failed. So we can > > > > > > > > > > add a flag to indicate that > > > > > > > > > > gadget is dead and the caller of dwc3_set_mode > > > > > > > > > > can check the flag to see if > > > > > > > > > > the operation is successful or not. > > > > > > > > > > > > > > > > > > > > Or am I misunderstanding your comment ? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Not just in __dwc3_set_mode(). I mean any time dwc3_core_soft_reset > > > > > > > > > fails, then we set this flag. So that it can prevent the user calling > > > > > > > > > any gadget ops causing more crashes/invalid behavior. The > > > > > > > > > dwc->softconnect is already wrong on pullup() on failure. > > > > > > > > > > > > > > > > > > So that we can have a check in different gadget ops. For pullup(): > > > > > > > > > > > > > > > > > > static int dwc3_gadget_pullup() { > > > > > > > > > if (dwc->udc_is_dead) { > > > > > > > > > dev_err(dev, "reset me. x_x \n"); > > > > > > > > > return; > > > > > > > > > } > > > > > > > > > > > > > > > > > > abc(); > > > > > > > > > } > > > > > > > > > > > > > > > > > > Perhaps the effort is probably the same if we > > > > > > > > > enhance the UDC core for > > > > > > > > > this? In any case, I'm fine either way. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Thinh > > > > > > > > > > > > > > > > Hi Thinh, > > > > > > > > > > > > > > > > So you don't want UDC to retry pullup if it fails the > > > > > > > > first time ? As per > > > > > > > > patch-2 of this series, I was trying to propagate the > > > > > > > > EITMEDOUT to UDC so > > > > > > > > that the caller (most probably configfs) can take > > > > > > > > appropriate action as to > > > > > > > > whether it must retry pullup or not. > > > > > > > > > > > > > > > > > > > > > > Now I'm confused. If the soft-reset times out, that means that the > > > > > > > soft-reset (self-clearing) bit isn't cleared. How can we retry if it's > > > > > > > stuck in this state? My impression is that soft-reset would > > > > > > > not complete > > > > > > > at all. Is that not the case for you, or it's simply because we need a > > > > > > > longer soft-reset timeout? > > > > > > > > > > > > > > Thanks, > > > > > > > Thinh > > > > > > > > > > > > Hi Thinh, > > > > > > > > > > > > Sorry for not being clear. The intention of patch-1 was to > > > > > > ensure we don't > > > > > > start the controller if reset times out and patch-2 was to > > > > > > ensure that UDC > > > > > > is in sync with controller by understanding that gadget_connect > > > > > > has failed > > > > > > and necessary cleanup has to be done in gadget_bind_driver. > > > > > > > > > > That should still be there. > > > > > > > > > > > > > > > > > But usually since the UDC_store is the one that is causing pullup to be > > > > > > called, the error value is propagated back to UDC_store. If it sees a > > > > > > failure, it does a retry to pullup. > > > > > > > > > > > > I didn't check whether subsequent retries by UDC to pullup are helping > > > > > > clear the reset bit or not. But I thought retrying pullup won't > > > > > > be of any > > > > > > harm. > > > > > > > > > > It's fine to retry. I'm thinking that the controller is in a bad state > > > > > at this point that we can't recover (hopefully that's not the case). > > > > > > > > > > > > > > > > > I now get that my patches are incomplete w.r.t handling the timeout. > > > > > > > > > > > > IIRC one of the following is what you are suggesting we need to do: > > > > > > > > > > > > Option-1: > > > > > > Set a flag when reset times out and clear it upon core_exit / > > > > > > core_init. If > > > > > > the flag is set, block calls to all the gadget_ops in dwc3. > > > > > > Basically even > > > > > > if retry happens from configfs/UDC, we bail out in pullup/udc_start even > > > > > > without trying the requested gadget operation. > > > > > > > > > > > > Option-2: > > > > > > If gadget_connect fails with -ETIMEDOUT in UDC, handle the failure and > > > > > > implement the same flag in UDC and don't even call any gadget_ops. > > > > > > > > > > > > > > > Hi Thinh, > > > > > > > > Thanks for the review. > > > > > > > > > I'm thinking of option-1. For option-2, we can't control if the > > > > > gadget_ops will be called. We only have control how we will respond in > > > > > case they get called again. > > > > > > > > > > But now I'm thinking again, I think it may be ok without adding the > > > > > flag. The UDC core and gadget driver won't do anything else before > > > > > pullup(1) is successful. Calling other gadget_ops should be harmless > > > > > (ie. it won't crash/break the system)? > > > > > > > > > I can give this a try in long run testing (For 7-14 days) and see if > > > > anything else is breaking. > > > > > > > > Most probably we do a composition switch / PIPO in between which can > > > > call usb_gadget_unregister_driver which might invoke a pullup(0) > > > > followed by udc_stop() and like you mentioned must not be a problem. > > > > > > > > > Sorry for the noise, but I think it may be ok without marking the > > > > > controller dead. I wonder if we can confirm my suspiction on retry? I > > > > > believe this is not easy to reproduce on your setup? If not, I think we > > > > > can take your change as is. > > > > > > Hi Thinh, > > > > > > I got this patch tested on two diff Gen-2 targets for around 10 days and > > > no issues were seen. (No SMMU fault seen on a 10 day run). Let me know of > > > any other concerns that might come up with this patch. Else I can rebase it > > > to get merged. > > > > > > Regards, > > > Krishna, > > > > Thanks for the tests. So you were able to observe the failure and able > > to recover from it without SMMU fault right? > > > Hi Thinh, > > Ideally if the issue didn't come up for more than a week straight, I > believe that we are able to recover from the issue. > > Regards, > Krishna, Hi Krishna, I wanted to confirm whether the controller is dead and whether any attempt to access the controller at this point can be fatal to the system (such as SMMU fault). I believe your patches prevents SMMU fault on first failure. I'm not sure what will happen after if we attempt a retry. Regardless, your changes are good for now. If we need to make further enhancement in the future, we'll do that later. Many thanks for the tests, Thinh
On Tue, Mar 28, 2023, Krishna Kurapati wrote: > If the core soft reset timeout happens, avoid setting up event > buffers and starting gadget as the writes to these registers > may not reflect when in reset and setting the run stop bit > can lead the controller to access wrong event buffer address > resulting in a crash. > > Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> > --- > drivers/usb/dwc3/gadget.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > index 3c63fa97a680..f0472801d9a5 100644 > --- a/drivers/usb/dwc3/gadget.c > +++ b/drivers/usb/dwc3/gadget.c > @@ -2620,13 +2620,16 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on) > * device-initiated disconnect requires a core soft reset > * (DCTL.CSftRst) before enabling the run/stop bit. > */ > - dwc3_core_soft_reset(dwc); > + ret = dwc3_core_soft_reset(dwc); > + if (ret) > + goto done; > > dwc3_event_buffers_setup(dwc); > __dwc3_gadget_start(dwc); > ret = dwc3_gadget_run_stop(dwc, true, false); > } > > +done: > pm_runtime_put(dwc->dev); > > return ret; > -- > 2.40.0 > Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Thanks, Thinh
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 3c63fa97a680..f0472801d9a5 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -2620,13 +2620,16 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on) * device-initiated disconnect requires a core soft reset * (DCTL.CSftRst) before enabling the run/stop bit. */ - dwc3_core_soft_reset(dwc); + ret = dwc3_core_soft_reset(dwc); + if (ret) + goto done; dwc3_event_buffers_setup(dwc); __dwc3_gadget_start(dwc); ret = dwc3_gadget_run_stop(dwc, true, false); } +done: pm_runtime_put(dwc->dev); return ret;