Message ID | 1699939661-7385-3-git-send-email-quic_qianyu@quicinc.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b909:0:b0:403:3b70:6f57 with SMTP id t9csp1659518vqg; Mon, 13 Nov 2023 21:28:01 -0800 (PST) X-Google-Smtp-Source: AGHT+IEC1lYQJqA6WrvCPuGLc23lVGm1ZIsA3sjJBc0OAPYBjRoGQoZUR/VnycP1XxRqW4DvhFW5 X-Received: by 2002:a17:902:ce86:b0:1cc:13d0:d515 with SMTP id f6-20020a170902ce8600b001cc13d0d515mr1536218plg.20.1699939681070; Mon, 13 Nov 2023 21:28:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699939681; cv=none; d=google.com; s=arc-20160816; b=WyI9cpouzzG6knY4b8k0qQjlhNHJdFQznt/t5FWQAOQZOfhE/GyrMj3Zx7hVHJcEDh +yf5MMSOPXAmYIndNvUNObTvyaQ3EfUPGzyW+Jem2WHC0PDk+dL1KY3HDXo0wfpuQjKx w9cKM3D/SASTJSsOFhZ0Jw/yXSSnlHBI3FN+hEqzVNsLkSwUCPRS6gkpsMK3Gn1xabwI M96RLyt5XrNhP9OEwLG1ap8QwZEMbLrdr8LBb/RuFtkVKfR7+GpQaPjDlRx3bOLxqPg5 tbubaNoXikBjfFsX5kksrTnOM+XiBtAOfc36txjmwoWN9bwKKqZpd9Oav0qYXwnoq2pi /W4g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=gjSpJlCj1oF1CZ9LtqHzM/lbY5LGb8Li7c87y0f6jLs=; fh=f4KQnOf4DL3FZ4rINpxCCB681RH9plJXTbAzdkN1cKE=; b=1D72aE5jW4iPJ2jLXJbOsJ9WNO4Dtjcc/hPQnqQrHGanLB9naHT+uUtfnXl4vSVGRv 9s8yEAFNdaWY+Atx8JqBGGclDw+vgelrpUhYJDvUvPXGhJDNkjSkVYiZAWDxyVdelvR3 M8zImIsvLruMxf393/pgLXeu4KXy6FeIjbKf15U+clP/unD6P42S48xxYs5DVPpqzBx+ MLTJ0dANpGn2Bm1Yw8JGKSOIEumhb+N9WxlLVtiFcsYQ+Dmrfc+ZYDkT1s3bnya7qCkx wSiyFpHVmWj+LS1jGRxaOqScvuqhvHszoYtAk0ZpA9fP7twEvYc+3/ATM+Ya36wP5rgf FWyQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@quicinc.com header.s=qcppdkim1 header.b=XJr6xibi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=quicinc.com Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id b3-20020a170902d30300b001c9c9251e05si6917833plc.476.2023.11.13.21.28.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Nov 2023 21:28:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@quicinc.com header.s=qcppdkim1 header.b=XJr6xibi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=quicinc.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 355CE801F763; Mon, 13 Nov 2023 21:28:00 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232149AbjKNF14 (ORCPT <rfc822;lhua1029@gmail.com> + 30 others); Tue, 14 Nov 2023 00:27:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49396 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232132AbjKNF1y (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 14 Nov 2023 00:27:54 -0500 Received: from mx0a-0031df01.pphosted.com (mx0a-0031df01.pphosted.com [205.220.168.131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6CC59D43; Mon, 13 Nov 2023 21:27:51 -0800 (PST) Received: from pps.filterd (m0279867.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3AE3m1iD030862; Tue, 14 Nov 2023 05:27:46 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=qcppdkim1; bh=gjSpJlCj1oF1CZ9LtqHzM/lbY5LGb8Li7c87y0f6jLs=; b=XJr6xibiL98Okx2TNRQ1ePC2HWqyw8LYOkgLITbS9ss2PZotikAEePOmG4JtuX/wHbYU SirDNn/Fgg/5U7FbHtuAisVOwkgn4FT6GfeT8X13z2FF4Xs0QRZtnZoYl85buVe5BsFs Tiy+zgEVvynweCGj3rlN9r0bBzDiope04gTKfv8Bv0TJL33dxTKVQUFz8Dc51M+8AtH7 ImVfJYQbvRx/AuKVw6i08+Con6tbircquc3JGH0mlyVOXfE5spaTAMRiUeTPW6xcuTmS mkrT7MlY0Z3C0by1jTl7BR+cJbJSKxXbAoyDudA5Kbz9Auu7dqu6OwJtpyyvRRUBaJG5 xQ== Received: from aptaippmta01.qualcomm.com (tpe-colo-wan-fw-bordernet.qualcomm.com [103.229.16.4]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3ubw6hgh9g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 14 Nov 2023 05:27:45 +0000 Received: from pps.filterd (APTAIPPMTA01.qualcomm.com [127.0.0.1]) by APTAIPPMTA01.qualcomm.com (8.17.1.5/8.17.1.5) with ESMTP id 3AE5RhfP008882; Tue, 14 Nov 2023 05:27:43 GMT Received: from pps.reinject (localhost [127.0.0.1]) by APTAIPPMTA01.qualcomm.com (PPS) with ESMTP id 3ua2pkg46b-1; Tue, 14 Nov 2023 05:27:43 +0000 Received: from APTAIPPMTA01.qualcomm.com (APTAIPPMTA01.qualcomm.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3AE5RhqW008870; Tue, 14 Nov 2023 05:27:43 GMT Received: from cbsp-sh-gv.qualcomm.com (CBSP-SH-gv.ap.qualcomm.com [10.231.249.68]) by APTAIPPMTA01.qualcomm.com (PPS) with ESMTP id 3AE5RghX008867; Tue, 14 Nov 2023 05:27:43 +0000 Received: by cbsp-sh-gv.qualcomm.com (Postfix, from userid 4098150) id 7B7FB549E; Tue, 14 Nov 2023 13:27:42 +0800 (CST) From: Qiang Yu <quic_qianyu@quicinc.com> To: mani@kernel.org, quic_jhugo@quicinc.com Cc: mhi@lists.linux.dev, linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org, quic_cang@quicinc.com, quic_mrana@quicinc.com, Qiang Yu <quic_qianyu@quicinc.com> Subject: [PATCH v4 2/4] bus: mhi: host: Drop chan lock before queuing buffers Date: Tue, 14 Nov 2023 13:27:39 +0800 Message-Id: <1699939661-7385-3-git-send-email-quic_qianyu@quicinc.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1699939661-7385-1-git-send-email-quic_qianyu@quicinc.com> References: <1699939661-7385-1-git-send-email-quic_qianyu@quicinc.com> X-QCInternal: smtphost X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-ORIG-GUID: phkTVK4-io4NL16x9IpI8k0sr56GKQ96 X-Proofpoint-GUID: phkTVK4-io4NL16x9IpI8k0sr56GKQ96 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-14_04,2023-11-09_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 adultscore=0 mlxlogscore=597 priorityscore=1501 phishscore=0 malwarescore=0 impostorscore=0 lowpriorityscore=0 mlxscore=0 spamscore=0 suspectscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311140040 X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Mon, 13 Nov 2023 21:28:00 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1782515950767010330 X-GMAIL-MSGID: 1782515950767010330 |
Series |
bus: mhi: host: Add lock to avoid race when ringing channel DB
|
|
Commit Message
Qiang Yu
Nov. 14, 2023, 5:27 a.m. UTC
Ensure read and write locks for the channel are not taken in succession by
dropping the read lock from parse_xfer_event() such that a callback given
to client can potentially queue buffers and acquire the write lock in that
process. Any queueing of buffers should be done without channel read lock
acquired as it can result in multiple locks and a soft lockup.
Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com>
---
drivers/bus/mhi/host/main.c | 4 ++++
1 file changed, 4 insertions(+)
Comments
On Tue, Nov 14, 2023 at 01:27:39PM +0800, Qiang Yu wrote: > Ensure read and write locks for the channel are not taken in succession by > dropping the read lock from parse_xfer_event() such that a callback given > to client can potentially queue buffers and acquire the write lock in that > process. Any queueing of buffers should be done without channel read lock > acquired as it can result in multiple locks and a soft lockup. > Is this patch trying to fix an existing issue in client drivers or a potential issue in the future drivers? Even if you take care of disabled channels, "mhi_event->lock" acquired during mhi_mark_stale_events() can cause deadlock, since event lock is already held by mhi_ev_task(). I'd prefer not to open the window unless this patch is fixing a real issue. - Mani > Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com> > --- > drivers/bus/mhi/host/main.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c > index 6c6d253..c4215b0 100644 > --- a/drivers/bus/mhi/host/main.c > +++ b/drivers/bus/mhi/host/main.c > @@ -642,6 +642,8 @@ static int parse_xfer_event(struct mhi_controller *mhi_cntrl, > mhi_del_ring_element(mhi_cntrl, tre_ring); > local_rp = tre_ring->rp; > > + read_unlock_bh(&mhi_chan->lock); > + > /* notify client */ > mhi_chan->xfer_cb(mhi_chan->mhi_dev, &result); > > @@ -667,6 +669,8 @@ static int parse_xfer_event(struct mhi_controller *mhi_cntrl, > kfree(buf_info->cb_buf); > } > } > + > + read_lock_bh(&mhi_chan->lock); > } > break; > } /* CC_EOT */ > -- > 2.7.4 > >
On 11/24/2023 6:04 PM, Manivannan Sadhasivam wrote: > On Tue, Nov 14, 2023 at 01:27:39PM +0800, Qiang Yu wrote: >> Ensure read and write locks for the channel are not taken in succession by >> dropping the read lock from parse_xfer_event() such that a callback given >> to client can potentially queue buffers and acquire the write lock in that >> process. Any queueing of buffers should be done without channel read lock >> acquired as it can result in multiple locks and a soft lockup. >> > Is this patch trying to fix an existing issue in client drivers or a potential > issue in the future drivers? > > Even if you take care of disabled channels, "mhi_event->lock" acquired during > mhi_mark_stale_events() can cause deadlock, since event lock is already held by > mhi_ev_task(). > > I'd prefer not to open the window unless this patch is fixing a real issue. > > - Mani In [PATCH v4 1/4] bus: mhi: host: Add spinlock to protect WP access when queueing TREs, we add write_lock_bh(&mhi_chan->lock)/write_unlock_bh(&mhi_chan->lock) in mhi_gen_tre, which may be invoked as part of mhi_queue in client xfer callback, so we have to use read_unlock_bh(&mhi_chan->lock) here to avoid acquiring mhi_chan->lock twice. Sorry for confusing you. Do you think we need to sqush this two patch into one? > >> Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com> >> --- >> drivers/bus/mhi/host/main.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c >> index 6c6d253..c4215b0 100644 >> --- a/drivers/bus/mhi/host/main.c >> +++ b/drivers/bus/mhi/host/main.c >> @@ -642,6 +642,8 @@ static int parse_xfer_event(struct mhi_controller *mhi_cntrl, >> mhi_del_ring_element(mhi_cntrl, tre_ring); >> local_rp = tre_ring->rp; >> >> + read_unlock_bh(&mhi_chan->lock); >> + >> /* notify client */ >> mhi_chan->xfer_cb(mhi_chan->mhi_dev, &result); >> >> @@ -667,6 +669,8 @@ static int parse_xfer_event(struct mhi_controller *mhi_cntrl, >> kfree(buf_info->cb_buf); >> } >> } >> + >> + read_lock_bh(&mhi_chan->lock); >> } >> break; >> } /* CC_EOT */ >> -- >> 2.7.4 >> >>
On 11/24/2023 6:04 PM, Manivannan Sadhasivam wrote: > On Tue, Nov 14, 2023 at 01:27:39PM +0800, Qiang Yu wrote: >> Ensure read and write locks for the channel are not taken in succession by >> dropping the read lock from parse_xfer_event() such that a callback given >> to client can potentially queue buffers and acquire the write lock in that >> process. Any queueing of buffers should be done without channel read lock >> acquired as it can result in multiple locks and a soft lockup. >> > Is this patch trying to fix an existing issue in client drivers or a potential > issue in the future drivers? > > Even if you take care of disabled channels, "mhi_event->lock" acquired during > mhi_mark_stale_events() can cause deadlock, since event lock is already held by > mhi_ev_task(). > > I'd prefer not to open the window unless this patch is fixing a real issue. > > - Mani In [PATCH v4 1/4] bus: mhi: host: Add spinlock to protect WP access when queueing TREs, we add write_lock_bh(&mhi_chan->lock)/write_unlock_bh(&mhi_chan->lock) in mhi_gen_tre, which may be invoked as part of mhi_queue in client xfer callback, so we have to use read_unlock_bh(&mhi_chan->lock) here to avoid acquiring mhi_chan->lock twice. Sorry for confusing you. Do you think we need to sqush this two patch into one? >> Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com> >> --- >> drivers/bus/mhi/host/main.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c >> index 6c6d253..c4215b0 100644 >> --- a/drivers/bus/mhi/host/main.c >> +++ b/drivers/bus/mhi/host/main.c >> @@ -642,6 +642,8 @@ static int parse_xfer_event(struct mhi_controller *mhi_cntrl, >> mhi_del_ring_element(mhi_cntrl, tre_ring); >> local_rp = tre_ring->rp; >> >> + read_unlock_bh(&mhi_chan->lock); >> + >> /* notify client */ >> mhi_chan->xfer_cb(mhi_chan->mhi_dev, &result); >> >> @@ -667,6 +669,8 @@ static int parse_xfer_event(struct mhi_controller *mhi_cntrl, >> kfree(buf_info->cb_buf); >> } >> } >> + >> + read_lock_bh(&mhi_chan->lock); >> } >> break; >> } /* CC_EOT */ >> -- >> 2.7.4 >> >>
On Mon, Nov 27, 2023 at 03:13:55PM +0800, Qiang Yu wrote: > > On 11/24/2023 6:04 PM, Manivannan Sadhasivam wrote: > > On Tue, Nov 14, 2023 at 01:27:39PM +0800, Qiang Yu wrote: > > > Ensure read and write locks for the channel are not taken in succession by > > > dropping the read lock from parse_xfer_event() such that a callback given > > > to client can potentially queue buffers and acquire the write lock in that > > > process. Any queueing of buffers should be done without channel read lock > > > acquired as it can result in multiple locks and a soft lockup. > > > > > Is this patch trying to fix an existing issue in client drivers or a potential > > issue in the future drivers? > > > > Even if you take care of disabled channels, "mhi_event->lock" acquired during > > mhi_mark_stale_events() can cause deadlock, since event lock is already held by > > mhi_ev_task(). > > > > I'd prefer not to open the window unless this patch is fixing a real issue. > > > > - Mani > In [PATCH v4 1/4] bus: mhi: host: Add spinlock to protect WP access when > queueing > TREs, we add > write_lock_bh(&mhi_chan->lock)/write_unlock_bh(&mhi_chan->lock) > in mhi_gen_tre, which may be invoked as part of mhi_queue in client xfer > callback, > so we have to use read_unlock_bh(&mhi_chan->lock) here to avoid acquiring > mhi_chan->lock > twice. > > Sorry for confusing you. Do you think we need to sqush this two patch into > one? Well, if patch 1 is introducing a potential deadlock, then we should fix patch 1 itself and not introduce a follow up patch. But there is one more issue that I pointed out in my previous reply. Also, I'm planning to cleanup the locking mess within MHI in the coming days. Perhaps we can revisit this series at that point of time. Will that be OK for you? - Mani > > > Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com> > > > --- > > > drivers/bus/mhi/host/main.c | 4 ++++ > > > 1 file changed, 4 insertions(+) > > > > > > diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c > > > index 6c6d253..c4215b0 100644 > > > --- a/drivers/bus/mhi/host/main.c > > > +++ b/drivers/bus/mhi/host/main.c > > > @@ -642,6 +642,8 @@ static int parse_xfer_event(struct mhi_controller *mhi_cntrl, > > > mhi_del_ring_element(mhi_cntrl, tre_ring); > > > local_rp = tre_ring->rp; > > > + read_unlock_bh(&mhi_chan->lock); > > > + > > > /* notify client */ > > > mhi_chan->xfer_cb(mhi_chan->mhi_dev, &result); > > > @@ -667,6 +669,8 @@ static int parse_xfer_event(struct mhi_controller *mhi_cntrl, > > > kfree(buf_info->cb_buf); > > > } > > > } > > > + > > > + read_lock_bh(&mhi_chan->lock); > > > } > > > break; > > > } /* CC_EOT */ > > > -- > > > 2.7.4 > > > > > > >
On 11/28/2023 9:32 PM, Manivannan Sadhasivam wrote: > On Mon, Nov 27, 2023 at 03:13:55PM +0800, Qiang Yu wrote: >> On 11/24/2023 6:04 PM, Manivannan Sadhasivam wrote: >>> On Tue, Nov 14, 2023 at 01:27:39PM +0800, Qiang Yu wrote: >>>> Ensure read and write locks for the channel are not taken in succession by >>>> dropping the read lock from parse_xfer_event() such that a callback given >>>> to client can potentially queue buffers and acquire the write lock in that >>>> process. Any queueing of buffers should be done without channel read lock >>>> acquired as it can result in multiple locks and a soft lockup. >>>> >>> Is this patch trying to fix an existing issue in client drivers or a potential >>> issue in the future drivers? >>> >>> Even if you take care of disabled channels, "mhi_event->lock" acquired during >>> mhi_mark_stale_events() can cause deadlock, since event lock is already held by >>> mhi_ev_task(). >>> >>> I'd prefer not to open the window unless this patch is fixing a real issue. >>> >>> - Mani >> In [PATCH v4 1/4] bus: mhi: host: Add spinlock to protect WP access when >> queueing >> TREs, we add >> write_lock_bh(&mhi_chan->lock)/write_unlock_bh(&mhi_chan->lock) >> in mhi_gen_tre, which may be invoked as part of mhi_queue in client xfer >> callback, >> so we have to use read_unlock_bh(&mhi_chan->lock) here to avoid acquiring >> mhi_chan->lock >> twice. >> >> Sorry for confusing you. Do you think we need to sqush this two patch into >> one? > Well, if patch 1 is introducing a potential deadlock, then we should fix patch > 1 itself and not introduce a follow up patch. > > But there is one more issue that I pointed out in my previous reply. Sorry, I can not understand why "mhi_event->lock" acquired during mhi_mark_stale_events() can cause deadlock. In mhi_ev_task(), we will not invoke mhi_mark_stale_events(). Can you provide some interpretation? > > Also, I'm planning to cleanup the locking mess within MHI in the coming days. > Perhaps we can revisit this series at that point of time. Will that be OK for > you? Sure, that will be great. > > - Mani > >>>> Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com> >>>> --- >>>> drivers/bus/mhi/host/main.c | 4 ++++ >>>> 1 file changed, 4 insertions(+) >>>> >>>> diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c >>>> index 6c6d253..c4215b0 100644 >>>> --- a/drivers/bus/mhi/host/main.c >>>> +++ b/drivers/bus/mhi/host/main.c >>>> @@ -642,6 +642,8 @@ static int parse_xfer_event(struct mhi_controller *mhi_cntrl, >>>> mhi_del_ring_element(mhi_cntrl, tre_ring); >>>> local_rp = tre_ring->rp; >>>> + read_unlock_bh(&mhi_chan->lock); >>>> + >>>> /* notify client */ >>>> mhi_chan->xfer_cb(mhi_chan->mhi_dev, &result); >>>> @@ -667,6 +669,8 @@ static int parse_xfer_event(struct mhi_controller *mhi_cntrl, >>>> kfree(buf_info->cb_buf); >>>> } >>>> } >>>> + >>>> + read_lock_bh(&mhi_chan->lock); >>>> } >>>> break; >>>> } /* CC_EOT */ >>>> -- >>>> 2.7.4 >>>> >>>>
On Wed, Nov 29, 2023 at 11:29:07AM +0800, Qiang Yu wrote: > > On 11/28/2023 9:32 PM, Manivannan Sadhasivam wrote: > > On Mon, Nov 27, 2023 at 03:13:55PM +0800, Qiang Yu wrote: > > > On 11/24/2023 6:04 PM, Manivannan Sadhasivam wrote: > > > > On Tue, Nov 14, 2023 at 01:27:39PM +0800, Qiang Yu wrote: > > > > > Ensure read and write locks for the channel are not taken in succession by > > > > > dropping the read lock from parse_xfer_event() such that a callback given > > > > > to client can potentially queue buffers and acquire the write lock in that > > > > > process. Any queueing of buffers should be done without channel read lock > > > > > acquired as it can result in multiple locks and a soft lockup. > > > > > > > > > Is this patch trying to fix an existing issue in client drivers or a potential > > > > issue in the future drivers? > > > > > > > > Even if you take care of disabled channels, "mhi_event->lock" acquired during > > > > mhi_mark_stale_events() can cause deadlock, since event lock is already held by > > > > mhi_ev_task(). > > > > > > > > I'd prefer not to open the window unless this patch is fixing a real issue. > > > > > > > > - Mani > > > In [PATCH v4 1/4] bus: mhi: host: Add spinlock to protect WP access when > > > queueing > > > TREs, we add > > > write_lock_bh(&mhi_chan->lock)/write_unlock_bh(&mhi_chan->lock) > > > in mhi_gen_tre, which may be invoked as part of mhi_queue in client xfer > > > callback, > > > so we have to use read_unlock_bh(&mhi_chan->lock) here to avoid acquiring > > > mhi_chan->lock > > > twice. > > > > > > Sorry for confusing you. Do you think we need to sqush this two patch into > > > one? > > Well, if patch 1 is introducing a potential deadlock, then we should fix patch > > 1 itself and not introduce a follow up patch. > > > > But there is one more issue that I pointed out in my previous reply. > Sorry, I can not understand why "mhi_event->lock" acquired during > mhi_mark_stale_events() can cause deadlock. In mhi_ev_task(), we will > not invoke mhi_mark_stale_events(). Can you provide some interpretation? Going by your theory that if a channel gets disabled while processing the event, the process trying to disable the channel will try to acquire "mhi_event->lock" which is already held by the process processing the event. - Mani > > > > Also, I'm planning to cleanup the locking mess within MHI in the coming days. > > Perhaps we can revisit this series at that point of time. Will that be OK for > > you? > Sure, that will be great. > > > > - Mani > > > > > > > Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com> > > > > > --- > > > > > drivers/bus/mhi/host/main.c | 4 ++++ > > > > > 1 file changed, 4 insertions(+) > > > > > > > > > > diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c > > > > > index 6c6d253..c4215b0 100644 > > > > > --- a/drivers/bus/mhi/host/main.c > > > > > +++ b/drivers/bus/mhi/host/main.c > > > > > @@ -642,6 +642,8 @@ static int parse_xfer_event(struct mhi_controller *mhi_cntrl, > > > > > mhi_del_ring_element(mhi_cntrl, tre_ring); > > > > > local_rp = tre_ring->rp; > > > > > + read_unlock_bh(&mhi_chan->lock); > > > > > + > > > > > /* notify client */ > > > > > mhi_chan->xfer_cb(mhi_chan->mhi_dev, &result); > > > > > @@ -667,6 +669,8 @@ static int parse_xfer_event(struct mhi_controller *mhi_cntrl, > > > > > kfree(buf_info->cb_buf); > > > > > } > > > > > } > > > > > + > > > > > + read_lock_bh(&mhi_chan->lock); > > > > > } > > > > > break; > > > > > } /* CC_EOT */ > > > > > -- > > > > > 2.7.4 > > > > > > > > > > >
On 11/30/2023 1:31 PM, Manivannan Sadhasivam wrote: > On Wed, Nov 29, 2023 at 11:29:07AM +0800, Qiang Yu wrote: >> On 11/28/2023 9:32 PM, Manivannan Sadhasivam wrote: >>> On Mon, Nov 27, 2023 at 03:13:55PM +0800, Qiang Yu wrote: >>>> On 11/24/2023 6:04 PM, Manivannan Sadhasivam wrote: >>>>> On Tue, Nov 14, 2023 at 01:27:39PM +0800, Qiang Yu wrote: >>>>>> Ensure read and write locks for the channel are not taken in succession by >>>>>> dropping the read lock from parse_xfer_event() such that a callback given >>>>>> to client can potentially queue buffers and acquire the write lock in that >>>>>> process. Any queueing of buffers should be done without channel read lock >>>>>> acquired as it can result in multiple locks and a soft lockup. >>>>>> >>>>> Is this patch trying to fix an existing issue in client drivers or a potential >>>>> issue in the future drivers? >>>>> >>>>> Even if you take care of disabled channels, "mhi_event->lock" acquired during >>>>> mhi_mark_stale_events() can cause deadlock, since event lock is already held by >>>>> mhi_ev_task(). >>>>> >>>>> I'd prefer not to open the window unless this patch is fixing a real issue. >>>>> >>>>> - Mani >>>> In [PATCH v4 1/4] bus: mhi: host: Add spinlock to protect WP access when >>>> queueing >>>> TREs, we add >>>> write_lock_bh(&mhi_chan->lock)/write_unlock_bh(&mhi_chan->lock) >>>> in mhi_gen_tre, which may be invoked as part of mhi_queue in client xfer >>>> callback, >>>> so we have to use read_unlock_bh(&mhi_chan->lock) here to avoid acquiring >>>> mhi_chan->lock >>>> twice. >>>> >>>> Sorry for confusing you. Do you think we need to sqush this two patch into >>>> one? >>> Well, if patch 1 is introducing a potential deadlock, then we should fix patch >>> 1 itself and not introduce a follow up patch. >>> >>> But there is one more issue that I pointed out in my previous reply. >> Sorry, I can not understand why "mhi_event->lock" acquired during >> mhi_mark_stale_events() can cause deadlock. In mhi_ev_task(), we will >> not invoke mhi_mark_stale_events(). Can you provide some interpretation? > Going by your theory that if a channel gets disabled while processing the event, > the process trying to disable the channel will try to acquire "mhi_event->lock" > which is already held by the process processing the event. > > - Mani OK, I get you. Thank you for kind explanation. Hopefully I didn't intrude too much. > >>> Also, I'm planning to cleanup the locking mess within MHI in the coming days. >>> Perhaps we can revisit this series at that point of time. Will that be OK for >>> you? >> Sure, that will be great. >>> - Mani >>> >>>>>> Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com> >>>>>> --- >>>>>> drivers/bus/mhi/host/main.c | 4 ++++ >>>>>> 1 file changed, 4 insertions(+) >>>>>> >>>>>> diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c >>>>>> index 6c6d253..c4215b0 100644 >>>>>> --- a/drivers/bus/mhi/host/main.c >>>>>> +++ b/drivers/bus/mhi/host/main.c >>>>>> @@ -642,6 +642,8 @@ static int parse_xfer_event(struct mhi_controller *mhi_cntrl, >>>>>> mhi_del_ring_element(mhi_cntrl, tre_ring); >>>>>> local_rp = tre_ring->rp; >>>>>> + read_unlock_bh(&mhi_chan->lock); >>>>>> + >>>>>> /* notify client */ >>>>>> mhi_chan->xfer_cb(mhi_chan->mhi_dev, &result); >>>>>> @@ -667,6 +669,8 @@ static int parse_xfer_event(struct mhi_controller *mhi_cntrl, >>>>>> kfree(buf_info->cb_buf); >>>>>> } >>>>>> } >>>>>> + >>>>>> + read_lock_bh(&mhi_chan->lock); >>>>>> } >>>>>> break; >>>>>> } /* CC_EOT */ >>>>>> -- >>>>>> 2.7.4 >>>>>> >>>>>>
On Wed, Dec 06, 2023 at 10:25:12AM +0800, Qiang Yu wrote: > > On 11/30/2023 1:31 PM, Manivannan Sadhasivam wrote: > > On Wed, Nov 29, 2023 at 11:29:07AM +0800, Qiang Yu wrote: > > > On 11/28/2023 9:32 PM, Manivannan Sadhasivam wrote: > > > > On Mon, Nov 27, 2023 at 03:13:55PM +0800, Qiang Yu wrote: > > > > > On 11/24/2023 6:04 PM, Manivannan Sadhasivam wrote: > > > > > > On Tue, Nov 14, 2023 at 01:27:39PM +0800, Qiang Yu wrote: > > > > > > > Ensure read and write locks for the channel are not taken in succession by > > > > > > > dropping the read lock from parse_xfer_event() such that a callback given > > > > > > > to client can potentially queue buffers and acquire the write lock in that > > > > > > > process. Any queueing of buffers should be done without channel read lock > > > > > > > acquired as it can result in multiple locks and a soft lockup. > > > > > > > > > > > > > Is this patch trying to fix an existing issue in client drivers or a potential > > > > > > issue in the future drivers? > > > > > > > > > > > > Even if you take care of disabled channels, "mhi_event->lock" acquired during > > > > > > mhi_mark_stale_events() can cause deadlock, since event lock is already held by > > > > > > mhi_ev_task(). > > > > > > > > > > > > I'd prefer not to open the window unless this patch is fixing a real issue. > > > > > > > > > > > > - Mani > > > > > In [PATCH v4 1/4] bus: mhi: host: Add spinlock to protect WP access when > > > > > queueing > > > > > TREs, we add > > > > > write_lock_bh(&mhi_chan->lock)/write_unlock_bh(&mhi_chan->lock) > > > > > in mhi_gen_tre, which may be invoked as part of mhi_queue in client xfer > > > > > callback, > > > > > so we have to use read_unlock_bh(&mhi_chan->lock) here to avoid acquiring > > > > > mhi_chan->lock > > > > > twice. > > > > > > > > > > Sorry for confusing you. Do you think we need to sqush this two patch into > > > > > one? > > > > Well, if patch 1 is introducing a potential deadlock, then we should fix patch > > > > 1 itself and not introduce a follow up patch. > > > > > > > > But there is one more issue that I pointed out in my previous reply. > > > Sorry, I can not understand why "mhi_event->lock" acquired during > > > mhi_mark_stale_events() can cause deadlock. In mhi_ev_task(), we will > > > not invoke mhi_mark_stale_events(). Can you provide some interpretation? > > Going by your theory that if a channel gets disabled while processing the event, > > the process trying to disable the channel will try to acquire "mhi_event->lock" > > which is already held by the process processing the event. > > > > - Mani > OK, I get you. Thank you for kind explanation. Hopefully I didn't intrude > too much. Not at all. Btw, did you actually encounter any issue that this patch is trying to fix? Or just fixing based on code inspection. - Mani > > > > > > Also, I'm planning to cleanup the locking mess within MHI in the coming days. > > > > Perhaps we can revisit this series at that point of time. Will that be OK for > > > > you? > > > Sure, that will be great. > > > > - Mani > > > > > > > > > > > Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com> > > > > > > > --- > > > > > > > drivers/bus/mhi/host/main.c | 4 ++++ > > > > > > > 1 file changed, 4 insertions(+) > > > > > > > > > > > > > > diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c > > > > > > > index 6c6d253..c4215b0 100644 > > > > > > > --- a/drivers/bus/mhi/host/main.c > > > > > > > +++ b/drivers/bus/mhi/host/main.c > > > > > > > @@ -642,6 +642,8 @@ static int parse_xfer_event(struct mhi_controller *mhi_cntrl, > > > > > > > mhi_del_ring_element(mhi_cntrl, tre_ring); > > > > > > > local_rp = tre_ring->rp; > > > > > > > + read_unlock_bh(&mhi_chan->lock); > > > > > > > + > > > > > > > /* notify client */ > > > > > > > mhi_chan->xfer_cb(mhi_chan->mhi_dev, &result); > > > > > > > @@ -667,6 +669,8 @@ static int parse_xfer_event(struct mhi_controller *mhi_cntrl, > > > > > > > kfree(buf_info->cb_buf); > > > > > > > } > > > > > > > } > > > > > > > + > > > > > > > + read_lock_bh(&mhi_chan->lock); > > > > > > > } > > > > > > > break; > > > > > > > } /* CC_EOT */ > > > > > > > -- > > > > > > > 2.7.4 > > > > > > > > > > > > > >
On 12/6/2023 9:48 PM, Manivannan Sadhasivam wrote: > On Wed, Dec 06, 2023 at 10:25:12AM +0800, Qiang Yu wrote: >> On 11/30/2023 1:31 PM, Manivannan Sadhasivam wrote: >>> On Wed, Nov 29, 2023 at 11:29:07AM +0800, Qiang Yu wrote: >>>> On 11/28/2023 9:32 PM, Manivannan Sadhasivam wrote: >>>>> On Mon, Nov 27, 2023 at 03:13:55PM +0800, Qiang Yu wrote: >>>>>> On 11/24/2023 6:04 PM, Manivannan Sadhasivam wrote: >>>>>>> On Tue, Nov 14, 2023 at 01:27:39PM +0800, Qiang Yu wrote: >>>>>>>> Ensure read and write locks for the channel are not taken in succession by >>>>>>>> dropping the read lock from parse_xfer_event() such that a callback given >>>>>>>> to client can potentially queue buffers and acquire the write lock in that >>>>>>>> process. Any queueing of buffers should be done without channel read lock >>>>>>>> acquired as it can result in multiple locks and a soft lockup. >>>>>>>> >>>>>>> Is this patch trying to fix an existing issue in client drivers or a potential >>>>>>> issue in the future drivers? >>>>>>> >>>>>>> Even if you take care of disabled channels, "mhi_event->lock" acquired during >>>>>>> mhi_mark_stale_events() can cause deadlock, since event lock is already held by >>>>>>> mhi_ev_task(). >>>>>>> >>>>>>> I'd prefer not to open the window unless this patch is fixing a real issue. >>>>>>> >>>>>>> - Mani >>>>>> In [PATCH v4 1/4] bus: mhi: host: Add spinlock to protect WP access when >>>>>> queueing >>>>>> TREs, we add >>>>>> write_lock_bh(&mhi_chan->lock)/write_unlock_bh(&mhi_chan->lock) >>>>>> in mhi_gen_tre, which may be invoked as part of mhi_queue in client xfer >>>>>> callback, >>>>>> so we have to use read_unlock_bh(&mhi_chan->lock) here to avoid acquiring >>>>>> mhi_chan->lock >>>>>> twice. >>>>>> >>>>>> Sorry for confusing you. Do you think we need to sqush this two patch into >>>>>> one? >>>>> Well, if patch 1 is introducing a potential deadlock, then we should fix patch >>>>> 1 itself and not introduce a follow up patch. >>>>> >>>>> But there is one more issue that I pointed out in my previous reply. >>>> Sorry, I can not understand why "mhi_event->lock" acquired during >>>> mhi_mark_stale_events() can cause deadlock. In mhi_ev_task(), we will >>>> not invoke mhi_mark_stale_events(). Can you provide some interpretation? >>> Going by your theory that if a channel gets disabled while processing the event, >>> the process trying to disable the channel will try to acquire "mhi_event->lock" >>> which is already held by the process processing the event. >>> >>> - Mani >> OK, I get you. Thank you for kind explanation. Hopefully I didn't intrude >> too much. > Not at all. Btw, did you actually encounter any issue that this patch is trying > to fix? Or just fixing based on code inspection. > > - Mani Yes, we actually meet the race issue in downstream driver. But I can not find more details about the issue. >>>>> Also, I'm planning to cleanup the locking mess within MHI in the coming days. >>>>> Perhaps we can revisit this series at that point of time. Will that be OK for >>>>> you? >>>> Sure, that will be great. >>>>> - Mani >>>>> >>>>>>>> Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com> >>>>>>>> --- >>>>>>>> drivers/bus/mhi/host/main.c | 4 ++++ >>>>>>>> 1 file changed, 4 insertions(+) >>>>>>>> >>>>>>>> diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c >>>>>>>> index 6c6d253..c4215b0 100644 >>>>>>>> --- a/drivers/bus/mhi/host/main.c >>>>>>>> +++ b/drivers/bus/mhi/host/main.c >>>>>>>> @@ -642,6 +642,8 @@ static int parse_xfer_event(struct mhi_controller *mhi_cntrl, >>>>>>>> mhi_del_ring_element(mhi_cntrl, tre_ring); >>>>>>>> local_rp = tre_ring->rp; >>>>>>>> + read_unlock_bh(&mhi_chan->lock); >>>>>>>> + >>>>>>>> /* notify client */ >>>>>>>> mhi_chan->xfer_cb(mhi_chan->mhi_dev, &result); >>>>>>>> @@ -667,6 +669,8 @@ static int parse_xfer_event(struct mhi_controller *mhi_cntrl, >>>>>>>> kfree(buf_info->cb_buf); >>>>>>>> } >>>>>>>> } >>>>>>>> + >>>>>>>> + read_lock_bh(&mhi_chan->lock); >>>>>>>> } >>>>>>>> break; >>>>>>>> } /* CC_EOT */ >>>>>>>> -- >>>>>>>> 2.7.4 >>>>>>>> >>>>>>>>
On 12/6/2023 9:48 PM, Manivannan Sadhasivam wrote: > On Wed, Dec 06, 2023 at 10:25:12AM +0800, Qiang Yu wrote: >> On 11/30/2023 1:31 PM, Manivannan Sadhasivam wrote: >>> On Wed, Nov 29, 2023 at 11:29:07AM +0800, Qiang Yu wrote: >>>> On 11/28/2023 9:32 PM, Manivannan Sadhasivam wrote: >>>>> On Mon, Nov 27, 2023 at 03:13:55PM +0800, Qiang Yu wrote: >>>>>> On 11/24/2023 6:04 PM, Manivannan Sadhasivam wrote: >>>>>>> On Tue, Nov 14, 2023 at 01:27:39PM +0800, Qiang Yu wrote: >>>>>>>> Ensure read and write locks for the channel are not taken in succession by >>>>>>>> dropping the read lock from parse_xfer_event() such that a callback given >>>>>>>> to client can potentially queue buffers and acquire the write lock in that >>>>>>>> process. Any queueing of buffers should be done without channel read lock >>>>>>>> acquired as it can result in multiple locks and a soft lockup. >>>>>>>> >>>>>>> Is this patch trying to fix an existing issue in client drivers or a potential >>>>>>> issue in the future drivers? >>>>>>> >>>>>>> Even if you take care of disabled channels, "mhi_event->lock" acquired during >>>>>>> mhi_mark_stale_events() can cause deadlock, since event lock is already held by >>>>>>> mhi_ev_task(). >>>>>>> >>>>>>> I'd prefer not to open the window unless this patch is fixing a real issue. >>>>>>> >>>>>>> - Mani >>>>>> In [PATCH v4 1/4] bus: mhi: host: Add spinlock to protect WP access when >>>>>> queueing >>>>>> TREs, we add >>>>>> write_lock_bh(&mhi_chan->lock)/write_unlock_bh(&mhi_chan->lock) >>>>>> in mhi_gen_tre, which may be invoked as part of mhi_queue in client xfer >>>>>> callback, >>>>>> so we have to use read_unlock_bh(&mhi_chan->lock) here to avoid acquiring >>>>>> mhi_chan->lock >>>>>> twice. >>>>>> >>>>>> Sorry for confusing you. Do you think we need to sqush this two patch into >>>>>> one? >>>>> Well, if patch 1 is introducing a potential deadlock, then we should fix patch >>>>> 1 itself and not introduce a follow up patch. >>>>> >>>>> But there is one more issue that I pointed out in my previous reply. >>>> Sorry, I can not understand why "mhi_event->lock" acquired during >>>> mhi_mark_stale_events() can cause deadlock. In mhi_ev_task(), we will >>>> not invoke mhi_mark_stale_events(). Can you provide some interpretation? >>> Going by your theory that if a channel gets disabled while processing the event, >>> the process trying to disable the channel will try to acquire "mhi_event->lock" >>> which is already held by the process processing the event. >>> >>> - Mani >> OK, I get you. Thank you for kind explanation. Hopefully I didn't intrude >> too much. > Not at all. Btw, did you actually encounter any issue that this patch is trying > to fix? Or just fixing based on code inspection. > > - Mani Yes, we actually meet the race issue in downstream driver. But I can not find more details about the issue. >>>>> Also, I'm planning to cleanup the locking mess within MHI in the coming days. >>>>> Perhaps we can revisit this series at that point of time. Will that be OK for >>>>> you? >>>> Sure, that will be great. >>>>> - Mani >>>>> >>>>>>>> Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com> >>>>>>>> --- >>>>>>>> drivers/bus/mhi/host/main.c | 4 ++++ >>>>>>>> 1 file changed, 4 insertions(+) >>>>>>>> >>>>>>>> diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c >>>>>>>> index 6c6d253..c4215b0 100644 >>>>>>>> --- a/drivers/bus/mhi/host/main.c >>>>>>>> +++ b/drivers/bus/mhi/host/main.c >>>>>>>> @@ -642,6 +642,8 @@ static int parse_xfer_event(struct mhi_controller *mhi_cntrl, >>>>>>>> mhi_del_ring_element(mhi_cntrl, tre_ring); >>>>>>>> local_rp = tre_ring->rp; >>>>>>>> + read_unlock_bh(&mhi_chan->lock); >>>>>>>> + >>>>>>>> /* notify client */ >>>>>>>> mhi_chan->xfer_cb(mhi_chan->mhi_dev, &result); >>>>>>>> @@ -667,6 +669,8 @@ static int parse_xfer_event(struct mhi_controller *mhi_cntrl, >>>>>>>> kfree(buf_info->cb_buf); >>>>>>>> } >>>>>>>> } >>>>>>>> + >>>>>>>> + read_lock_bh(&mhi_chan->lock); >>>>>>>> } >>>>>>>> break; >>>>>>>> } /* CC_EOT */ >>>>>>>> -- >>>>>>>> 2.7.4 >>>>>>>> >>>>>>>>
On Thu, Dec 07, 2023 at 01:27:19PM +0800, Qiang Yu wrote: > > On 12/6/2023 9:48 PM, Manivannan Sadhasivam wrote: > > On Wed, Dec 06, 2023 at 10:25:12AM +0800, Qiang Yu wrote: > > > On 11/30/2023 1:31 PM, Manivannan Sadhasivam wrote: > > > > On Wed, Nov 29, 2023 at 11:29:07AM +0800, Qiang Yu wrote: > > > > > On 11/28/2023 9:32 PM, Manivannan Sadhasivam wrote: > > > > > > On Mon, Nov 27, 2023 at 03:13:55PM +0800, Qiang Yu wrote: > > > > > > > On 11/24/2023 6:04 PM, Manivannan Sadhasivam wrote: > > > > > > > > On Tue, Nov 14, 2023 at 01:27:39PM +0800, Qiang Yu wrote: > > > > > > > > > Ensure read and write locks for the channel are not taken in succession by > > > > > > > > > dropping the read lock from parse_xfer_event() such that a callback given > > > > > > > > > to client can potentially queue buffers and acquire the write lock in that > > > > > > > > > process. Any queueing of buffers should be done without channel read lock > > > > > > > > > acquired as it can result in multiple locks and a soft lockup. > > > > > > > > > > > > > > > > > Is this patch trying to fix an existing issue in client drivers or a potential > > > > > > > > issue in the future drivers? > > > > > > > > > > > > > > > > Even if you take care of disabled channels, "mhi_event->lock" acquired during > > > > > > > > mhi_mark_stale_events() can cause deadlock, since event lock is already held by > > > > > > > > mhi_ev_task(). > > > > > > > > > > > > > > > > I'd prefer not to open the window unless this patch is fixing a real issue. > > > > > > > > > > > > > > > > - Mani > > > > > > > In [PATCH v4 1/4] bus: mhi: host: Add spinlock to protect WP access when > > > > > > > queueing > > > > > > > TREs, we add > > > > > > > write_lock_bh(&mhi_chan->lock)/write_unlock_bh(&mhi_chan->lock) > > > > > > > in mhi_gen_tre, which may be invoked as part of mhi_queue in client xfer > > > > > > > callback, > > > > > > > so we have to use read_unlock_bh(&mhi_chan->lock) here to avoid acquiring > > > > > > > mhi_chan->lock > > > > > > > twice. > > > > > > > > > > > > > > Sorry for confusing you. Do you think we need to sqush this two patch into > > > > > > > one? > > > > > > Well, if patch 1 is introducing a potential deadlock, then we should fix patch > > > > > > 1 itself and not introduce a follow up patch. > > > > > > > > > > > > But there is one more issue that I pointed out in my previous reply. > > > > > Sorry, I can not understand why "mhi_event->lock" acquired during > > > > > mhi_mark_stale_events() can cause deadlock. In mhi_ev_task(), we will > > > > > not invoke mhi_mark_stale_events(). Can you provide some interpretation? > > > > Going by your theory that if a channel gets disabled while processing the event, > > > > the process trying to disable the channel will try to acquire "mhi_event->lock" > > > > which is already held by the process processing the event. > > > > > > > > - Mani > > > OK, I get you. Thank you for kind explanation. Hopefully I didn't intrude > > > too much. > > Not at all. Btw, did you actually encounter any issue that this patch is trying > > to fix? Or just fixing based on code inspection. > > > > - Mani > Yes, we actually meet the race issue in downstream driver. But I can not > find more details about the issue. Hmm. I think it is OK to accept this patch and ignore the channel disabling concern since the event lock is in place to prevent that. There would be no deadlock as I mentioned above, since the process that is parsing the xfer event is not the one that is going to disable the channel in parallel. Could you please respin this series dropping patch 3/4 and also addressing the issue I mentioned in patch 4/4? - Mani
On 12/7/2023 2:43 PM, Manivannan Sadhasivam wrote: > On Thu, Dec 07, 2023 at 01:27:19PM +0800, Qiang Yu wrote: >> On 12/6/2023 9:48 PM, Manivannan Sadhasivam wrote: >>> On Wed, Dec 06, 2023 at 10:25:12AM +0800, Qiang Yu wrote: >>>> On 11/30/2023 1:31 PM, Manivannan Sadhasivam wrote: >>>>> On Wed, Nov 29, 2023 at 11:29:07AM +0800, Qiang Yu wrote: >>>>>> On 11/28/2023 9:32 PM, Manivannan Sadhasivam wrote: >>>>>>> On Mon, Nov 27, 2023 at 03:13:55PM +0800, Qiang Yu wrote: >>>>>>>> On 11/24/2023 6:04 PM, Manivannan Sadhasivam wrote: >>>>>>>>> On Tue, Nov 14, 2023 at 01:27:39PM +0800, Qiang Yu wrote: >>>>>>>>>> Ensure read and write locks for the channel are not taken in succession by >>>>>>>>>> dropping the read lock from parse_xfer_event() such that a callback given >>>>>>>>>> to client can potentially queue buffers and acquire the write lock in that >>>>>>>>>> process. Any queueing of buffers should be done without channel read lock >>>>>>>>>> acquired as it can result in multiple locks and a soft lockup. >>>>>>>>>> >>>>>>>>> Is this patch trying to fix an existing issue in client drivers or a potential >>>>>>>>> issue in the future drivers? >>>>>>>>> >>>>>>>>> Even if you take care of disabled channels, "mhi_event->lock" acquired during >>>>>>>>> mhi_mark_stale_events() can cause deadlock, since event lock is already held by >>>>>>>>> mhi_ev_task(). >>>>>>>>> >>>>>>>>> I'd prefer not to open the window unless this patch is fixing a real issue. >>>>>>>>> >>>>>>>>> - Mani >>>>>>>> In [PATCH v4 1/4] bus: mhi: host: Add spinlock to protect WP access when >>>>>>>> queueing >>>>>>>> TREs, we add >>>>>>>> write_lock_bh(&mhi_chan->lock)/write_unlock_bh(&mhi_chan->lock) >>>>>>>> in mhi_gen_tre, which may be invoked as part of mhi_queue in client xfer >>>>>>>> callback, >>>>>>>> so we have to use read_unlock_bh(&mhi_chan->lock) here to avoid acquiring >>>>>>>> mhi_chan->lock >>>>>>>> twice. >>>>>>>> >>>>>>>> Sorry for confusing you. Do you think we need to sqush this two patch into >>>>>>>> one? >>>>>>> Well, if patch 1 is introducing a potential deadlock, then we should fix patch >>>>>>> 1 itself and not introduce a follow up patch. >>>>>>> >>>>>>> But there is one more issue that I pointed out in my previous reply. >>>>>> Sorry, I can not understand why "mhi_event->lock" acquired during >>>>>> mhi_mark_stale_events() can cause deadlock. In mhi_ev_task(), we will >>>>>> not invoke mhi_mark_stale_events(). Can you provide some interpretation? >>>>> Going by your theory that if a channel gets disabled while processing the event, >>>>> the process trying to disable the channel will try to acquire "mhi_event->lock" >>>>> which is already held by the process processing the event. >>>>> >>>>> - Mani >>>> OK, I get you. Thank you for kind explanation. Hopefully I didn't intrude >>>> too much. >>> Not at all. Btw, did you actually encounter any issue that this patch is trying >>> to fix? Or just fixing based on code inspection. >>> >>> - Mani >> Yes, we actually meet the race issue in downstream driver. But I can not >> find more details about the issue. > Hmm. I think it is OK to accept this patch and ignore the channel disabling > concern since the event lock is in place to prevent that. There would be no > deadlock as I mentioned above, since the process that is parsing the xfer event > is not the one that is going to disable the channel in parallel. > > Could you please respin this series dropping patch 3/4 and also addressing the > issue I mentioned in patch 4/4? > > - Mani Thank you for tirelessly review these patches. Will do this in next version.
diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c index 6c6d253..c4215b0 100644 --- a/drivers/bus/mhi/host/main.c +++ b/drivers/bus/mhi/host/main.c @@ -642,6 +642,8 @@ static int parse_xfer_event(struct mhi_controller *mhi_cntrl, mhi_del_ring_element(mhi_cntrl, tre_ring); local_rp = tre_ring->rp; + read_unlock_bh(&mhi_chan->lock); + /* notify client */ mhi_chan->xfer_cb(mhi_chan->mhi_dev, &result); @@ -667,6 +669,8 @@ static int parse_xfer_event(struct mhi_controller *mhi_cntrl, kfree(buf_info->cb_buf); } } + + read_lock_bh(&mhi_chan->lock); } break; } /* CC_EOT */