Message ID | 20231220172112.763539-1-cristian.marussi@arm.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-7345-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp2795458dyi; Wed, 20 Dec 2023 09:21:48 -0800 (PST) X-Google-Smtp-Source: AGHT+IEl/LAcLYXCZGBnPOY5IjpHBeJQpL7BXRgtk+nRNTAyvwHmLA3F5rEhNS3Gz5gXANqUIB7h X-Received: by 2002:a17:90a:e616:b0:28b:c9e0:d3dc with SMTP id j22-20020a17090ae61600b0028bc9e0d3dcmr1243364pjy.43.1703092907551; Wed, 20 Dec 2023 09:21:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703092907; cv=none; d=google.com; s=arc-20160816; b=VNv6P4zQD1z68Ah++W+++pfyhkks1JBaF4f6Dl+EsVFbOZMTDeTQr5KZwdiAh7x/It T4kuFkIGH5Pt9cUEppDpw6xBz/92y+NcgdpZitmglgPQRPWgl4mOn2siczSb/Ehwa7wV NFwO5z+T7K+1bw9mFKQOEkXQX3Zsv6qe44lf8UzlwpIC6RKzKVnf42Pv42KofNuH0uzH UdylYnCodKhHxyTUFIvUj0792yxQ0huUstq7wLzsjjkSt7eXFWbuSTT31KZu1c60wmLl yOp+jYzY+YjTdAzxJEb58tRg1VzT31hSFIslnK1nz/HQd6Pa1H6u3x5FiMcsSDIe/SyX p2kw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from; bh=27hYTWBwiEeb2cVAWZEp/18YGtAKC0kwm/9EfRLTXFs=; fh=0jgyZWNBFOoiKHEvhVlz4EfBbWVkF8a1GmNDufWOYRY=; b=ULUgCOIvzmHq5pgeZcCDAf2kunohfma5MOlXifYlBIMnm7HZugRlIYd1oFDuVyOjRt Omzy2gHIcAd5rIjWAck4XIYzucsepvOMXlXgCGYKKgz62cBKy8OSKJkwIvne3Ieb9PZG HkQFxBowqn3J98lyQla7QrCy2vnFeFg5BjGIJrje8LNp5xBAggNXmZ/SHJc/+vkJI+XT AJc58iK/vzymPevwIKqSpeIAUsNRi5VbWoFa5kBe6iXYjy1+GIyVIkLvhRKoo5RfEDwR q/f+RicCDrK6g6aLNIbu90ukTZTrj/rSBIZDLVkTBJ650lk3TABKHVzmT/B+i+HKYKN8 ePHw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-7345-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-7345-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id gi11-20020a17090b110b00b0028b77a3eb52si129096pjb.134.2023.12.20.09.21.47 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Dec 2023 09:21:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-7345-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-7345-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-7345-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 464062846E5 for <ouuuleilei@gmail.com>; Wed, 20 Dec 2023 17:21:47 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B638247A73; Wed, 20 Dec 2023 17:21:32 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7ABCE4777A; Wed, 20 Dec 2023 17:21:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 612B61FB; Wed, 20 Dec 2023 09:22:12 -0800 (PST) Received: from e120937-lin.. (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 479663F64C; Wed, 20 Dec 2023 09:21:26 -0800 (PST) From: Cristian Marussi <cristian.marussi@arm.com> To: linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Cc: sudeep.holla@arm.com, vincent.guittot@linaro.org, Cristian Marussi <cristian.marussi@arm.com>, Xinglong Yang <xinglong.yang@cixtech.com>, stable@vger.kernel.org Subject: [PATCH] firmware: arm_scmi: Check Mailbox/SMT channel for consistency Date: Wed, 20 Dec 2023 17:21:12 +0000 Message-Id: <20231220172112.763539-1-cristian.marussi@arm.com> X-Mailer: git-send-email 2.34.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785822348927742928 X-GMAIL-MSGID: 1785822348927742928 |
Series |
firmware: arm_scmi: Check Mailbox/SMT channel for consistency
|
|
Commit Message
Cristian Marussi
Dec. 20, 2023, 5:21 p.m. UTC
On reception of a completion interrupt the SMT memory area is accessed to
retrieve the message header at first and then, if the message sequence
number identifies a transaction which is still pending, the related
payload is fetched too.
When an SCMI command times out the channel ownership remains with the
platform until eventually a late reply is received and, as a consequence,
any further transmission attempt remains pending, waiting for the channel
to be relinquished by the platform.
Once that late reply is received the channel ownership is given back
to the agent and any pending request is then allowed to proceed and
overwrite the SMT area of the just delivered late reply; then the wait for
the reply to the new request starts.
It has been observed that the spurious IRQ related to the late reply can
be wrongly associated with the freshly enqueued request: when that happens
the SCMI stack in-flight lookup procedure is fooled by the fact that the
message header now present in the SMT area is related to the new pending
transaction, even though the real reply has still to arrive.
This race-condition on the A2P channel can be detected by looking at the
channel status bits: a genuine reply from the platform will have set the
channel free bit before triggering the completion IRQ.
Add a consistency check to validate such condition in the A2P ISR.
Reported-by: Xinglong Yang <xinglong.yang@cixtech.com>
Closes: https://lore.kernel.org/all/PUZPR06MB54981E6FA00D82BFDBB864FBF08DA@PUZPR06MB5498.apcprd06.prod.outlook.com/
Fixes: 5c8a47a5a91d ("firmware: arm_scmi: Make scmi core independent of the transport type")
CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
---
drivers/firmware/arm_scmi/common.h | 1 +
drivers/firmware/arm_scmi/mailbox.c | 14 ++++++++++++++
drivers/firmware/arm_scmi/shmem.c | 6 ++++++
3 files changed, 21 insertions(+)
Comments
Hi, Cristian This patch successfully solves the bug. From: Cristian Marussi <cristian.marussi@arm.com> Sent: Thursday, December 21, 2023 1:21 AM > On reception of a completion interrupt the SMT memory area is accessed to > retrieve the message header at first and then, if the message sequence > number identifies a transaction which is still pending, the related > payload is fetched too. > > When an SCMI command times out the channel ownership remains with the > platform until eventually a late reply is received and, as a consequence, > any further transmission attempt remains pending, waiting for the channel > to be relinquished by the platform. > > Once that late reply is received the channel ownership is given back > to the agent and any pending request is then allowed to proceed and > overwrite the SMT area of the just delivered late reply; then the wait for > the reply to the new request starts. > > It has been observed that the spurious IRQ related to the late reply can > be wrongly associated with the freshly enqueued request: when that > happens > the SCMI stack in-flight lookup procedure is fooled by the fact that the > message header now present in the SMT area is related to the new pending > transaction, even though the real reply has still to arrive. > > This race-condition on the A2P channel can be detected by looking at the > channel status bits: a genuine reply from the platform will have set the > channel free bit before triggering the completion IRQ. > > Add a consistency check to validate such condition in the A2P ISR. > > Reported-by: Xinglong Yang <xinglong.yang@cixtech.com> > Closes: > https://lore.k/ > ernel.org%2Fall%2FPUZPR06MB54981E6FA00D82BFDBB864FBF08DA%40PUZP > R06MB5498.apcprd06.prod.outlook.com%2F&data=05%7C02%7Cxinglong.ya > ng%40cixtech.com%7C669e9ff5e6764a77791208dc01801b8e%7C0409f77ae53 > d4d23943eccade7cb4811%7C1%7C0%7C638386896955072826%7CUnknown% > 7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haW > wiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=T1DOD7KfY%2FJNJHacHtX > d5wcfde%2Fd5UDqGvyW4vuYwYU%3D&reserved=0 > Fixes: 5c8a47a5a91d ("firmware: arm_scmi: Make scmi core independent of > the transport type") > CC: stable@vger.kernel.org # 5.15+ > Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> > --- > drivers/firmware/arm_scmi/common.h | 1 + > drivers/firmware/arm_scmi/mailbox.c | 14 ++++++++++++++ > drivers/firmware/arm_scmi/shmem.c | 6 ++++++ > 3 files changed, 21 insertions(+) > > diff --git a/drivers/firmware/arm_scmi/common.h > b/drivers/firmware/arm_scmi/common.h > index 3b7c68a11fd0..0956c2443840 100644 > --- a/drivers/firmware/arm_scmi/common.h > +++ b/drivers/firmware/arm_scmi/common.h > @@ -329,6 +329,7 @@ void shmem_fetch_notification(struct > scmi_shared_mem __iomem *shmem, > void shmem_clear_channel(struct scmi_shared_mem __iomem *shmem); > bool shmem_poll_done(struct scmi_shared_mem __iomem *shmem, > struct scmi_xfer *xfer); > +bool shmem_channel_free(struct scmi_shared_mem __iomem *shmem); > > /* declarations for message passing transports */ > struct scmi_msg_payld; > diff --git a/drivers/firmware/arm_scmi/mailbox.c > b/drivers/firmware/arm_scmi/mailbox.c > index 19246ed1f01f..b8d470417e8f 100644 > --- a/drivers/firmware/arm_scmi/mailbox.c > +++ b/drivers/firmware/arm_scmi/mailbox.c > @@ -45,6 +45,20 @@ static void rx_callback(struct mbox_client *cl, void *m) > { > struct scmi_mailbox *smbox = client_to_scmi_mailbox(cl); > > + /* > + * An A2P IRQ is NOT valid when received while the platform still has > + * the ownership of the channel, because the platform at first releases > + * the SMT channel and then sends the completion interrupt. > + * > + * This addresses a possible race condition in which a spurious IRQ from > + * a previous timed-out reply which arrived late could be wrongly > + * associated with the next pending transaction. > + */ > + if (cl->knows_txdone && !shmem_channel_free(smbox->shmem)) { > + dev_warn(smbox->cinfo->dev, "Ignoring spurious A2P IRQ !\n"); > + return; > + } > + > scmi_rx_callback(smbox->cinfo, shmem_read_header(smbox->shmem), > NULL); > } > > diff --git a/drivers/firmware/arm_scmi/shmem.c > b/drivers/firmware/arm_scmi/shmem.c > index 87b4f4d35f06..517d52fb3bcb 100644 > --- a/drivers/firmware/arm_scmi/shmem.c > +++ b/drivers/firmware/arm_scmi/shmem.c > @@ -122,3 +122,9 @@ bool shmem_poll_done(struct scmi_shared_mem > __iomem *shmem, > (SCMI_SHMEM_CHAN_STAT_CHANNEL_ERROR | > SCMI_SHMEM_CHAN_STAT_CHANNEL_FREE); > } > + > +bool shmem_channel_free(struct scmi_shared_mem __iomem *shmem) > +{ > + return (ioread32(&shmem->channel_status) & > + SCMI_SHMEM_CHAN_STAT_CHANNEL_FREE); > +} > -- > 2.34.1 Thanks, Xinglong
On Thu, Dec 21, 2023 at 10:31:29AM +0000, Xinglong Yang wrote: > Hi, Cristian > > This patch successfully solves the bug. > Hi Xinglong, thanks for reporting and testing ! Cristian > From: Cristian Marussi <cristian.marussi@arm.com> Sent: Thursday, December 21, 2023 1:21 AM > > On reception of a completion interrupt the SMT memory area is accessed to > > retrieve the message header at first and then, if the message sequence > > number identifies a transaction which is still pending, the related > > payload is fetched too. > > > > When an SCMI command times out the channel ownership remains with the > > platform until eventually a late reply is received and, as a consequence, > > any further transmission attempt remains pending, waiting for the channel > > to be relinquished by the platform. > > > > Once that late reply is received the channel ownership is given back > > to the agent and any pending request is then allowed to proceed and > > overwrite the SMT area of the just delivered late reply; then the wait for > > the reply to the new request starts. > > > > It has been observed that the spurious IRQ related to the late reply can > > be wrongly associated with the freshly enqueued request: when that > > happens > > the SCMI stack in-flight lookup procedure is fooled by the fact that the > > message header now present in the SMT area is related to the new pending > > transaction, even though the real reply has still to arrive. > > > > This race-condition on the A2P channel can be detected by looking at the > > channel status bits: a genuine reply from the platform will have set the > > channel free bit before triggering the completion IRQ. > > > > Add a consistency check to validate such condition in the A2P ISR. > > > > Reported-by: Xinglong Yang <xinglong.yang@cixtech.com> > > Closes: > > https://lore.k/ > > ernel.org%2Fall%2FPUZPR06MB54981E6FA00D82BFDBB864FBF08DA%40PUZP > > R06MB5498.apcprd06.prod.outlook.com%2F&data=05%7C02%7Cxinglong.ya > > ng%40cixtech.com%7C669e9ff5e6764a77791208dc01801b8e%7C0409f77ae53 > > d4d23943eccade7cb4811%7C1%7C0%7C638386896955072826%7CUnknown% > > 7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haW > > wiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=T1DOD7KfY%2FJNJHacHtX > > d5wcfde%2Fd5UDqGvyW4vuYwYU%3D&reserved=0 > > Fixes: 5c8a47a5a91d ("firmware: arm_scmi: Make scmi core independent of > > the transport type") > > CC: stable@vger.kernel.org # 5.15+ > > Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> > > --- > > drivers/firmware/arm_scmi/common.h | 1 + > > drivers/firmware/arm_scmi/mailbox.c | 14 ++++++++++++++ > > drivers/firmware/arm_scmi/shmem.c | 6 ++++++ > > 3 files changed, 21 insertions(+) > > > > diff --git a/drivers/firmware/arm_scmi/common.h > > b/drivers/firmware/arm_scmi/common.h > > index 3b7c68a11fd0..0956c2443840 100644 > > --- a/drivers/firmware/arm_scmi/common.h > > +++ b/drivers/firmware/arm_scmi/common.h > > @@ -329,6 +329,7 @@ void shmem_fetch_notification(struct > > scmi_shared_mem __iomem *shmem, > > void shmem_clear_channel(struct scmi_shared_mem __iomem *shmem); > > bool shmem_poll_done(struct scmi_shared_mem __iomem *shmem, > > struct scmi_xfer *xfer); > > +bool shmem_channel_free(struct scmi_shared_mem __iomem *shmem); > > > > /* declarations for message passing transports */ > > struct scmi_msg_payld; > > diff --git a/drivers/firmware/arm_scmi/mailbox.c > > b/drivers/firmware/arm_scmi/mailbox.c > > index 19246ed1f01f..b8d470417e8f 100644 > > --- a/drivers/firmware/arm_scmi/mailbox.c > > +++ b/drivers/firmware/arm_scmi/mailbox.c > > @@ -45,6 +45,20 @@ static void rx_callback(struct mbox_client *cl, void *m) > > { > > struct scmi_mailbox *smbox = client_to_scmi_mailbox(cl); > > > > + /* > > + * An A2P IRQ is NOT valid when received while the platform still has > > + * the ownership of the channel, because the platform at first releases > > + * the SMT channel and then sends the completion interrupt. > > + * > > + * This addresses a possible race condition in which a spurious IRQ from > > + * a previous timed-out reply which arrived late could be wrongly > > + * associated with the next pending transaction. > > + */ > > + if (cl->knows_txdone && !shmem_channel_free(smbox->shmem)) { > > + dev_warn(smbox->cinfo->dev, "Ignoring spurious A2P IRQ !\n"); > > + return; > > + } > > + > > scmi_rx_callback(smbox->cinfo, shmem_read_header(smbox->shmem), > > NULL); > > } > > > > diff --git a/drivers/firmware/arm_scmi/shmem.c > > b/drivers/firmware/arm_scmi/shmem.c > > index 87b4f4d35f06..517d52fb3bcb 100644 > > --- a/drivers/firmware/arm_scmi/shmem.c > > +++ b/drivers/firmware/arm_scmi/shmem.c > > @@ -122,3 +122,9 @@ bool shmem_poll_done(struct scmi_shared_mem > > __iomem *shmem, > > (SCMI_SHMEM_CHAN_STAT_CHANNEL_ERROR | > > SCMI_SHMEM_CHAN_STAT_CHANNEL_FREE); > > } > > + > > +bool shmem_channel_free(struct scmi_shared_mem __iomem *shmem) > > +{ > > + return (ioread32(&shmem->channel_status) & > > + SCMI_SHMEM_CHAN_STAT_CHANNEL_FREE); > > +} > > -- > > 2.34.1 > > Thanks, > Xinglong > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Thu, Dec 21, 2023 at 10:31:29AM +0000, Xinglong Yang wrote: > Hi, Cristian > > This patch successfully solves the bug. > I take this as: Tested-by: Xinglong Yang <xinglong.yang@cixtech.com> Please shout if you don't imply that.
It is ok. From: Sudeep Holla <sudeep.holla@arm.com> Sent: Thursday, December 21, 2023 6:52 PM > On Thu, Dec 21, 2023 at 10:31:29AM +0000, Xinglong Yang wrote: > > Hi, Cristian > > > > This patch successfully solves the bug. > > > > I take this as: > > Tested-by: Xinglong Yang <xinglong.yang@cixtech.com> > > Please shout if you don't imply that. > > -- > Regards, > Sudeep Thanks, Xinglong
On Wed, 20 Dec 2023 17:21:12 +0000, Cristian Marussi wrote: > On reception of a completion interrupt the SMT memory area is accessed to > retrieve the message header at first and then, if the message sequence > number identifies a transaction which is still pending, the related > payload is fetched too. > > When an SCMI command times out the channel ownership remains with the > platform until eventually a late reply is received and, as a consequence, > any further transmission attempt remains pending, waiting for the channel > to be relinquished by the platform. > > [...] Applied to sudeep.holla/linux (for-next/scmi/fixes), thanks! [1/1] firmware: arm_scmi: Check Mailbox/SMT channel for consistency https://git.kernel.org/sudeep.holla/c/437a310b2224 -- Regards, Sudeep
diff --git a/drivers/firmware/arm_scmi/common.h b/drivers/firmware/arm_scmi/common.h index 3b7c68a11fd0..0956c2443840 100644 --- a/drivers/firmware/arm_scmi/common.h +++ b/drivers/firmware/arm_scmi/common.h @@ -329,6 +329,7 @@ void shmem_fetch_notification(struct scmi_shared_mem __iomem *shmem, void shmem_clear_channel(struct scmi_shared_mem __iomem *shmem); bool shmem_poll_done(struct scmi_shared_mem __iomem *shmem, struct scmi_xfer *xfer); +bool shmem_channel_free(struct scmi_shared_mem __iomem *shmem); /* declarations for message passing transports */ struct scmi_msg_payld; diff --git a/drivers/firmware/arm_scmi/mailbox.c b/drivers/firmware/arm_scmi/mailbox.c index 19246ed1f01f..b8d470417e8f 100644 --- a/drivers/firmware/arm_scmi/mailbox.c +++ b/drivers/firmware/arm_scmi/mailbox.c @@ -45,6 +45,20 @@ static void rx_callback(struct mbox_client *cl, void *m) { struct scmi_mailbox *smbox = client_to_scmi_mailbox(cl); + /* + * An A2P IRQ is NOT valid when received while the platform still has + * the ownership of the channel, because the platform at first releases + * the SMT channel and then sends the completion interrupt. + * + * This addresses a possible race condition in which a spurious IRQ from + * a previous timed-out reply which arrived late could be wrongly + * associated with the next pending transaction. + */ + if (cl->knows_txdone && !shmem_channel_free(smbox->shmem)) { + dev_warn(smbox->cinfo->dev, "Ignoring spurious A2P IRQ !\n"); + return; + } + scmi_rx_callback(smbox->cinfo, shmem_read_header(smbox->shmem), NULL); } diff --git a/drivers/firmware/arm_scmi/shmem.c b/drivers/firmware/arm_scmi/shmem.c index 87b4f4d35f06..517d52fb3bcb 100644 --- a/drivers/firmware/arm_scmi/shmem.c +++ b/drivers/firmware/arm_scmi/shmem.c @@ -122,3 +122,9 @@ bool shmem_poll_done(struct scmi_shared_mem __iomem *shmem, (SCMI_SHMEM_CHAN_STAT_CHANNEL_ERROR | SCMI_SHMEM_CHAN_STAT_CHANNEL_FREE); } + +bool shmem_channel_free(struct scmi_shared_mem __iomem *shmem) +{ + return (ioread32(&shmem->channel_status) & + SCMI_SHMEM_CHAN_STAT_CHANNEL_FREE); +}