[v4,2/4] platform/x86: intel_scu_ipc: Check status upon timeout in ipc_wait_for_interrupt()
Message ID | 20230913212723.3055315-3-swboyd@chromium.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:a8d:b0:3f2:4152:657d with SMTP id gr13csp150580vqb; Wed, 13 Sep 2023 16:29:31 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEgSitU4Mn6iZ0Z2cnT5tomnuB+FKzm1nSvq7hsL4jxUH9AuH8J4Z4dFP6VXSpQimgfOigw X-Received: by 2002:a17:90a:eb0b:b0:273:e64c:f22e with SMTP id j11-20020a17090aeb0b00b00273e64cf22emr3344057pjz.29.1694647771531; Wed, 13 Sep 2023 16:29:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694647771; cv=none; d=google.com; s=arc-20160816; b=J3fWn+Bw2RB2ZYbFE8tzCdGb3iMCG2MNa0d4/iejCfyAXEYwImwp1BWvVqqbnH+qI5 t5t0HzWd3igqJ+ENdqV2Q3+igtkURKVItMVlIgixBbhjVJiDxjRMN4OTDf4aUOAx1MjY mSsJBsB08TYEQ2cX/CbWbaSlNNwqL+qyjraupwP/XEI1GjRkNJMkxRqWEH62aSv+7VJT LYvhFuIlvO3Bw/AITVRXIwy3NnTs2Xz29Sbq/sQKpU3vLsWCB5de+AI+vbZCOR6oAKQ3 dHK4o32kkc2WGiZOodsePP7FA3HG11YU017Zt6eRJmdBt7A7iZkaMiqD7bd8jRrYlYQf JMrA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=/kiTus1GQ2wpWYv2hiScbkUesx/S4YRwA/TTSZ+SSL4=; fh=5P2u8pACeXHzMbKFYECFOSm9AHpASBd8snUOaO+lMmE=; b=e8HJ90qC1A/o5cTUUkZxJL5npFscby0Z3Z9WCzqQxHB2v0yQoR2Gec70usw5BPtw83 pQfje7somRQBpklsw+RYQ5dCAuwWaSX7Nw/WYfUvu/g7MZs3OJaLJWIp0brnyb71kgja 52g3QjPpDe0k9LJWBrtVlVwhMuBgYbzEsmuo3gXvHRQuFpjMf7cRWQmcWqRgPZYoRgDK 0BjlWJAFEYG6xI/KxVKkkZmzP+zcvft52IfnNCF9yZ1uIkJTvdYs7JNhmjYgYwdm+/Sd oKN0MtlWozfrndjFrnTukXeXrrEfV9FqtD1OpcsKfoHwQCyqC5TK1XBN7ekq6NT6s4Hw 0e0g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=irWRL3Dh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id l10-20020a17090aec0a00b00267ba1c43adsi333559pjy.101.2023.09.13.16.29.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Sep 2023 16:29:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=irWRL3Dh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id E08D481EE2CF; Wed, 13 Sep 2023 14:27:59 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232512AbjIMV1g (ORCPT <rfc822;pwkd43@gmail.com> + 34 others); Wed, 13 Sep 2023 17:27:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38742 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229743AbjIMV1c (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 13 Sep 2023 17:27:32 -0400 Received: from mail-pf1-x42d.google.com (mail-pf1-x42d.google.com [IPv6:2607:f8b0:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E5D041724 for <linux-kernel@vger.kernel.org>; Wed, 13 Sep 2023 14:27:28 -0700 (PDT) Received: by mail-pf1-x42d.google.com with SMTP id d2e1a72fcca58-68fb46f38f9so241927b3a.1 for <linux-kernel@vger.kernel.org>; Wed, 13 Sep 2023 14:27:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1694640448; x=1695245248; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/kiTus1GQ2wpWYv2hiScbkUesx/S4YRwA/TTSZ+SSL4=; b=irWRL3Dhte1i12EFMTqrguyLoUPSaGHc1URlmZ4ILg/6UhpBKDkn1uiORA47RurTRr HbBMckPJCQ5HWE/fi2DItGu3CzIZSEn3F8xSRYyCi8EQiOTULT3US9aI+JgA/0EibUsS irsjZ7ugWpzAT8M00n9Egc6hiYVBG6ygIQe1k= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694640448; x=1695245248; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/kiTus1GQ2wpWYv2hiScbkUesx/S4YRwA/TTSZ+SSL4=; b=T0E5D/dtZ2hsvgH6kX/qyRB17Z0DjVz6evDpwh4o5S4VxCFevgyRJYCCMRdIOrDfNU Jqg+I0AabV6oPtsB5yiZfsogBBYZQitiPI43pwS1Kqpk4CW56QsMWqPr3evuxROp9S6H foafImwsWWg+xlN9Wwe13rOj/JpIWdMIth9hAMS1Afxs5goRzn0Ta44C3tMtbixFWs1x pXOhKhnKDzxlhsuZV25JvjOPDFbZpvLNBcAmjwJNdhQjl74USLDA9MCRfUxEXDPyXpJw INDoEqOWoltQtYpcd/sgoYukou9PT7xu/52t3OZqmkMb1wTY99XMVyDG+BynubLE2atB 7qkA== X-Gm-Message-State: AOJu0Yy7AtTG4OZjSFKJrUsdPVgqgZyy9FBrER0UfcljYimSnpDMfXdZ 1AoeKQio3VC8HaC70s0AAJiLWw== X-Received: by 2002:a05:6a20:561a:b0:147:d861:50e4 with SMTP id ir26-20020a056a20561a00b00147d86150e4mr3072141pzc.33.1694640448396; Wed, 13 Sep 2023 14:27:28 -0700 (PDT) Received: from smtp.gmail.com ([2620:15c:11a:201:ae97:c6dc:1d98:494f]) by smtp.gmail.com with ESMTPSA id a10-20020a17090ad80a00b0025bdc3454c6sm1923976pjv.8.2023.09.13.14.27.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Sep 2023 14:27:27 -0700 (PDT) From: Stephen Boyd <swboyd@chromium.org> To: Mika Westerberg <mika.westerberg@linux.intel.com>, Hans de Goede <hdegoede@redhat.com>, Mark Gross <markgross@kernel.org> Cc: linux-kernel@vger.kernel.org, patches@lists.linux.dev, platform-driver-x86@vger.kernel.org, Andy Shevchenko <andriy.shevchenko@linux.intel.com>, Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>, Prashant Malani <pmalani@chromium.org> Subject: [PATCH v4 2/4] platform/x86: intel_scu_ipc: Check status upon timeout in ipc_wait_for_interrupt() Date: Wed, 13 Sep 2023 14:27:20 -0700 Message-ID: <20230913212723.3055315-3-swboyd@chromium.org> X-Mailer: git-send-email 2.42.0.283.g2d96d420d3-goog In-Reply-To: <20230913212723.3055315-1-swboyd@chromium.org> References: <20230913212723.3055315-1-swboyd@chromium.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Wed, 13 Sep 2023 14:27:59 -0700 (PDT) X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1776966981581163188 X-GMAIL-MSGID: 1776966981581163188 |
Series |
platform/x86: intel_scu_ipc: Timeout fixes
|
|
Commit Message
Stephen Boyd
Sept. 13, 2023, 9:27 p.m. UTC
It's possible for the completion in ipc_wait_for_interrupt() to timeout, simply because the interrupt was delayed in being processed. A timeout in itself is not an error. This driver should check the status register upon a timeout to ensure that scheduling or interrupt processing delays don't affect the outcome of the IPC return value. CPU0 SCU ---- --- ipc_wait_for_interrupt() wait_for_completion_timeout(&scu->cmd_complete) [TIMEOUT] status[IPC_STATUS_BUSY]=0 Fix this problem by reading the status bit in all cases, regardless of the timeout. If the completion times out, we'll assume the problem was that the IPC_STATUS_BUSY bit was still set, but if the status bit is cleared in the meantime we know that we hit some scheduling delay and we should just check the error bit. Cc: Prashant Malani <pmalani@chromium.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Fixes: ed12f295bfd5 ("ipc: Added support for IPC interrupt mode") Signed-off-by: Stephen Boyd <swboyd@chromium.org> --- drivers/platform/x86/intel_scu_ipc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
Comments
On Wed, 13 Sep 2023, Stephen Boyd wrote: > It's possible for the completion in ipc_wait_for_interrupt() to timeout, > simply because the interrupt was delayed in being processed. A timeout > in itself is not an error. This driver should check the status register > upon a timeout to ensure that scheduling or interrupt processing delays > don't affect the outcome of the IPC return value. > > CPU0 SCU > ---- --- > ipc_wait_for_interrupt() > wait_for_completion_timeout(&scu->cmd_complete) > [TIMEOUT] status[IPC_STATUS_BUSY]=0 > > Fix this problem by reading the status bit in all cases, regardless of > the timeout. If the completion times out, we'll assume the problem was > that the IPC_STATUS_BUSY bit was still set, but if the status bit is > cleared in the meantime we know that we hit some scheduling delay and we > should just check the error bit. Hi, I don't understand the intent here. What prevents IPC_STATUS_BUSY from changing right after you've read it in ipc_read_status(scu)? Doesn't that end you exactly into the same situation where the returned value is stale so I cannot see how this fixes anything, at best it just plays around the race window that seems to still be there after this fix?
Hi Ilpo, On 9/15/23 15:49, Ilpo Järvinen wrote: > On Wed, 13 Sep 2023, Stephen Boyd wrote: > >> It's possible for the completion in ipc_wait_for_interrupt() to timeout, >> simply because the interrupt was delayed in being processed. A timeout >> in itself is not an error. This driver should check the status register >> upon a timeout to ensure that scheduling or interrupt processing delays >> don't affect the outcome of the IPC return value. >> >> CPU0 SCU >> ---- --- >> ipc_wait_for_interrupt() >> wait_for_completion_timeout(&scu->cmd_complete) >> [TIMEOUT] status[IPC_STATUS_BUSY]=0 >> >> Fix this problem by reading the status bit in all cases, regardless of >> the timeout. If the completion times out, we'll assume the problem was >> that the IPC_STATUS_BUSY bit was still set, but if the status bit is >> cleared in the meantime we know that we hit some scheduling delay and we >> should just check the error bit. > > Hi, > > I don't understand the intent here. What prevents IPC_STATUS_BUSY from > changing right after you've read it in ipc_read_status(scu)? Doesn't that > end you exactly into the same situation where the returned value is stale > so I cannot see how this fixes anything, at best it just plays around the > race window that seems to still be there after this fix? As I understand it the problem before was that the function would return -ETIMEDOUT; purely based on wait_for_completion_timeout() without ever actually checking the BUSY bit: Old code: if (!wait_for_completion_timeout(&scu->cmd_complete, IPC_TIMEOUT)) return -ETIMEDOUT; This allows for a scenario where when the IRQ processing got delayed (on say another core) causing the timeout to trigger, ipc_wait_for_interrupt() would return -ETIMEDOUT even though the BUSY flag was already cleared by the SCU. This patch adds an explicit check for the BUSY flag after the wait_for_completion(), rather then relying on the wait_for_completion() return value which implies things are still busy. As for "What prevents IPC_STATUS_BUSY from changing right after you've read it in ipc_read_status(scu)?" AFAICT in this code path the bit is only ever supposed to go from being set (busy) to unset (not busy), not the other way around since no new commands can be submitted until this function has completed. So that scenario cannot happen. Regards, Hans
On Mon, 18 Sep 2023, Hans de Goede wrote: > On 9/15/23 15:49, Ilpo Järvinen wrote: > > On Wed, 13 Sep 2023, Stephen Boyd wrote: > > > >> It's possible for the completion in ipc_wait_for_interrupt() to timeout, > >> simply because the interrupt was delayed in being processed. A timeout > >> in itself is not an error. This driver should check the status register > >> upon a timeout to ensure that scheduling or interrupt processing delays > >> don't affect the outcome of the IPC return value. > >> > >> CPU0 SCU > >> ---- --- > >> ipc_wait_for_interrupt() > >> wait_for_completion_timeout(&scu->cmd_complete) > >> [TIMEOUT] status[IPC_STATUS_BUSY]=0 > >> > >> Fix this problem by reading the status bit in all cases, regardless of > >> the timeout. If the completion times out, we'll assume the problem was > >> that the IPC_STATUS_BUSY bit was still set, but if the status bit is > >> cleared in the meantime we know that we hit some scheduling delay and we > >> should just check the error bit. > > > > Hi, > > > > I don't understand the intent here. What prevents IPC_STATUS_BUSY from > > changing right after you've read it in ipc_read_status(scu)? Doesn't that > > end you exactly into the same situation where the returned value is stale > > so I cannot see how this fixes anything, at best it just plays around the > > race window that seems to still be there after this fix? > > As I understand it the problem before was that the function would > return -ETIMEDOUT; purely based on wait_for_completion_timeout() > without ever actually checking the BUSY bit: > > Old code: > > if (!wait_for_completion_timeout(&scu->cmd_complete, IPC_TIMEOUT)) > return -ETIMEDOUT; > > This allows for a scenario where when the IRQ processing got delayed > (on say another core) causing the timeout to trigger, > ipc_wait_for_interrupt() would return -ETIMEDOUT even though > the BUSY flag was already cleared by the SCU. > > This patch adds an explicit check for the BUSY flag after > the wait_for_completion(), rather then relying on the > wait_for_completion() return value which implies things > are still busy. Oh, I see, it's because the code is waiting for the completion rather than the actual condition. > As for "What prevents IPC_STATUS_BUSY from > changing right after you've read it in ipc_read_status(scu)?" > > AFAICT in this code path the bit is only ever supposed to go > from being set (busy) to unset (not busy), not the other > way around since no new commands can be submitted until > this function has completed. So that scenario cannot happen. This is not what I meant. I meant that if the code has decided to return -ETIMEDOUT, the status bit still change at that point which makes the return value to not match. This race is still there and given the changelog was a bit sparse on what race it was fixing I ended up noticing this detail.
diff --git a/drivers/platform/x86/intel_scu_ipc.c b/drivers/platform/x86/intel_scu_ipc.c index 4c774ee8bb1b..299c15312acb 100644 --- a/drivers/platform/x86/intel_scu_ipc.c +++ b/drivers/platform/x86/intel_scu_ipc.c @@ -248,10 +248,12 @@ static inline int ipc_wait_for_interrupt(struct intel_scu_ipc_dev *scu) { int status; - if (!wait_for_completion_timeout(&scu->cmd_complete, IPC_TIMEOUT)) - return -ETIMEDOUT; + wait_for_completion_timeout(&scu->cmd_complete, IPC_TIMEOUT); status = ipc_read_status(scu); + if (status & IPC_STATUS_BUSY) + return -ETIMEDOUT; + if (status & IPC_STATUS_ERR) return -EIO;