Message ID | 20230809073432.4193-1-johan+linaro@kernel.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:c44e:0:b0:3f2:4152:657d with SMTP id w14csp2634369vqr; Wed, 9 Aug 2023 01:01:48 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHdEIm3XzXcYzqGdJTXWNJMev41bmjQqQBGAvF0kNqGko+ohZQTl3KvRxR5kM9MLTu/ZBCo X-Received: by 2002:ac2:5f4e:0:b0:4fb:81f2:4228 with SMTP id 14-20020ac25f4e000000b004fb81f24228mr1079255lfz.31.1691568108478; Wed, 09 Aug 2023 01:01:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691568108; cv=none; d=google.com; s=arc-20160816; b=NW1yT1+sCPtJLLXGgReGoFsfz3r+E1ZdglDwHjQ9AnC4Vw/mJFYbyHFge5/HM/7Z6E fhEN2BJhv6sZFUkyWyFTM9KZrptaMfsWEenY/8tpCUhLigaprbi7jDUPnj1dNEG7dgc9 Zrw3Gg4NXfzSZMElRS5t6DqV9/QB8aqiuER49C5y5rJ770yUk2okK3hCwyE6H3i/x17j fP0C1VrncNdB9LSMLvVNT3dDBrZtUZK3ZM74jXEC0zrToLqqECWQlrWJAg7SCShZIv2D PjeaSGDSjAX6jvQQdMqlec2nAxnhDcxDoluZpF0DXGPE5os7IqBEk2/Bn6iSErCBCdnE N77g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=WKpdOnI8NT0I2kUoF3dNyAVe1uyrPqzk6XTY6gr8Cqs=; fh=wtHqj3TCR6gBQewBbe5b6MjfCSQqDoN/crRZIHN6FM4=; b=MxlS536TQLAN/Sd79yfGGFO1Y/t68wwZNoY4EgD1WvkyRV+67MWDzOV1rRJeMxp+mI kCg+AR0917HifF8a27NaJf4d7ugiCaCfJxZJJwxV3XzZgb/SM5VJkpm/f/Pah5/iL4rS BF7i32bOYqMQj6umF3ZjOVC7f75ChvI2f0vbTV+RHd2uHQvXKL2jKXU6i6SeYGSTzr+g F5gK8lUR8oloqD7dhITKj5J05oJW0cU7/aKis361aYwnvkTjIZ+sPhyN5x9Fq9SyuPYy dZiLqKq0GIurxIlhUSj+qsyiz1AUT3VkhvihoEOODp8Q50zTl05l0huzOujyP9Gt8I/r VNng== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=KLYbVGAP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b10-20020aa7dc0a000000b00522def174desi8990804edu.369.2023.08.09.01.01.24; Wed, 09 Aug 2023 01:01:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=KLYbVGAP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230451AbjHIHex (ORCPT <rfc822;aaronkmseo@gmail.com> + 99 others); Wed, 9 Aug 2023 03:34:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56420 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229650AbjHIHew (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 9 Aug 2023 03:34:52 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C8F610C6; Wed, 9 Aug 2023 00:34:52 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id E5E9762FE2; Wed, 9 Aug 2023 07:34:51 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 59A0FC433C7; Wed, 9 Aug 2023 07:34:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1691566491; bh=60Y6J4I0j3xXL5Nco312J593o+/GGtthikOgUobZJYo=; h=From:To:Cc:Subject:Date:From; b=KLYbVGAPMTOE6TBQhJigf7VgNNYfcES5L5qJFBKGt1lX4B7ZhbYwkwLo2SVmkKUlx jLl4q13qQfGDIX1SSZwBVsqxJc37jScOg/DEQEqi8jus0qQ1rn2aAQrhB9XYMWaYG5 ZWaScln3FVt0MBewvBUzRQGa0MXJ1u13QtJCL33XV2CAqaVCSVUKzz9dKAyBtpy+0B 1+lJCA0TmNshW026Cs1LL23QCTwDn/nAsHmDaVMIcDNtoi2BXG4FBp0fLHMHbS+/nG tPJTtOzmnErHWYsC7yCsAUxdnkIwyvJDCq2tPtgG4aEsyoNevVUF5LDSiR3NO78poY ry76Tm+3uD1Xw== Received: from johan by xi.lan with local (Exim 4.96) (envelope-from <johan+linaro@kernel.org>) id 1qTdiS-00016I-1p; Wed, 09 Aug 2023 09:34:48 +0200 From: Johan Hovold <johan+linaro@kernel.org> To: Kalle Valo <kvalo@kernel.org> Cc: Jeff Johnson <quic_jjohnson@quicinc.com>, Bjorn Andersson <quic_bjorande@quicinc.com>, Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>, Konrad Dybcio <konrad.dybcio@linaro.org>, Manikanta Pubbisetty <quic_mpubbise@quicinc.com>, ath11k@lists.infradead.org, linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org, Johan Hovold <johan+linaro@kernel.org> Subject: [PATCH] Revert "Revert "wifi: ath11k: Enable threaded NAPI"" Date: Wed, 9 Aug 2023 09:34:32 +0200 Message-ID: <20230809073432.4193-1-johan+linaro@kernel.org> X-Mailer: git-send-email 2.41.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773737720466832477 X-GMAIL-MSGID: 1773737720466832477 |
Series |
Revert "Revert "wifi: ath11k: Enable threaded NAPI""
|
|
Commit Message
Johan Hovold
Aug. 9, 2023, 7:34 a.m. UTC
This reverts commit d265ebe41c911314bd273c218a37088835959fa1.
Disabling threaded NAPI causes the Lenovo ThinkPad X13s to hang (e.g. no
more interrupts received) almost immediately during RX.
Apparently something broke since commit 13aa2fb692d3 ("wifi: ath11k:
Enable threaded NAPI") so that a simple revert is no longer possible.
As commit d265ebe41c91 ("Revert "wifi: ath11k: Enable threaded NAPI"")
does not address the underlying issue reported with QCN9074, it seems we
need to reenable threaded NAPI before fixing both bugs properly.
Fixes: d265ebe41c91 ("Revert "wifi: ath11k: Enable threaded NAPI"")
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
---
Hi Kalle,
Disabling threaded NAPI caused a severe regression in 6.5-rc5 by making
the X13s completely unusable (e.g. no keyboard input, I've seen an RCU
splat once).
I'm supposed to be on holiday this week, but thanks to the rain I gave
rc5 a try and ran into this.
I've added Bjorn, Mani and Konrad on CC who may be able to help with
debugging this further if needed while I'm out-of-office.
Johan
drivers/net/wireless/ath/ath11k/ahb.c | 1 +
drivers/net/wireless/ath/ath11k/pcic.c | 1 +
2 files changed, 2 insertions(+)
Comments
On 8/9/2023 1:04 PM, Johan Hovold wrote: > This reverts commit d265ebe41c911314bd273c218a37088835959fa1. > > Disabling threaded NAPI causes the Lenovo ThinkPad X13s to hang (e.g. no > more interrupts received) almost immediately during RX. > > Apparently something broke since commit 13aa2fb692d3 ("wifi: ath11k: > Enable threaded NAPI") so that a simple revert is no longer possible. > This is getting as weird as it would get :) > As commit d265ebe41c91 ("Revert "wifi: ath11k: Enable threaded NAPI"") > does not address the underlying issue reported with QCN9074, it seems we > need to reenable threaded NAPI before fixing both bugs properly. > It seems that the revert has actually solved the issue reported with QCN9074. https://bugzilla.kernel.org/show_bug.cgi?id=217536 We were trying to reproduce the problem on X86+QCN9074 (with threaded NAPI) from quite some time, but there is no repro yet. Actually, enabling/disabling threaded NAPI is a simple affair; I'm wondering to hear that interrupts are blocked due to not having threaded NAPI. What is the chip that Lenovo Thinkpad X13s is having? Thanks, Manikanta
On Wed, Aug 09, 2023 at 02:32:37PM +0530, Manikanta Pubbisetty wrote: > On 8/9/2023 1:04 PM, Johan Hovold wrote: > > This reverts commit d265ebe41c911314bd273c218a37088835959fa1. > > > > Disabling threaded NAPI causes the Lenovo ThinkPad X13s to hang (e.g. no > > more interrupts received) almost immediately during RX. > > > > Apparently something broke since commit 13aa2fb692d3 ("wifi: ath11k: > > Enable threaded NAPI") so that a simple revert is no longer possible. > > > > This is getting as weird as it would get :) > > > As commit d265ebe41c91 ("Revert "wifi: ath11k: Enable threaded NAPI"") > > does not address the underlying issue reported with QCN9074, it seems we > > need to reenable threaded NAPI before fixing both bugs properly. > > > > It seems that the revert has actually solved the issue reported with > QCN9074. > > https://bugzilla.kernel.org/show_bug.cgi?id=217536 Sure, but it's only a workaround as the underlying cause has not been identified. > We were trying to reproduce the problem on X86+QCN9074 (with threaded > NAPI) from quite some time, but there is no repro yet. > > Actually, enabling/disabling threaded NAPI is a simple affair; I'm > wondering to hear that interrupts are blocked due to not having > threaded NAPI. It sounds to me like the driver's locking is broken if moving to softirq processing hangs the machine like this. But I have not had time to try to try to track it down besides verifying that reenabling threaded NAPI makes the problem go away. > What is the chip that Lenovo Thinkpad X13s is having? It's a WCN6855 (QCNFA765). Johan
On 8/9/2023 2:46 PM, Johan Hovold wrote: > On Wed, Aug 09, 2023 at 02:32:37PM +0530, Manikanta Pubbisetty wrote: >> On 8/9/2023 1:04 PM, Johan Hovold wrote: >>> This reverts commit d265ebe41c911314bd273c218a37088835959fa1. >>> >>> Disabling threaded NAPI causes the Lenovo ThinkPad X13s to hang (e.g. no >>> more interrupts received) almost immediately during RX. >>> >>> Apparently something broke since commit 13aa2fb692d3 ("wifi: ath11k: >>> Enable threaded NAPI") so that a simple revert is no longer possible. >>> >> >> This is getting as weird as it would get :) >> >>> As commit d265ebe41c91 ("Revert "wifi: ath11k: Enable threaded NAPI"") >>> does not address the underlying issue reported with QCN9074, it seems we >>> need to reenable threaded NAPI before fixing both bugs properly. >>> >> >> It seems that the revert has actually solved the issue reported with >> QCN9074. >> >> https://bugzilla.kernel.org/show_bug.cgi?id=217536 > > Sure, but it's only a workaround as the underlying cause has not been > identified. > >> We were trying to reproduce the problem on X86+QCN9074 (with threaded >> NAPI) from quite some time, but there is no repro yet. >> >> Actually, enabling/disabling threaded NAPI is a simple affair; I'm >> wondering to hear that interrupts are blocked due to not having >> threaded NAPI. > > It sounds to me like the driver's locking is broken if moving to softirq > processing hangs the machine like this. But I have not had time to try > to try to track it down besides verifying that reenabling threaded NAPI > makes the problem go away. > >> What is the chip that Lenovo Thinkpad X13s is having? > > It's a WCN6855 (QCNFA765). > WCN6855 & QCN9074 share the same driver code base since both being PCIe devices. One working and another not working seems to be surprising. Do you have a dmesg log when this problem occurred? We are working on to root cause the original problem. The hindrance as of today is that we are not able to repro this so far in Qualcomm. We are planning to work with the reporter to get more logs. Thanks, Manikanta
On 8/9/2023 2:46 PM, Johan Hovold wrote: > On Wed, Aug 09, 2023 at 02:32:37PM +0530, Manikanta Pubbisetty wrote: >> On 8/9/2023 1:04 PM, Johan Hovold wrote: >>> This reverts commit d265ebe41c911314bd273c218a37088835959fa1. >>> >>> Disabling threaded NAPI causes the Lenovo ThinkPad X13s to hang (e.g. no >>> more interrupts received) almost immediately during RX. >>> >>> Apparently something broke since commit 13aa2fb692d3 ("wifi: ath11k: >>> Enable threaded NAPI") so that a simple revert is no longer possible. >>> >> >> This is getting as weird as it would get :) >> >>> As commit d265ebe41c91 ("Revert "wifi: ath11k: Enable threaded NAPI"") >>> does not address the underlying issue reported with QCN9074, it seems we >>> need to reenable threaded NAPI before fixing both bugs properly. >>> >> >> It seems that the revert has actually solved the issue reported with >> QCN9074. >> >> https://bugzilla.kernel.org/show_bug.cgi?id=217536 > > Sure, but it's only a workaround as the underlying cause has not been > identified. > >> We were trying to reproduce the problem on X86+QCN9074 (with threaded >> NAPI) from quite some time, but there is no repro yet. >> >> Actually, enabling/disabling threaded NAPI is a simple affair; I'm >> wondering to hear that interrupts are blocked due to not having >> threaded NAPI. > > It sounds to me like the driver's locking is broken if moving to softirq > processing hangs the machine like this. But I have not had time to try > to try to track it down besides verifying that reenabling threaded NAPI > makes the problem go away. > >> What is the chip that Lenovo Thinkpad X13s is having? > > It's a WCN6855 (QCNFA765). > Also it is worth to give a try with this patch here https://patchwork.kernel.org/project/linux-wireless/patch/20230601033840.2997-1-quic_bqiang@quicinc.com/ . This seems to be fixing some known interrupt issue on WCN6855. Could you pls give a try? Thanks, Manikanta
On Tue, Aug 22, 2023 at 03:56:24PM +0300, Kalle Valo wrote: > Johan Hovold <johan@kernel.org> writes: > > On Wed, Aug 09, 2023 at 09:34:32AM +0200, Johan Hovold wrote: > > > >> Disabling threaded NAPI caused a severe regression in 6.5-rc5 by making > >> the X13s completely unusable (e.g. no keyboard input, I've seen an RCU > >> splat once). > > Any chance we can get the offending commit reverted before 6.5 is > > released? > > The problem here is that would break QCN9074 again so there is no good > solution. I suspect we have a fundamental issue in ath11k which we just > haven't discovered yet. I would prefer to get to the bottom of this > before reverting anything. Sure, ideally we can find and fix the underlying issues these next few days, but since this regression was introduced in rc5 in an attempt to address the QCN9074 issue which has been there since 6.1 I think we need to revert otherwise. > > I'll take a closer look at this meanwhile. > > Thanks, much appreciated. Did you try enabling all kernel debug > features, maybe they would give some hints? Yes, I have a bunch of those enabled. Lockdep does not complain, but the hard lockup detector triggers and it looks like CPU0 (which handles most interrupts on this machine currently) has got stuck while processing an interrupt. RCU also detects the stall on CPU0 and provides a task dump for ksoftirqd with the following call trace: __switch_to run_ksoftirqd smpboot_thread_fn kthread ret_from_fork I just tried the out-of-tree pseudo NMI series [0] to get a stack trace, but CPU0 does not respond to those either when I hit this. Note that it takes a bit of RX to trigger this, but I hit it as soon as I try to download something substantial (e.g. after a couple of MB). Johan [0] https://lore.kernel.org/lkml/20230419225604.21204-1-dianders@chromium.org/
diff --git a/drivers/net/wireless/ath/ath11k/ahb.c b/drivers/net/wireless/ath/ath11k/ahb.c index 139da578831a..1cebba7889d7 100644 --- a/drivers/net/wireless/ath/ath11k/ahb.c +++ b/drivers/net/wireless/ath/ath11k/ahb.c @@ -376,6 +376,7 @@ static void ath11k_ahb_ext_irq_enable(struct ath11k_base *ab) struct ath11k_ext_irq_grp *irq_grp = &ab->ext_irq_grp[i]; if (!irq_grp->napi_enabled) { + dev_set_threaded(&irq_grp->napi_ndev, true); napi_enable(&irq_grp->napi); irq_grp->napi_enabled = true; } diff --git a/drivers/net/wireless/ath/ath11k/pcic.c b/drivers/net/wireless/ath/ath11k/pcic.c index c63083633b37..c899616fbee4 100644 --- a/drivers/net/wireless/ath/ath11k/pcic.c +++ b/drivers/net/wireless/ath/ath11k/pcic.c @@ -466,6 +466,7 @@ void ath11k_pcic_ext_irq_enable(struct ath11k_base *ab) struct ath11k_ext_irq_grp *irq_grp = &ab->ext_irq_grp[i]; if (!irq_grp->napi_enabled) { + dev_set_threaded(&irq_grp->napi_ndev, true); napi_enable(&irq_grp->napi); irq_grp->napi_enabled = true; }