Message ID | 20221222221244.1290833-1-kuba@kernel.org |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp16745wrn; Thu, 22 Dec 2022 14:18:35 -0800 (PST) X-Google-Smtp-Source: AMrXdXuGzAYgjaWVyYaFuWWM/NqY/ys82olkseJKkhnbHdzkukHtedfBFqBeqQ8sxnTeDldOC4Ms X-Received: by 2002:a05:6a00:7c8:b0:577:458f:97cc with SMTP id n8-20020a056a0007c800b00577458f97ccmr23567509pfu.2.1671747515243; Thu, 22 Dec 2022 14:18:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671747515; cv=none; d=google.com; s=arc-20160816; b=yVZm6fhsmObIYw73/7zqMXE0GGjXnH7gGcupV/qgyKJD4GtMyqo5HujEKEUz4EXZFg /j/peAlYSAqmGlwZmgcFjbjPiRSlE9nHRi52OwPOn6Yt1gKtoK6vNzB3Bf9v6jCrAvy8 cIQfnQJWjpgK5CQWrennPmYJu0an39c4EZUu8gts8puKCOTSY9nKOiQMCWODrdQtL91N sCsnumBoWcmKGeLQ3b9uuN21sY7eIb7ZYifV7sbydOwahFmZEzkg7frajOWvvVdp6Dks 0X2+uFk0sdtJVCKCKLsNieQpciNhoIQ9hKlHj+pRdC1bsTxPpJxstOzD/n89T4w7zyt3 s3CQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=xqiwf1XPGCJ0lEjS8jUJgFsd4SlIBGlYosGmTOIuS2A=; b=qeRnsZff97WjXlPSJFcn95CWROkXEIxRO8izq46CHTuYYUhLnDxAyO0MlWtAg3e1v3 OvX9iEZrDkrA6xNAKndMtMy9Fs/o6b+h6XRWLvVh347efhszRAOHAFeDGvoeLmU0nMYs CXkkgiWbGSXoXQGGn05n1MugNn/tOA3DrXiVcyW9gWltcaU0zKKBTzF1PdnX9Uoxp76e wVEMCrlycBvmpux8yQYljHgHV8RN+DqPJlOg+9g042gLN14LlD1vMm+kBfKXQlmRFoPu oBisohzcu5vdQy4IYYUX9aqVmJSlqotAI1ZbjDZMgU++xWpOdFdC+GRgbNP1dhouq4vK vvDg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=AlERWfGT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t3-20020a056a0021c300b00561898445bdsi1800100pfj.273.2022.12.22.14.18.20; Thu, 22 Dec 2022 14:18:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=AlERWfGT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235697AbiLVWND (ORCPT <rfc822;pacteraone@gmail.com> + 99 others); Thu, 22 Dec 2022 17:13:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59646 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229820AbiLVWMv (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 22 Dec 2022 17:12:51 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 33C1421E28; Thu, 22 Dec 2022 14:12:50 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id BE293B81F73; Thu, 22 Dec 2022 22:12:48 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 26F6CC433D2; Thu, 22 Dec 2022 22:12:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1671747167; bh=pl0+Mxn05zTD00+BMsF+I76rO5jQsWPLZD2igIpiXXY=; h=From:To:Cc:Subject:Date:From; b=AlERWfGTWbwBdZauWN/RZxkJ09l5bGY9HtIUVqPDrw+3F53m9RF/DOPbjmi70am2L mdSo873ZtV6pEm437aB34PnVrPe2SErTLaka7r5NwJXFpgZjqt7r1mwCx8RN52OLgX pqSrpdOaXoDARbdGJmAIVbuYRJ1D4Na622QKZdtm6rljYlPEgyLaBlI5QT7IcBDIfC 8uSOczkAE66NVvbloC40SJ2lU2RFtOl3vAUtsNkxwtwBXDdHnRmppvasF2bOAy2Rwy WIs/K9beEjqU3cUOiEleIJZERkSE5SD32JohVJIuqgY+0M8JYg1v7Bg+f9BfJaoM0B Kx32eokqazY0g== From: Jakub Kicinski <kuba@kernel.org> To: peterz@infradead.org, tglx@linutronix.de Cc: jstultz@google.com, edumazet@google.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Jakub Kicinski <kuba@kernel.org> Subject: [PATCH 0/3] softirq: uncontroversial change Date: Thu, 22 Dec 2022 14:12:41 -0800 Message-Id: <20221222221244.1290833-1-kuba@kernel.org> X-Mailer: git-send-email 2.38.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752954322454856846?= X-GMAIL-MSGID: =?utf-8?q?1752954322454856846?= |
Series | softirq: uncontroversial change | |
Message
Jakub Kicinski
Dec. 22, 2022, 10:12 p.m. UTC
Catching up on LWN I run across the article about softirq changes, and then I noticed fresh patches in Peter's tree. So probably wise for me to throw these out there. My (can I say Meta's?) problem is the opposite to what the RT sensitive people complain about. In the current scheme once ksoftirqd is woken no network processing happens until it runs. When networking gets overloaded - that's probably fair, the problem is that we confuse latency tweaks with overload protection. We have a needs_resched() in the loop condition (which is a latency tweak) Most often we defer to ksoftirqd because we're trying to be nice and let user space respond quickly, not because there is an overload. But the user space may not be nice, and sit on the CPU for 10ms+. Also the sirq's "work allowance" is 2ms, which is uncomfortably close to the timer tick, but that's another story. We have a sirq latency tracker in our prod kernel which catches 8ms+ stalls of net Tx (packets queued to the NIC but there is no NAPI cleanup within 8ms) and with these patches applied on 5.19 fully loaded web machine sees a drop in stalls from 1.8 stalls/sec to 0.16/sec. I also see a 50% drop in outgoing TCP retransmissions and ~10% drop in non-TLP incoming ones. This is not a network-heavy workload so most of the rtx are due to scheduling artifacts. The network latency in a datacenter is somewhere around neat 1000x lower than scheduling granularity (around 10us). These patches (patch 2 is "the meat") change what we recognize as overload. Instead of just checking if "ksoftirqd is woken" it also caps how long we consider ourselves to be in overload, a time limit which is different based on whether we yield due to real resource exhaustion vs just hitting that needs_resched(). I hope the core concept is not entirely idiotic. It'd be great if we could get this in or fold an equivalent concept into ongoing work from others, because due to various "scheduler improvements" every time we upgrade the production kernel this problem is getting worse :( Jakub Kicinski (3): softirq: rename ksoftirqd_running() -> ksoftirqd_should_handle() softirq: avoid spurious stalls due to need_resched() softirq: don't yield if only expedited handlers are pending kernel/softirq.c | 29 ++++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-)
Comments
Hi all, On Thu, 2022-12-22 at 14:12 -0800, Jakub Kicinski wrote: > Catching up on LWN I run across the article about softirq > changes, and then I noticed fresh patches in Peter's tree. > So probably wise for me to throw these out there. > > My (can I say Meta's?) problem is the opposite to what the RT > sensitive people complain about. In the current scheme once > ksoftirqd is woken no network processing happens until it runs. > > When networking gets overloaded - that's probably fair, the problem > is that we confuse latency tweaks with overload protection. We have > a needs_resched() in the loop condition (which is a latency tweak) > Most often we defer to ksoftirqd because we're trying to be nice > and let user space respond quickly, not because there is an > overload. But the user space may not be nice, and sit on the CPU > for 10ms+. Also the sirq's "work allowance" is 2ms, which is > uncomfortably close to the timer tick, but that's another story. > > We have a sirq latency tracker in our prod kernel which catches > 8ms+ stalls of net Tx (packets queued to the NIC but there is > no NAPI cleanup within 8ms) and with these patches applied > on 5.19 fully loaded web machine sees a drop in stalls from > 1.8 stalls/sec to 0.16/sec. I also see a 50% drop in outgoing > TCP retransmissions and ~10% drop in non-TLP incoming ones. > This is not a network-heavy workload so most of the rtx are > due to scheduling artifacts. > > The network latency in a datacenter is somewhere around neat > 1000x lower than scheduling granularity (around 10us). > > These patches (patch 2 is "the meat") change what we recognize > as overload. Instead of just checking if "ksoftirqd is woken" > it also caps how long we consider ourselves to be in overload, > a time limit which is different based on whether we yield due > to real resource exhaustion vs just hitting that needs_resched(). > > I hope the core concept is not entirely idiotic. It'd be great > if we could get this in or fold an equivalent concept into ongoing > work from others, because due to various "scheduler improvements" > every time we upgrade the production kernel this problem is getting > worse :( Please allow me to revive this old thread. My understanding is that we want to avoid adding more heuristics here, preferring a consistent refactor. I would like to propose a revert of: 4cd13c21b207 softirq: Let ksoftirqd do its job the its follow-ups: 3c53776e29f8 Mark HI and TASKLET softirq synchronous 0f50524789fc softirq: Don't skip softirq execution when softirq thread is parking The problem originally addressed by 4cd13c21b207 can now be tackled with the threaded napi, available since: 29863d41bb6e net: implement threaded-able napi poll loop support Reverting the mentioned commit should address the latency issues mentioned by Jakub - I verified it solves a somewhat related problem in my setup - and reduces the layering of heuristics in this area. A refactor introducing uniform overload detection and proper resource control will be better, but I admit it's beyond me and anyway it could still land afterwards. Any opinion more then welcome! Thanks, Paolo
On Thu, Apr 20, 2023 at 7:24 PM Paolo Abeni <pabeni@redhat.com> wrote: > > Hi all, > On Thu, 2022-12-22 at 14:12 -0800, Jakub Kicinski wrote: > > Catching up on LWN I run across the article about softirq > > changes, and then I noticed fresh patches in Peter's tree. > > So probably wise for me to throw these out there. > > > > My (can I say Meta's?) problem is the opposite to what the RT > > sensitive people complain about. In the current scheme once > > ksoftirqd is woken no network processing happens until it runs. > > > > When networking gets overloaded - that's probably fair, the problem > > is that we confuse latency tweaks with overload protection. We have > > a needs_resched() in the loop condition (which is a latency tweak) > > Most often we defer to ksoftirqd because we're trying to be nice > > and let user space respond quickly, not because there is an > > overload. But the user space may not be nice, and sit on the CPU > > for 10ms+. Also the sirq's "work allowance" is 2ms, which is > > uncomfortably close to the timer tick, but that's another story. > > > > We have a sirq latency tracker in our prod kernel which catches > > 8ms+ stalls of net Tx (packets queued to the NIC but there is > > no NAPI cleanup within 8ms) and with these patches applied > > on 5.19 fully loaded web machine sees a drop in stalls from > > 1.8 stalls/sec to 0.16/sec. I also see a 50% drop in outgoing > > TCP retransmissions and ~10% drop in non-TLP incoming ones. > > This is not a network-heavy workload so most of the rtx are > > due to scheduling artifacts. > > > > The network latency in a datacenter is somewhere around neat > > 1000x lower than scheduling granularity (around 10us). > > > > These patches (patch 2 is "the meat") change what we recognize > > as overload. Instead of just checking if "ksoftirqd is woken" > > it also caps how long we consider ourselves to be in overload, > > a time limit which is different based on whether we yield due > > to real resource exhaustion vs just hitting that needs_resched(). > > > > I hope the core concept is not entirely idiotic. It'd be great > > if we could get this in or fold an equivalent concept into ongoing > > work from others, because due to various "scheduler improvements" > > every time we upgrade the production kernel this problem is getting > > worse :( > > Please allow me to revive this old thread. > > My understanding is that we want to avoid adding more heuristics here, > preferring a consistent refactor. > > I would like to propose a revert of: > > 4cd13c21b207 softirq: Let ksoftirqd do its job > > the its follow-ups: > > 3c53776e29f8 Mark HI and TASKLET softirq synchronous > 0f50524789fc softirq: Don't skip softirq execution when softirq thread is parking > > The problem originally addressed by 4cd13c21b207 can now be tackled > with the threaded napi, available since: > > 29863d41bb6e net: implement threaded-able napi poll loop support > > Reverting the mentioned commit should address the latency issues > mentioned by Jakub - I verified it solves a somewhat related problem in > my setup - and reduces the layering of heuristics in this area. > > A refactor introducing uniform overload detection and proper resource > control will be better, but I admit it's beyond me and anyway it could > still land afterwards. > > Any opinion more then welcome! Seems fine, but I think few things need to be fixed first in napi_threaded_poll() to enable some important features that are currently in net_rx_action() only.
On Thu, 2023-04-20 at 19:41 +0200, Eric Dumazet wrote: > On Thu, Apr 20, 2023 at 7:24 PM Paolo Abeni <pabeni@redhat.com> wrote: > > I would like to propose a revert of: > > > > 4cd13c21b207 softirq: Let ksoftirqd do its job > > > > the its follow-ups: > > > > 3c53776e29f8 Mark HI and TASKLET softirq synchronous > > 0f50524789fc softirq: Don't skip softirq execution when softirq thread is parking > > > > The problem originally addressed by 4cd13c21b207 can now be tackled > > with the threaded napi, available since: > > > > 29863d41bb6e net: implement threaded-able napi poll loop support > > > > Reverting the mentioned commit should address the latency issues > > mentioned by Jakub - I verified it solves a somewhat related problem in > > my setup - and reduces the layering of heuristics in this area. > > > > A refactor introducing uniform overload detection and proper resource > > control will be better, but I admit it's beyond me and anyway it could > > still land afterwards. > > > > Any opinion more then welcome! > > Seems fine, but I think few things need to be fixed first in > napi_threaded_poll() > to enable some important features that are currently in net_rx_action() only. Thanks for the feedback. I fear I'll miss some relevant bits. On top of my head I think about RPS and skb_defer_free. Both should work even when napi threaded is enabled - with an additional softirq ;) Do you think we should be able to handle both inside the napi thread? Or do you refer to other features? Thanks! Paolo
On Fri, Apr 21, 2023 at 1:34 AM Paolo Abeni <pabeni@redhat.com> wrote: > > Hi all, > On Thu, 2022-12-22 at 14:12 -0800, Jakub Kicinski wrote: > > Catching up on LWN I run across the article about softirq > > changes, and then I noticed fresh patches in Peter's tree. > > So probably wise for me to throw these out there. > > > > My (can I say Meta's?) problem is the opposite to what the RT > > sensitive people complain about. In the current scheme once > > ksoftirqd is woken no network processing happens until it runs. > > > > When networking gets overloaded - that's probably fair, the problem > > is that we confuse latency tweaks with overload protection. We have > > a needs_resched() in the loop condition (which is a latency tweak) > > Most often we defer to ksoftirqd because we're trying to be nice > > and let user space respond quickly, not because there is an > > overload. But the user space may not be nice, and sit on the CPU > > for 10ms+. Also the sirq's "work allowance" is 2ms, which is > > uncomfortably close to the timer tick, but that's another story. > > > > We have a sirq latency tracker in our prod kernel which catches > > 8ms+ stalls of net Tx (packets queued to the NIC but there is > > no NAPI cleanup within 8ms) and with these patches applied > > on 5.19 fully loaded web machine sees a drop in stalls from > > 1.8 stalls/sec to 0.16/sec. I also see a 50% drop in outgoing > > TCP retransmissions and ~10% drop in non-TLP incoming ones. > > This is not a network-heavy workload so most of the rtx are > > due to scheduling artifacts. > > > > The network latency in a datacenter is somewhere around neat > > 1000x lower than scheduling granularity (around 10us). > > > > These patches (patch 2 is "the meat") change what we recognize > > as overload. Instead of just checking if "ksoftirqd is woken" > > it also caps how long we consider ourselves to be in overload, > > a time limit which is different based on whether we yield due > > to real resource exhaustion vs just hitting that needs_resched(). > > > > I hope the core concept is not entirely idiotic. It'd be great > > if we could get this in or fold an equivalent concept into ongoing > > work from others, because due to various "scheduler improvements" > > every time we upgrade the production kernel this problem is getting > > worse :( > [...] > Please allow me to revive this old thread. Hi Paolo, So good to hear this :) > > My understanding is that we want to avoid adding more heuristics here, > preferring a consistent refactor. > > I would like to propose a revert of: > > 4cd13c21b207 softirq: Let ksoftirqd do its job > > the its follow-ups: > > 3c53776e29f8 Mark HI and TASKLET softirq synchronous > 0f50524789fc softirq: Don't skip softirq execution when softirq thread is parking More than this, I list some related patches mentioned in the above commit 3c53776e29f8: 1ff688209e2e ("watchdog: core: make sure the watchdog_worker is not deferred") 8d5755b3f77b ("watchdog: softdog: fire watchdog even if softirqs do not get to run") 217f69743681 ("net: busy-poll: allow preemption in sk_busy_loop()") > > The problem originally addressed by 4cd13c21b207 can now be tackled > with the threaded napi, available since: > > 29863d41bb6e net: implement threaded-able napi poll loop support > > Reverting the mentioned commit should address the latency issues > mentioned by Jakub - I verified it solves a somewhat related problem in > my setup - and reduces the layering of heuristics in this area. Sure, it is. I also can verify its usefulness in the real workload. Some days ago I also sent a heuristics patch [1] that can bypass the ksoftirqd if the user chooses to mask some type of softirq. Let the user decide it. But I observed that if we mask some softirqs, or we can say, completely revert the commit 4cd13c21b207, the load would go higher and the kernel itself may occupy/consume more time than before. They were tested under the similar workload launched by our applications. [1]: https://lore.kernel.org/all/20230410023041.49857-1-kerneljasonxing@gmail.com/ > > A refactor introducing uniform overload detection and proper resource > control will be better, but I admit it's beyond me and anyway it could > still land afterwards. +1 Thanks, Jason > > Any opinion more then welcome! > > Thanks, > > Paolo >
On Fri, 2023-04-21 at 10:48 +0800, Jason Xing wrote: > > > My understanding is that we want to avoid adding more heuristics here, > > preferring a consistent refactor. > > > > I would like to propose a revert of: > > > > 4cd13c21b207 softirq: Let ksoftirqd do its job > > > > the its follow-ups: > > > > 3c53776e29f8 Mark HI and TASKLET softirq synchronous > > 0f50524789fc softirq: Don't skip softirq execution when softirq thread is parking > > More than this, I list some related patches mentioned in the above > commit 3c53776e29f8: > 1ff688209e2e ("watchdog: core: make sure the watchdog_worker is not deferred") > 8d5755b3f77b ("watchdog: softdog: fire watchdog even if softirqs do > not get to run") > 217f69743681 ("net: busy-poll: allow preemption in sk_busy_loop()") The first 2 changes replace plain timers with HR ones, could possibly be reverted, too, but it should not be a big deal either way. I think instead we want to keep the third commit above, as it should be useful when napi threaded is enabled. Generally speaking I would keep the initial revert to the bare minimum. > > The problem originally addressed by 4cd13c21b207 can now be tackled > > with the threaded napi, available since: > > > > 29863d41bb6e net: implement threaded-able napi poll loop support > > > > Reverting the mentioned commit should address the latency issues > > mentioned by Jakub - I verified it solves a somewhat related problem in > > my setup - and reduces the layering of heuristics in this area. > > Sure, it is. I also can verify its usefulness in the real workload. > Some days ago I also sent a heuristics patch [1] that can bypass the > ksoftirqd if the user chooses to mask some type of softirq. Let the > user decide it. > > But I observed that if we mask some softirqs, or we can say, > completely revert the commit 4cd13c21b207, the load would go higher > and the kernel itself may occupy/consume more time than before. They > were tested under the similar workload launched by our applications. > > [1]: https://lore.kernel.org/all/20230410023041.49857-1-kerneljasonxing@gmail.com/ Thanks for the reference, I would have missed that patch otherwise. My understanding is that adding more knobs here is in the opposite direction of what Thomas is suggesting, and IMHO the 'now mask' should not be exposed to user-space. > Thanks for the feedback, Paolo
On Fri, Apr 21, 2023 at 5:33 PM Paolo Abeni <pabeni@redhat.com> wrote: > > On Fri, 2023-04-21 at 10:48 +0800, Jason Xing wrote: > > > > > My understanding is that we want to avoid adding more heuristics here, > > > preferring a consistent refactor. > > > > > > I would like to propose a revert of: > > > > > > 4cd13c21b207 softirq: Let ksoftirqd do its job > > > > > > the its follow-ups: > > > > > > 3c53776e29f8 Mark HI and TASKLET softirq synchronous > > > 0f50524789fc softirq: Don't skip softirq execution when softirq thread is parking > > > > More than this, I list some related patches mentioned in the above > > commit 3c53776e29f8: > > 1ff688209e2e ("watchdog: core: make sure the watchdog_worker is not deferred") > > 8d5755b3f77b ("watchdog: softdog: fire watchdog even if softirqs do > > not get to run") > > 217f69743681 ("net: busy-poll: allow preemption in sk_busy_loop()") > [...] > The first 2 changes replace plain timers with HR ones, could possibly > be reverted, too, but it should not be a big deal either way. > > I think instead we want to keep the third commit above, as it should be > useful when napi threaded is enabled. > > Generally speaking I would keep the initial revert to the bare minimum. I agree with you :) > > > > The problem originally addressed by 4cd13c21b207 can now be tackled > > > with the threaded napi, available since: > > > > > > 29863d41bb6e net: implement threaded-able napi poll loop support > > > > > > Reverting the mentioned commit should address the latency issues > > > mentioned by Jakub - I verified it solves a somewhat related problem in > > > my setup - and reduces the layering of heuristics in this area. > > > > Sure, it is. I also can verify its usefulness in the real workload. > > Some days ago I also sent a heuristics patch [1] that can bypass the > > ksoftirqd if the user chooses to mask some type of softirq. Let the > > user decide it. > > > > But I observed that if we mask some softirqs, or we can say, > > completely revert the commit 4cd13c21b207, the load would go higher > > and the kernel itself may occupy/consume more time than before. They > > were tested under the similar workload launched by our applications. > > > > [1]: https://lore.kernel.org/all/20230410023041.49857-1-kerneljasonxing@gmail.com/ > > Thanks for the reference, I would have missed that patch otherwise. > > My understanding is that adding more knobs here is in the opposite > direction of what Thomas is suggesting, and IMHO the 'now mask' should > not be exposed to user-space. Could you please share the link about what Thomas is suggesting? I missed it. At the beginning, I didn't have the guts to revert the commit directly. Instead I wrote a compromised patch that is not that elegant as you said. Anyway, the idea is common, but reverting the whole commit may involve more work. I will spend some time digging into this part. More suggestions are also welcome :) Thanks, Jason > > > > Thanks for the feedback, > > Paolo >