Message ID | 20230325152417.5403-1-kerneljasonxing@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp480631vqo; Sat, 25 Mar 2023 08:49:26 -0700 (PDT) X-Google-Smtp-Source: AKy350b/Vwg8mCVn49TW8RFOgRhruJw3rN8vtAvg12IeNt9eXWRCpMi6co7X2STlYAsPjp6dF9Ar X-Received: by 2002:a17:902:e744:b0:1a1:cc0c:a5c2 with SMTP id p4-20020a170902e74400b001a1cc0ca5c2mr7029784plf.62.1679759366431; Sat, 25 Mar 2023 08:49:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679759366; cv=none; d=google.com; s=arc-20160816; b=dymtninT+aSZbkLqxi0hqscmbhQXyfVNigJvJIpnpsDwfflje0QOTQBjPrhJkRRF0s c8dMdQERPpVDzszVqL/esLNgQLhGEx4W8FMG6/plMkZ+qNzJJP2wsUoluURH6LAHT/aU 85KexgywXLcQ8djsJH+I3ScxThJTBSLI2HFAWo2qp7l9wd3aXkkUzaI7WIbjOtX9ms7b 5jtTpFylNQhr4natZ9r/kbwX5E88/GkEzTqExoy4jtATHdAWEGllJ77LGjAlm0R4C5RY WIuPgfJovAAMsU235hdd5UGJR2/HP77JiKnUnbAm4PgqL1gI7IeRAGBR/1MaNqfU7jZJ 5cTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=niKFnY0t+4MX3R2lOhT5fCEyXxOfUgTVhDHjO9rpeS8=; b=FiQKWM6XUtUYerIcXjxon8aGzJgpCAEz6d5Y0f0+n3uuSx6SDrI3aSfiST3TcYMGtA DcxC1iWl7dH/59kpDom0sWhSu/4zSXzpsenBkKVpNWw8NXQaIf08OseKVCwcULGnXpbd YmTvUDzrFS5vhKEntZHFA84IkgLikui1jlGMAJMiKS3H2Yw9OvFAOo0t02RQKPrlzoOn WRaYlV9XPotRmGKCZYXLZIUQNmxu5+5HF0PsL7RdnPNCuo8z6/DH3vJcARcgUv/kHFu1 Ek4bS6xNzExkeudbAAdRuUkNKOMZ+vsFTUqi58e2PrPZiMg265hC/zmLa2xrFlWp2HLN i1ow== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=Xa07t1Ip; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bs184-20020a6328c1000000b0050f6925a3fesi18547188pgb.596.2023.03.25.08.49.14; Sat, 25 Mar 2023 08:49:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=Xa07t1Ip; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230079AbjCYP0J (ORCPT <rfc822;makky5685@gmail.com> + 99 others); Sat, 25 Mar 2023 11:26:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50222 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229460AbjCYP0G (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sat, 25 Mar 2023 11:26:06 -0400 Received: from mail-pj1-x102b.google.com (mail-pj1-x102b.google.com [IPv6:2607:f8b0:4864:20::102b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E2747D500; Sat, 25 Mar 2023 08:26:05 -0700 (PDT) Received: by mail-pj1-x102b.google.com with SMTP id fy10-20020a17090b020a00b0023b4bcf0727so4332417pjb.0; Sat, 25 Mar 2023 08:26:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679757965; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=niKFnY0t+4MX3R2lOhT5fCEyXxOfUgTVhDHjO9rpeS8=; b=Xa07t1IpvMdvYSqRRh11jFCR38BL5ga6Yq7Df88By3VZeMYJd7qoZIVkP6F7dOCQr4 IDnEP4uo+iZ/zJe/CZLjQMx3vplTGf5LNfTG1fVdD+Ft8efSVOxoTA4nIXje4MCoxei4 I+cGdOYUe+wd2o6gye6P+OmMFi91cEIRdeyWKajPV7f4IKujitg+24ZTEvzD15BX63rW auFzqVeLwfn9C8KCfgJvnhsobGQgq9H0dhvoWTsm+7AvPvzBgqjWQ1XML8if3p5vmmZL cl5QGrlQs9ZNx2jD9Prc6m6fH9WUWzfBcFSJ1XZeUbJwYUC2HZpqB8fctkOhq4cXH78z DKaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679757965; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=niKFnY0t+4MX3R2lOhT5fCEyXxOfUgTVhDHjO9rpeS8=; b=VWw1EQarHN43Zs+Trv1gx68PZo4qaYEnr3CwtUop3S6UdJc3H6zQHjzJjGDs169ltP ChC/0kbRct0zaSvmZMi9C3fddsNeDIRxJK9yZwHDoZ81stMj2xeU8qygZUwmBMsdCT6Q pedrvA5ZcyMbUKK/OLJWCmsP3h9yXdWZPESqo+GQIadCXsgcz0Lwvod2yHxlSDH2G7C7 fpL8nzXnpRrvd+gG22/i4RwqGDicZBvh3TdzKEcXQyq/CpBlLYvrY4JsHxl7Q/aw4vKi lKNSJmC18fVPc/TaqhZKwG9s2LuBZ5oEfPFbOoQW4Qu9WsW/khy7ruIy/eVSJrljMqvF xl7Q== X-Gm-Message-State: AAQBX9dW4QwR+pqnb86qakZyXE2ELwKGMKDmebRlW6af9sqhFNewoRE6 fkWxsOKdpMJ7mXKNbTeT8J4= X-Received: by 2002:a17:903:64e:b0:1a1:b3c0:4228 with SMTP id kh14-20020a170903064e00b001a1b3c04228mr5154310plb.19.1679757965044; Sat, 25 Mar 2023 08:26:05 -0700 (PDT) Received: from KERNELXING-MB0.tencent.com ([114.253.37.157]) by smtp.gmail.com with ESMTPSA id d16-20020a63f250000000b004ff6b744248sm15249470pgk.48.2023.03.25.08.26.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 25 Mar 2023 08:26:04 -0700 (PDT) From: Jason Xing <kerneljasonxing@gmail.com> To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, therbert@google.com Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, kerneljasonxing@gmail.com, Jason Xing <kernelxing@tencent.com> Subject: [PATCH net] net: fix raising a softirq on the current cpu with rps enabled Date: Sat, 25 Mar 2023 23:24:17 +0800 Message-Id: <20230325152417.5403-1-kerneljasonxing@gmail.com> X-Mailer: git-send-email 2.33.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761355357441402196?= X-GMAIL-MSGID: =?utf-8?q?1761355357441402196?= |
Series |
[net] net: fix raising a softirq on the current cpu with rps enabled
|
|
Commit Message
Jason Xing
March 25, 2023, 3:24 p.m. UTC
From: Jason Xing <kernelxing@tencent.com> Since we decide to put the skb into a backlog queue of another cpu, we should not raise the softirq for the current cpu. When to raise a softirq is based on whether we have more data left to process later. As to the current cpu, there is no indication of more data enqueued, so we do not need this action. After enqueuing to another cpu, net_rx_action() function will call ipi and then another cpu will raise the softirq as expected. Also, raising more softirqs which set the corresponding bit field can make the IRQ mechanism think we probably need to start ksoftirqd on the current cpu. Actually it shouldn't happen. Fixes: 0a9627f2649a ("rps: Receive Packet Steering") Signed-off-by: Jason Xing <kernelxing@tencent.com> --- net/core/dev.c | 2 -- 1 file changed, 2 deletions(-)
Comments
On Sat, Mar 25, 2023 at 8:26 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > From: Jason Xing <kernelxing@tencent.com> > > Since we decide to put the skb into a backlog queue of another > cpu, we should not raise the softirq for the current cpu. When > to raise a softirq is based on whether we have more data left to > process later. As to the current cpu, there is no indication of > more data enqueued, so we do not need this action. After enqueuing > to another cpu, net_rx_action() function will call ipi and then > another cpu will raise the softirq as expected. > > Also, raising more softirqs which set the corresponding bit field > can make the IRQ mechanism think we probably need to start ksoftirqd > on the current cpu. Actually it shouldn't happen. > > Fixes: 0a9627f2649a ("rps: Receive Packet Steering") > Signed-off-by: Jason Xing <kernelxing@tencent.com> > --- > net/core/dev.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 1518a366783b..bfaaa652f50c 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -4594,8 +4594,6 @@ static int napi_schedule_rps(struct softnet_data *sd) > if (sd != mysd) { > sd->rps_ipi_next = mysd->rps_ipi_list; > mysd->rps_ipi_list = sd; > - > - __raise_softirq_irqoff(NET_RX_SOFTIRQ); > return 1; > } > #endif /* CONFIG_RPS */ > -- > 2.37.3 > This is not going to work in some cases. Please take a deeper look. I have to run, if you (or others) do not find the reason, I will give more details when I am done traveling.
On Sat, Mar 25, 2023 at 11:57 PM Eric Dumazet <edumazet@google.com> wrote: > > On Sat, Mar 25, 2023 at 8:26 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > From: Jason Xing <kernelxing@tencent.com> > > > > Since we decide to put the skb into a backlog queue of another > > cpu, we should not raise the softirq for the current cpu. When > > to raise a softirq is based on whether we have more data left to > > process later. As to the current cpu, there is no indication of > > more data enqueued, so we do not need this action. After enqueuing > > to another cpu, net_rx_action() function will call ipi and then > > another cpu will raise the softirq as expected. > > > > Also, raising more softirqs which set the corresponding bit field > > can make the IRQ mechanism think we probably need to start ksoftirqd > > on the current cpu. Actually it shouldn't happen. > > > > Fixes: 0a9627f2649a ("rps: Receive Packet Steering") > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > --- > > net/core/dev.c | 2 -- > > 1 file changed, 2 deletions(-) > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > index 1518a366783b..bfaaa652f50c 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -4594,8 +4594,6 @@ static int napi_schedule_rps(struct softnet_data *sd) > > if (sd != mysd) { > > sd->rps_ipi_next = mysd->rps_ipi_list; > > mysd->rps_ipi_list = sd; > > - > > - __raise_softirq_irqoff(NET_RX_SOFTIRQ); > > return 1; > > } > > #endif /* CONFIG_RPS */ > > -- > > 2.37.3 > > > > This is not going to work in some cases. Please take a deeper look. I'll do it. I've already been struggling with this for a few days. I still have no clue. I found out in some cases that frequently starting ksoftirqd is not that good because this thread may be blocked when waiting in the runqueue. So I tried to avoid raising softirqs to mitigate the issue and then I noticed this one. > > I have to run, if you (or others) do not find the reason, I will give > more details when I am done traveling. Happy traveling :) Thanks, Jason
On Sun, Mar 26, 2023 at 9:39 AM Hillf Danton <hdanton@sina.com> wrote: > > On Sat, Mar 25, 2023 at 8:26 AM Jason Xing <kerneljasonxing@gmail.com> > > > > @@ -4594,8 +4594,6 @@ static int napi_schedule_rps(struct softnet_data *sd) > > if (sd !=3D mysd) { > > sd->rps_ipi_next =3D mysd->rps_ipi_list; > > mysd->rps_ipi_list =3D sd; > > - > > - __raise_softirq_irqoff(NET_RX_SOFTIRQ); > > return 1; > > } > > Nope, ipi should be sent. But no ipi can go without irq enabled. > Sorry, I didn't get it. IPI is sent in net_rx_action() and apparently I didn't touch this part in this patch. Here is only about whether we should raise an IRQ even if the skb will be enqueued into another cpu. > So feel free to work out what sense made by disabling irq at the call site > then directly send ipi instead.
On Sat, Mar 25, 2023 at 11:57 PM Eric Dumazet <edumazet@google.com> wrote: > > On Sat, Mar 25, 2023 at 8:26 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > From: Jason Xing <kernelxing@tencent.com> > > > > Since we decide to put the skb into a backlog queue of another > > cpu, we should not raise the softirq for the current cpu. When > > to raise a softirq is based on whether we have more data left to > > process later. As to the current cpu, there is no indication of > > more data enqueued, so we do not need this action. After enqueuing > > to another cpu, net_rx_action() function will call ipi and then > > another cpu will raise the softirq as expected. > > > > Also, raising more softirqs which set the corresponding bit field > > can make the IRQ mechanism think we probably need to start ksoftirqd > > on the current cpu. Actually it shouldn't happen. > > > > Fixes: 0a9627f2649a ("rps: Receive Packet Steering") > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > --- > > net/core/dev.c | 2 -- > > 1 file changed, 2 deletions(-) > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > index 1518a366783b..bfaaa652f50c 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -4594,8 +4594,6 @@ static int napi_schedule_rps(struct softnet_data *sd) > > if (sd != mysd) { > > sd->rps_ipi_next = mysd->rps_ipi_list; > > mysd->rps_ipi_list = sd; > > - > > - __raise_softirq_irqoff(NET_RX_SOFTIRQ); > > return 1; > > } > > #endif /* CONFIG_RPS */ > > -- > > 2.37.3 > > > > This is not going to work in some cases. Please take a deeper look. > > I have to run, if you (or others) do not find the reason, I will give > more details when I am done traveling. I'm wondering whether we could use @mysd instead of @sd like this: if (!__test_and_set_bit(NAPI_STATE_SCHED, &mysd->backlog.state)) __raise_softirq_irqoff(NET_RX_SOFTIRQ); I traced back to some historical changes and saw some relations with this commit ("net: solve a NAPI race"): https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=39e6c8208d7b6fb9d2047850fb3327db567b564b Thanks, Jason
On Sun, Mar 26, 2023 at 12:04 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Sat, Mar 25, 2023 at 11:57 PM Eric Dumazet <edumazet@google.com> wrote: > > > > On Sat, Mar 25, 2023 at 8:26 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > Since we decide to put the skb into a backlog queue of another > > > cpu, we should not raise the softirq for the current cpu. When > > > to raise a softirq is based on whether we have more data left to > > > process later. As to the current cpu, there is no indication of > > > more data enqueued, so we do not need this action. After enqueuing > > > to another cpu, net_rx_action() function will call ipi and then > > > another cpu will raise the softirq as expected. > > > > > > Also, raising more softirqs which set the corresponding bit field > > > can make the IRQ mechanism think we probably need to start ksoftirqd > > > on the current cpu. Actually it shouldn't happen. > > > > > > Fixes: 0a9627f2649a ("rps: Receive Packet Steering") > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > --- > > > net/core/dev.c | 2 -- > > > 1 file changed, 2 deletions(-) > > > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > > index 1518a366783b..bfaaa652f50c 100644 > > > --- a/net/core/dev.c > > > +++ b/net/core/dev.c > > > @@ -4594,8 +4594,6 @@ static int napi_schedule_rps(struct softnet_data *sd) > > > if (sd != mysd) { > > > sd->rps_ipi_next = mysd->rps_ipi_list; > > > mysd->rps_ipi_list = sd; > > > - > > > - __raise_softirq_irqoff(NET_RX_SOFTIRQ); > > > return 1; > > > } > > > #endif /* CONFIG_RPS */ > > > -- > > > 2.37.3 > > > > > > > This is not going to work in some cases. Please take a deeper look. > > > > I have to run, if you (or others) do not find the reason, I will give > > more details when I am done traveling. > > I'm wondering whether we could use @mysd instead of @sd like this: > > if (!__test_and_set_bit(NAPI_STATE_SCHED, &mysd->backlog.state)) > __raise_softirq_irqoff(NET_RX_SOFTIRQ); Ah, I have to add more precise code because the above codes may mislead people. diff --git a/net/core/dev.c b/net/core/dev.c index 1518a366783b..9ac9b32e392f 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4594,8 +4594,9 @@ static int napi_schedule_rps(struct softnet_data *sd) if (sd != mysd) { sd->rps_ipi_next = mysd->rps_ipi_list; mysd->rps_ipi_list = sd; + if (!__test_and_set_bit(NAPI_STATE_SCHED, &mysd->backlog.state)) + __raise_softirq_irqoff(NET_RX_SOFTIRQ); - __raise_softirq_irqoff(NET_RX_SOFTIRQ); return 1; } #endif /* CONFIG_RPS */ Eric, I realized that some paths don't call the ipi to notify another cpu. If someone grabs the NAPI_STATE_SCHED flag, we know that at the end of net_rx_action() or the beginning of process_backlog(), the net_rps_action_and_irq_enable() will handle the information delivery. However, if no one grabs the flag, in some paths we could not have a chance immediately to tell another cpu to raise the softirq and then process those pending data. Thus, I have to make sure if someone owns the napi poll as shown above. If I get this wrong, please correct me if you're available. Thanks in advance. > > I traced back to some historical changes and saw some relations with > this commit ("net: solve a NAPI race"): > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=39e6c8208d7b6fb9d2047850fb3327db567b564b > > Thanks, > Jason
On Sun, Mar 26, 2023 at 6:10 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Sun, Mar 26, 2023 at 12:04 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > On Sat, Mar 25, 2023 at 11:57 PM Eric Dumazet <edumazet@google.com> wrote: > > > > > > On Sat, Mar 25, 2023 at 8:26 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > Since we decide to put the skb into a backlog queue of another > > > > cpu, we should not raise the softirq for the current cpu. When > > > > to raise a softirq is based on whether we have more data left to > > > > process later. As to the current cpu, there is no indication of > > > > more data enqueued, so we do not need this action. After enqueuing > > > > to another cpu, net_rx_action() function will call ipi and then > > > > another cpu will raise the softirq as expected. > > > > > > > > Also, raising more softirqs which set the corresponding bit field > > > > can make the IRQ mechanism think we probably need to start ksoftirqd > > > > on the current cpu. Actually it shouldn't happen. > > > > > > > > Fixes: 0a9627f2649a ("rps: Receive Packet Steering") > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > > --- > > > > net/core/dev.c | 2 -- > > > > 1 file changed, 2 deletions(-) > > > > > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > > > index 1518a366783b..bfaaa652f50c 100644 > > > > --- a/net/core/dev.c > > > > +++ b/net/core/dev.c > > > > @@ -4594,8 +4594,6 @@ static int napi_schedule_rps(struct softnet_data *sd) > > > > if (sd != mysd) { > > > > sd->rps_ipi_next = mysd->rps_ipi_list; > > > > mysd->rps_ipi_list = sd; > > > > - > > > > - __raise_softirq_irqoff(NET_RX_SOFTIRQ); > > > > return 1; > > > > } > > > > #endif /* CONFIG_RPS */ > > > > -- > > > > 2.37.3 > > > > > > > > > > This is not going to work in some cases. Please take a deeper look. > > > > > > I have to run, if you (or others) do not find the reason, I will give > > > more details when I am done traveling. > > > > I'm wondering whether we could use @mysd instead of @sd like this: > > > > if (!__test_and_set_bit(NAPI_STATE_SCHED, &mysd->backlog.state)) > > __raise_softirq_irqoff(NET_RX_SOFTIRQ); > > Ah, I have to add more precise code because the above codes may mislead people. > > diff --git a/net/core/dev.c b/net/core/dev.c > index 1518a366783b..9ac9b32e392f 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -4594,8 +4594,9 @@ static int napi_schedule_rps(struct softnet_data *sd) > if (sd != mysd) { > sd->rps_ipi_next = mysd->rps_ipi_list; > mysd->rps_ipi_list = sd; > + if (!__test_and_set_bit(NAPI_STATE_SCHED, &mysd->backlog.state)) Forgive me. Really I need some coffee. I made a mistake. This line above should be: + if (!test_bit(NAPI_STATE_SCHED, &mysd->backlog.state)) But the whole thing doesn't feel right. I need a few days to dig into this part until Eric can help me with more of it. Thanks, Jason > + __raise_softirq_irqoff(NET_RX_SOFTIRQ); > > - __raise_softirq_irqoff(NET_RX_SOFTIRQ); > return 1; > } > #endif /* CONFIG_RPS */ > > Eric, I realized that some paths don't call the ipi to notify another > cpu. If someone grabs the NAPI_STATE_SCHED flag, we know that at the > end of net_rx_action() or the beginning of process_backlog(), the > net_rps_action_and_irq_enable() will handle the information delivery. > However, if no one grabs the flag, in some paths we could not have a > chance immediately to tell another cpu to raise the softirq and then > process those pending data. Thus, I have to make sure if someone owns > the napi poll as shown above. > > If I get this wrong, please correct me if you're available. Thanks in advance. > > > > > I traced back to some historical changes and saw some relations with > > this commit ("net: solve a NAPI race"): > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=39e6c8208d7b6fb9d2047850fb3327db567b564b > > > > Thanks, > > Jason
> > Forgive me. Really I need some coffee. I made a mistake. This line > above should be: > > + if (!test_bit(NAPI_STATE_SCHED, &mysd->backlog.state)) > > But the whole thing doesn't feel right. I need a few days to dig into > this part until Eric can help me with more of it. > I am still traveling, and this is weekend time :/ It should not be too hard to read net/core/dev.c and remember that not _all_ drivers (or some core networking functions) use the NAPI model. eg look at netif_rx() and ask yourself why your patch is buggy. Just look at callers of enqueue_to_backlog() and ask yourself if all of them are called from net_rx_action() [The answer is no, just in case you wonder] In order to add your optimization, more work is needed, like adding new parameters so that we do not miss critical __raise_softirq_irqoff(NET_RX_SOFTIRQ) when _needed_. We keep going circles around softirq deficiencies, I feel you are trying to fix a second-order 'issue'. Real cause is elsewhere, look at recent patches from Jakub. Thanks.
On Mon, Mar 27, 2023 at 1:35 AM Eric Dumazet <edumazet@google.com> wrote: > > > > > Forgive me. Really I need some coffee. I made a mistake. This line > > above should be: > > > > + if (!test_bit(NAPI_STATE_SCHED, &mysd->backlog.state)) > > > > But the whole thing doesn't feel right. I need a few days to dig into > > this part until Eric can help me with more of it. > > > > I am still traveling, and this is weekend time :/ Thanks for your time, Eric, really appreciate it. > > It should not be too hard to read net/core/dev.c and remember that not > _all_ drivers (or some core networking functions) use the NAPI model. > > eg look at netif_rx() and ask yourself why your patch is buggy. Yes, it is. In my last email I sent yesterday I encountered one issue which exactly happened when I started hundreds of iperf processes transmitting data to loopback. It got stuck :( So I realized it is the non-napi case that triggers such a problem. > > Just look at callers of enqueue_to_backlog() and ask yourself if all > of them are called from net_rx_action() > > [The answer is no, just in case you wonder] > > In order to add your optimization, more work is needed, like adding > new parameters so that we do not miss critical > __raise_softirq_irqoff(NET_RX_SOFTIRQ) when _needed_. Thanks, I need to do more work/study on it. > > We keep going circles around softirq deficiencies, I feel you are > trying to fix a second-order 'issue'. Right, going circles gives me a headache. > > Real cause is elsewhere, look at recent patches from Jakub. After you pointed out, I searched and found there is indeed one patchset in 2022 The tile like this: [PATCH 0/3] softirq: uncontroversial change Thanks, Jason > > Thanks.
diff --git a/net/core/dev.c b/net/core/dev.c index 1518a366783b..bfaaa652f50c 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4594,8 +4594,6 @@ static int napi_schedule_rps(struct softnet_data *sd) if (sd != mysd) { sd->rps_ipi_next = mysd->rps_ipi_list; mysd->rps_ipi_list = sd; - - __raise_softirq_irqoff(NET_RX_SOFTIRQ); return 1; } #endif /* CONFIG_RPS */