Message ID | 20230125185230.3574681-1-leitao@debian.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp434178wrn; Wed, 25 Jan 2023 10:59:46 -0800 (PST) X-Google-Smtp-Source: AMrXdXsFu80C12m3Dy1SfCKaFwx4bP7Yrxmz1C92StsH39VzecU+jgzBm6H8FVJNWzq5MH6zL1bi X-Received: by 2002:a17:906:4b53:b0:873:3806:be87 with SMTP id j19-20020a1709064b5300b008733806be87mr35431319ejv.71.1674673186285; Wed, 25 Jan 2023 10:59:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674673186; cv=none; d=google.com; s=arc-20160816; b=kyqzUIp+f2QK9MNY+TnU5VJE/cZ/A2X20SKjE/nk4NExGf6gAob4v69RL0MGBKgw5j 11DdqsNOiFBWrTokLz/ZBMm9m2uPrlhQ33SCk1DBCkiBMSzW89veyYrwUttSlCzzFpZg bwKx7Ls4qQi3uhhjWNXo55cfictm1979LvI4OrFWnRB9Lm6M98Dy+6D5ROJ1sIoo64+D cz2HlTxCt1hLSnsAQw/C29OPXUTwsTE5JDTwOyBANw9RkHdI9EU1nc25qjz2qVx65kD0 fyKUD7j38AB5jlIO5x+h8hFtT+DF9iiuAUnR1fleFdSSRHAlBUIGZ8M7is5jNLEJu2cb vs/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=RQeDdHynJPckR6TzoLgOIVW3N1/aAZvGLbwQhlPV21k=; b=hv4OHk5+lEqj7ulKuilE6eEIn0ik5/g2ocxsbOOKdMiniL7TcUnGHQKLhaclsZRfrC AZxxHs5FndRjIk2D3Mwyo5Rh2+E/nafda715HZQF0kiQP1U03q8gPw96cEKBc+AQLO/P RqOMc/6XfKWxMqjyaSsLwvVrNM7j//FW47Tn2DyYNaVfGimQD4EkttPAh8Hmmd1VqHVA DhzjFaFTxmLm9YESpmYP5gzjyrDn35cf7mVFZBq8xc+a49UOr+B+WVkEITNlypiW9H4e mv6RLaC+5v0qSwohaTr+dqVfZRlDELUKRBbFhDSVbQ6gYQq+JEDwq1gJRWzaX55Zyh9S srXQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 20-20020a170906019400b00870d2ca67absi7796941ejb.218.2023.01.25.10.59.23; Wed, 25 Jan 2023 10:59:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235381AbjAYSwv (ORCPT <rfc822;rust.linux@gmail.com> + 99 others); Wed, 25 Jan 2023 13:52:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45480 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229449AbjAYSwt (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 25 Jan 2023 13:52:49 -0500 Received: from mail-ej1-f49.google.com (mail-ej1-f49.google.com [209.85.218.49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0CEFF171F; Wed, 25 Jan 2023 10:52:45 -0800 (PST) Received: by mail-ej1-f49.google.com with SMTP id os24so6448643ejb.8; Wed, 25 Jan 2023 10:52:44 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=RQeDdHynJPckR6TzoLgOIVW3N1/aAZvGLbwQhlPV21k=; b=sqHZNek6y90ncWo+JFnqD1P9uVxHTznNcifvTYvWJoGFjbCmCDK6/GTbUwXzn2b2La UUMNB3uhcFvJdCxJbDYvCYFfP0wXhBEn2PrFPe4aiV0zQiBi68wA9FdDusMXZQCQ7fn7 +0Fv9FwCMlQNPU0d5aAIr+aDiJy4RAu31B+SzTjkVPYe+45RDP0lK6hlnN1kmnQi9svG mTKfhZ2uJ3u7AaKo8Bil/k4Pqya6CA+mU3eQQ3BxuK6rGsl0cryhBnDTewKSXR9Pv/tT 4OYTMqAqjmv9ZhJGDpLo+Lh20p+v8wDOLO+4XA444UeRa9h/f6yY+ecaZcZ9MeuWKCly CsEw== X-Gm-Message-State: AFqh2koIN0/Lk2J4Ff4aQtbGT4wKYi/f4G9yMpXrTui9Cy33jo2toAv4 uIJCzykk52xKLwlR8wx7BUM= X-Received: by 2002:a17:906:c409:b0:863:73ee:bb67 with SMTP id u9-20020a170906c40900b0086373eebb67mr34402293ejz.73.1674672763593; Wed, 25 Jan 2023 10:52:43 -0800 (PST) Received: from localhost (fwdproxy-cln-007.fbsv.net. [2a03:2880:31ff:7::face:b00c]) by smtp.gmail.com with ESMTPSA id p16-20020a1709060e9000b008779570227bsm2675642ejf.112.2023.01.25.10.52.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Jan 2023 10:52:42 -0800 (PST) From: Breno Leitao <leitao@debian.org> To: kuba@kernel.org, netdev@vger.kernel.org Cc: leitao@debian.org, leit@fb.com, davem@davemloft.net, edumazet@google.com, pabeni@redhat.com, andrew@lunn.ch, linux-kernel@vger.kernel.org, Michael van der Westhuizen <rmikey@meta.com> Subject: [PATCH v3] netpoll: Remove 4s sleep during carrier detection Date: Wed, 25 Jan 2023 10:52:30 -0800 Message-Id: <20230125185230.3574681-1-leitao@debian.org> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756022110868075891?= X-GMAIL-MSGID: =?utf-8?q?1756022110868075891?= |
Series |
[v3] netpoll: Remove 4s sleep during carrier detection
|
|
Commit Message
Breno Leitao
Jan. 25, 2023, 6:52 p.m. UTC
This patch removes the msleep(4s) during netpoll_setup() if the carrier
appears instantly.
Here are some scenarios where this workaround is counter-productive in
modern ages:
Servers which have BMC communicating over NC-SI via the same NIC as gets
used for netconsole. BMC will keep the PHY up, hence the carrier
appearing instantly.
The link is fibre, SERDES getting sync could happen within 0.1Hz, and
the carrier also appears instantly.
Other than that, if a driver is reporting instant carrier and then
losing it, this is probably a driver bug.
Reported-by: Michael van der Westhuizen <rmikey@meta.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
--
v1->v2: added "RFC" in the subject
v2->v3: improved the commit message
---
net/core/netpoll.c | 12 +-----------
1 file changed, 1 insertion(+), 11 deletions(-)
Comments
From: Breno Leitao > Sent: 25 January 2023 18:53 > This patch removes the msleep(4s) during netpoll_setup() if the carrier > appears instantly. > > Here are some scenarios where this workaround is counter-productive in > modern ages: > > Servers which have BMC communicating over NC-SI via the same NIC as gets > used for netconsole. BMC will keep the PHY up, hence the carrier > appearing instantly. > > The link is fibre, SERDES getting sync could happen within 0.1Hz, and > the carrier also appears instantly. > > Other than that, if a driver is reporting instant carrier and then > losing it, this is probably a driver bug. I can't help feeling that this will break something. The 4 second delay does look counter productive though. Obvious alternatives are 'wait a bit before the first check' and 'require carrier to be present for a few checks'. It also has to be said that checking every ms seems over enthusiastic. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On 26/01/2023 09:04, David Laight wrote: >> This patch removes the msleep(4s) during netpoll_setup() if the carrier >> appears instantly. >> >> Here are some scenarios where this workaround is counter-productive in >> modern ages: >> >> Servers which have BMC communicating over NC-SI via the same NIC as gets >> used for netconsole. BMC will keep the PHY up, hence the carrier >> appearing instantly. >> >> The link is fibre, SERDES getting sync could happen within 0.1Hz, and >> the carrier also appears instantly. >> >> Other than that, if a driver is reporting instant carrier and then >> losing it, this is probably a driver bug. > > I can't help feeling that this will break something. If we see breakages after this patch, then we can identify broken drivers, and fix the driver itself. On the other side, if we keep this workaround, we are penalizing the boot of every modern machine in 4s, just because we might have some broken driver somewhere.
On Thu, Jan 26, 2023 at 09:04:42AM +0000, David Laight wrote: > From: Breno Leitao > > Sent: 25 January 2023 18:53 > > This patch removes the msleep(4s) during netpoll_setup() if the carrier > > appears instantly. > > > > Here are some scenarios where this workaround is counter-productive in > > modern ages: > > > > Servers which have BMC communicating over NC-SI via the same NIC as gets > > used for netconsole. BMC will keep the PHY up, hence the carrier > > appearing instantly. > > > > The link is fibre, SERDES getting sync could happen within 0.1Hz, and > > the carrier also appears instantly. > > > > Other than that, if a driver is reporting instant carrier and then > > losing it, this is probably a driver bug. > > I can't help feeling that this will break something. > The 4 second delay does look counter productive though. > Obvious alternatives are 'wait a bit before the first check' > and 'require carrier to be present for a few checks'. I'm guessing, but i think the issue is that the MAC reports the carrier is up, even though autoneg has not completed, and so packets are getting dropped. Autoneg takes around 1.5 seconds, so you need to wait this long before starting to send to prevent packets landing in the bit bucket. And i guess polling as you suggests does not help, since it never returns the true status. But this is pure guesswork. Maybe some mailing list archaeology can help explain this code. I guess the likely breaking scenario is that simply the first 1.5 seconds of the kernel log goes to the bit bucket for broken MACs. Which is not fatal, just annoying for somebody trying to debug a crash in the first few seconds. I suppose dhcp might also take longer for broken MACs, since its first requests also get lost, and it might get into exponential back off. I guess the risks are small here. But i use the word guess a lot... Andrew
Hello: This patch was applied to netdev/net-next.git (master) by Jakub Kicinski <kuba@kernel.org>: On Wed, 25 Jan 2023 10:52:30 -0800 you wrote: > This patch removes the msleep(4s) during netpoll_setup() if the carrier > appears instantly. > > Here are some scenarios where this workaround is counter-productive in > modern ages: > > Servers which have BMC communicating over NC-SI via the same NIC as gets > used for netconsole. BMC will keep the PHY up, hence the carrier > appearing instantly. > > [...] Here is the summary with links: - [v3] netpoll: Remove 4s sleep during carrier detection https://git.kernel.org/netdev/net-next/c/d8afe2f8a92d You are awesome, thank you!
diff --git a/net/core/netpoll.c b/net/core/netpoll.c index 9be762e1d..a089b704b 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -682,7 +682,7 @@ int netpoll_setup(struct netpoll *np) } if (!netif_running(ndev)) { - unsigned long atmost, atleast; + unsigned long atmost; np_info(np, "device %s not up yet, forcing it\n", np->dev_name); @@ -694,7 +694,6 @@ int netpoll_setup(struct netpoll *np) } rtnl_unlock(); - atleast = jiffies + HZ/10; atmost = jiffies + carrier_timeout * HZ; while (!netif_carrier_ok(ndev)) { if (time_after(jiffies, atmost)) { @@ -704,15 +703,6 @@ int netpoll_setup(struct netpoll *np) msleep(1); } - /* If carrier appears to come up instantly, we don't - * trust it and pause so that we don't pump all our - * queued console messages into the bitbucket. - */ - - if (time_before(jiffies, atleast)) { - np_notice(np, "carrier detect appears untrustworthy, waiting 4 seconds\n"); - msleep(4000); - } rtnl_lock(); }