From patchwork Sat Oct 22 07:22:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg KH X-Patchwork-Id: 7367 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp1095902wrr; Sat, 22 Oct 2022 01:03:44 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5gg7wKsSKWu4I0cazhpID5ZD2oVHmkHlezAYRhajtUI3Yd5Q8GxmemAP+tp7w5XyF+oyw0 X-Received: by 2002:a63:4e66:0:b0:456:b3a7:7a80 with SMTP id o38-20020a634e66000000b00456b3a77a80mr19110190pgl.467.1666425824288; Sat, 22 Oct 2022 01:03:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666425824; cv=none; d=google.com; s=arc-20160816; b=cyvloWpgyNAuLJ6OfdtN5LcRfQwWw8hMOmA96EnUtLS+mDzV90MrZcSaUI66DKng5t 323/qkEeYNgpPHqcPZGThJxwk4kJ+EqmPYULM0/7shpnXu8SdpUV0rxsrixqi/emjgVR 783w8iTkf56hKA2g5DMxkZT06YmjOe6nMUDBvYdLPVtfT78W7/p/7aBOMrUUdb+lEfBI 4SpxTlKxZ0+mY+VCNOtvHzK5LNIycQbJ/hLlAV5zRryQfi3exrfvg21y04goX36BvYqd FV7KZy6R4F8JO0b2utUypoYE4YOaFMgl49UMp1UWYdR6El3Z7HJ7mCConj9iDHvAq1rE +E4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=Ww6W2wzWF9XrXMmEd+aTVZUrUmxleOhDQLQfFRHqTjw=; b=k4n7bUtuLlr6lp2uursmfOueQY1tfOpVPVJo2UuwJQDHhD2sBC5mXqMXCiAGU/WGhP AC7wM8y/iaMlTFuPI8B3SzS/JVBqazUKqc5j11APDIuw7KcYJxzlW69d60ScB57I94fW NyKAkR6I0lKhIRoQCgawbVAqpMvfcIhuuCa6TY3CPTr9lufKb2AzasGyIQCO7TJN9vXM UEMYP5EKAks9fi0FN9nz0u3IksFDMMjvV9nik7XTGfCaFy/9jqo5UDP/4+6F656PSQJJ eG/O7uPJ51T/zwH9FgRJJ334Dku5kH/mVOO5ZyVSB7mmlli0pV7b5EUBj0a6bktTccS4 Lpxg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=CFNYtnh9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a192-20020a6390c9000000b0046af665ed92si2005164pge.480.2022.10.22.01.03.31; Sat, 22 Oct 2022 01:03:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=CFNYtnh9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232637AbiJVIDL (ORCPT + 99 others); Sat, 22 Oct 2022 04:03:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47414 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232408AbiJVHyz (ORCPT ); Sat, 22 Oct 2022 03:54:55 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 851D2356D5; Sat, 22 Oct 2022 00:47:47 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 45D00B82E0C; Sat, 22 Oct 2022 07:46:42 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A7CA1C433C1; Sat, 22 Oct 2022 07:46:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1666424801; bh=Ru4G+3ge//H9NJZQ0OiqM6QGQaZIT8gVzXirC+yW40U=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=CFNYtnh9t8AMlbNi/4lgEgqedKr6w0QzY1XEjHFAtUlDsFnRxFQi4qoYSoOJ+fjwC P2yjwJQBicmGXGCGqlX5EqhzKQz/r206eC3fkQHuntvvoht6NrQYKx95840WUJJuVs e2WT/3+3Y0OM6wLNdvzvncNH7+P7j9wwjOhTQ7oQ= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Neal Cardwell , "Kevin(Yudong) Yang" , Yuchung Cheng , Eric Dumazet , "David S. Miller" , Sasha Levin Subject: [PATCH 5.19 281/717] tcp: fix tcp_cwnd_validate() to not forget is_cwnd_limited Date: Sat, 22 Oct 2022 09:22:40 +0200 Message-Id: <20221022072503.542307241@linuxfoundation.org> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221022072415.034382448@linuxfoundation.org> References: <20221022072415.034382448@linuxfoundation.org> User-Agent: quilt/0.67 MIME-Version: 1.0 X-Spam-Status: No, score=-7.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747374125266306726?= X-GMAIL-MSGID: =?utf-8?q?1747374125266306726?= From: Neal Cardwell [ Upstream commit f4ce91ce12a7c6ead19b128ffa8cff6e3ded2a14 ] This commit fixes a bug in the tracking of max_packets_out and is_cwnd_limited. This bug can cause the connection to fail to remember that is_cwnd_limited is true, causing the connection to fail to grow cwnd when it should, causing throughput to be lower than it should be. The following event sequence is an example that triggers the bug: (a) The connection is cwnd_limited, but packets_out is not at its peak due to TSO deferral deciding not to send another skb yet. In such cases the connection can advance max_packets_seq and set tp->is_cwnd_limited to true and max_packets_out to a small number. (b) Then later in the round trip the connection is pacing-limited (not cwnd-limited), and packets_out is larger. In such cases the connection would raise max_packets_out to a bigger number but (unexpectedly) flip tp->is_cwnd_limited from true to false. This commit fixes that bug. One straightforward fix would be to separately track (a) the next window after max_packets_out reaches a maximum, and (b) the next window after tp->is_cwnd_limited is set to true. But this would require consuming an extra u32 sequence number. Instead, to save space we track only the most important information. Specifically, we track the strongest available signal of the degree to which the cwnd is fully utilized: (1) If the connection is cwnd-limited then we remember that fact for the current window. (2) If the connection not cwnd-limited then we track the maximum number of outstanding packets in the current window. In particular, note that the new logic cannot trigger the buggy (a)/(b) sequence above because with the new logic a condition where tp->packets_out > tp->max_packets_out can only trigger an update of tp->is_cwnd_limited if tp->is_cwnd_limited is false. This first showed up in a testing of a BBRv2 dev branch, but this buggy behavior highlighted a general issue with the tcp_cwnd_validate() logic that can cause cwnd to fail to increase at the proper rate for any TCP congestion control, including Reno or CUBIC. Fixes: ca8a22634381 ("tcp: make cwnd-limited checks measurement-based, and gentler") Signed-off-by: Neal Cardwell Signed-off-by: Kevin(Yudong) Yang Signed-off-by: Yuchung Cheng Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- include/linux/tcp.h | 2 +- include/net/tcp.h | 5 ++++- net/ipv4/tcp.c | 2 ++ net/ipv4/tcp_output.c | 19 ++++++++++++------- 4 files changed, 19 insertions(+), 9 deletions(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 1168302b7927..bb31d60addac 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -265,7 +265,7 @@ struct tcp_sock { u32 packets_out; /* Packets which are "in flight" */ u32 retrans_out; /* Retransmitted packets out */ u32 max_packets_out; /* max packets_out in last window */ - u32 max_packets_seq; /* right edge of max_packets_out flight */ + u32 cwnd_usage_seq; /* right edge of cwnd usage tracking flight */ u16 urg_data; /* Saved octet of OOB data and control flags */ u8 ecn_flags; /* ECN status bits. */ diff --git a/include/net/tcp.h b/include/net/tcp.h index 78a64e1b33a7..788b1f17b5e3 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1289,11 +1289,14 @@ static inline bool tcp_is_cwnd_limited(const struct sock *sk) { const struct tcp_sock *tp = tcp_sk(sk); + if (tp->is_cwnd_limited) + return true; + /* If in slow start, ensure cwnd grows to twice what was ACKed. */ if (tcp_in_slow_start(tp)) return tcp_snd_cwnd(tp) < 2 * tp->max_packets_out; - return tp->is_cwnd_limited; + return false; } /* BBR congestion control needs pacing. diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index ab03977b6578..f82cd6eb7088 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3042,6 +3042,8 @@ int tcp_disconnect(struct sock *sk, int flags) tp->snd_ssthresh = TCP_INFINITE_SSTHRESH; tcp_snd_cwnd_set(tp, TCP_INIT_CWND); tp->snd_cwnd_cnt = 0; + tp->is_cwnd_limited = 0; + tp->max_packets_out = 0; tp->window_clamp = 0; tp->delivered = 0; tp->delivered_ce = 0; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 84314de754f8..a16139cacc45 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1875,15 +1875,20 @@ static void tcp_cwnd_validate(struct sock *sk, bool is_cwnd_limited) const struct tcp_congestion_ops *ca_ops = inet_csk(sk)->icsk_ca_ops; struct tcp_sock *tp = tcp_sk(sk); - /* Track the maximum number of outstanding packets in each - * window, and remember whether we were cwnd-limited then. + /* Track the strongest available signal of the degree to which the cwnd + * is fully utilized. If cwnd-limited then remember that fact for the + * current window. If not cwnd-limited then track the maximum number of + * outstanding packets in the current window. (If cwnd-limited then we + * chose to not update tp->max_packets_out to avoid an extra else + * clause with no functional impact.) */ - if (!before(tp->snd_una, tp->max_packets_seq) || - tp->packets_out > tp->max_packets_out || - is_cwnd_limited) { - tp->max_packets_out = tp->packets_out; - tp->max_packets_seq = tp->snd_nxt; + if (!before(tp->snd_una, tp->cwnd_usage_seq) || + is_cwnd_limited || + (!tp->is_cwnd_limited && + tp->packets_out > tp->max_packets_out)) { tp->is_cwnd_limited = is_cwnd_limited; + tp->max_packets_out = tp->packets_out; + tp->cwnd_usage_seq = tp->snd_nxt; } if (tcp_is_cwnd_limited(sk)) {