Message ID | 20221026151558.4165020-1-luwei32@huawei.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp295299wru; Wed, 26 Oct 2022 07:18:53 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4ayMW5+Y7mmKizNqiClqwLcrdW3jjZbS6txwr6qkwg1cujgihMzrOsH2USvtfsUMI4eG4a X-Received: by 2002:a63:ff4f:0:b0:439:61d6:197 with SMTP id s15-20020a63ff4f000000b0043961d60197mr37330730pgk.67.1666793933429; Wed, 26 Oct 2022 07:18:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666793933; cv=none; d=google.com; s=arc-20160816; b=RzDBHaa0qq6VFf/DidBzzBx8rAWQup8i5RgtEzFTYCYlWplMj13gdmhkvbzadBKOVK daw7ThVFzKBPltDC6biyUDZjMAFPVi7obqQgGink/SPGfVzE41Ye/CltF+x7YgQVm6V3 qnMc7Cgp8ohwWfJzUDguXnOeo9XvI35wORwI5aOkXW8HhFF/lz67C/2Nj4jCTq6GKe5M GINtKsQrhhOwboxjvzGuaXoDhnd+DLqeORSHB2GUamEYXIgWFyIc9acKI9ZVohvGT57l iEoW6WqCe2KKqyQkU1saRj8Q+FL8ZSTAeuMCJaNk/y76JwzawREU0ouyo1SHGD93Il9H /GOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:to:from; bh=f7MU0OOovDevzOWT21qlgZO+WGhZTH2AUm4dckDJmEU=; b=S/ochErhHbIVslQOKmagwoFoi4HI+2Gjt2h4JnpTE9BWgNTZTFNExcxYDOa8X3Ze6E BPrccNOx9Tktt0KNcuuXY6XfjhPvB/YLaZCYUnOeUYqxAGP7Xtei8lc93dqy0ou2K3ec leWdzcsiSpsTGnBv8SaPRhoUOB7ZL+vlCeYwqfo2Gc6fMA4/5OV/PX9gbUiOen82IlQr nx27MiuaWpbSUSi8RHyzqhGlznczsoTliUc6WsmX96TuHaOlWSBUnsR7/8vXEvifJ/nM oFY4RJIxa3QKL4oLWBJIUgv1S9zdszvW+QT8KhYZW8H0ER/LTJcR0P+dZleRRQbrUqoE bZpQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id mi10-20020a17090b4b4a00b001fde53d5d79si2901226pjb.5.2022.10.26.07.18.40; Wed, 26 Oct 2022 07:18:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233485AbiJZOMs (ORCPT <rfc822;pwkd43@gmail.com> + 99 others); Wed, 26 Oct 2022 10:12:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54186 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234192AbiJZOMo (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 26 Oct 2022 10:12:44 -0400 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 54FB710EE54; Wed, 26 Oct 2022 07:12:43 -0700 (PDT) Received: from kwepemi500015.china.huawei.com (unknown [172.30.72.54]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4My9cn5CFzzVj66; Wed, 26 Oct 2022 22:07:53 +0800 (CST) Received: from huawei.com (10.175.101.6) by kwepemi500015.china.huawei.com (7.221.188.92) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Wed, 26 Oct 2022 22:12:40 +0800 From: Lu Wei <luwei32@huawei.com> To: <edumazet@google.com>, <davem@davemloft.net>, <yoshfuji@linux-ipv6.org>, <dsahern@kernel.org>, <kuba@kernel.org>, <pabeni@redhat.com>, <xemul@parallels.com>, <netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org> Subject: [PATCH net] tcp: reset tp->sacked_out when sack is enabled Date: Wed, 26 Oct 2022 23:15:58 +0800 Message-ID: <20221026151558.4165020-1-luwei32@huawei.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [10.175.101.6] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemi500015.china.huawei.com (7.221.188.92) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747760115458728004?= X-GMAIL-MSGID: =?utf-8?q?1747760115458728004?= |
Series |
[net] tcp: reset tp->sacked_out when sack is enabled
|
|
Commit Message
Lu Wei
Oct. 26, 2022, 3:15 p.m. UTC
The meaning of tp->sacked_out depends on whether sack is enabled
or not. If setsockopt is called to enable sack_ok via
tcp_repair_options_est(), tp->sacked_out should be cleared, or it
will trigger warning in tcp_verify_left_out as follows:
============================================
WARNING: CPU: 8 PID: 0 at net/ipv4/tcp_input.c:2132
tcp_timeout_mark_lost+0x154/0x160
tcp_enter_loss+0x2b/0x290
tcp_retransmit_timer+0x50b/0x640
tcp_write_timer_handler+0x1c8/0x340
tcp_write_timer+0xe5/0x140
call_timer_fn+0x3a/0x1b0
__run_timers.part.0+0x1bf/0x2d0
run_timer_softirq+0x43/0xb0
__do_softirq+0xfd/0x373
__irq_exit_rcu+0xf6/0x140
This warning occurs in several steps:
Step1. If sack is not enabled, when server receives dup-ack,
it calls tcp_add_reno_sack() to increase tp->sacked_out.
Step2. Setsockopt() is called to enable sack
Step3. The retransmit timer expires, it calls tcp_timeout_mark_lost()
to increase tp->lost_out but not clear tp->sacked_out because
sack is enabled and tcp_is_reno() is false.
So tp->left_out is increased repeatly in Step1 and Step3 and it is
greater than tp->packets_out and trigger the warning. In function
tcp_timeout_mark_lost(), tp->sacked_out will be cleared if Step2 not
happen and the warning will not be triggered. So this patch clears
tp->sacked_out in tcp_repair_options_est().
Fixes: b139ba4e90dc ("tcp: Repair connection-time negotiated parameters")
Signed-off-by: Lu Wei <luwei32@huawei.com>
---
net/ipv4/tcp.c | 3 +++
1 file changed, 3 insertions(+)
Comments
On Wed, Oct 26, 2022 at 7:12 AM Lu Wei <luwei32@huawei.com> wrote: > > The meaning of tp->sacked_out depends on whether sack is enabled > or not. If setsockopt is called to enable sack_ok via > tcp_repair_options_est(), tp->sacked_out should be cleared, or it > will trigger warning in tcp_verify_left_out as follows: > > ============================================ > WARNING: CPU: 8 PID: 0 at net/ipv4/tcp_input.c:2132 > tcp_timeout_mark_lost+0x154/0x160 > tcp_enter_loss+0x2b/0x290 > tcp_retransmit_timer+0x50b/0x640 > tcp_write_timer_handler+0x1c8/0x340 > tcp_write_timer+0xe5/0x140 > call_timer_fn+0x3a/0x1b0 > __run_timers.part.0+0x1bf/0x2d0 > run_timer_softirq+0x43/0xb0 > __do_softirq+0xfd/0x373 > __irq_exit_rcu+0xf6/0x140 > > This warning occurs in several steps: > Step1. If sack is not enabled, when server receives dup-ack, > it calls tcp_add_reno_sack() to increase tp->sacked_out. > > Step2. Setsockopt() is called to enable sack > > Step3. The retransmit timer expires, it calls tcp_timeout_mark_lost() > to increase tp->lost_out but not clear tp->sacked_out because > sack is enabled and tcp_is_reno() is false. > > So tp->left_out is increased repeatly in Step1 and Step3 and it is > greater than tp->packets_out and trigger the warning. In function > tcp_timeout_mark_lost(), tp->sacked_out will be cleared if Step2 not > happen and the warning will not be triggered. So this patch clears > tp->sacked_out in tcp_repair_options_est(). > > Fixes: b139ba4e90dc ("tcp: Repair connection-time negotiated parameters") > Signed-off-by: Lu Wei <luwei32@huawei.com> > --- > net/ipv4/tcp.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index ef14efa1fb70..188d5c0e440f 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -3282,6 +3282,9 @@ static int tcp_repair_options_est(struct sock *sk, sockptr_t optbuf, > if (opt.opt_val != 0) > return -EINVAL; > > + if (tcp_is_reno(tp)) > + tp->sacked_out = 0; > + > tp->rx_opt.sack_ok |= TCP_SACK_SEEN; > break; > case TCPOPT_TIMESTAMP: > -- > 2.31.1 > Hmm, I am not sure this is the right fix. Probably TCP_REPAIR_OPTIONS should not be allowed if data has already been sent. Pavel, what do you think ?
On Wed, Oct 26, 2022 at 7:30 AM Eric Dumazet <edumazet@google.com> wrote: > > On Wed, Oct 26, 2022 at 7:12 AM Lu Wei <luwei32@huawei.com> wrote: > > > > The meaning of tp->sacked_out depends on whether sack is enabled > > or not. If setsockopt is called to enable sack_ok via > > tcp_repair_options_est(), tp->sacked_out should be cleared, or it > > will trigger warning in tcp_verify_left_out as follows: > > > > ============================================ > > WARNING: CPU: 8 PID: 0 at net/ipv4/tcp_input.c:2132 > > tcp_timeout_mark_lost+0x154/0x160 > > tcp_enter_loss+0x2b/0x290 > > tcp_retransmit_timer+0x50b/0x640 > > tcp_write_timer_handler+0x1c8/0x340 > > tcp_write_timer+0xe5/0x140 > > call_timer_fn+0x3a/0x1b0 > > __run_timers.part.0+0x1bf/0x2d0 > > run_timer_softirq+0x43/0xb0 > > __do_softirq+0xfd/0x373 > > __irq_exit_rcu+0xf6/0x140 > > > > This warning occurs in several steps: > > Step1. If sack is not enabled, when server receives dup-ack, > > it calls tcp_add_reno_sack() to increase tp->sacked_out. > > > > Step2. Setsockopt() is called to enable sack > > > > Step3. The retransmit timer expires, it calls tcp_timeout_mark_lost() > > to increase tp->lost_out but not clear tp->sacked_out because > > sack is enabled and tcp_is_reno() is false. > > > > So tp->left_out is increased repeatly in Step1 and Step3 and it is > > greater than tp->packets_out and trigger the warning. In function > > tcp_timeout_mark_lost(), tp->sacked_out will be cleared if Step2 not > > happen and the warning will not be triggered. So this patch clears > > tp->sacked_out in tcp_repair_options_est(). > > > > Fixes: b139ba4e90dc ("tcp: Repair connection-time negotiated parameters") > > Signed-off-by: Lu Wei <luwei32@huawei.com> > > --- > > net/ipv4/tcp.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > > index ef14efa1fb70..188d5c0e440f 100644 > > --- a/net/ipv4/tcp.c > > +++ b/net/ipv4/tcp.c > > @@ -3282,6 +3282,9 @@ static int tcp_repair_options_est(struct sock *sk, sockptr_t optbuf, > > if (opt.opt_val != 0) > > return -EINVAL; > > > > + if (tcp_is_reno(tp)) > > + tp->sacked_out = 0; > > + > > tp->rx_opt.sack_ok |= TCP_SACK_SEEN; > > break; > > case TCPOPT_TIMESTAMP: > > -- > > 2.31.1 > > > > Hmm, I am not sure this is the right fix. > > Probably TCP_REPAIR_OPTIONS should not be allowed if data has already been sent. > > Pavel, what do you think ? Routing to Denis V. Lunev <den@openvz.org>, because Pavel's address no longer works. Thanks !
On 10/26/22 16:33, Eric Dumazet wrote: > On Wed, Oct 26, 2022 at 7:30 AM Eric Dumazet <edumazet@google.com> wrote: >> On Wed, Oct 26, 2022 at 7:12 AM Lu Wei <luwei32@huawei.com> wrote: >>> The meaning of tp->sacked_out depends on whether sack is enabled >>> or not. If setsockopt is called to enable sack_ok via >>> tcp_repair_options_est(), tp->sacked_out should be cleared, or it >>> will trigger warning in tcp_verify_left_out as follows: >>> >>> ============================================ >>> WARNING: CPU: 8 PID: 0 at net/ipv4/tcp_input.c:2132 >>> tcp_timeout_mark_lost+0x154/0x160 >>> tcp_enter_loss+0x2b/0x290 >>> tcp_retransmit_timer+0x50b/0x640 >>> tcp_write_timer_handler+0x1c8/0x340 >>> tcp_write_timer+0xe5/0x140 >>> call_timer_fn+0x3a/0x1b0 >>> __run_timers.part.0+0x1bf/0x2d0 >>> run_timer_softirq+0x43/0xb0 >>> __do_softirq+0xfd/0x373 >>> __irq_exit_rcu+0xf6/0x140 >>> >>> This warning occurs in several steps: >>> Step1. If sack is not enabled, when server receives dup-ack, >>> it calls tcp_add_reno_sack() to increase tp->sacked_out. >>> >>> Step2. Setsockopt() is called to enable sack >>> >>> Step3. The retransmit timer expires, it calls tcp_timeout_mark_lost() >>> to increase tp->lost_out but not clear tp->sacked_out because >>> sack is enabled and tcp_is_reno() is false. >>> >>> So tp->left_out is increased repeatly in Step1 and Step3 and it is >>> greater than tp->packets_out and trigger the warning. In function >>> tcp_timeout_mark_lost(), tp->sacked_out will be cleared if Step2 not >>> happen and the warning will not be triggered. So this patch clears >>> tp->sacked_out in tcp_repair_options_est(). >>> >>> Fixes: b139ba4e90dc ("tcp: Repair connection-time negotiated parameters") >>> Signed-off-by: Lu Wei <luwei32@huawei.com> >>> --- >>> net/ipv4/tcp.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c >>> index ef14efa1fb70..188d5c0e440f 100644 >>> --- a/net/ipv4/tcp.c >>> +++ b/net/ipv4/tcp.c >>> @@ -3282,6 +3282,9 @@ static int tcp_repair_options_est(struct sock *sk, sockptr_t optbuf, >>> if (opt.opt_val != 0) >>> return -EINVAL; >>> >>> + if (tcp_is_reno(tp)) >>> + tp->sacked_out = 0; >>> + >>> tp->rx_opt.sack_ok |= TCP_SACK_SEEN; >>> break; >>> case TCPOPT_TIMESTAMP: >>> -- >>> 2.31.1 >>> >> Hmm, I am not sure this is the right fix. >> >> Probably TCP_REPAIR_OPTIONS should not be allowed if data has already been sent. >> >> Pavel, what do you think ? > Routing to Denis V. Lunev <den@openvz.org>, because Pavel's address no > longer works. > > Thanks ! Hi, guys! This code is used in CRIU. I have added CRIU maintainers Andrey Vagin, Pavel Tikhomirov and new address of Pavel Emelyanov to CC list. Here is the quote from Pavel Tikhomirov on the topic. "We do setsockopt with TCP_REPAIR_OPTIONS in CRIU just after calling connect to the socket here https://github.com/checkpoint-restore/criu/blob/18c6426eaeebc5fe7d0f9ca0acb592a3ec828b0c/soccr/soccr.c#L566 and before libsoccr_restore_queue. So it seems there should be no data sent in this socket at the moment, so I believe it is safe to prohibit TCP_REPAIR_OPTIONS if data was already sent. Though I'd recomend running some CRIU tests after this change just to be sure that we don't break it. E.g.: "zdtm/static/socket-tcp*" or just "./test/zdtm.py run -a --keep-going --ignore-taint". Thank you in advance, Den
On Wed, Oct 26, 2022 at 11:59 AM Denis V. Lunev <den@virtuozzo.com> wrote: > > On 10/26/22 16:33, Eric Dumazet wrote: > > On Wed, Oct 26, 2022 at 7:30 AM Eric Dumazet <edumazet@google.com> wrote: > >> On Wed, Oct 26, 2022 at 7:12 AM Lu Wei <luwei32@huawei.com> wrote: > >>> The meaning of tp->sacked_out depends on whether sack is enabled > >>> or not. If setsockopt is called to enable sack_ok via > >>> tcp_repair_options_est(), tp->sacked_out should be cleared, or it > >>> will trigger warning in tcp_verify_left_out as follows: > >>> > >>> ============================================ > >>> WARNING: CPU: 8 PID: 0 at net/ipv4/tcp_input.c:2132 > >>> tcp_timeout_mark_lost+0x154/0x160 > >>> tcp_enter_loss+0x2b/0x290 > >>> tcp_retransmit_timer+0x50b/0x640 > >>> tcp_write_timer_handler+0x1c8/0x340 > >>> tcp_write_timer+0xe5/0x140 > >>> call_timer_fn+0x3a/0x1b0 > >>> __run_timers.part.0+0x1bf/0x2d0 > >>> run_timer_softirq+0x43/0xb0 > >>> __do_softirq+0xfd/0x373 > >>> __irq_exit_rcu+0xf6/0x140 > >>> > >>> This warning occurs in several steps: > >>> Step1. If sack is not enabled, when server receives dup-ack, > >>> it calls tcp_add_reno_sack() to increase tp->sacked_out. > >>> > >>> Step2. Setsockopt() is called to enable sack > >>> > >>> Step3. The retransmit timer expires, it calls tcp_timeout_mark_lost() > >>> to increase tp->lost_out but not clear tp->sacked_out because > >>> sack is enabled and tcp_is_reno() is false. > >>> > >>> So tp->left_out is increased repeatly in Step1 and Step3 and it is > >>> greater than tp->packets_out and trigger the warning. In function > >>> tcp_timeout_mark_lost(), tp->sacked_out will be cleared if Step2 not > >>> happen and the warning will not be triggered. So this patch clears > >>> tp->sacked_out in tcp_repair_options_est(). > >>> > >>> Fixes: b139ba4e90dc ("tcp: Repair connection-time negotiated parameters") > >>> Signed-off-by: Lu Wei <luwei32@huawei.com> > >>> --- > >>> net/ipv4/tcp.c | 3 +++ > >>> 1 file changed, 3 insertions(+) > >>> > >>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > >>> index ef14efa1fb70..188d5c0e440f 100644 > >>> --- a/net/ipv4/tcp.c > >>> +++ b/net/ipv4/tcp.c > >>> @@ -3282,6 +3282,9 @@ static int tcp_repair_options_est(struct sock *sk, sockptr_t optbuf, > >>> if (opt.opt_val != 0) > >>> return -EINVAL; > >>> > >>> + if (tcp_is_reno(tp)) > >>> + tp->sacked_out = 0; > >>> + > >>> tp->rx_opt.sack_ok |= TCP_SACK_SEEN; > >>> break; > >>> case TCPOPT_TIMESTAMP: > >>> -- > >>> 2.31.1 > >>> > >> Hmm, I am not sure this is the right fix. > >> > >> Probably TCP_REPAIR_OPTIONS should not be allowed if data has already been sent. > >> > >> Pavel, what do you think ? > > Routing to Denis V. Lunev <den@openvz.org>, because Pavel's address no > > longer works. > > > > Thanks ! > Hi, guys! > > This code is used in CRIU. I have added CRIU maintainers > Andrey Vagin, Pavel Tikhomirov and new address of Pavel > Emelyanov to CC list. > > Here is the quote from Pavel Tikhomirov on the topic. > "We do setsockopt with TCP_REPAIR_OPTIONS in CRIU just > after calling connect to the socket here > https://github.com/checkpoint-restore/criu/blob/18c6426eaeebc5fe7d0f9ca0acb592a3ec828b0c/soccr/soccr.c#L566 > > and before libsoccr_restore_queue. > > So it seems there should be no data sent in this socket at > the moment, so I believe it is safe to prohibit > TCP_REPAIR_OPTIONS if data was already sent. > > Though I'd recomend running some CRIU tests after this > change just to be sure that we don't break it. > E.g.: "zdtm/static/socket-tcp*" or just > "./test/zdtm.py run -a --keep-going --ignore-taint". Thanks for confirming my suspicion. Please Lu submit a different patch. Your patch is only addressing the immediate issue (a WARNNG), but should really be preventing future bug reports because fuzzers will hit other points in the stack, not expecting fundamental TCP options being flipped in the middle of a connection. Thanks > > Thank you in advance, > Den
On Thu, Oct 27, 2022 at 5:00 AM luwei (O) <luwei32@huawei.com> wrote: > > > thanks, I will send next version Yeah, what about first agreeing on what the plans are ? In order to avoid confusion and lkml/netdev bloat, I think you should describe why existing checks are not enough. Since this was probably caught by a fuzzer like syzbot, do you have a repro ?
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index ef14efa1fb70..188d5c0e440f 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3282,6 +3282,9 @@ static int tcp_repair_options_est(struct sock *sk, sockptr_t optbuf, if (opt.opt_val != 0) return -EINVAL; + if (tcp_is_reno(tp)) + tp->sacked_out = 0; + tp->rx_opt.sack_ok |= TCP_SACK_SEEN; break; case TCPOPT_TIMESTAMP: