Message ID | 108791.1695199151@warthog.procyon.org.uk |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp4282856vqi; Wed, 20 Sep 2023 09:58:07 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFJyy2loFHAcQMwSybd6ynCJ9nIje8DUiWeCGVXb6Y12VkCY05RxsBLC4IbLnSmSlO2RtpR X-Received: by 2002:a05:6358:998a:b0:142:d678:f708 with SMTP id j10-20020a056358998a00b00142d678f708mr4519981rwb.19.1695229087152; Wed, 20 Sep 2023 09:58:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695229087; cv=none; d=google.com; s=arc-20160816; b=nWbhVSk8dcbrZ1kBCSKtEAIc+32D1bv1Y2i9f+E6WoGW5m4nzJJVkyi6CVBVY6nMLR NbiD8RJFbSTVvtqBmlaiqxvuYEq5xYSvaldd1vfJfCFLHWfBJq8HUpOSfe+mHbNDl0MT XiFdIvx6HlpJuFUscCMYo4V3tHEYMSVouc5HDy1afAUpPSo0O6vih+5j4CJRYY8qQyvf 8QBCH55USFIXWlAoUqHKnZJp8RkARUNwJaMLWrDyL7pEOMvu5q5QT+GBVeGb+8m9Pq1g QMFaLoNTAqwEUT1fDFqaH06+P7UpodUyr4yn2hmWMn1DP0n/RId8pVhfyOzDXNoFGY0i b5HA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:content-transfer-encoding :content-id:mime-version:subject:cc:to:from:organization :dkim-signature; bh=+3yBpA5iz82pce6LfLZNujDLBSjE7ZVN1nK9Rsm3ufc=; fh=kQe84kwE5TxPvvkb4B0IYRcMTipN2dYXirw+2l8D2KU=; b=Ruf65SA3B0qx9Uh9n9/mimowJSLzEbZcr/VtlJ863Bv9q0tNIfpejS59W++eXVydR4 k6bXq77aCf42Hdt/uBu0SXccdtkIqfoCSwxQ5rv3eBeFVnIxWslrviP62Q7JsoXlT4J3 jVu6WUZd2yoXrlg5u7j+NWiSRE7ZTBX6iJeDWODfE7mB9u4+cJwzKf1el9sBPcmZ+NR/ XQiPA5GlPi7Eom1oOXUQ/+Gcuv/PldH+/+fQIjFB4cmrBYoQQeHXPUpewML9u3DeFujN ok0cyiDhzZQAgyRLNUjOCuW7zs8Nq0vOghGhA/uYAtLosGCm+HJH+cW6Pmhbt8FXlYzo Mc7A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=czUj6an4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id v184-20020a6389c1000000b00578e01b020esi2034968pgd.751.2023.09.20.09.58.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Sep 2023 09:58:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=czUj6an4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 5D6C78047551; Wed, 20 Sep 2023 01:40:10 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233451AbjITIkI (ORCPT <rfc822;toshivichauhan@gmail.com> + 26 others); Wed, 20 Sep 2023 04:40:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38254 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232488AbjITIkG (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 20 Sep 2023 04:40:06 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7CBAC90 for <linux-kernel@vger.kernel.org>; Wed, 20 Sep 2023 01:39:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1695199162; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=+3yBpA5iz82pce6LfLZNujDLBSjE7ZVN1nK9Rsm3ufc=; b=czUj6an4tmC8yvo5TCg5v68hUY+mos8NhxGUIUWLHDkmIWnT0OBj7AHmXiH4nETIcsUj+1 rWY5CjoohHp5KXRSB3Mf6NPY0WnNdERIfQeXy4mNW2/4U8RztNFhJHj3p6FDT8diEjFKbd Wm+yOvWnd1VuywC6/NiPZZnAvwJvkf4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-169-whr9bxVyOA-yv0PPo12ZiA-1; Wed, 20 Sep 2023 04:39:15 -0400 X-MC-Unique: whr9bxVyOA-yv0PPo12ZiA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 139D385A5BA; Wed, 20 Sep 2023 08:39:14 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.42.28.216]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5788E2156701; Wed, 20 Sep 2023 08:39:12 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells <dhowells@redhat.com> To: netdev@vger.kernel.org cc: dhowells@redhat.com, syzbot+62cbf263225ae13ff153@syzkaller.appspotmail.com, Eric Dumazet <edumazet@google.com>, Willem de Bruijn <willemdebruijn.kernel@gmail.com>, "David S. Miller" <davem@davemloft.net>, David Ahern <dsahern@kernel.org>, Paolo Abeni <pabeni@redhat.com>, Jakub Kicinski <kuba@kernel.org>, bpf@vger.kernel.org, syzkaller-bugs@googlegroups.com, linux-kernel@vger.kernel.org Subject: [PATCH net v2] ipv4, ipv6: Fix handling of transhdrlen in __ip{,6}_append_data() MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <108790.1695199151.1@warthog.procyon.org.uk> Content-Transfer-Encoding: quoted-printable Date: Wed, 20 Sep 2023 09:39:11 +0100 Message-ID: <108791.1695199151@warthog.procyon.org.uk> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Wed, 20 Sep 2023 01:40:10 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777576535083531293 X-GMAIL-MSGID: 1777576535083531293 |
Series |
[net,v2] ipv4, ipv6: Fix handling of transhdrlen in __ip{,6}_append_data()
|
|
Commit Message
David Howells
Sept. 20, 2023, 8:39 a.m. UTC
Including the transhdrlen in length is a problem when the packet is
partially filled (e.g. something like send(MSG_MORE) happened previously)
when appending to an IPv4 or IPv6 packet as we don't want to repeat the
transport header or account for it twice. This can happen under some
circumstances, such as splicing into an L2TP socket.
The symptom observed is a warning in __ip6_append_data():
WARNING: CPU: 1 PID: 5042 at net/ipv6/ip6_output.c:1800 __ip6_append_data.isra.0+0x1be8/0x47f0 net/ipv6/ip6_output.c:1800
that occurs when MSG_SPLICE_PAGES is used to append more data to an already
partially occupied skbuff. The warning occurs when 'copy' is larger than
the amount of data in the message iterator. This is because the requested
length includes the transport header length when it shouldn't. This can be
triggered by, for example:
sfd = socket(AF_INET6, SOCK_DGRAM, IPPROTO_L2TP);
bind(sfd, ...); // ::1
connect(sfd, ...); // ::1 port 7
send(sfd, buffer, 4100, MSG_MORE);
sendfile(sfd, dfd, NULL, 1024);
Fix this by deducting transhdrlen from length in ip{,6}_append_data() right
before we clear transhdrlen if there is already a packet that we're going
to try appending to.
Reported-by: syzbot+62cbf263225ae13ff153@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/0000000000001c12b30605378ce8@google.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Dumazet <edumazet@google.com>
cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: David Ahern <dsahern@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: netdev@vger.kernel.org
cc: bpf@vger.kernel.org
cc: syzkaller-bugs@googlegroups.com
Link: https://lore.kernel.org/r/75315.1695139973@warthog.procyon.org.uk/ # v1
---
net/ipv4/ip_output.c | 1 +
net/ipv6/ip6_output.c | 1 +
2 files changed, 2 insertions(+)
Comments
David Howells wrote: > Including the transhdrlen in length is a problem when the packet is > partially filled (e.g. something like send(MSG_MORE) happened previously) > when appending to an IPv4 or IPv6 packet as we don't want to repeat the > transport header or account for it twice. This can happen under some > circumstances, such as splicing into an L2TP socket. > > The symptom observed is a warning in __ip6_append_data(): > > WARNING: CPU: 1 PID: 5042 at net/ipv6/ip6_output.c:1800 __ip6_append_data.isra.0+0x1be8/0x47f0 net/ipv6/ip6_output.c:1800 > > that occurs when MSG_SPLICE_PAGES is used to append more data to an already > partially occupied skbuff. The warning occurs when 'copy' is larger than > the amount of data in the message iterator. This is because the requested > length includes the transport header length when it shouldn't. This can be > triggered by, for example: > > sfd = socket(AF_INET6, SOCK_DGRAM, IPPROTO_L2TP); > bind(sfd, ...); // ::1 > connect(sfd, ...); // ::1 port 7 > send(sfd, buffer, 4100, MSG_MORE); > sendfile(sfd, dfd, NULL, 1024); > > Fix this by deducting transhdrlen from length in ip{,6}_append_data() right > before we clear transhdrlen if there is already a packet that we're going > to try appending to. > > Reported-by: syzbot+62cbf263225ae13ff153@syzkaller.appspotmail.com > Link: https://lore.kernel.org/r/0000000000001c12b30605378ce8@google.com/ > Signed-off-by: David Howells <dhowells@redhat.com> > cc: Eric Dumazet <edumazet@google.com> > cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> > cc: "David S. Miller" <davem@davemloft.net> > cc: David Ahern <dsahern@kernel.org> > cc: Paolo Abeni <pabeni@redhat.com> > cc: Jakub Kicinski <kuba@kernel.org> > cc: netdev@vger.kernel.org > cc: bpf@vger.kernel.org > cc: syzkaller-bugs@googlegroups.com > Link: https://lore.kernel.org/r/75315.1695139973@warthog.procyon.org.uk/ # v1 > --- > net/ipv4/ip_output.c | 1 + > net/ipv6/ip6_output.c | 1 + > 2 files changed, 2 insertions(+) > > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c > index 4ab877cf6d35..9646f2d9afcf 100644 > --- a/net/ipv4/ip_output.c > +++ b/net/ipv4/ip_output.c > @@ -1354,6 +1354,7 @@ int ip_append_data(struct sock *sk, struct flowi4 *fl4, > if (err) > return err; > } else { > + length -= transhdrlen; > transhdrlen = 0; > } > > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c > index 54fc4c711f2c..6a4ce7f622e9 100644 > --- a/net/ipv6/ip6_output.c > +++ b/net/ipv6/ip6_output.c > @@ -1888,6 +1888,7 @@ int ip6_append_data(struct sock *sk, > length += exthdrlen; > transhdrlen += exthdrlen; > } else { > + length -= transhdrlen; > transhdrlen = 0; > } > Definitely a much simpler patch, thanks. So the current model is that callers with non-zero transhdrlen always pass to __ip_append_data payload length + transhdrlen. I do see that udp does this: ulen += sizeof(struct udphdr); This calls ip_make_skb if not corked, but directly ip_append_data if corked. Then __ip_append_data will use transhdrlen in its packet calculations, and reset that to zero after allocating the first new skb. So if corked *and* fragmentation, which would cause a new skb to be allocated, the next skb would incorrectly reserve udp header space, because the second __ip_append_data call will again pass transhdrlen. If so, then this patch fixes that. But that has never been reported, so I'm most likely misreading some part.. So on the surface this makes sense to me. But I need to read it more closely still. The most risk-averse version would limit this change explicitly to MSG_SPLICE_PAGES calls. FWIW I think MSG_ZEROCOPY is somewhat immune compared to MSG_SPLCE_PAGES solely because it is limited to TCP, UDP and RDS sockets.
On Wed, Sep 20, 2023 at 9:54 AM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > David Howells wrote: > > Including the transhdrlen in length is a problem when the packet is > > partially filled (e.g. something like send(MSG_MORE) happened previously) > > when appending to an IPv4 or IPv6 packet as we don't want to repeat the > > transport header or account for it twice. This can happen under some > > circumstances, such as splicing into an L2TP socket. > > > > The symptom observed is a warning in __ip6_append_data(): > > > > WARNING: CPU: 1 PID: 5042 at net/ipv6/ip6_output.c:1800 __ip6_append_data.isra.0+0x1be8/0x47f0 net/ipv6/ip6_output.c:1800 > > > > that occurs when MSG_SPLICE_PAGES is used to append more data to an already > > partially occupied skbuff. The warning occurs when 'copy' is larger than > > the amount of data in the message iterator. This is because the requested > > length includes the transport header length when it shouldn't. This can be > > triggered by, for example: > > > > sfd = socket(AF_INET6, SOCK_DGRAM, IPPROTO_L2TP); > > bind(sfd, ...); // ::1 > > connect(sfd, ...); // ::1 port 7 > > send(sfd, buffer, 4100, MSG_MORE); > > sendfile(sfd, dfd, NULL, 1024); > > > > Fix this by deducting transhdrlen from length in ip{,6}_append_data() right > > before we clear transhdrlen if there is already a packet that we're going > > to try appending to. > > > > Reported-by: syzbot+62cbf263225ae13ff153@syzkaller.appspotmail.com > > Link: https://lore.kernel.org/r/0000000000001c12b30605378ce8@google.com/ > > Signed-off-by: David Howells <dhowells@redhat.com> > > cc: Eric Dumazet <edumazet@google.com> > > cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> > > cc: "David S. Miller" <davem@davemloft.net> > > cc: David Ahern <dsahern@kernel.org> > > cc: Paolo Abeni <pabeni@redhat.com> > > cc: Jakub Kicinski <kuba@kernel.org> > > cc: netdev@vger.kernel.org > > cc: bpf@vger.kernel.org > > cc: syzkaller-bugs@googlegroups.com > > Link: https://lore.kernel.org/r/75315.1695139973@warthog.procyon.org.uk/ # v1 > > --- > > net/ipv4/ip_output.c | 1 + > > net/ipv6/ip6_output.c | 1 + > > 2 files changed, 2 insertions(+) > > > > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c > > index 4ab877cf6d35..9646f2d9afcf 100644 > > --- a/net/ipv4/ip_output.c > > +++ b/net/ipv4/ip_output.c > > @@ -1354,6 +1354,7 @@ int ip_append_data(struct sock *sk, struct flowi4 *fl4, > > if (err) > > return err; > > } else { > > + length -= transhdrlen; > > transhdrlen = 0; > > } > > > > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c > > index 54fc4c711f2c..6a4ce7f622e9 100644 > > --- a/net/ipv6/ip6_output.c > > +++ b/net/ipv6/ip6_output.c > > @@ -1888,6 +1888,7 @@ int ip6_append_data(struct sock *sk, > > length += exthdrlen; > > transhdrlen += exthdrlen; > > } else { > > + length -= transhdrlen; > > transhdrlen = 0; > > } > > > > Definitely a much simpler patch, thanks. > > So the current model is that callers with non-zero transhdrlen always > pass to __ip_append_data payload length + transhdrlen. > > I do see that udp does this: ulen += sizeof(struct udphdr); This calls > ip_make_skb if not corked, but directly ip_append_data if corked. > > Then __ip_append_data will use transhdrlen in its packet calculations, > and reset that to zero after allocating the first new skb. > > So if corked *and* fragmentation, which would cause a new skb to be > allocated, the next skb would incorrectly reserve udp header space, > because the second __ip_append_data call will again pass transhdrlen. > If so, then this patch fixes that. But that has never been reported, > so I'm most likely misreading some part.. This works today because udp only includes transhdrlen if not corked. In udpv6_sendmsg: if (up->pending) { ... goto do_append_data; } ulen += sizeof(struct udphdr); So ip6_append_data is called with ulen == len once data is pending, so subtracting transhdrlen (which is still sizeof(udphdr)) would not be correct. l2tp_ip6_sendmsg more or less follows udpv6_sendmsg, but it unconditionally sets ulen = len + transhdrlen. So maybe the fix is in L2TP: +++ b/net/l2tp/l2tp_ip6.c @@ -507,7 +507,6 @@ static int l2tp_ip6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) */ if (len > INT_MAX - transhdrlen) return -EMSGSIZE; - ulen = len + transhdrlen; /* Mirror BSD error message compatibility */ if (msg->msg_flags & MSG_OOB) @@ -628,6 +627,7 @@ static int l2tp_ip6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) back_from_confirm: lock_sock(sk); + ulen = len + skb_queue_empty(&sk->sk_write_queue) ? transhdrlen : 0; As said, only raw, udp and l2p can possibly pass MSG_MORE and so cause secondary invocations of ip6_append_data for the same send. With raw passing transhdrlen 0, and udp as discussed above, we only have to consider l2tp.
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 4ab877cf6d35..9646f2d9afcf 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1354,6 +1354,7 @@ int ip_append_data(struct sock *sk, struct flowi4 *fl4, if (err) return err; } else { + length -= transhdrlen; transhdrlen = 0; } diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 54fc4c711f2c..6a4ce7f622e9 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1888,6 +1888,7 @@ int ip6_append_data(struct sock *sk, length += exthdrlen; transhdrlen += exthdrlen; } else { + length -= transhdrlen; transhdrlen = 0; }