From patchwork Fri Dec 8 00:52:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 175507 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp5163626vqy; Thu, 7 Dec 2023 16:54:31 -0800 (PST) X-Google-Smtp-Source: AGHT+IGM2+iE02x91j4kPt9sq+ql6rxpkiVZTjs4hgAOfyCDfo+xOrF2YzBft/UjiWbmBnxwHan/ X-Received: by 2002:a05:6358:51ce:b0:170:17eb:1e6 with SMTP id 14-20020a05635851ce00b0017017eb01e6mr3034684rwl.41.1701996870947; Thu, 07 Dec 2023 16:54:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701996870; cv=none; d=google.com; s=arc-20160816; b=BJCFxFhvAlS8lvp4C11y5F2MoBCN6J5acPW44XISd7tKD2pHz8Dq48uUxRj+Db6PKo KqzKOowtMDoi56DUMjkty+ztLwje6jSU4S1cSg4XQr4ov9xUyjWH2giC3noa3XcO1YvO z5Sbg5yg/BXfMe+fVBWXBn1p9lgyLdc8cySrwR5icymTcz9hlA3nGfIIgLPR2jTDcG4M 7m9hQhA0Hr+kLWGhzUeIZV97Rc/v1wgMMj3G6ZqgDzgNdNSl35sEPJZ2ykep9xlCWOJr H52Zt8OyKK9GXW1OsOU0Z174sKFpxxiVhV3P2o0NuNGius5AISUL5QjOX/kq9T7HOk0S MBIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=+q3Q0nSqUx5ks54zyV6WkT43wIv1KXasQh2KaO7Camc=; fh=jiJDDAUhxxqnLDOaLfA63yYuVUdQlpct3IK6rtRzo8Q=; b=GP6kOn+iQY1sGho7Wnd0iOYVrAUAAIVwKn0fcXOPlZuskz82HI/ysJnkKYkMe+aZb5 NnAmHKfFPgphdpA27Xbu+UAPUE8WNO17XUpnTT/0wAIQaFak2Mh15vDTpB0wZshlk4eP pbq9bMy2pRhXnSSprHeEGmPZe73BNcD/48nK6kKRWySh+P/9Sb9STM5MBdLCEv5zr04Z JoRJGsrdZv1kuswHlIwSrmvLl5vmLMNDKr8pUgvPEUo7RqigAFTiTFrrE/zQjIqDyDyI lUFE2fsZ4SpTD1fV+NMnTpna8Y5at1GGpwux2USTaQWPXbLPmffzbhvrIZI36SkmWPdX RPDw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=gmqAQl+a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id b15-20020a630c0f000000b005bcfc60520asi549015pgl.586.2023.12.07.16.54.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Dec 2023 16:54:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=gmqAQl+a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 0FBDB8088A84; Thu, 7 Dec 2023 16:54:27 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1572928AbjLHAxl (ORCPT + 99 others); Thu, 7 Dec 2023 19:53:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41618 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1572895AbjLHAxQ (ORCPT ); Thu, 7 Dec 2023 19:53:16 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6CEA01999 for ; Thu, 7 Dec 2023 16:53:20 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-db547d41413so1395982276.0 for ; Thu, 07 Dec 2023 16:53:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701996799; x=1702601599; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=+q3Q0nSqUx5ks54zyV6WkT43wIv1KXasQh2KaO7Camc=; b=gmqAQl+aXWfBigYktGXkKzQ0WeVGdIx2AxomgDeKeLHHWTxCHEy7acwmd7fLUB9Tvw 6lsc27J39RpEpOQoU99ET3b+di0Oz+ghz9ASOJ3n6NRK5AXNwPTT6GiLUUTKzJb2dNOX sANS4MCFXy5wmjVaGVvDfK4bGFUwBQ0X8M3NiDFsRAANZNE6+hNiLVSpPL7iaVyYagbp SZ7/FPGhGcICK0j2zHEAgoQBxLtXpZRKCFGrEkX3VGHsuPOWRFEqEfVnRo9QaVA9YIwM JIJGVx5fgNVofvOD315vGHrmqJBWLLsB8BkHnO1d1uMp/+4YgBUvw/S8qHP6378KqUfQ AuFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701996799; x=1702601599; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=+q3Q0nSqUx5ks54zyV6WkT43wIv1KXasQh2KaO7Camc=; b=ruiMIepvwWg0hNdX71Z8KKQIIb9n5CCMvAU+dRAn5F+CPAOYTfXhelyT+lBdYLi0Bb 5B/dZ7CeZ+wWibJ0ovLfCHVEzY6BRWxZWrm1D3iLuhYcQirQMpuybM4hr6pTneuTGCfF RREAg4C/4mIspGoi01/hsvjULEjNzhUSdBnVgTxtx7asQiJmrLdX1oNwxQtFcUmenI0O W0NnWYjPflSynyX0WH/J8P4Y8ddKwN3dHWjDhNgvWfl6Aj3fF6jHotYnqWyY8ifo5TaR ntH8YmvWeTnwstDmtrmsJXKbmhSg2IwoSbe5TNQ2FGmq/bP64Maw0p1XSN7BEY4ItzfW hY9A== X-Gm-Message-State: AOJu0Yx2BkSrRnezCjfegWSC0Qqs+ZUKyBxwiIoAcRoGlBWIPfpEKq6S 45igXYH5eHGaxHKzO2G1hwM5u7aMIwetYtqu1g== X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2c4:200:f1cf:c733:235b:9fff]) (user=almasrymina job=sendgmr) by 2002:a05:6902:14d:b0:db5:3aaf:5207 with SMTP id p13-20020a056902014d00b00db53aaf5207mr1545ybh.3.1701996799475; Thu, 07 Dec 2023 16:53:19 -0800 (PST) Date: Thu, 7 Dec 2023 16:52:43 -0800 In-Reply-To: <20231208005250.2910004-1-almasrymina@google.com> Mime-Version: 1.0 References: <20231208005250.2910004-1-almasrymina@google.com> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog Message-ID: <20231208005250.2910004-13-almasrymina@google.com> Subject: [net-next v1 12/16] net: add support for skbs with unreadable frags From: Mina Almasry To: Shailend Chand , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, bpf@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org Cc: Mina Almasry , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jonathan Corbet , Jeroen de Borst , Praveen Kaligineedi , Jesper Dangaard Brouer , Ilias Apalodimas , Arnd Bergmann , David Ahern , Willem de Bruijn , Shuah Khan , Sumit Semwal , " =?utf-8?q?Christian_K=C3=B6nig?= " , Yunsheng Lin , Harshitha Ramamurthy , Shakeel Butt , Willem de Bruijn , Kaiyuan Zhang X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Thu, 07 Dec 2023 16:54:27 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784673071106408973 X-GMAIL-MSGID: 1784673071106408973 For device memory TCP, we expect the skb headers to be available in host memory for access, and we expect the skb frags to be in device memory and unaccessible to the host. We expect there to be no mixing and matching of device memory frags (unaccessible) with host memory frags (accessible) in the same skb. Add a skb->devmem flag which indicates whether the frags in this skb are device memory frags or not. __skb_fill_page_desc() now checks frags added to skbs for page_pool_iovs, and marks the skb as skb->devmem accordingly. Add checks through the network stack to avoid accessing the frags of devmem skbs and avoid coalescing devmem skbs with non devmem skbs. Signed-off-by: Willem de Bruijn Signed-off-by: Kaiyuan Zhang Signed-off-by: Mina Almasry --- Changes in v1: - Rename devmem -> dmabuf (David). - Flip skb_frags_not_readable (Jakub). --- include/linux/skbuff.h | 14 +++++++- include/net/tcp.h | 5 +-- net/core/datagram.c | 6 ++++ net/core/gro.c | 5 ++- net/core/skbuff.c | 77 ++++++++++++++++++++++++++++++++++++------ net/ipv4/tcp.c | 3 ++ net/ipv4/tcp_input.c | 13 +++++-- net/ipv4/tcp_output.c | 5 ++- net/packet/af_packet.c | 4 +-- 9 files changed, 112 insertions(+), 20 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 851f448d2181..61de32ab04ea 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -817,6 +817,8 @@ typedef unsigned char *sk_buff_data_t; * @csum_level: indicates the number of consecutive checksums found in * the packet minus one that have been verified as * CHECKSUM_UNNECESSARY (max 3) + * @dmabuf: indicates that all the fragments in this skb are backed by + * dmabuf. * @dst_pending_confirm: need to confirm neighbour * @decrypted: Decrypted SKB * @slow_gro: state present at GRO time, slower prepare step required @@ -1003,7 +1005,7 @@ struct sk_buff { #if IS_ENABLED(CONFIG_IP_SCTP) __u8 csum_not_inet:1; #endif - + __u8 dmabuf:1; #if defined(CONFIG_NET_SCHED) || defined(CONFIG_NET_XGRESS) __u16 tc_index; /* traffic control index */ #endif @@ -1778,6 +1780,12 @@ static inline void skb_zcopy_downgrade_managed(struct sk_buff *skb) __skb_zcopy_downgrade_managed(skb); } +/* Return true if frags in this skb are readable by the host. */ +static inline bool skb_frags_readable(const struct sk_buff *skb) +{ + return !skb->dmabuf; +} + static inline void skb_mark_not_on_list(struct sk_buff *skb) { skb->next = NULL; @@ -2480,6 +2488,10 @@ static inline void __skb_fill_page_desc(struct sk_buff *skb, int i, struct page *page, int off, int size) { __skb_fill_page_desc_noacc(skb_shinfo(skb), i, page, off, size); + if (page_is_page_pool_iov(page)) { + skb->dmabuf = true; + return; + } /* Propagate page pfmemalloc to the skb if we can. The problem is * that not all callers have unique ownership of the page but rely diff --git a/include/net/tcp.h b/include/net/tcp.h index 973555cb1d3f..0fbf198bdb55 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1017,7 +1017,7 @@ static inline int tcp_skb_mss(const struct sk_buff *skb) static inline bool tcp_skb_can_collapse_to(const struct sk_buff *skb) { - return likely(!TCP_SKB_CB(skb)->eor); + return likely(!TCP_SKB_CB(skb)->eor && skb_frags_readable(skb)); } static inline bool tcp_skb_can_collapse(const struct sk_buff *to, @@ -1025,7 +1025,8 @@ static inline bool tcp_skb_can_collapse(const struct sk_buff *to, { return likely(tcp_skb_can_collapse_to(to) && mptcp_skb_can_collapse(to, from) && - skb_pure_zcopy_same(to, from)); + skb_pure_zcopy_same(to, from) && + skb_frags_readable(to) == skb_frags_readable(from)); } /* Events passed to congestion control interface */ diff --git a/net/core/datagram.c b/net/core/datagram.c index 103d46fa0eeb..f28472ddbaa4 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -426,6 +426,9 @@ static int __skb_datagram_iter(const struct sk_buff *skb, int offset, return 0; } + if (!skb_frags_readable(skb)) + goto short_copy; + /* Copy paged appendix. Hmm... why does this look so complicated? */ for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { int end; @@ -638,6 +641,9 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, if (msg && msg->msg_ubuf && msg->sg_from_iter) return msg->sg_from_iter(sk, skb, from, length); + if (!skb_frags_readable(skb)) + return -EFAULT; + frag = skb_shinfo(skb)->nr_frags; while (length && iov_iter_count(from)) { diff --git a/net/core/gro.c b/net/core/gro.c index 42d7f6755f32..26df48f1b355 100644 --- a/net/core/gro.c +++ b/net/core/gro.c @@ -390,6 +390,9 @@ static void gro_pull_from_frag0(struct sk_buff *skb, int grow) { struct skb_shared_info *pinfo = skb_shinfo(skb); + if (WARN_ON_ONCE(!skb_frags_readable(skb))) + return; + BUG_ON(skb->end - skb->tail < grow); memcpy(skb_tail_pointer(skb), NAPI_GRO_CB(skb)->frag0, grow); @@ -411,7 +414,7 @@ static void gro_try_pull_from_frag0(struct sk_buff *skb) { int grow = skb_gro_offset(skb) - skb_headlen(skb); - if (grow > 0) + if (grow > 0 && skb_frags_readable(skb)) gro_pull_from_frag0(skb, grow); } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 2ce64f57a0f6..50b1b7c2ef7b 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1235,6 +1235,14 @@ void skb_dump(const char *level, const struct sk_buff *skb, bool full_pkt) struct page *p; u8 *vaddr; + if (skb_frag_is_page_pool_iov(frag)) { + printk("%sskb frag %d: not readable\n", level, i); + len -= frag->bv_len; + if (!len) + break; + continue; + } + skb_frag_foreach_page(frag, skb_frag_off(frag), skb_frag_size(frag), p, p_off, p_len, copied) { @@ -1812,6 +1820,9 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask) if (skb_shared(skb) || skb_unclone(skb, gfp_mask)) return -EINVAL; + if (!skb_frags_readable(skb)) + return -EFAULT; + if (!num_frags) goto release; @@ -1982,8 +1993,12 @@ struct sk_buff *skb_copy(const struct sk_buff *skb, gfp_t gfp_mask) { int headerlen = skb_headroom(skb); unsigned int size = skb_end_offset(skb) + skb->data_len; - struct sk_buff *n = __alloc_skb(size, gfp_mask, - skb_alloc_rx_flag(skb), NUMA_NO_NODE); + struct sk_buff *n; + + if (!skb_frags_readable(skb)) + return NULL; + + n = __alloc_skb(size, gfp_mask, skb_alloc_rx_flag(skb), NUMA_NO_NODE); if (!n) return NULL; @@ -2309,14 +2324,16 @@ struct sk_buff *skb_copy_expand(const struct sk_buff *skb, int newheadroom, int newtailroom, gfp_t gfp_mask) { - /* - * Allocate the copy buffer - */ - struct sk_buff *n = __alloc_skb(newheadroom + skb->len + newtailroom, - gfp_mask, skb_alloc_rx_flag(skb), - NUMA_NO_NODE); int oldheadroom = skb_headroom(skb); int head_copy_len, head_copy_off; + struct sk_buff *n; + + if (!skb_frags_readable(skb)) + return NULL; + + /* Allocate the copy buffer */ + n = __alloc_skb(newheadroom + skb->len + newtailroom, gfp_mask, + skb_alloc_rx_flag(skb), NUMA_NO_NODE); if (!n) return NULL; @@ -2655,6 +2672,9 @@ void *__pskb_pull_tail(struct sk_buff *skb, int delta) */ int i, k, eat = (skb->tail + delta) - skb->end; + if (!skb_frags_readable(skb)) + return NULL; + if (eat > 0 || skb_cloned(skb)) { if (pskb_expand_head(skb, 0, eat > 0 ? eat + 128 : 0, GFP_ATOMIC)) @@ -2808,6 +2828,9 @@ int skb_copy_bits(const struct sk_buff *skb, int offset, void *to, int len) to += copy; } + if (!skb_frags_readable(skb)) + goto fault; + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { int end; skb_frag_t *f = &skb_shinfo(skb)->frags[i]; @@ -2996,6 +3019,9 @@ static bool __skb_splice_bits(struct sk_buff *skb, struct pipe_inode_info *pipe, /* * then map the fragments */ + if (!skb_frags_readable(skb)) + return false; + for (seg = 0; seg < skb_shinfo(skb)->nr_frags; seg++) { const skb_frag_t *f = &skb_shinfo(skb)->frags[seg]; @@ -3219,6 +3245,9 @@ int skb_store_bits(struct sk_buff *skb, int offset, const void *from, int len) from += copy; } + if (!skb_frags_readable(skb)) + goto fault; + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; int end; @@ -3298,6 +3327,9 @@ __wsum __skb_checksum(const struct sk_buff *skb, int offset, int len, pos = copy; } + if (!skb_frags_readable(skb)) + return 0; + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { int end; skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; @@ -3398,6 +3430,9 @@ __wsum skb_copy_and_csum_bits(const struct sk_buff *skb, int offset, pos = copy; } + if (!skb_frags_readable(skb)) + return 0; + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { int end; @@ -3888,7 +3923,9 @@ static inline void skb_split_inside_header(struct sk_buff *skb, skb_shinfo(skb1)->frags[i] = skb_shinfo(skb)->frags[i]; skb_shinfo(skb1)->nr_frags = skb_shinfo(skb)->nr_frags; + skb1->dmabuf = skb->dmabuf; skb_shinfo(skb)->nr_frags = 0; + skb->dmabuf = 0; skb1->data_len = skb->data_len; skb1->len += skb1->data_len; skb->data_len = 0; @@ -3902,6 +3939,7 @@ static inline void skb_split_no_header(struct sk_buff *skb, { int i, k = 0; const int nfrags = skb_shinfo(skb)->nr_frags; + const int dmabuf = skb->dmabuf; skb_shinfo(skb)->nr_frags = 0; skb1->len = skb1->data_len = skb->len - len; @@ -3935,6 +3973,16 @@ static inline void skb_split_no_header(struct sk_buff *skb, pos += size; } skb_shinfo(skb1)->nr_frags = k; + + if (skb_shinfo(skb)->nr_frags) + skb->dmabuf = dmabuf; + else + skb->dmabuf = 0; + + if (skb_shinfo(skb1)->nr_frags) + skb1->dmabuf = dmabuf; + else + skb1->dmabuf = 0; } /** @@ -4170,6 +4218,9 @@ unsigned int skb_seq_read(unsigned int consumed, const u8 **data, return block_limit - abs_offset; } + if (!skb_frags_readable(st->cur_skb)) + return 0; + if (st->frag_idx == 0 && !st->frag_data) st->stepped_offset += skb_headlen(st->cur_skb); @@ -5784,7 +5835,10 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, (from->pp_recycle && skb_cloned(from))) return false; - if (len <= skb_tailroom(to)) { + if (skb_frags_readable(from) != skb_frags_readable(to)) + return false; + + if (len <= skb_tailroom(to) && skb_frags_readable(from)) { if (len) BUG_ON(skb_copy_bits(from, 0, skb_put(to, len), len)); *delta_truesize = 0; @@ -5959,6 +6013,9 @@ int skb_ensure_writable(struct sk_buff *skb, unsigned int write_len) if (!pskb_may_pull(skb, write_len)) return -ENOMEM; + if (!skb_frags_readable(skb)) + return -EFAULT; + if (!skb_cloned(skb) || skb_clone_writable(skb, write_len)) return 0; @@ -6613,7 +6670,7 @@ void skb_condense(struct sk_buff *skb) { if (skb->data_len) { if (skb->data_len > skb->end - skb->tail || - skb_cloned(skb)) + skb_cloned(skb) || !skb_frags_readable(skb)) return; /* Nice, we can free page frag(s) right now */ diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index e22681c4bfac..5a3135e93d3d 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2140,6 +2140,9 @@ static int tcp_zerocopy_receive(struct sock *sk, skb = tcp_recv_skb(sk, seq, &offset); } + if (!skb_frags_readable(skb)) + break; + if (TCP_SKB_CB(skb)->has_rxtstamp) { tcp_update_recv_tstamps(skb, tss); zc->msg_flags |= TCP_CMSG_TS; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 0548f0c12155..a47f98187656 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5309,6 +5309,9 @@ tcp_collapse(struct sock *sk, struct sk_buff_head *list, struct rb_root *root, for (end_of_skbs = true; skb != NULL && skb != tail; skb = n) { n = tcp_skb_next(skb, list); + if (!skb_frags_readable(skb)) + goto skip_this; + /* No new bits? It is possible on ofo queue. */ if (!before(start, TCP_SKB_CB(skb)->end_seq)) { skb = tcp_collapse_one(sk, skb, list, root); @@ -5329,17 +5332,20 @@ tcp_collapse(struct sock *sk, struct sk_buff_head *list, struct rb_root *root, break; } - if (n && n != tail && mptcp_skb_can_collapse(skb, n) && + if (n && n != tail && skb_frags_readable(n) && + mptcp_skb_can_collapse(skb, n) && TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(n)->seq) { end_of_skbs = false; break; } +skip_this: /* Decided to skip this, advance start seq. */ start = TCP_SKB_CB(skb)->end_seq; } if (end_of_skbs || - (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN))) + (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN)) || + !skb_frags_readable(skb)) return; __skb_queue_head_init(&tmp); @@ -5383,7 +5389,8 @@ tcp_collapse(struct sock *sk, struct sk_buff_head *list, struct rb_root *root, if (!skb || skb == tail || !mptcp_skb_can_collapse(nskb, skb) || - (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN))) + (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN)) || + !skb_frags_readable(skb)) goto end; #ifdef CONFIG_TLS_DEVICE if (skb->decrypted != nskb->decrypted) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index eb13a55d660c..c8c0a1cbaca5 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2343,7 +2343,8 @@ static bool tcp_can_coalesce_send_queue_head(struct sock *sk, int len) if (unlikely(TCP_SKB_CB(skb)->eor) || tcp_has_tx_tstamp(skb) || - !skb_pure_zcopy_same(skb, next)) + !skb_pure_zcopy_same(skb, next) || + skb_frags_readable(skb) != skb_frags_readable(next)) return false; len -= skb->len; @@ -3227,6 +3228,8 @@ static bool tcp_can_collapse(const struct sock *sk, const struct sk_buff *skb) return false; if (skb_cloned(skb)) return false; + if (!skb_frags_readable(skb)) + return false; /* Some heuristics for collapsing over SACK'd could be invented */ if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) return false; diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index f92edba4c40f..33988106f237 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2156,7 +2156,7 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev, } } - snaplen = skb->len; + snaplen = skb_frags_readable(skb) ? skb->len : skb_headlen(skb); res = run_filter(skb, sk, snaplen); if (!res) @@ -2276,7 +2276,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, } } - snaplen = skb->len; + snaplen = skb_frags_readable(skb) ? skb->len : skb_headlen(skb); res = run_filter(skb, sk, snaplen); if (!res)