From patchwork Thu Apr 6 09:42:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 80152 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp894937vqo; Thu, 6 Apr 2023 02:45:52 -0700 (PDT) X-Google-Smtp-Source: AKy350ZdQjOVMYurR0XtnzAYiY1FtUroijKkvYB0FDFOjpdk3GZBOWz5Pcxqy6eWLmmhaJ5fLTRF X-Received: by 2002:a17:903:283:b0:1a2:3ddc:6286 with SMTP id j3-20020a170903028300b001a23ddc6286mr10876893plr.49.1680774352099; Thu, 06 Apr 2023 02:45:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680774352; cv=none; d=google.com; s=arc-20160816; b=tCHVpM6BGT2vl8zI70hfAje/TxZ4CAtGpDIEB/7qr3pZjAG088Re2F9Jyfz2QP/9BW B5Cg0fuYJeZUkkKbI/Yh6qbxtSynKF3BKtWd6RASy3poeh5j5C+jLuA+Bm3NACl8l3wD vkS3cz1vxtVWm6iUsQD/Fz0LzJ6hxxUKYzJHlfun30WoUX/38/dodG+yIgfUtCfy2+kj SHbkDhLsVT/PJQ9V7GPxymCJMJUy1I4RpKU8vYeXg+QUgg+u6AnazK9uPcHclBtOUjXX aZKSwUaqpAhoQVeYaGNjP1PqwNnBcIc/2HwUvLUotKqjjM+Kgn/CW0poCAb8Ny0y0wqO iV/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=QAACn2mr7ooi1DdUlemHLPNq4itk2LGg3Qvic6fgj7U=; b=FeMXLC0mf5g/i3D3sToKOR8yh+kt57ArW1FFPec1jjQmxh6CZTYFZj03t7OGubm/U5 JP0alvyiFC9Qm+oVBhztQqGkiP7U/02bTokwNTl/hO2pEhhvFUISghCSVGcSqdic/w6u 9+J003Ca44vmfErmtsMTPoDsI6zlNXFKQ3TsVRsgecpxfpa4JDXql6uPhReNSq8vB7pI e0K3OipihqMGQ0wGNAOBuJtk6Oi0JSEu2l0COUKypqloApMvvNfLO5MsV8Q1d//A4uB1 3VD/2SnE/RQP+lG7U1LETrqR7swubykqmad53q8jTbcj2QgBL97RMYoZmZna44EOn8jj WDXA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dDASBm9X; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id li12-20020a170903294c00b0019f3d6dfd19si1155561plb.471.2023.04.06.02.45.39; Thu, 06 Apr 2023 02:45:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dDASBm9X; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236554AbjDFJoA (ORCPT + 99 others); Thu, 6 Apr 2023 05:44:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51812 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236520AbjDFJnu (ORCPT ); Thu, 6 Apr 2023 05:43:50 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C394D7AA6 for ; Thu, 6 Apr 2023 02:43:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680774184; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QAACn2mr7ooi1DdUlemHLPNq4itk2LGg3Qvic6fgj7U=; b=dDASBm9XXEld27rIX5q3MIsbSoWgHLQBluwP3tYSzENIBldXIpKaIhVkr1Zhwgn3aLZ5RY ZSgIT9FSMa9UNaE9fSKJUrNoFZBk12DFn9L9uYYXUsCdZ9M978zUh2jcKAPIk4/mSwu3Tr sZ8GPRcdhtWGa2xYWZgJbyif0Y0Nmdc= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-407-oFbn2Db-OmiMAeWnu6JDQg-1; Thu, 06 Apr 2023 05:42:53 -0400 X-MC-Unique: oFbn2Db-OmiMAeWnu6JDQg-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 16D458996E2; Thu, 6 Apr 2023 09:42:53 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 06D00492C3E; Thu, 6 Apr 2023 09:42:50 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , Matthew Wilcox , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Willem de Bruijn Subject: [PATCH net-next v5 01/19] net: Declare MSG_SPLICE_PAGES internal sendmsg() flag Date: Thu, 6 Apr 2023 10:42:27 +0100 Message-Id: <20230406094245.3633290-2-dhowells@redhat.com> In-Reply-To: <20230406094245.3633290-1-dhowells@redhat.com> References: <20230406094245.3633290-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762419646797857881?= X-GMAIL-MSGID: =?utf-8?q?1762419646797857881?= Declare MSG_SPLICE_PAGES, an internal sendmsg() flag, that hints to a network protocol that it should splice pages from the source iterator rather than copying the data if it can. This flag is added to a list that is cleared by sendmsg syscalls on entry. This is intended as a replacement for the ->sendpage() op, allowing a way to splice in several multipage folios in one go. Signed-off-by: David Howells Reviewed-by: Willem de Bruijn cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- include/linux/socket.h | 3 +++ net/socket.c | 2 ++ 2 files changed, 5 insertions(+) diff --git a/include/linux/socket.h b/include/linux/socket.h index 13c3a237b9c9..bd1cc3238851 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -327,6 +327,7 @@ struct ucred { */ #define MSG_ZEROCOPY 0x4000000 /* Use user data in kernel path */ +#define MSG_SPLICE_PAGES 0x8000000 /* Splice the pages from the iterator in sendmsg() */ #define MSG_FASTOPEN 0x20000000 /* Send data in TCP SYN */ #define MSG_CMSG_CLOEXEC 0x40000000 /* Set close_on_exec for file descriptor received through @@ -337,6 +338,8 @@ struct ucred { #define MSG_CMSG_COMPAT 0 /* We never have 32 bit fixups */ #endif +/* Flags to be cleared on entry by sendmsg and sendmmsg syscalls */ +#define MSG_INTERNAL_SENDMSG_FLAGS (MSG_SPLICE_PAGES) /* Setsockoptions(2) level. Thanks to BSD these must match IPPROTO_xxx */ #define SOL_IP 0 diff --git a/net/socket.c b/net/socket.c index 73e493da4589..b3fd3f7f7e03 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2136,6 +2136,7 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags, msg.msg_name = (struct sockaddr *)&address; msg.msg_namelen = addr_len; } + flags &= ~MSG_INTERNAL_SENDMSG_FLAGS; if (sock->file->f_flags & O_NONBLOCK) flags |= MSG_DONTWAIT; msg.msg_flags = flags; @@ -2483,6 +2484,7 @@ static int ____sys_sendmsg(struct socket *sock, struct msghdr *msg_sys, } msg_sys->msg_flags = flags; + flags &= ~MSG_INTERNAL_SENDMSG_FLAGS; if (sock->file->f_flags & O_NONBLOCK) msg_sys->msg_flags |= MSG_DONTWAIT; /* From patchwork Thu Apr 6 09:42:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 80156 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp895132vqo; Thu, 6 Apr 2023 02:46:18 -0700 (PDT) X-Google-Smtp-Source: AKy350Zrk/vf1P0qEKsuseHdMX6CVFiczBi00UA0XBR3Z/W+20tbqc42K3GGhod9ao0QWw5Dp0O5 X-Received: by 2002:a05:6a20:6d0a:b0:da:38d2:c27a with SMTP id fv10-20020a056a206d0a00b000da38d2c27amr2181185pzb.1.1680774377937; Thu, 06 Apr 2023 02:46:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680774377; cv=none; d=google.com; s=arc-20160816; b=X0DgtvDu6aXTKenrZwvGTSfkv5iBGx2qBIH70HBTkXDYRkFhlqqmqcM+pcNVtOiRIz 8/Wog62M+gwjJMmu8NFxxdxNVOIUuPRWm1ONsklwF9GxvzrkvcQWSiogbpzcqgqynjvJ c9ygYiASabivuZX/zzafGhi1Gu0O36wakREBkIhrjuGmGFQYNx6BkwAiGgpgwj96T/jx HdFbgWwPgBwj9VidWZPZvELwRHNNsGhthLloRPKTCgOh/V4pw2MsfQ3VIQqqx22tTVcy ZoU/nYjMLDwW9jcIL2fRUbAFiZ6o/AwSuyn+b2p62W0L9RI2o93sMVioXj4hyIWUF0Q+ lv1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=GgIr9+xYtjWCQCpR/5vL0XO1pMxYzzO4LkH470whdw8=; b=SqKlmN1joS61FTzKlNLHPYFb1r+zxGune304BKgMeO6csJL4oZcF8HZXp4hxp/5MzR R10SFfUx6KzyJP83h11HCAqAXjkmqSuMcWYBOxnKSo0LsUZSHOC+gJbXDRfQkZog/5ZQ UHuNSu5mKCjkBio6FbaEPxspv5zcBi6jhUrix/MrFHuQXOo85AyOcPkGIxSQD5sV2axP SaCrq8iLn2LcDUy++hqZ0gJMz3Rd1+BaQlzsCvcD/XkTElc3aU93ysajkQsOLQEbmNF3 RV8Tj/VrlHRwPVau5Go5KFEL1VVThmLpJ9H4B77OUFR2RKNF1/gJZPyMPPLNYq5iAzXz fvbA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Iivif0QN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y187-20020a638ac4000000b004fc274006ddsi864209pgd.670.2023.04.06.02.46.03; Thu, 06 Apr 2023 02:46:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Iivif0QN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236536AbjDFJn6 (ORCPT + 99 others); Thu, 6 Apr 2023 05:43:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51808 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235204AbjDFJnr (ORCPT ); Thu, 6 Apr 2023 05:43:47 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 866CB7695 for ; Thu, 6 Apr 2023 02:43:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680774180; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GgIr9+xYtjWCQCpR/5vL0XO1pMxYzzO4LkH470whdw8=; b=Iivif0QNcR4v5spsbK1Jorzyx9J/AHfWELBE12agKRyMajHr5n2KhbGNvW8DRnrXYbZR2U L0Fc/s1m2fPil7HpdgqerggdpQ3a7NodBsIacjn6JJoYDxe0KZDU882amUBo436/0RHhea e/kQbwKvvj/kdpo+Hz7Iz2ffuEk5Wvg= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-451-1ShmMRvZOOedJ3J350iYHg-1; Thu, 06 Apr 2023 05:42:56 -0400 X-MC-Unique: 1ShmMRvZOOedJ3J350iYHg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D33758996E2; Thu, 6 Apr 2023 09:42:55 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id BE81240C83AC; Thu, 6 Apr 2023 09:42:53 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , Matthew Wilcox , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton Subject: [PATCH net-next v5 02/19] mm: Move the page fragment allocator from page_alloc.c into its own file Date: Thu, 6 Apr 2023 10:42:28 +0100 Message-Id: <20230406094245.3633290-3-dhowells@redhat.com> In-Reply-To: <20230406094245.3633290-1-dhowells@redhat.com> References: <20230406094245.3633290-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762419673741818247?= X-GMAIL-MSGID: =?utf-8?q?1762419673741818247?= Move the page fragment allocator from page_alloc.c into its own file preparatory to changing it. Signed-off-by: David Howells cc: Andrew Morton cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: linux-mm@kvack.org cc: netdev@vger.kernel.org --- mm/Makefile | 2 +- mm/page_alloc.c | 126 ----------------------------------------- mm/page_frag_alloc.c | 131 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 132 insertions(+), 127 deletions(-) create mode 100644 mm/page_frag_alloc.c diff --git a/mm/Makefile b/mm/Makefile index 8e105e5b3e29..4e6dc12b4cbd 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -52,7 +52,7 @@ obj-y := filemap.o mempool.o oom_kill.o fadvise.o \ readahead.o swap.o truncate.o vmscan.o shmem.o \ util.o mmzone.o vmstat.o backing-dev.o \ mm_init.o percpu.o slab_common.o \ - compaction.o \ + compaction.o page_frag_alloc.o \ interval_tree.o list_lru.o workingset.o \ debug.o gup.o mmap_lock.o $(mmu-y) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7136c36c5d01..d751e750c14b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5695,132 +5695,6 @@ void free_pages(unsigned long addr, unsigned int order) EXPORT_SYMBOL(free_pages); -/* - * Page Fragment: - * An arbitrary-length arbitrary-offset area of memory which resides - * within a 0 or higher order page. Multiple fragments within that page - * are individually refcounted, in the page's reference counter. - * - * The page_frag functions below provide a simple allocation framework for - * page fragments. This is used by the network stack and network device - * drivers to provide a backing region of memory for use as either an - * sk_buff->head, or to be used in the "frags" portion of skb_shared_info. - */ -static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, - gfp_t gfp_mask) -{ - struct page *page = NULL; - gfp_t gfp = gfp_mask; - -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - gfp_mask |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY | - __GFP_NOMEMALLOC; - page = alloc_pages_node(NUMA_NO_NODE, gfp_mask, - PAGE_FRAG_CACHE_MAX_ORDER); - nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE; -#endif - if (unlikely(!page)) - page = alloc_pages_node(NUMA_NO_NODE, gfp, 0); - - nc->va = page ? page_address(page) : NULL; - - return page; -} - -void __page_frag_cache_drain(struct page *page, unsigned int count) -{ - VM_BUG_ON_PAGE(page_ref_count(page) == 0, page); - - if (page_ref_sub_and_test(page, count)) - free_the_page(page, compound_order(page)); -} -EXPORT_SYMBOL(__page_frag_cache_drain); - -void *page_frag_alloc_align(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask, - unsigned int align_mask) -{ - unsigned int size = PAGE_SIZE; - struct page *page; - int offset; - - if (unlikely(!nc->va)) { -refill: - page = __page_frag_cache_refill(nc, gfp_mask); - if (!page) - return NULL; - -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - /* if size can vary use size else just use PAGE_SIZE */ - size = nc->size; -#endif - /* Even if we own the page, we do not use atomic_set(). - * This would break get_page_unless_zero() users. - */ - page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE); - - /* reset page count bias and offset to start of new frag */ - nc->pfmemalloc = page_is_pfmemalloc(page); - nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; - nc->offset = size; - } - - offset = nc->offset - fragsz; - if (unlikely(offset < 0)) { - page = virt_to_page(nc->va); - - if (!page_ref_sub_and_test(page, nc->pagecnt_bias)) - goto refill; - - if (unlikely(nc->pfmemalloc)) { - free_the_page(page, compound_order(page)); - goto refill; - } - -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - /* if size can vary use size else just use PAGE_SIZE */ - size = nc->size; -#endif - /* OK, page count is 0, we can safely set it */ - set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1); - - /* reset page count bias and offset to start of new frag */ - nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; - offset = size - fragsz; - if (unlikely(offset < 0)) { - /* - * The caller is trying to allocate a fragment - * with fragsz > PAGE_SIZE but the cache isn't big - * enough to satisfy the request, this may - * happen in low memory conditions. - * We don't release the cache page because - * it could make memory pressure worse - * so we simply return NULL here. - */ - return NULL; - } - } - - nc->pagecnt_bias--; - offset &= align_mask; - nc->offset = offset; - - return nc->va + offset; -} -EXPORT_SYMBOL(page_frag_alloc_align); - -/* - * Frees a page fragment allocated out of either a compound or order 0 page. - */ -void page_frag_free(void *addr) -{ - struct page *page = virt_to_head_page(addr); - - if (unlikely(put_page_testzero(page))) - free_the_page(page, compound_order(page)); -} -EXPORT_SYMBOL(page_frag_free); - static void *make_alloc_exact(unsigned long addr, unsigned int order, size_t size) { diff --git a/mm/page_frag_alloc.c b/mm/page_frag_alloc.c new file mode 100644 index 000000000000..bee95824ef8f --- /dev/null +++ b/mm/page_frag_alloc.c @@ -0,0 +1,131 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Page fragment allocator + * + * Page Fragment: + * An arbitrary-length arbitrary-offset area of memory which resides within a + * 0 or higher order page. Multiple fragments within that page are + * individually refcounted, in the page's reference counter. + * + * The page_frag functions provide a simple allocation framework for page + * fragments. This is used by the network stack and network device drivers to + * provide a backing region of memory for use as either an sk_buff->head, or to + * be used in the "frags" portion of skb_shared_info. + */ + +#include +#include +#include + +static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, + gfp_t gfp_mask) +{ + struct page *page = NULL; + gfp_t gfp = gfp_mask; + +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + gfp_mask |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY | + __GFP_NOMEMALLOC; + page = alloc_pages_node(NUMA_NO_NODE, gfp_mask, + PAGE_FRAG_CACHE_MAX_ORDER); + nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE; +#endif + if (unlikely(!page)) + page = alloc_pages_node(NUMA_NO_NODE, gfp, 0); + + nc->va = page ? page_address(page) : NULL; + + return page; +} + +void __page_frag_cache_drain(struct page *page, unsigned int count) +{ + VM_BUG_ON_PAGE(page_ref_count(page) == 0, page); + + if (page_ref_sub_and_test(page, count - 1)) + __free_pages(page, compound_order(page)); +} +EXPORT_SYMBOL(__page_frag_cache_drain); + +void *page_frag_alloc_align(struct page_frag_cache *nc, + unsigned int fragsz, gfp_t gfp_mask, + unsigned int align_mask) +{ + unsigned int size = PAGE_SIZE; + struct page *page; + int offset; + + if (unlikely(!nc->va)) { +refill: + page = __page_frag_cache_refill(nc, gfp_mask); + if (!page) + return NULL; + +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + /* if size can vary use size else just use PAGE_SIZE */ + size = nc->size; +#endif + /* Even if we own the page, we do not use atomic_set(). + * This would break get_page_unless_zero() users. + */ + page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE); + + /* reset page count bias and offset to start of new frag */ + nc->pfmemalloc = page_is_pfmemalloc(page); + nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; + nc->offset = size; + } + + offset = nc->offset - fragsz; + if (unlikely(offset < 0)) { + page = virt_to_page(nc->va); + + if (page_ref_count(page) != nc->pagecnt_bias) + goto refill; + if (unlikely(nc->pfmemalloc)) { + page_ref_sub(page, nc->pagecnt_bias - 1); + __free_pages(page, compound_order(page)); + goto refill; + } + +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + /* if size can vary use size else just use PAGE_SIZE */ + size = nc->size; +#endif + /* OK, page count is 0, we can safely set it */ + set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1); + + /* reset page count bias and offset to start of new frag */ + nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; + offset = size - fragsz; + if (unlikely(offset < 0)) { + /* + * The caller is trying to allocate a fragment + * with fragsz > PAGE_SIZE but the cache isn't big + * enough to satisfy the request, this may + * happen in low memory conditions. + * We don't release the cache page because + * it could make memory pressure worse + * so we simply return NULL here. + */ + return NULL; + } + } + + nc->pagecnt_bias--; + offset &= align_mask; + nc->offset = offset; + + return nc->va + offset; +} +EXPORT_SYMBOL(page_frag_alloc_align); + +/* + * Frees a page fragment allocated out of either a compound or order 0 page. + */ +void page_frag_free(void *addr) +{ + struct page *page = virt_to_head_page(addr); + + __free_pages(page, compound_order(page)); +} +EXPORT_SYMBOL(page_frag_free); From patchwork Thu Apr 6 09:42:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 80165 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp896768vqo; Thu, 6 Apr 2023 02:50:22 -0700 (PDT) X-Google-Smtp-Source: AKy350ZQ3LM8SVfmdTZ6D4bF2fDak+fM5AmPCeurbieJcQRuyF9R1mDTAiFWcMtvzHgDrX3hQd+L X-Received: by 2002:a17:903:3094:b0:1a1:c98c:bd4d with SMTP id u20-20020a170903309400b001a1c98cbd4dmr7671518plc.38.1680774622301; Thu, 06 Apr 2023 02:50:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680774622; cv=none; d=google.com; s=arc-20160816; b=YONaRltahYcnHAPRbiNB2bncj3jjoOL7Kn0FAHg1XSFN/tJg5ANmxA2loyi9nQc8HQ tsAGmRiKSIkFXbdzXu8vwDKongFmoZMv3VjhlVSqBjhjxlK7lbPTvdeZEo+GHBl7gXas RgMU2AsR/dxB5akOdMEVpFWuxLWMcmoe7h2c0hVK3njs7wrP1sL/0bOpiyJugEeXPWqJ 5kQJZGDKOXav5X0+6r6ZylI8KQnInEprLMA+To/wdc1sFj2Jim2BDNpVKRCN3pxZ6WHU bq6bTda/AWBJfwEoIFRlm3Iv5lWmxwrWcwrXzAVA3OSftAjWn2djFuDlGjilhsQq5PUa vR7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=nVZzQDZ8oCv/Z/RSubd/I0Q5tsMRDz1XKi4/e/H81VM=; b=DVM/H4UJGjq2aTY8Ocr0Z/ZuwyY1+jMdTYd4c3kg2OFxfgYHBQuHam/9jyh6MVe6W5 mOHGKlrHID/1Ce92WJgt0oSSw73iiknPjeAt6ISefELbZNTIYzZywOYTLSMu88cgOCja co1e4ln6lhZv7svSoKD1ufsaEcy5QrzKn26JNTIdgawSoxYhMaphBrK/97FQfZy92E6w DL7RZnuLfqI4qOLMQshOqF/e2MK36iiFc7/2t7SBzCbJrf5S9lb8L2q8FVhdvUXVD+Os /l8U6vZr23/RdYXCRHult1+awcMahkzlvu20o4A4KeU15Blqd0449HyYJJ/MoFgb2Pv1 ljqA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Cvl8nkD0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i30-20020a63585e000000b005136987a02esi856877pgm.728.2023.04.06.02.50.08; Thu, 06 Apr 2023 02:50:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Cvl8nkD0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236572AbjDFJoE (ORCPT + 99 others); Thu, 6 Apr 2023 05:44:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51878 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236526AbjDFJn5 (ORCPT ); Thu, 6 Apr 2023 05:43:57 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 31A427DA4 for ; Thu, 6 Apr 2023 02:43:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680774186; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nVZzQDZ8oCv/Z/RSubd/I0Q5tsMRDz1XKi4/e/H81VM=; b=Cvl8nkD0OfDwLKpexBdjuNklI7fZAJv0YlAPgAS8ghnJ575EjQFc53nZtlnO7y3DgsSeUW 6ZZ6dBqDtk0kL/nUHFbmNlf7bm+8DV/z6eXG3FedBq1ixLtDNzg1Tqfr33ixfz+OU6b9A2 JQ9rn0ForKSpWwP2XXj6P4Bfnp4J+qU= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-385-efyz_S0XOm-fe6KgdgJSmg-1; Thu, 06 Apr 2023 05:43:03 -0400 X-MC-Unique: efyz_S0XOm-fe6KgdgJSmg-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 873EC185A78B; Thu, 6 Apr 2023 09:43:00 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 71734492C14; Thu, 6 Apr 2023 09:42:56 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , Matthew Wilcox , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Jeroen de Borst , Catherine Sullivan , Shailend Chand , Felix Fietkau , John Crispin , Sean Wang , Mark Lee , Lorenzo Bianconi , Matthias Brugger , AngeloGioacchino Del Regno , Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , Chaitanya Kulkarni , Andrew Morton , linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, linux-nvme@lists.infradead.org Subject: [PATCH net-next v5 03/19] mm: Make the page_frag_cache allocator use multipage folios Date: Thu, 6 Apr 2023 10:42:29 +0100 Message-Id: <20230406094245.3633290-4-dhowells@redhat.com> In-Reply-To: <20230406094245.3633290-1-dhowells@redhat.com> References: <20230406094245.3633290-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762419930622174805?= X-GMAIL-MSGID: =?utf-8?q?1762419930622174805?= Change the page_frag_cache allocator to use multipage folios rather than groups of pages. This reduces page_frag_free to just a folio_put() or put_page(). Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Jeroen de Borst cc: Catherine Sullivan cc: Shailend Chand cc: Felix Fietkau cc: John Crispin cc: Sean Wang cc: Mark Lee cc: Lorenzo Bianconi cc: Matthias Brugger cc: AngeloGioacchino Del Regno cc: Keith Busch cc: Jens Axboe cc: Christoph Hellwig cc: Sagi Grimberg cc: Chaitanya Kulkarni cc: Andrew Morton cc: Matthew Wilcox cc: netdev@vger.kernel.org cc: linux-arm-kernel@lists.infradead.org cc: linux-mediatek@lists.infradead.org cc: linux-nvme@lists.infradead.org cc: linux-mm@kvack.org --- drivers/net/ethernet/google/gve/gve_main.c | 11 +-- drivers/net/ethernet/mediatek/mtk_wed_wo.c | 15 +-- drivers/nvme/host/tcp.c | 7 +- drivers/nvme/target/tcp.c | 5 +- include/linux/gfp.h | 1 + include/linux/mm_types.h | 13 +-- mm/page_frag_alloc.c | 101 +++++++++++---------- 7 files changed, 65 insertions(+), 88 deletions(-) diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c index 57ce74315eba..b2fc1a3e6340 100644 --- a/drivers/net/ethernet/google/gve/gve_main.c +++ b/drivers/net/ethernet/google/gve/gve_main.c @@ -1263,17 +1263,10 @@ static void gve_unreg_xdp_info(struct gve_priv *priv) static void gve_drain_page_cache(struct gve_priv *priv) { - struct page_frag_cache *nc; int i; - for (i = 0; i < priv->rx_cfg.num_queues; i++) { - nc = &priv->rx[i].page_cache; - if (nc->va) { - __page_frag_cache_drain(virt_to_page(nc->va), - nc->pagecnt_bias); - nc->va = NULL; - } - } + for (i = 0; i < priv->rx_cfg.num_queues; i++) + page_frag_cache_clear(&priv->rx[i].page_cache); } static int gve_open(struct net_device *dev) diff --git a/drivers/net/ethernet/mediatek/mtk_wed_wo.c b/drivers/net/ethernet/mediatek/mtk_wed_wo.c index 69fba29055e9..6ce532217777 100644 --- a/drivers/net/ethernet/mediatek/mtk_wed_wo.c +++ b/drivers/net/ethernet/mediatek/mtk_wed_wo.c @@ -286,7 +286,6 @@ mtk_wed_wo_queue_free(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q) static void mtk_wed_wo_queue_tx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q) { - struct page *page; int i; for (i = 0; i < q->n_desc; i++) { @@ -298,12 +297,7 @@ mtk_wed_wo_queue_tx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q) entry->buf = NULL; } - if (!q->cache.va) - return; - - page = virt_to_page(q->cache.va); - __page_frag_cache_drain(page, q->cache.pagecnt_bias); - memset(&q->cache, 0, sizeof(q->cache)); + page_frag_cache_clear(&q->cache); } static void @@ -320,12 +314,7 @@ mtk_wed_wo_queue_rx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q) skb_free_frag(buf); } - if (!q->cache.va) - return; - - page = virt_to_page(q->cache.va); - __page_frag_cache_drain(page, q->cache.pagecnt_bias); - memset(&q->cache, 0, sizeof(q->cache)); + page_frag_cache_clear(&q->cache); } static void diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 42c0598c31f2..76f12ac714b0 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -1323,12 +1323,7 @@ static void nvme_tcp_free_queue(struct nvme_ctrl *nctrl, int qid) if (queue->hdr_digest || queue->data_digest) nvme_tcp_free_crypto(queue); - if (queue->pf_cache.va) { - page = virt_to_head_page(queue->pf_cache.va); - __page_frag_cache_drain(page, queue->pf_cache.pagecnt_bias); - queue->pf_cache.va = NULL; - } - + page_frag_cache_clear(&queue->pf_cache); noreclaim_flag = memalloc_noreclaim_save(); sock_release(queue->sock); memalloc_noreclaim_restore(noreclaim_flag); diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index 66e8f9fd0ca7..ae871c31cf00 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -1438,7 +1438,6 @@ static void nvmet_tcp_free_cmd_data_in_buffers(struct nvmet_tcp_queue *queue) static void nvmet_tcp_release_queue_work(struct work_struct *w) { - struct page *page; struct nvmet_tcp_queue *queue = container_of(w, struct nvmet_tcp_queue, release_work); @@ -1460,9 +1459,7 @@ static void nvmet_tcp_release_queue_work(struct work_struct *w) if (queue->hdr_digest || queue->data_digest) nvmet_tcp_free_crypto(queue); ida_free(&nvmet_tcp_queue_ida, queue->idx); - - page = virt_to_head_page(queue->pf_cache.va); - __page_frag_cache_drain(page, queue->pf_cache.pagecnt_bias); + page_frag_cache_clear(&queue->pf_cache); kfree(queue); } diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 65a78773dcca..5e15384798eb 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -313,6 +313,7 @@ static inline void *page_frag_alloc(struct page_frag_cache *nc, { return page_frag_alloc_align(nc, fragsz, gfp_mask, ~0u); } +void page_frag_cache_clear(struct page_frag_cache *nc); extern void page_frag_free(void *addr); diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 0722859c3647..49a70b3f44a9 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -420,18 +420,13 @@ static inline void *folio_get_private(struct folio *folio) } struct page_frag_cache { - void * va; -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - __u16 offset; - __u16 size; -#else - __u32 offset; -#endif + struct folio *folio; + unsigned int offset; /* we maintain a pagecount bias, so that we dont dirty cache line * containing page->_refcount every time we allocate a fragment. */ - unsigned int pagecnt_bias; - bool pfmemalloc; + unsigned int pagecnt_bias; + bool pfmemalloc; }; typedef unsigned long vm_flags_t; diff --git a/mm/page_frag_alloc.c b/mm/page_frag_alloc.c index bee95824ef8f..9b138cb0e3a4 100644 --- a/mm/page_frag_alloc.c +++ b/mm/page_frag_alloc.c @@ -16,88 +16,95 @@ #include #include -static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, - gfp_t gfp_mask) +/* + * Allocate a new folio for the frag cache. + */ +static struct folio *page_frag_cache_refill(struct page_frag_cache *nc, + gfp_t gfp_mask) { - struct page *page = NULL; + struct folio *folio = NULL; gfp_t gfp = gfp_mask; #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - gfp_mask |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY | - __GFP_NOMEMALLOC; - page = alloc_pages_node(NUMA_NO_NODE, gfp_mask, - PAGE_FRAG_CACHE_MAX_ORDER); - nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE; + gfp_mask |= __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC; + folio = folio_alloc(gfp_mask, PAGE_FRAG_CACHE_MAX_ORDER); #endif - if (unlikely(!page)) - page = alloc_pages_node(NUMA_NO_NODE, gfp, 0); - - nc->va = page ? page_address(page) : NULL; + if (unlikely(!folio)) + folio = folio_alloc(gfp, 0); - return page; + if (folio) + nc->folio = folio; + return folio; } void __page_frag_cache_drain(struct page *page, unsigned int count) { - VM_BUG_ON_PAGE(page_ref_count(page) == 0, page); + struct folio *folio = page_folio(page); - if (page_ref_sub_and_test(page, count - 1)) - __free_pages(page, compound_order(page)); + VM_BUG_ON_FOLIO(folio_ref_count(folio) == 0, folio); + + folio_put_refs(folio, count); } EXPORT_SYMBOL(__page_frag_cache_drain); +void page_frag_cache_clear(struct page_frag_cache *nc) +{ + struct folio *folio = nc->folio; + + if (folio) { + VM_BUG_ON_FOLIO(folio_ref_count(folio) == 0, folio); + folio_put_refs(folio, nc->pagecnt_bias); + nc->folio = NULL; + } + +} +EXPORT_SYMBOL(page_frag_cache_clear); + void *page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask, unsigned int align_mask) { - unsigned int size = PAGE_SIZE; - struct page *page; - int offset; + struct folio *folio = nc->folio; + size_t offset; - if (unlikely(!nc->va)) { + if (unlikely(!folio)) { refill: - page = __page_frag_cache_refill(nc, gfp_mask); - if (!page) + folio = page_frag_cache_refill(nc, gfp_mask); + if (!folio) return NULL; -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - /* if size can vary use size else just use PAGE_SIZE */ - size = nc->size; -#endif /* Even if we own the page, we do not use atomic_set(). * This would break get_page_unless_zero() users. */ - page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE); + folio_ref_add(folio, PAGE_FRAG_CACHE_MAX_SIZE); /* reset page count bias and offset to start of new frag */ - nc->pfmemalloc = page_is_pfmemalloc(page); + nc->pfmemalloc = folio_is_pfmemalloc(folio); nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; - nc->offset = size; + nc->offset = folio_size(folio); } - offset = nc->offset - fragsz; - if (unlikely(offset < 0)) { - page = virt_to_page(nc->va); - - if (page_ref_count(page) != nc->pagecnt_bias) + offset = nc->offset; + if (unlikely(fragsz > offset)) { + /* Reuse the folio if everyone we gave it to has finished with it. */ + if (!folio_ref_sub_and_test(folio, nc->pagecnt_bias)) { + nc->folio = NULL; goto refill; + } + if (unlikely(nc->pfmemalloc)) { - page_ref_sub(page, nc->pagecnt_bias - 1); - __free_pages(page, compound_order(page)); + __folio_put(folio); + nc->folio = NULL; goto refill; } -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - /* if size can vary use size else just use PAGE_SIZE */ - size = nc->size; -#endif /* OK, page count is 0, we can safely set it */ - set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1); + folio_set_count(folio, PAGE_FRAG_CACHE_MAX_SIZE + 1); /* reset page count bias and offset to start of new frag */ nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; - offset = size - fragsz; - if (unlikely(offset < 0)) { + offset = folio_size(folio); + if (unlikely(fragsz > offset)) { /* * The caller is trying to allocate a fragment * with fragsz > PAGE_SIZE but the cache isn't big @@ -107,15 +114,17 @@ void *page_frag_alloc_align(struct page_frag_cache *nc, * it could make memory pressure worse * so we simply return NULL here. */ + nc->offset = offset; return NULL; } } nc->pagecnt_bias--; + offset -= fragsz; offset &= align_mask; nc->offset = offset; - return nc->va + offset; + return folio_address(folio) + offset; } EXPORT_SYMBOL(page_frag_alloc_align); @@ -124,8 +133,6 @@ EXPORT_SYMBOL(page_frag_alloc_align); */ void page_frag_free(void *addr) { - struct page *page = virt_to_head_page(addr); - - __free_pages(page, compound_order(page)); + folio_put(virt_to_folio(addr)); } EXPORT_SYMBOL(page_frag_free); From patchwork Thu Apr 6 09:42:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 80154 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp895033vqo; Thu, 6 Apr 2023 02:46:04 -0700 (PDT) X-Google-Smtp-Source: AKy350YuwBvynTDfxhcV6KjsReRr6x2Gyj6hg1oYWUyIi3NYHafJt2jGvLpVjKlQBc9C4wRm9s+y X-Received: by 2002:a05:6a20:a50c:b0:da:6e45:6056 with SMTP id bd12-20020a056a20a50c00b000da6e456056mr1777835pzb.43.1680774364171; Thu, 06 Apr 2023 02:46:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680774364; cv=none; d=google.com; s=arc-20160816; b=jTM/hF/zxX4B8Jp+c/4NLQnlak+sszPlDnRjS6Ddv8JzbGx4m8ke4UuOq9HmWJ7IeB k3gJ1+oVN7+duKQIQsoKLlJxORo1+BE2GaRJbkkp4vSroQtCfBYMq/9rt1Xi6m2/3qWa wgSs7sHq+a5GIVnL9K/4pSUi5kwkDXFS+WuxMaThmcsaeVy9eBQCz5c31LzMMWP7XWhw lAzMEvpY27jrsaEylMydVO7aF7eNQGRv8QOhExuqVuMEijLxYCshmkLjk0MaZBgzcijg KrioAUZzMs+fWPwgdeLPlQR2kTrAf55TdpHxQFeEmulLcUs/ZfKeokNIj2cvvu28NhVt FZRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=vT0qsSDBYT0X8eSrHsK7EpxTL/1ZdmjGeRARxkvOlLA=; b=JePdvSPcCeDPZNd/q2E6Wru+G32X9afOAry/9GFaRYZWSPw5FVJFAyj0X0jHACs07t JMSr2Oy12VRKKEuevqHy+koIPQpO0zQcJR4ogg6TJ+tnstfkUKnOwQmqDT6ZIofNEou2 oXmMxP8SJIf2JwserJw/AlLm8+pzal5OCoSy2mP7ZDkJIGbaxNqU7h+Y9E37C2pLMjWX r4WaB5KpC2HrHT4FWBcLcD1vHF/xX6rSq8J5Fh2mPVqKPQ4hpeu7gWGfVY1xHTPuvXmf w5aTqb+cik+utOKrfB3be/S9f4OP0X+6bfM6plnEpg3dPSDhIbL+eNOPTHwSKc8lxNIp bgcQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DE6D4gqy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h62-20020a638341000000b00513dbb6cdd3si895495pge.264.2023.04.06.02.45.51; Thu, 06 Apr 2023 02:46:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DE6D4gqy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236586AbjDFJoL (ORCPT + 99 others); Thu, 6 Apr 2023 05:44:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51964 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236570AbjDFJoC (ORCPT ); Thu, 6 Apr 2023 05:44:02 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2FD617DB2 for ; Thu, 6 Apr 2023 02:43:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680774191; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vT0qsSDBYT0X8eSrHsK7EpxTL/1ZdmjGeRARxkvOlLA=; b=DE6D4gqy9pH8LrykrGIok+FXUMgHG4K9vv0hQKL5+PcGhR7Na990/GEs2wdfQtN6YDUTl6 TF1GhmVKNSi+P43He6CZXLJYiR4/i4/olr/L7AkgZPjNs/IxLnePyMTHjgUjCQQkjidV3q DTd2u5qvZ3TcA6XCPlRev/2nRc+3HPM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-382-A7hI_nzRMuG-vsAxsqekbQ-1; Thu, 06 Apr 2023 05:43:06 -0400 X-MC-Unique: A7hI_nzRMuG-vsAxsqekbQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 472DF185A790; Thu, 6 Apr 2023 09:43:05 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3BBC840AE2C0; Thu, 6 Apr 2023 09:43:01 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , Matthew Wilcox , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Jeroen de Borst , Catherine Sullivan , Shailend Chand , Felix Fietkau , John Crispin , Sean Wang , Mark Lee , Lorenzo Bianconi , Matthias Brugger , AngeloGioacchino Del Regno , Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , Chaitanya Kulkarni , Andrew Morton , linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, linux-nvme@lists.infradead.org Subject: [PATCH net-next v5 04/19] mm: Make the page_frag_cache allocator use per-cpu Date: Thu, 6 Apr 2023 10:42:30 +0100 Message-Id: <20230406094245.3633290-5-dhowells@redhat.com> In-Reply-To: <20230406094245.3633290-1-dhowells@redhat.com> References: <20230406094245.3633290-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762419659147785899?= X-GMAIL-MSGID: =?utf-8?q?1762419659147785899?= Make the page_frag_cache allocator have a separate allocation bucket for each cpu to avoid racing. This means that no lock is required, other than preempt disablement, to allocate from it, though if a softirq wants to access it, then softirq disablement will need to be added. Make the NVMe, mediatek and GVE drivers pass in NULL to page_frag_cache() and use the default allocation buckets rather than defining their own. Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Jeroen de Borst cc: Catherine Sullivan cc: Shailend Chand cc: Felix Fietkau cc: John Crispin cc: Sean Wang cc: Mark Lee cc: Lorenzo Bianconi cc: Matthias Brugger cc: AngeloGioacchino Del Regno cc: Keith Busch cc: Jens Axboe cc: Christoph Hellwig cc: Sagi Grimberg cc: Chaitanya Kulkarni cc: Andrew Morton cc: Matthew Wilcox cc: netdev@vger.kernel.org cc: linux-arm-kernel@lists.infradead.org cc: linux-mediatek@lists.infradead.org cc: linux-nvme@lists.infradead.org cc: linux-mm@kvack.org --- drivers/net/ethernet/google/gve/gve.h | 1 - drivers/net/ethernet/google/gve/gve_main.c | 9 - drivers/net/ethernet/google/gve/gve_rx.c | 2 +- drivers/net/ethernet/mediatek/mtk_wed_wo.c | 8 +- drivers/net/ethernet/mediatek/mtk_wed_wo.h | 2 - drivers/nvme/host/tcp.c | 14 +- drivers/nvme/target/tcp.c | 19 +- include/linux/gfp.h | 18 +- mm/page_frag_alloc.c | 195 ++++++++++++++------- net/core/skbuff.c | 32 ++-- 10 files changed, 165 insertions(+), 135 deletions(-) diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h index e214b51d3c8b..5864d723210a 100644 --- a/drivers/net/ethernet/google/gve/gve.h +++ b/drivers/net/ethernet/google/gve/gve.h @@ -250,7 +250,6 @@ struct gve_rx_ring { struct xdp_rxq_info xdp_rxq; struct xdp_rxq_info xsk_rxq; struct xsk_buff_pool *xsk_pool; - struct page_frag_cache page_cache; /* Page cache to allocate XDP frames */ }; /* A TX desc ring entry */ diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c index b2fc1a3e6340..6c835038a8eb 100644 --- a/drivers/net/ethernet/google/gve/gve_main.c +++ b/drivers/net/ethernet/google/gve/gve_main.c @@ -1261,14 +1261,6 @@ static void gve_unreg_xdp_info(struct gve_priv *priv) } } -static void gve_drain_page_cache(struct gve_priv *priv) -{ - int i; - - for (i = 0; i < priv->rx_cfg.num_queues; i++) - page_frag_cache_clear(&priv->rx[i].page_cache); -} - static int gve_open(struct net_device *dev) { struct gve_priv *priv = netdev_priv(dev); @@ -1352,7 +1344,6 @@ static int gve_close(struct net_device *dev) netif_carrier_off(dev); if (gve_get_device_rings_ok(priv)) { gve_turndown(priv); - gve_drain_page_cache(priv); err = gve_destroy_rings(priv); if (err) goto err; diff --git a/drivers/net/ethernet/google/gve/gve_rx.c b/drivers/net/ethernet/google/gve/gve_rx.c index d1da7413dc4d..7ae8377c394f 100644 --- a/drivers/net/ethernet/google/gve/gve_rx.c +++ b/drivers/net/ethernet/google/gve/gve_rx.c @@ -634,7 +634,7 @@ static int gve_xdp_redirect(struct net_device *dev, struct gve_rx_ring *rx, total_len = headroom + SKB_DATA_ALIGN(len) + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); - frame = page_frag_alloc(&rx->page_cache, total_len, GFP_ATOMIC); + frame = page_frag_alloc(NULL, total_len, GFP_ATOMIC); if (!frame) { u64_stats_update_begin(&rx->statss); rx->xdp_alloc_fails++; diff --git a/drivers/net/ethernet/mediatek/mtk_wed_wo.c b/drivers/net/ethernet/mediatek/mtk_wed_wo.c index 6ce532217777..859f34447f2f 100644 --- a/drivers/net/ethernet/mediatek/mtk_wed_wo.c +++ b/drivers/net/ethernet/mediatek/mtk_wed_wo.c @@ -143,7 +143,7 @@ mtk_wed_wo_queue_refill(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q, dma_addr_t addr; void *buf; - buf = page_frag_alloc(&q->cache, q->buf_size, GFP_ATOMIC); + buf = page_frag_alloc(NULL, q->buf_size, GFP_ATOMIC); if (!buf) break; @@ -296,15 +296,11 @@ mtk_wed_wo_queue_tx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q) skb_free_frag(entry->buf); entry->buf = NULL; } - - page_frag_cache_clear(&q->cache); } static void mtk_wed_wo_queue_rx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q) { - struct page *page; - for (;;) { void *buf = mtk_wed_wo_dequeue(wo, q, NULL, true); @@ -313,8 +309,6 @@ mtk_wed_wo_queue_rx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q) skb_free_frag(buf); } - - page_frag_cache_clear(&q->cache); } static void diff --git a/drivers/net/ethernet/mediatek/mtk_wed_wo.h b/drivers/net/ethernet/mediatek/mtk_wed_wo.h index dbcf42ce9173..6f940db67fb8 100644 --- a/drivers/net/ethernet/mediatek/mtk_wed_wo.h +++ b/drivers/net/ethernet/mediatek/mtk_wed_wo.h @@ -210,8 +210,6 @@ struct mtk_wed_wo_queue_entry { struct mtk_wed_wo_queue { struct mtk_wed_wo_queue_regs regs; - struct page_frag_cache cache; - struct mtk_wed_wo_queue_desc *desc; dma_addr_t desc_dma; diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 76f12ac714b0..5a92236db92a 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -147,8 +147,6 @@ struct nvme_tcp_queue { __le32 exp_ddgst; __le32 recv_ddgst; - struct page_frag_cache pf_cache; - void (*state_change)(struct sock *); void (*data_ready)(struct sock *); void (*write_space)(struct sock *); @@ -482,9 +480,8 @@ static int nvme_tcp_init_request(struct blk_mq_tag_set *set, struct nvme_tcp_queue *queue = &ctrl->queues[queue_idx]; u8 hdgst = nvme_tcp_hdgst_len(queue); - req->pdu = page_frag_alloc(&queue->pf_cache, - sizeof(struct nvme_tcp_cmd_pdu) + hdgst, - GFP_KERNEL | __GFP_ZERO); + req->pdu = page_frag_alloc(NULL, sizeof(struct nvme_tcp_cmd_pdu) + hdgst, + GFP_KERNEL | __GFP_ZERO); if (!req->pdu) return -ENOMEM; @@ -1300,9 +1297,8 @@ static int nvme_tcp_alloc_async_req(struct nvme_tcp_ctrl *ctrl) struct nvme_tcp_request *async = &ctrl->async_req; u8 hdgst = nvme_tcp_hdgst_len(queue); - async->pdu = page_frag_alloc(&queue->pf_cache, - sizeof(struct nvme_tcp_cmd_pdu) + hdgst, - GFP_KERNEL | __GFP_ZERO); + async->pdu = page_frag_alloc(NULL, sizeof(struct nvme_tcp_cmd_pdu) + hdgst, + GFP_KERNEL | __GFP_ZERO); if (!async->pdu) return -ENOMEM; @@ -1312,7 +1308,6 @@ static int nvme_tcp_alloc_async_req(struct nvme_tcp_ctrl *ctrl) static void nvme_tcp_free_queue(struct nvme_ctrl *nctrl, int qid) { - struct page *page; struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl); struct nvme_tcp_queue *queue = &ctrl->queues[qid]; unsigned int noreclaim_flag; @@ -1323,7 +1318,6 @@ static void nvme_tcp_free_queue(struct nvme_ctrl *nctrl, int qid) if (queue->hdr_digest || queue->data_digest) nvme_tcp_free_crypto(queue); - page_frag_cache_clear(&queue->pf_cache); noreclaim_flag = memalloc_noreclaim_save(); sock_release(queue->sock); memalloc_noreclaim_restore(noreclaim_flag); diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index ae871c31cf00..d6cc557cc539 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -143,8 +143,6 @@ struct nvmet_tcp_queue { struct nvmet_tcp_cmd connect; - struct page_frag_cache pf_cache; - void (*data_ready)(struct sock *); void (*state_change)(struct sock *); void (*write_space)(struct sock *); @@ -1312,25 +1310,25 @@ static int nvmet_tcp_alloc_cmd(struct nvmet_tcp_queue *queue, c->queue = queue; c->req.port = queue->port->nport; - c->cmd_pdu = page_frag_alloc(&queue->pf_cache, - sizeof(*c->cmd_pdu) + hdgst, GFP_KERNEL | __GFP_ZERO); + c->cmd_pdu = page_frag_alloc(NULL, sizeof(*c->cmd_pdu) + hdgst, + GFP_KERNEL | __GFP_ZERO); if (!c->cmd_pdu) return -ENOMEM; c->req.cmd = &c->cmd_pdu->cmd; - c->rsp_pdu = page_frag_alloc(&queue->pf_cache, - sizeof(*c->rsp_pdu) + hdgst, GFP_KERNEL | __GFP_ZERO); + c->rsp_pdu = page_frag_alloc(NULL, sizeof(*c->rsp_pdu) + hdgst, + GFP_KERNEL | __GFP_ZERO); if (!c->rsp_pdu) goto out_free_cmd; c->req.cqe = &c->rsp_pdu->cqe; - c->data_pdu = page_frag_alloc(&queue->pf_cache, - sizeof(*c->data_pdu) + hdgst, GFP_KERNEL | __GFP_ZERO); + c->data_pdu = page_frag_alloc(NULL, sizeof(*c->data_pdu) + hdgst, + GFP_KERNEL | __GFP_ZERO); if (!c->data_pdu) goto out_free_rsp; - c->r2t_pdu = page_frag_alloc(&queue->pf_cache, - sizeof(*c->r2t_pdu) + hdgst, GFP_KERNEL | __GFP_ZERO); + c->r2t_pdu = page_frag_alloc(NULL, sizeof(*c->r2t_pdu) + hdgst, + GFP_KERNEL | __GFP_ZERO); if (!c->r2t_pdu) goto out_free_data; @@ -1459,7 +1457,6 @@ static void nvmet_tcp_release_queue_work(struct work_struct *w) if (queue->hdr_digest || queue->data_digest) nvmet_tcp_free_crypto(queue); ida_free(&nvmet_tcp_queue_ida, queue->idx); - page_frag_cache_clear(&queue->pf_cache); kfree(queue); } diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 5e15384798eb..b208ca315882 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -304,16 +304,18 @@ extern void free_pages(unsigned long addr, unsigned int order); struct page_frag_cache; extern void __page_frag_cache_drain(struct page *page, unsigned int count); -extern void *page_frag_alloc_align(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask, - unsigned int align_mask); - -static inline void *page_frag_alloc(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask) +extern void *page_frag_alloc_align(struct page_frag_cache __percpu *frag_cache, + size_t fragsz, gfp_t gfp, + unsigned long align_mask); +extern void *page_frag_memdup(struct page_frag_cache __percpu *frag_cache, + const void *p, size_t fragsz, gfp_t gfp, + unsigned long align_mask); + +static inline void *page_frag_alloc(struct page_frag_cache __percpu *frag_cache, + size_t fragsz, gfp_t gfp) { - return page_frag_alloc_align(nc, fragsz, gfp_mask, ~0u); + return page_frag_alloc_align(frag_cache, fragsz, gfp, ULONG_MAX); } -void page_frag_cache_clear(struct page_frag_cache *nc); extern void page_frag_free(void *addr); diff --git a/mm/page_frag_alloc.c b/mm/page_frag_alloc.c index 9b138cb0e3a4..7844398afe26 100644 --- a/mm/page_frag_alloc.c +++ b/mm/page_frag_alloc.c @@ -16,25 +16,23 @@ #include #include +static DEFINE_PER_CPU(struct page_frag_cache, page_frag_default_allocator); + /* * Allocate a new folio for the frag cache. */ -static struct folio *page_frag_cache_refill(struct page_frag_cache *nc, - gfp_t gfp_mask) +static struct folio *page_frag_cache_refill(gfp_t gfp) { - struct folio *folio = NULL; - gfp_t gfp = gfp_mask; + struct folio *folio; #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - gfp_mask |= __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC; - folio = folio_alloc(gfp_mask, PAGE_FRAG_CACHE_MAX_ORDER); + folio = folio_alloc(gfp | __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC, + PAGE_FRAG_CACHE_MAX_ORDER); + if (folio) + return folio; #endif - if (unlikely(!folio)) - folio = folio_alloc(gfp, 0); - if (folio) - nc->folio = folio; - return folio; + return folio_alloc(gfp, 0); } void __page_frag_cache_drain(struct page *page, unsigned int count) @@ -47,54 +45,68 @@ void __page_frag_cache_drain(struct page *page, unsigned int count) } EXPORT_SYMBOL(__page_frag_cache_drain); -void page_frag_cache_clear(struct page_frag_cache *nc) -{ - struct folio *folio = nc->folio; - - if (folio) { - VM_BUG_ON_FOLIO(folio_ref_count(folio) == 0, folio); - folio_put_refs(folio, nc->pagecnt_bias); - nc->folio = NULL; - } - -} -EXPORT_SYMBOL(page_frag_cache_clear); - -void *page_frag_alloc_align(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask, - unsigned int align_mask) +/** + * page_frag_alloc_align - Allocate some memory for use in zerocopy + * @frag_cache: The frag cache to use (or NULL for the default) + * @fragsz: The size of the fragment desired + * @gfp: Allocation flags under which to make an allocation + * @align_mask: The required alignment + * + * Allocate some memory for use with zerocopy where protocol bits have to be + * mixed in with spliced/zerocopied data. Unlike memory allocated from the + * slab, this memory's lifetime is purely dependent on the folio's refcount. + * + * The way it works is that a folio is allocated and fragments are broken off + * sequentially and returned to the caller with a ref until the folio no longer + * has enough spare space - at which point the allocator's ref is dropped and a + * new folio is allocated. The folio remains in existence until the last ref + * held by, say, an sk_buff is discarded and then the page is returned to the + * page allocator. + * + * Returns a pointer to the memory on success and -ENOMEM on allocation + * failure. + * + * The allocated memory should be disposed of with folio_put(). + */ +void *page_frag_alloc_align(struct page_frag_cache __percpu *frag_cache, + size_t fragsz, gfp_t gfp, unsigned long align_mask) { - struct folio *folio = nc->folio; + struct page_frag_cache *nc; + struct folio *folio, *spare = NULL; size_t offset; + void *p; - if (unlikely(!folio)) { -refill: - folio = page_frag_cache_refill(nc, gfp_mask); - if (!folio) - return NULL; + if (!frag_cache) + frag_cache = &page_frag_default_allocator; + if (WARN_ON_ONCE(fragsz == 0)) + fragsz = 1; + align_mask &= ~3UL; - /* Even if we own the page, we do not use atomic_set(). - * This would break get_page_unless_zero() users. - */ - folio_ref_add(folio, PAGE_FRAG_CACHE_MAX_SIZE); - - /* reset page count bias and offset to start of new frag */ - nc->pfmemalloc = folio_is_pfmemalloc(folio); - nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; - nc->offset = folio_size(folio); + nc = get_cpu_ptr(frag_cache); +reload: + folio = nc->folio; + offset = nc->offset; +try_again: + + /* Make the allocation if there's sufficient space. */ + if (fragsz <= offset) { + nc->pagecnt_bias--; + offset = (offset - fragsz) & align_mask; + nc->offset = offset; + p = folio_address(folio) + offset; + put_cpu_ptr(frag_cache); + if (spare) + folio_put(spare); + return p; } - offset = nc->offset; - if (unlikely(fragsz > offset)) { - /* Reuse the folio if everyone we gave it to has finished with it. */ - if (!folio_ref_sub_and_test(folio, nc->pagecnt_bias)) { - nc->folio = NULL; + /* Insufficient space - see if we can refurbish the current folio. */ + if (folio) { + if (!folio_ref_sub_and_test(folio, nc->pagecnt_bias)) goto refill; - } if (unlikely(nc->pfmemalloc)) { __folio_put(folio); - nc->folio = NULL; goto refill; } @@ -104,27 +116,56 @@ void *page_frag_alloc_align(struct page_frag_cache *nc, /* reset page count bias and offset to start of new frag */ nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; offset = folio_size(folio); - if (unlikely(fragsz > offset)) { - /* - * The caller is trying to allocate a fragment - * with fragsz > PAGE_SIZE but the cache isn't big - * enough to satisfy the request, this may - * happen in low memory conditions. - * We don't release the cache page because - * it could make memory pressure worse - * so we simply return NULL here. - */ - nc->offset = offset; + if (unlikely(fragsz > offset)) + goto frag_too_big; + goto try_again; + } + +refill: + if (!spare) { + nc->folio = NULL; + put_cpu_ptr(frag_cache); + + spare = page_frag_cache_refill(gfp); + if (!spare) return NULL; - } + + nc = get_cpu_ptr(frag_cache); + /* We may now be on a different cpu and/or someone else may + * have refilled it + */ + nc->pfmemalloc = folio_is_pfmemalloc(spare); + if (nc->folio) + goto reload; } - nc->pagecnt_bias--; - offset -= fragsz; - offset &= align_mask; + nc->folio = spare; + folio = spare; + spare = NULL; + + /* Even if we own the page, we do not use atomic_set(). This would + * break get_page_unless_zero() users. + */ + folio_ref_add(folio, PAGE_FRAG_CACHE_MAX_SIZE); + + /* Reset page count bias and offset to start of new frag */ + nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; + offset = folio_size(folio); + goto try_again; + +frag_too_big: + /* + * The caller is trying to allocate a fragment with fragsz > PAGE_SIZE + * but the cache isn't big enough to satisfy the request, this may + * happen in low memory conditions. We don't release the cache page + * because it could make memory pressure worse so we simply return NULL + * here. + */ nc->offset = offset; - - return folio_address(folio) + offset; + put_cpu_ptr(frag_cache); + if (spare) + folio_put(spare); + return NULL; } EXPORT_SYMBOL(page_frag_alloc_align); @@ -136,3 +177,25 @@ void page_frag_free(void *addr) folio_put(virt_to_folio(addr)); } EXPORT_SYMBOL(page_frag_free); + +/** + * page_frag_memdup - Allocate a page fragment and duplicate some data into it + * @frag_cache: The frag cache to use (or NULL for the default) + * @fragsz: The amount of memory to copy (maximum 1/2 page). + * @p: The source data to copy + * @gfp: Allocation flags under which to make an allocation + * @align_mask: The required alignment + */ +void *page_frag_memdup(struct page_frag_cache __percpu *frag_cache, + const void *p, size_t fragsz, gfp_t gfp, + unsigned long align_mask) +{ + void *q; + + q = page_frag_alloc_align(frag_cache, fragsz, gfp, align_mask); + if (!q) + return q; + + return memcpy(q, p, fragsz); +} +EXPORT_SYMBOL(page_frag_memdup); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 050a875d09c5..3d05ed64b606 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -222,13 +222,13 @@ static void *page_frag_alloc_1k(struct page_frag_1k *nc, gfp_t gfp_mask) #endif struct napi_alloc_cache { - struct page_frag_cache page; struct page_frag_1k page_small; unsigned int skb_count; void *skb_cache[NAPI_SKB_CACHE_SIZE]; }; static DEFINE_PER_CPU(struct page_frag_cache, netdev_alloc_cache); +static DEFINE_PER_CPU(struct page_frag_cache, napi_frag_cache); static DEFINE_PER_CPU(struct napi_alloc_cache, napi_alloc_cache); /* Double check that napi_get_frags() allocates skbs with @@ -250,11 +250,9 @@ void napi_get_frags_check(struct napi_struct *napi) void *__napi_alloc_frag_align(unsigned int fragsz, unsigned int align_mask) { - struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache); - fragsz = SKB_DATA_ALIGN(fragsz); - return page_frag_alloc_align(&nc->page, fragsz, GFP_ATOMIC, align_mask); + return page_frag_alloc_align(&napi_frag_cache, fragsz, GFP_ATOMIC, align_mask); } EXPORT_SYMBOL(__napi_alloc_frag_align); @@ -264,15 +262,12 @@ void *__netdev_alloc_frag_align(unsigned int fragsz, unsigned int align_mask) fragsz = SKB_DATA_ALIGN(fragsz); if (in_hardirq() || irqs_disabled()) { - struct page_frag_cache *nc = this_cpu_ptr(&netdev_alloc_cache); - - data = page_frag_alloc_align(nc, fragsz, GFP_ATOMIC, align_mask); + data = page_frag_alloc_align(&netdev_alloc_cache, + fragsz, GFP_ATOMIC, align_mask); } else { - struct napi_alloc_cache *nc; - local_bh_disable(); - nc = this_cpu_ptr(&napi_alloc_cache); - data = page_frag_alloc_align(&nc->page, fragsz, GFP_ATOMIC, align_mask); + data = page_frag_alloc_align(&napi_frag_cache, + fragsz, GFP_ATOMIC, align_mask); local_bh_enable(); } return data; @@ -652,7 +647,6 @@ EXPORT_SYMBOL(__alloc_skb); struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len, gfp_t gfp_mask) { - struct page_frag_cache *nc; struct sk_buff *skb; bool pfmemalloc; void *data; @@ -677,14 +671,12 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len, gfp_mask |= __GFP_MEMALLOC; if (in_hardirq() || irqs_disabled()) { - nc = this_cpu_ptr(&netdev_alloc_cache); - data = page_frag_alloc(nc, len, gfp_mask); - pfmemalloc = nc->pfmemalloc; + data = page_frag_alloc(&netdev_alloc_cache, len, gfp_mask); + pfmemalloc = folio_is_pfmemalloc(virt_to_folio(data)); } else { local_bh_disable(); - nc = this_cpu_ptr(&napi_alloc_cache.page); - data = page_frag_alloc(nc, len, gfp_mask); - pfmemalloc = nc->pfmemalloc; + data = page_frag_alloc(&napi_frag_cache, len, gfp_mask); + pfmemalloc = folio_is_pfmemalloc(virt_to_folio(data)); local_bh_enable(); } @@ -772,8 +764,8 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len, } else { len = SKB_HEAD_ALIGN(len); - data = page_frag_alloc(&nc->page, len, gfp_mask); - pfmemalloc = nc->page.pfmemalloc; + data = page_frag_alloc(&napi_frag_cache, len, gfp_mask); + pfmemalloc = folio_is_pfmemalloc(virt_to_folio(data)); } if (unlikely(!data)) From patchwork Thu Apr 6 09:42:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 80153 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp895006vqo; Thu, 6 Apr 2023 02:46:02 -0700 (PDT) X-Google-Smtp-Source: AKy350bCl3J4kM9J78lfia4WhSM70m5m25qcPErDs1xyLVcY6wqNWNMo8YVXKXewbOpl7Il+drSy X-Received: by 2002:a17:90b:17d0:b0:23d:31c3:c98d with SMTP id me16-20020a17090b17d000b0023d31c3c98dmr10828090pjb.15.1680774361787; Thu, 06 Apr 2023 02:46:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680774361; cv=none; d=google.com; s=arc-20160816; b=yBm6WJypH9H5Qwipfu0BjNMmqXCOErnUObYe4KcG/jxRWuA2zAxuTDd3l2L08fi/S7 g+VEiIwRLcbxYpfymjMa5I4U0McGIRn98LArQyaQAshhMDkdIlJLiqp9MrXMHOAj3UKg 7TH33/9KoG1fGTIcFmnUtHaEeD/CWMCTfRhfq7pxtptYOQG5NjdhgQQQAoxl/i6pgzY9 4ZEIXgtqNSnLiyfH9TBmP/KY+WCBL0IzcYMs+0M5hNlfW2aJtwLcALT2JU+4i9agAq+t Qx3YLxA0mxcGhu/DeqzZqbDbbHhXqBHaLl1e/BTeR1sJ489kWzhsy518qn6LGIDgSzO8 uLng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=RIVP3lGRuM1KX32IU3V3PATNWV0YJv7UKTSfgC7+z8Q=; b=HuAj5ll9g3rA56YxGXECHK7UDaqLAKMoqQRufjRfLpF9pRR+yMcqxHcHGUByTSR++x WmwRHipVJenhs8amH3h9IAlfCndz5B5LHikJdU74Sc9q1r/Y9ucCqApRa2BhO0tY5Bja 31fxd5Hree/uDbaJ8mvQco8ZGC2YtmRd+Ye+HIcvGAAMksjuBNA37N8xjTOKoK47R7BY C9CJ406i4airJK/R6r2gljRLw69/jECKlGdRLVMXstU38xfr9BUnYb7c3APSQ3UZ9d6h EnYVYb3DWz+jUyzoElkQzozuZq9SZgHLkKnQ4P1VJHIFe1+lwPMibYIlNUE1yaqn37Jo ZZeg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="cBH8/gtU"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id qe17-20020a17090b4f9100b00233fe9ab99bsi3751951pjb.130.2023.04.06.02.45.47; Thu, 06 Apr 2023 02:46:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="cBH8/gtU"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236599AbjDFJoH (ORCPT + 99 others); Thu, 6 Apr 2023 05:44:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51972 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236345AbjDFJn7 (ORCPT ); Thu, 6 Apr 2023 05:43:59 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D0E2A7DB6 for ; Thu, 6 Apr 2023 02:43:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680774193; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RIVP3lGRuM1KX32IU3V3PATNWV0YJv7UKTSfgC7+z8Q=; b=cBH8/gtU9r6y9BK1tZZHKcd/Xha8S/V9tA7mRwAfCg0TdbxNLqM7R16HqabsZiiPS4BOsq 54l4sWLOUMuaJOrKbAuBpUlN/+daUazaIC5zoIUcwGPTRl9EriYMfvZ4BDu2Z9dwPF6jTW BBGSHy/RluP9j5I32vtvSdLs+3uhoWA= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-6-FyvdLlUmOc2cPKFSu5UMHw-1; Thu, 06 Apr 2023 05:43:08 -0400 X-MC-Unique: FyvdLlUmOc2cPKFSu5UMHw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0036F185A78F; Thu, 6 Apr 2023 09:43:08 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id DECF11415117; Thu, 6 Apr 2023 09:43:05 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , Matthew Wilcox , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, David Ahern Subject: [PATCH net-next v5 05/19] tcp: Support MSG_SPLICE_PAGES Date: Thu, 6 Apr 2023 10:42:31 +0100 Message-Id: <20230406094245.3633290-6-dhowells@redhat.com> In-Reply-To: <20230406094245.3633290-1-dhowells@redhat.com> References: <20230406094245.3633290-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762419657438235829?= X-GMAIL-MSGID: =?utf-8?q?1762419657438235829?= Make TCP's sendmsg() support MSG_SPLICE_PAGES. This causes pages to be spliced from the source iterator. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells cc: Eric Dumazet cc: "David S. Miller" cc: David Ahern cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/ipv4/tcp.c | 67 ++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 60 insertions(+), 7 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index fd68d49490f2..510bacc7ce7b 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1221,7 +1221,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) int flags, err, copied = 0; int mss_now = 0, size_goal, copied_syn = 0; int process_backlog = 0; - bool zc = false; + int zc = 0; long timeo; flags = msg->msg_flags; @@ -1232,17 +1232,22 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) if (msg->msg_ubuf) { uarg = msg->msg_ubuf; net_zcopy_get(uarg); - zc = sk->sk_route_caps & NETIF_F_SG; + if (sk->sk_route_caps & NETIF_F_SG) + zc = 1; } else if (sock_flag(sk, SOCK_ZEROCOPY)) { uarg = msg_zerocopy_realloc(sk, size, skb_zcopy(skb)); if (!uarg) { err = -ENOBUFS; goto out_err; } - zc = sk->sk_route_caps & NETIF_F_SG; - if (!zc) + if (sk->sk_route_caps & NETIF_F_SG) + zc = 1; + else uarg_to_msgzc(uarg)->zerocopy = 0; } + } else if (unlikely(msg->msg_flags & MSG_SPLICE_PAGES) && size) { + if (sk->sk_route_caps & NETIF_F_SG) + zc = 2; } if (unlikely(flags & MSG_FASTOPEN || inet_sk(sk)->defer_connect) && @@ -1305,7 +1310,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) goto do_error; while (msg_data_left(msg)) { - int copy = 0; + ssize_t copy = 0; skb = tcp_write_queue_tail(sk); if (skb) @@ -1346,7 +1351,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) if (copy > msg_data_left(msg)) copy = msg_data_left(msg); - if (!zc) { + if (zc == 0) { bool merge = true; int i = skb_shinfo(skb)->nr_frags; struct page_frag *pfrag = sk_page_frag(sk); @@ -1391,7 +1396,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) page_ref_inc(pfrag->page); } pfrag->offset += copy; - } else { + } else if (zc == 1) { /* First append to a fragless skb builds initial * pure zerocopy skb */ @@ -1412,6 +1417,54 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) if (err < 0) goto do_error; copy = err; + } else if (zc == 2) { + /* Splice in data. */ + struct page *page = NULL, **pages = &page; + size_t off = 0, part; + bool can_coalesce; + int i = skb_shinfo(skb)->nr_frags; + + copy = iov_iter_extract_pages(&msg->msg_iter, &pages, + copy, 1, 0, &off); + if (copy <= 0) { + err = copy ?: -EIO; + goto do_error; + } + + can_coalesce = skb_can_coalesce(skb, i, page, off); + if (!can_coalesce && i >= READ_ONCE(sysctl_max_skb_frags)) { + tcp_mark_push(tp, skb); + iov_iter_revert(&msg->msg_iter, copy); + goto new_segment; + } + if (tcp_downgrade_zcopy_pure(sk, skb)) { + iov_iter_revert(&msg->msg_iter, copy); + goto wait_for_space; + } + + part = tcp_wmem_schedule(sk, copy); + iov_iter_revert(&msg->msg_iter, copy - part); + if (!part) + goto wait_for_space; + copy = part; + + if (can_coalesce) { + skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy); + } else { + get_page(page); + skb_fill_page_desc_noacc(skb, i, page, off, copy); + } + page = NULL; + + if (!(flags & MSG_NO_SHARED_FRAGS)) + skb_shinfo(skb)->flags |= SKBFL_SHARED_FRAG; + + skb->len += copy; + skb->data_len += copy; + skb->truesize += copy; + sk_wmem_queued_add(sk, copy); + sk_mem_charge(sk, copy); + } if (!copied) From patchwork Thu Apr 6 09:42:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 80157 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp895219vqo; Thu, 6 Apr 2023 02:46:30 -0700 (PDT) X-Google-Smtp-Source: AKy350Y7J6+ER/mm+Ayz8opFIHhqYnjzlEt+ayNiKLiY4n2Ro559qPCRf+EBxoInZ0Qp6QFuGjcE X-Received: by 2002:a62:4e49:0:b0:627:ebf3:7a97 with SMTP id c70-20020a624e49000000b00627ebf37a97mr9504795pfb.14.1680774389963; Thu, 06 Apr 2023 02:46:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680774389; cv=none; d=google.com; s=arc-20160816; b=mkvpB4SZgDvyM6M7Jr8TEhE01/yaVAQbKccaxVt1s2v4iSWG05xgFBiNxZUjZDhoFu as4Tf9cm2laPzNDViOewSSUKi0SVtOP0ECJRj5LUWZ+mqpz9LjFy7ArJrPdCQleSqQk9 kVso8uK8hP9abwsqOskYdE5DXjH5RLv8q7Vz9L7RfpTOh8vqzuylbFrR37KJj72Hv2rI 88QDD1bNmgxuNR+hC2getmMeynziJX+QYe+0SDJcml9DoR9cGOOpVsbVUF6jmJ/Jn+fi /HIerpaKKKBtD3+19CGh3NqFNkGrYuL4zjbZQdaxG4gdN0hyTHdNGzDgQPUni75Uwzlp RUsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=srNjm2pDsZ7bySZjGb738UmToW4CeRBJkACWB8iZ9Oo=; b=wJwH/IodR6dk1edIsBPMKoWaDf5Eq/98j1rxxdzZGWb5evFSAlX5Z7YI7jv0J37/ae cDOMwgfldYaAsb8PE6Eud6Gdmu/E/Vk+SoX2KbyF+TESSCeKZ63S0JGg1tYoNOwgS9XB 9tvXg0MgOfH74x3BVPFptwolFc6VNO7P8YPPUYzyHES3jSHAUn2bzxgEwoLGxx57/Bvt 1CEBTuZV5H34Gv+tQxlwE/wyCGnHRae9Hj9zZJKKHTpQbI4lsBNAkLm6FFXl+8EaDtej cPmajN7DbuzIT6WjV1aKKsKT/mtFWrHhk4t4TQX1Xyq/+r40SyUMYpqEui1q36No+ma3 D7DA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DZdnQbDH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c2-20020aa79522000000b0062dbafd7a3esi1027283pfp.100.2023.04.06.02.46.16; Thu, 06 Apr 2023 02:46:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DZdnQbDH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236608AbjDFJpC (ORCPT + 99 others); Thu, 6 Apr 2023 05:45:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52324 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236552AbjDFJod (ORCPT ); Thu, 6 Apr 2023 05:44:33 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 857637ED3 for ; Thu, 6 Apr 2023 02:43:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680774198; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=srNjm2pDsZ7bySZjGb738UmToW4CeRBJkACWB8iZ9Oo=; b=DZdnQbDHDZ81cV3GhHqMZlnKie25s2NMBQs3kLLHGpntZDouCk3BU4VEXjkZW5v3JB2LAn sJ4D1MM1WU9M114yF1GpVFcc7BJSAHehrYAyx2AOXVDKvLiWm0RSNzX52apbf6zOSnQrDM GpO13ziw0LCMvQvNGZ0uRZJsBTyvFfg= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-237-DOKvdq12P_Wm-Yg8pakj4Q-1; Thu, 06 Apr 2023 05:43:11 -0400 X-MC-Unique: DOKvdq12P_Wm-Yg8pakj4Q-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B992E38601AC; Thu, 6 Apr 2023 09:43:10 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id A7FC6492C14; Thu, 6 Apr 2023 09:43:08 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , Matthew Wilcox , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, David Ahern Subject: [PATCH net-next v5 06/19] tcp: Make sendmsg(MSG_SPLICE_PAGES) copy unspliceable data Date: Thu, 6 Apr 2023 10:42:32 +0100 Message-Id: <20230406094245.3633290-7-dhowells@redhat.com> In-Reply-To: <20230406094245.3633290-1-dhowells@redhat.com> References: <20230406094245.3633290-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762419686981169978?= X-GMAIL-MSGID: =?utf-8?q?1762419686981169978?= If sendmsg() with MSG_SPLICE_PAGES encounters a page that shouldn't be spliced - a slab page, for instance, or one with a zero count - make tcp_sendmsg() copy it. Signed-off-by: David Howells cc: Eric Dumazet cc: "David S. Miller" cc: David Ahern cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/ipv4/tcp.c | 28 +++++++++++++++++++++++++--- 1 file changed, 25 insertions(+), 3 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 510bacc7ce7b..238a8ad6527c 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1418,10 +1418,10 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) goto do_error; copy = err; } else if (zc == 2) { - /* Splice in data. */ + /* Splice in data if we can; copy if we can't. */ struct page *page = NULL, **pages = &page; size_t off = 0, part; - bool can_coalesce; + bool can_coalesce, put = false; int i = skb_shinfo(skb)->nr_frags; copy = iov_iter_extract_pages(&msg->msg_iter, &pages, @@ -1448,12 +1448,34 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) goto wait_for_space; copy = part; + if (!sendpage_ok(page)) { + const void *p = kmap_local_page(page); + void *q; + + q = page_frag_memdup(NULL, p + off, copy, + sk->sk_allocation, ULONG_MAX); + kunmap_local(p); + if (!q) { + iov_iter_revert(&msg->msg_iter, copy); + err = copy ?: -ENOMEM; + goto do_error; + } + page = virt_to_page(q); + off = offset_in_page(q); + put = true; + can_coalesce = false; + } + if (can_coalesce) { skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy); } else { - get_page(page); + if (!put) + get_page(page); + put = false; skb_fill_page_desc_noacc(skb, i, page, off, copy); } + if (put) + put_page(page); page = NULL; if (!(flags & MSG_NO_SHARED_FRAGS)) From patchwork Thu Apr 6 09:42:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 80167 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp897553vqo; Thu, 6 Apr 2023 02:52:19 -0700 (PDT) X-Google-Smtp-Source: AKy350a8DkiAxvc0vcavjeSquDyY59YQEfwaxGtdTvrWQvLGBWHJsEin/MeCYfwuvAiV9BA2wK6w X-Received: by 2002:a17:906:854c:b0:931:f9f8:d4ea with SMTP id h12-20020a170906854c00b00931f9f8d4eamr5402944ejy.53.1680774739344; Thu, 06 Apr 2023 02:52:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680774739; cv=none; d=google.com; s=arc-20160816; b=FmXEWGiKy611PBfgDcw2H+1euGqpiMGIGoBmWkolLlctiXoVgLIFatb7u8+JSbIDE7 oJ6K8lfCnq2RBUFVjMaYjoy4lTrRafHyDmGtK/W7TaUNcNIKoMbppvxSPPgTRtOrMp/P zyGMmYEPXP9ssVM+zW8b9nmdsC/fSIuijj0vKltad7W9Os2BoC6usTsq58k+3f+gpRQM Ip+msdhX1q0YoiwXyk8I9d4LyXJXAmu2lU7icNBsu4Ul+XV6P3mDgMNX22d4UoSx2Pqq C+8hIpMhwou/PP+cVdwWj/uFABybc62oG78NeL8pb9y5OJaZ8pl8zm5Jjl2Xe29LPNPZ tAtw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=+Ie/FfVlAXyyE3lC0L+B6Yd0gloGht93FV/bBiMRDfo=; b=M5XWqGuzdpGORNIhBzXR5WTD60VpDHfoT+O7jxXYys3a7L52/UB/3c8TqwqPWX6+sF RtuopHAeCJz2MH1ZP9zuamglMn3HcDYYKI1KKDkH4Q6Cih2lEBdAMPL7rm83IuGi19Yu EaFs00PtbWww/xn4friyLvY2m7TGhdyzJ5aJy6a5nvSe/oxX24KaVc4tb+vcJkhzOx3Q rmr5v+lk099VnU4fEY2CFCfWMQZ16N5/RRZJSmEEhD+A1fKbNIspbyZOmn85OFSStOhV /dx4HhxF6n/7Ml6TZzQnkxhZh3n3P/57ub+qSb5qkmJkMG4wYEqH7LiDnhPpJuPoQVCm 8CIA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Wmx4DUDm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u20-20020a170906c41400b00928fca4e16csi872323ejz.962.2023.04.06.02.51.55; Thu, 06 Apr 2023 02:52:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Wmx4DUDm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236785AbjDFJo7 (ORCPT + 99 others); Thu, 6 Apr 2023 05:44:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52320 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235848AbjDFJod (ORCPT ); Thu, 6 Apr 2023 05:44:33 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 45FE77ED0 for ; Thu, 6 Apr 2023 02:43:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680774197; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+Ie/FfVlAXyyE3lC0L+B6Yd0gloGht93FV/bBiMRDfo=; b=Wmx4DUDmx1TMpiql/s2R+UFE7pA6xBqAZspaoqv1/3MhvGzKmXupyJeVk2n/QiwKoqPy96 iz7zCb8RLss5gcg9zagFX7DasQSUARAzjSmB0JwbjIBXuRlDjKN8v0xMBVajYUwTubsWiB wf1RdeSX4nMY/V9Dp01VTq8+fa7AUE4= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-117-jsdlZo8rMCGqC7FKxuV6Bg-1; Thu, 06 Apr 2023 05:43:14 -0400 X-MC-Unique: jsdlZo8rMCGqC7FKxuV6Bg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 684AC1C0433B; Thu, 6 Apr 2023 09:43:13 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 55D182166B26; Thu, 6 Apr 2023 09:43:11 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , Matthew Wilcox , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, David Ahern Subject: [PATCH net-next v5 07/19] tcp: Convert do_tcp_sendpages() to use MSG_SPLICE_PAGES Date: Thu, 6 Apr 2023 10:42:33 +0100 Message-Id: <20230406094245.3633290-8-dhowells@redhat.com> In-Reply-To: <20230406094245.3633290-1-dhowells@redhat.com> References: <20230406094245.3633290-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762420052878596867?= X-GMAIL-MSGID: =?utf-8?q?1762420052878596867?= Convert do_tcp_sendpages() to use sendmsg() with MSG_SPLICE_PAGES rather than directly splicing in the pages itself. do_tcp_sendpages() can then be inlined in subsequent patches into its callers. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells cc: Eric Dumazet cc: "David S. Miller" cc: David Ahern cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/ipv4/tcp.c | 158 +++---------------------------------------------- 1 file changed, 7 insertions(+), 151 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 238a8ad6527c..a8a4ace8b3da 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -972,163 +972,19 @@ static int tcp_wmem_schedule(struct sock *sk, int copy) return min(copy, sk->sk_forward_alloc); } -static struct sk_buff *tcp_build_frag(struct sock *sk, int size_goal, int flags, - struct page *page, int offset, size_t *size) -{ - struct sk_buff *skb = tcp_write_queue_tail(sk); - struct tcp_sock *tp = tcp_sk(sk); - bool can_coalesce; - int copy, i; - - if (!skb || (copy = size_goal - skb->len) <= 0 || - !tcp_skb_can_collapse_to(skb)) { -new_segment: - if (!sk_stream_memory_free(sk)) - return NULL; - - skb = tcp_stream_alloc_skb(sk, 0, sk->sk_allocation, - tcp_rtx_and_write_queues_empty(sk)); - if (!skb) - return NULL; - -#ifdef CONFIG_TLS_DEVICE - skb->decrypted = !!(flags & MSG_SENDPAGE_DECRYPTED); -#endif - tcp_skb_entail(sk, skb); - copy = size_goal; - } - - if (copy > *size) - copy = *size; - - i = skb_shinfo(skb)->nr_frags; - can_coalesce = skb_can_coalesce(skb, i, page, offset); - if (!can_coalesce && i >= READ_ONCE(sysctl_max_skb_frags)) { - tcp_mark_push(tp, skb); - goto new_segment; - } - if (tcp_downgrade_zcopy_pure(sk, skb)) - return NULL; - - copy = tcp_wmem_schedule(sk, copy); - if (!copy) - return NULL; - - if (can_coalesce) { - skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy); - } else { - get_page(page); - skb_fill_page_desc_noacc(skb, i, page, offset, copy); - } - - if (!(flags & MSG_NO_SHARED_FRAGS)) - skb_shinfo(skb)->flags |= SKBFL_SHARED_FRAG; - - skb->len += copy; - skb->data_len += copy; - skb->truesize += copy; - sk_wmem_queued_add(sk, copy); - sk_mem_charge(sk, copy); - WRITE_ONCE(tp->write_seq, tp->write_seq + copy); - TCP_SKB_CB(skb)->end_seq += copy; - tcp_skb_pcount_set(skb, 0); - - *size = copy; - return skb; -} - ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, size_t size, int flags) { - struct tcp_sock *tp = tcp_sk(sk); - int mss_now, size_goal; - int err; - ssize_t copied; - long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT); - - if (IS_ENABLED(CONFIG_DEBUG_VM) && - WARN_ONCE(!sendpage_ok(page), - "page must not be a Slab one and have page_count > 0")) - return -EINVAL; - - /* Wait for a connection to finish. One exception is TCP Fast Open - * (passive side) where data is allowed to be sent before a connection - * is fully established. - */ - if (((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) && - !tcp_passive_fastopen(sk)) { - err = sk_stream_wait_connect(sk, &timeo); - if (err != 0) - goto out_err; - } + struct bio_vec bvec; + struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, }; - sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk); + bvec_set_page(&bvec, page, size, offset); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); - mss_now = tcp_send_mss(sk, &size_goal, flags); - copied = 0; + if (flags & MSG_SENDPAGE_NOTLAST) + msg.msg_flags |= MSG_MORE; - err = -EPIPE; - if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN)) - goto out_err; - - while (size > 0) { - struct sk_buff *skb; - size_t copy = size; - - skb = tcp_build_frag(sk, size_goal, flags, page, offset, ©); - if (!skb) - goto wait_for_space; - - if (!copied) - TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_PSH; - - copied += copy; - offset += copy; - size -= copy; - if (!size) - goto out; - - if (skb->len < size_goal || (flags & MSG_OOB)) - continue; - - if (forced_push(tp)) { - tcp_mark_push(tp, skb); - __tcp_push_pending_frames(sk, mss_now, TCP_NAGLE_PUSH); - } else if (skb == tcp_send_head(sk)) - tcp_push_one(sk, mss_now); - continue; - -wait_for_space: - set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); - tcp_push(sk, flags & ~MSG_MORE, mss_now, - TCP_NAGLE_PUSH, size_goal); - - err = sk_stream_wait_memory(sk, &timeo); - if (err != 0) - goto do_error; - - mss_now = tcp_send_mss(sk, &size_goal, flags); - } - -out: - if (copied) { - tcp_tx_timestamp(sk, sk->sk_tsflags); - if (!(flags & MSG_SENDPAGE_NOTLAST)) - tcp_push(sk, flags, mss_now, tp->nonagle, size_goal); - } - return copied; - -do_error: - tcp_remove_empty_skb(sk); - if (copied) - goto out; -out_err: - /* make sure we wake any epoll edge trigger waiter */ - if (unlikely(tcp_rtx_and_write_queues_empty(sk) && err == -EAGAIN)) { - sk->sk_write_space(sk); - tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED); - } - return sk_stream_error(sk, flags, err); + return tcp_sendmsg_locked(sk, &msg, size); } EXPORT_SYMBOL_GPL(do_tcp_sendpages); From patchwork Thu Apr 6 09:42:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 80155 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp895047vqo; Thu, 6 Apr 2023 02:46:06 -0700 (PDT) X-Google-Smtp-Source: AKy350bncmufyAOZy3rbxx5jlgLFIHw385lqXK6ImdBcJ5/v+rdb2zBG6OmK74vSdVqLsgLLB2S+ X-Received: by 2002:a17:90b:1803:b0:237:c209:5b14 with SMTP id lw3-20020a17090b180300b00237c2095b14mr10572034pjb.22.1680774366640; Thu, 06 Apr 2023 02:46:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680774366; cv=none; d=google.com; s=arc-20160816; b=fSSyqLmcV8LKXkQpmE5FOi8ahtpJ6jKEaXHCpu3dMWw6kZk7Bqrf2IbQSGRa43mm5b XiXb/CB8KotgDPwJpnSv9sHrLiHC1krWk0QerZYlT5ItI66waLeq5X8yZu7YRqbNnZRv QFmFxswuheDqbS2DJc0h0pJxSqi/MBPVdCu8CyudqCOH7Xz0CMbWdpaOf9YrLyXNXicF 7cnJ3Iojni/fUx9E+JV/Np5xAYKPhaxdL8HvbN7MVPy0je7VRBzSuzYF8Xwzy9Af7W51 JskRDVQ6aJafvWTwn+3piD5IRmlvIRTdy4gQcIweKko2A+gsWxdnRU4o7zHQLdMspJh9 FFWQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=rjJlYTxOmtT7wdKdf7S5Ju9QN7AtCPTA0VxjqyvYpeA=; b=qf9xfyiO74tC1qc7qtVen707az7/isATKwu9RYOB7L8HgQUuXd1bE3IBN55+7Q6G6v 24W+jvu2Kl3Yt0uR7+/HA5qNmZI6xxWBN4vPBTkKxdEfYbJK02V7+3jzhMn8aUsBFF8W iCo111ghvEzjUMH3gjM9TlYUvhGLlwZLdH/8WjADURNhZp3BB83lW3+6cqyNSb5YWmJc 57tB/LhZjY1+eLodljA30R97hU/gx2XhY1WZeWlcTjohcfPwWEmZedEfTcneDiKDK6rf Dp3xWFltozOaEmb9e8XP45G313giJ05Hh/scrNydTIhShDnp0ht5995x8vdc9hAILIeK H/Ig== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=S+S1JxIr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a19-20020a1709027d9300b001a1faf63622si1207861plm.464.2023.04.06.02.45.52; Thu, 06 Apr 2023 02:46:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=S+S1JxIr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236760AbjDFJoz (ORCPT + 99 others); Thu, 6 Apr 2023 05:44:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52328 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231484AbjDFJod (ORCPT ); Thu, 6 Apr 2023 05:44:33 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8642A7EDD for ; Thu, 6 Apr 2023 02:43:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680774201; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rjJlYTxOmtT7wdKdf7S5Ju9QN7AtCPTA0VxjqyvYpeA=; b=S+S1JxIr8m2I6H+KOXofE5r5R0I/rwMDiv5GKjX+dZYV8wF5S0e09ggWUOOsDmYqv23ATG JOIoNACQO3epb3Ch+DOFw7mNC7om0selrhQEuZ4DRvmwjosNxbYEpvucUqi2FutZKvWSJR IlJYJu3qh7DN9y6ZxNcOVl00aTyLAy4= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-3-tBtsn0JjNXKzzGvLcjsJcg-1; Thu, 06 Apr 2023 05:43:17 -0400 X-MC-Unique: tBtsn0JjNXKzzGvLcjsJcg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9072B2800486; Thu, 6 Apr 2023 09:43:16 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 33CEEC12901; Thu, 6 Apr 2023 09:43:14 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , Matthew Wilcox , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, John Fastabend , Jakub Sitnicki , David Ahern , bpf@vger.kernel.org Subject: [PATCH net-next v5 08/19] tcp_bpf: Inline do_tcp_sendpages as it's now a wrapper around tcp_sendmsg Date: Thu, 6 Apr 2023 10:42:34 +0100 Message-Id: <20230406094245.3633290-9-dhowells@redhat.com> In-Reply-To: <20230406094245.3633290-1-dhowells@redhat.com> References: <20230406094245.3633290-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762419662289387130?= X-GMAIL-MSGID: =?utf-8?q?1762419662289387130?= do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(), so inline it. This is part of replacing ->sendpage() with a call to sendmsg() with MSG_SPLICE_PAGES set. Signed-off-by: David Howells cc: John Fastabend cc: Jakub Sitnicki cc: "David S. Miller" cc: Eric Dumazet cc: David Ahern cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org cc: bpf@vger.kernel.org --- net/ipv4/tcp_bpf.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index ebf917511937..24bfb885777e 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -72,11 +72,13 @@ static int tcp_bpf_push(struct sock *sk, struct sk_msg *msg, u32 apply_bytes, { bool apply = apply_bytes; struct scatterlist *sge; + struct msghdr msghdr = { .msg_flags = flags | MSG_SPLICE_PAGES, }; struct page *page; int size, ret = 0; u32 off; while (1) { + struct bio_vec bvec; bool has_tx_ulp; sge = sk_msg_elem(msg, msg->sg.start); @@ -88,16 +90,18 @@ static int tcp_bpf_push(struct sock *sk, struct sk_msg *msg, u32 apply_bytes, tcp_rate_check_app_limited(sk); retry: has_tx_ulp = tls_sw_has_ctx_tx(sk); - if (has_tx_ulp) { - flags |= MSG_SENDPAGE_NOPOLICY; - ret = kernel_sendpage_locked(sk, - page, off, size, flags); - } else { - ret = do_tcp_sendpages(sk, page, off, size, flags); - } + if (has_tx_ulp) + msghdr.msg_flags |= MSG_SENDPAGE_NOPOLICY; + if (flags & MSG_SENDPAGE_NOTLAST) + msghdr.msg_flags |= MSG_MORE; + + bvec_set_page(&bvec, page, size, off); + iov_iter_bvec(&msghdr.msg_iter, ITER_SOURCE, &bvec, 1, size); + ret = tcp_sendmsg_locked(sk, &msghdr, size); if (ret <= 0) return ret; + if (apply) apply_bytes -= ret; msg->sg.size -= ret; @@ -404,7 +408,7 @@ static int tcp_bpf_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) long timeo; int flags; - /* Don't let internal do_tcp_sendpages() flags through */ + /* Don't let internal sendpage flags through */ flags = (msg->msg_flags & ~MSG_SENDPAGE_DECRYPTED); flags |= MSG_NO_SHARED_FRAGS; From patchwork Thu Apr 6 09:42:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 80160 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp895425vqo; Thu, 6 Apr 2023 02:46:56 -0700 (PDT) X-Google-Smtp-Source: AKy350ZGmwXztytOT5VHNnUJLAxaXm99AiwS0UCoIoN6W3SqpeU4BTeK0avZoD1dW8UJINtNQF31 X-Received: by 2002:a05:6a20:a91f:b0:d3:6d54:7852 with SMTP id cd31-20020a056a20a91f00b000d36d547852mr2229247pzb.31.1680774416206; Thu, 06 Apr 2023 02:46:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680774416; cv=none; d=google.com; s=arc-20160816; b=LT0LdGzvRSuevwgkOM82SPUsfhJho0JVFAPwQPUZ+nRJLKHDuqkJJLxYbCD66g5yJ1 3le/RqLnnK30bVPkU/xx15/diawY26vKM/VCxunxAXQrmhjPT8x366zAKVS+Yylrj6+k 4FHHAwnd9oYF992aWW2vWeavku0XA1qHRc1gjwhdNBJICKC146Wwe0hyh2gVfJJ6RC7V qY6VTJtPa05h0m4rk5xoXUzEUlypetubXP/EQ8MeREXu6VWVQYHlRaFJhqFpIb7w43KI boLdpjL3ibvIiCfv3FSJySiExI9VtTjL73POMRVfgxHIyJmnzIkqboy/+2DMtMMAHHAG 6VxA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=tYeoTX9B8dad49qWPHePVLYzX6oyR1jBlVymL6Q5Zsc=; b=wGOV0HYmXDKwRzaTZap5g4Y8nmw28xgt1UsqSqRmIRm0Rpvc4ch+9qqXleL3mVUvlZ oiVVTq6mVup9nR+PYhP0DK5ioaApJFaa4nLV5LHrrEWSfvnqozG9bRr2Ky23Ol2XlcIx Ccm0irGWvUCe8yc6FNCRf3JlOko7Aqhi6T5WZTB55Xw5pOrkalIt+orEZY3SvL3o+8nG fOTXmcxsHqVaengwN8MksWMbkLnoIB/jEFLLQ7nxGgua1DU7MBLLpWyQrfpxTfBNlmo4 k1FNVHsOxQ9G7hJA1vfMAhQPRRwgseejyEUrPVMrHnbh/UdksHTuKdbJfuPoOtqsDWfC Kjjw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=TBiovYo0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v3-20020aa799c3000000b006233c7e887csi1005415pfi.148.2023.04.06.02.46.43; Thu, 06 Apr 2023 02:46:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=TBiovYo0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235604AbjDFJpQ (ORCPT + 99 others); Thu, 6 Apr 2023 05:45:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51842 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236593AbjDFJoo (ORCPT ); Thu, 6 Apr 2023 05:44:44 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 89C157EEA for ; Thu, 6 Apr 2023 02:43:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680774205; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tYeoTX9B8dad49qWPHePVLYzX6oyR1jBlVymL6Q5Zsc=; b=TBiovYo0XktxX4Ep9x51//nt8ypyeRtjpKrb5MaplhaHd2gt9WuW/0XnpPT/xUuUaCWTNQ UpJssy/vniIpoWaVh69bzq1PvxtKO3mnCmem5UEVCppg3agQa8lyEOm5NRzJpkDMwPcV+s cyeGSrTLWjv9h+YWDAXd0PkEAs27RVk= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-119-jk5GdTXgMCaItZ0dnoKllg-1; Thu, 06 Apr 2023 05:43:20 -0400 X-MC-Unique: jk5GdTXgMCaItZ0dnoKllg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 884003C0ED6C; Thu, 6 Apr 2023 09:43:19 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2FFAD2166B26; Thu, 6 Apr 2023 09:43:17 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , Matthew Wilcox , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Steffen Klassert , Herbert Xu , David Ahern Subject: [PATCH net-next v5 09/19] espintcp: Inline do_tcp_sendpages() Date: Thu, 6 Apr 2023 10:42:35 +0100 Message-Id: <20230406094245.3633290-10-dhowells@redhat.com> In-Reply-To: <20230406094245.3633290-1-dhowells@redhat.com> References: <20230406094245.3633290-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762419714265950324?= X-GMAIL-MSGID: =?utf-8?q?1762419714265950324?= do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(), so inline it, allowing do_tcp_sendpages() to be removed. This is part of replacing ->sendpage() with a call to sendmsg() with MSG_SPLICE_PAGES set. Signed-off-by: David Howells cc: Steffen Klassert cc: Herbert Xu cc: Eric Dumazet cc: "David S. Miller" cc: David Ahern cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/xfrm/espintcp.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/net/xfrm/espintcp.c b/net/xfrm/espintcp.c index 872b80188e83..3504925babdb 100644 --- a/net/xfrm/espintcp.c +++ b/net/xfrm/espintcp.c @@ -205,14 +205,16 @@ static int espintcp_sendskb_locked(struct sock *sk, struct espintcp_msg *emsg, static int espintcp_sendskmsg_locked(struct sock *sk, struct espintcp_msg *emsg, int flags) { + struct msghdr msghdr = { .msg_flags = flags | MSG_SPLICE_PAGES, }; struct sk_msg *skmsg = &emsg->skmsg; struct scatterlist *sg; int done = 0; int ret; - flags |= MSG_SENDPAGE_NOTLAST; + msghdr.msg_flags |= MSG_SENDPAGE_NOTLAST; sg = &skmsg->sg.data[skmsg->sg.start]; do { + struct bio_vec bvec; size_t size = sg->length - emsg->offset; int offset = sg->offset + emsg->offset; struct page *p; @@ -220,11 +222,13 @@ static int espintcp_sendskmsg_locked(struct sock *sk, emsg->offset = 0; if (sg_is_last(sg)) - flags &= ~MSG_SENDPAGE_NOTLAST; + msghdr.msg_flags &= ~MSG_SENDPAGE_NOTLAST; p = sg_page(sg); retry: - ret = do_tcp_sendpages(sk, p, offset, size, flags); + bvec_set_page(&bvec, p, size, offset); + iov_iter_bvec(&msghdr.msg_iter, ITER_SOURCE, &bvec, 1, size); + ret = tcp_sendmsg_locked(sk, &msghdr, size); if (ret < 0) { emsg->offset = offset - sg->offset; skmsg->sg.start += done; From patchwork Thu Apr 6 09:42:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 80163 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp895873vqo; Thu, 6 Apr 2023 02:48:06 -0700 (PDT) X-Google-Smtp-Source: AKy350YdMDltDB0xIa809YW2JwYjjYQ1aPN3MDYWz7kZR7im1pWT9sPc0EK377/mUhNjlkQ9g+cx X-Received: by 2002:a62:53c7:0:b0:628:a71:77a0 with SMTP id h190-20020a6253c7000000b006280a7177a0mr7997386pfb.7.1680774485997; Thu, 06 Apr 2023 02:48:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680774485; cv=none; d=google.com; s=arc-20160816; b=Q0Y0OKEx0p8rUN+D1s3LK/M8ZjsYUak8btx5bnThaeiMdNq9VHGPJpdUhIRMgKTioY +7z64V/4xsqj3ZImF4NbB79XX4syASWlMC6IrbAPDa6WQv0qalDA3vNDhJG1pqtcGh6t RpLQl/7UbpMvUr9qT/XIDZXnfSwaswpxTHmyXRCzVjMFqC/FDpTnh7sVi4IEH0RKTcHO UJaRTeZtRhWg0bpFQRKxoFpW5swncmarHcdW0ecyz/4B8M5T5ulOxwb2fL7lBC0hUY3b gkJ36NO6LdZsAahTPJaSbAZ9NqgnWmAHzHeGOyroY0IW2n8c64MTuPDchAtqqLMPCDeG MDiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=3FA5Yjp/rAglxaSde3E7QeP89aqCLnwKrUfyvWNxLew=; b=jm23g5SR4embXPvv2evzZ7agVpB33m8Y0YW/0nG2rmHjxYfeLB7aPF7h0tU0jHjK9b 7SSk8aPXFIWptLS+pZun1n7wXiL6LdAEMTudVRgY+mzZdiirOP0TzyI8ipAnzrjf3wwV MtlK46br3FMJ4OM3ukA4FExNEaFAmW0IGcWL3Sh9KQR0VU709HwrC1HlaFPsSv3Rltvw 7Zgxxh3nQ3lyfMKhCBoHsVAV7q9520X9twyg1y73qs/daXfzWOePC45P+E3jxaWgrcfR YOUbe3WlaCha+tc8DdyAz1CVH804TbArRankBHcgyTNbefz3q6ZvNILo5GW1zru4c5UO aCPQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=cLSkD2Yc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a187-20020a6390c4000000b00502f48d80b6si853313pge.645.2023.04.06.02.47.53; Thu, 06 Apr 2023 02:48:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=cLSkD2Yc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236843AbjDFJpV (ORCPT + 99 others); Thu, 6 Apr 2023 05:45:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51806 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236626AbjDFJor (ORCPT ); Thu, 6 Apr 2023 05:44:47 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 672B57EF9 for ; Thu, 6 Apr 2023 02:43:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680774208; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3FA5Yjp/rAglxaSde3E7QeP89aqCLnwKrUfyvWNxLew=; b=cLSkD2Yck0lC5FB8JvsyM0DKSf9xIxatAMSyq22Vjtvad1o88ISP+Rng8dd6W8rypjZHz8 jio/I43UAJJKknuPirLhl9XTzHuxkQE0PKKHbsdSLFTHOKyLw/s817dN3Uj8VxFyT3cvRN B32Iyk2E+1rgZ03hxS8/Ih7uXK8s/OI= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-635-Db4pioF3PMmf8aN9sn6_TA-1; Thu, 06 Apr 2023 05:43:23 -0400 X-MC-Unique: Db4pioF3PMmf8aN9sn6_TA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 633491C041BB; Thu, 6 Apr 2023 09:43:22 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3BBE92166B29; Thu, 6 Apr 2023 09:43:20 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , Matthew Wilcox , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Boris Pismenny , John Fastabend Subject: [PATCH net-next v5 10/19] tls: Inline do_tcp_sendpages() Date: Thu, 6 Apr 2023 10:42:36 +0100 Message-Id: <20230406094245.3633290-11-dhowells@redhat.com> In-Reply-To: <20230406094245.3633290-1-dhowells@redhat.com> References: <20230406094245.3633290-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762419786988922784?= X-GMAIL-MSGID: =?utf-8?q?1762419786988922784?= do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(), so inline it, allowing do_tcp_sendpages() to be removed. This is part of replacing ->sendpage() with a call to sendmsg() with MSG_SPLICE_PAGES set. Signed-off-by: David Howells cc: Boris Pismenny cc: John Fastabend cc: Jakub Kicinski cc: "David S. Miller" cc: Eric Dumazet cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- include/net/tls.h | 2 +- net/tls/tls_main.c | 24 +++++++++++++++--------- 2 files changed, 16 insertions(+), 10 deletions(-) diff --git a/include/net/tls.h b/include/net/tls.h index 154949c7b0c8..d31521c36a84 100644 --- a/include/net/tls.h +++ b/include/net/tls.h @@ -256,7 +256,7 @@ struct tls_context { struct scatterlist *partially_sent_record; u16 partially_sent_offset; - bool in_tcp_sendpages; + bool splicing_pages; bool pending_open_record_frags; struct mutex tx_lock; /* protects partially_sent_* fields and diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index b32c112984dd..1d0e318d7977 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -124,7 +124,10 @@ int tls_push_sg(struct sock *sk, u16 first_offset, int flags) { - int sendpage_flags = flags | MSG_SENDPAGE_NOTLAST; + struct bio_vec bvec; + struct msghdr msg = { + .msg_flags = MSG_SENDPAGE_NOTLAST | MSG_SPLICE_PAGES | flags, + }; int ret = 0; struct page *p; size_t size; @@ -133,16 +136,19 @@ int tls_push_sg(struct sock *sk, size = sg->length - offset; offset += sg->offset; - ctx->in_tcp_sendpages = true; + ctx->splicing_pages = true; while (1) { if (sg_is_last(sg)) - sendpage_flags = flags; + msg.msg_flags = flags; /* is sending application-limited? */ tcp_rate_check_app_limited(sk); p = sg_page(sg); retry: - ret = do_tcp_sendpages(sk, p, offset, size, sendpage_flags); + bvec_set_page(&bvec, p, size, offset); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); + + ret = tcp_sendmsg_locked(sk, &msg, size); if (ret != size) { if (ret > 0) { @@ -154,7 +160,7 @@ int tls_push_sg(struct sock *sk, offset -= sg->offset; ctx->partially_sent_offset = offset; ctx->partially_sent_record = (void *)sg; - ctx->in_tcp_sendpages = false; + ctx->splicing_pages = false; return ret; } @@ -168,7 +174,7 @@ int tls_push_sg(struct sock *sk, size = sg->length; } - ctx->in_tcp_sendpages = false; + ctx->splicing_pages = false; return 0; } @@ -246,11 +252,11 @@ static void tls_write_space(struct sock *sk) { struct tls_context *ctx = tls_get_ctx(sk); - /* If in_tcp_sendpages call lower protocol write space handler + /* If splicing_pages call lower protocol write space handler * to ensure we wake up any waiting operations there. For example - * if do_tcp_sendpages where to call sk_wait_event. + * if splicing pages where to call sk_wait_event. */ - if (ctx->in_tcp_sendpages) { + if (ctx->splicing_pages) { ctx->sk_write_space(sk); return; } From patchwork Thu Apr 6 09:42:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 80159 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp895341vqo; Thu, 6 Apr 2023 02:46:46 -0700 (PDT) X-Google-Smtp-Source: AKy350byIUeO4nH6ND8WRpnXNvsCTpVYpTv090lBj1qUc9BtP2P8D2ftF2LUWaD+r5e/qNLvIiXg X-Received: by 2002:aa7:978f:0:b0:625:4b46:e019 with SMTP id o15-20020aa7978f000000b006254b46e019mr6109063pfp.9.1680774406480; Thu, 06 Apr 2023 02:46:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680774406; cv=none; d=google.com; s=arc-20160816; b=Vfp4oI9wJlsa5lwJrLM1659pOgVrrfSfrV840Z850emN9H57vCWrbI08wxO0MmXGP1 ZER7XaNJBn+H01DqbYASHbPHgtpdnFpwbQkw4AA3f5npm7vO+lx53/Z2sTQpjbgHUvva C0j0GERJmfSpe6JawNAHMuWMjvRJ7uTfrM8pZ9SQdOyuy0kHK6BErD2IF/13wit01/Ak b9nGZtlyVBrUtBmX7nBAkqLi2aLyJTmKP54deTCst2uGZtkDbFdiezk9eYqwkvL0cMcl 34Af72zt6eSz/Jvy+R1oYzoDbYpTfvMDDV7gI6hkjdsxFufb2kdNYD4is90wlDcjc7Ky 4bSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=c83k4BnXfg8o4jtTDXWGTM6Kn2fXWMyaDkEU8sjy210=; b=RzYKlKG1Is7CDivFvp/+NZIl67/DWLOXo9BM9qGTDIMJlPTO2PlB16VmRMlCiz8nwo zeDcJDD6FFghVBDeTdsMjHvZBDJEvFOnl0NpFnMILLMqHRariIAzKBzGKWSB5n4uVnIB vFQcY3qijDOz+f1J1uRM/SaAT2CgJBn8b3dWMPApBKdrbkYh1aR/agFsLmGOVPJBUS9r 5W7X6yDB/aM1kDpqAWKqi7jVQw1KmuyhDuRFva2AhwF8t/+Cn/6ADMigOOpp48ChGtRH 8sDcPp6XIk7tJ60mmAOMb5N/3SuvNj9KGedpCsGE1yl/wbUeTmE8mY10Kz5IQS2f3nsk vK5g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HocLaENs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c25-20020a6566d9000000b004fc21b8bd98si924743pgw.328.2023.04.06.02.46.33; Thu, 06 Apr 2023 02:46:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HocLaENs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236797AbjDFJpL (ORCPT + 99 others); Thu, 6 Apr 2023 05:45:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51812 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236602AbjDFJoo (ORCPT ); Thu, 6 Apr 2023 05:44:44 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A6D1583C0 for ; Thu, 6 Apr 2023 02:43:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680774209; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c83k4BnXfg8o4jtTDXWGTM6Kn2fXWMyaDkEU8sjy210=; b=HocLaENsXNx89RPtVJR7SYa5qFLvNq3/klnY1PTLmpFjjVHu0MS4W13v+JD/aFY8/0DMkQ Xf05egrb1fvSxQ2u08VLTGFnp5x5cCsMZPKNMkkxbL+kxAUNrDxCZvbOoLJSjXywv5WcRQ uQ6NpATSZUSjMF784zS2pZILHK+pgNE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-576-TYObJh0zMq67k0lTYOLEeg-1; Thu, 06 Apr 2023 05:43:26 -0400 X-MC-Unique: TYObJh0zMq67k0lTYOLEeg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 95B0E800B23; Thu, 6 Apr 2023 09:43:25 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id F305A440D8; Thu, 6 Apr 2023 09:43:22 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , Matthew Wilcox , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Bernard Metzler , Jason Gunthorpe , Leon Romanovsky , Tom Talpey , linux-rdma@vger.kernel.org Subject: [PATCH net-next v5 11/19] siw: Inline do_tcp_sendpages() Date: Thu, 6 Apr 2023 10:42:37 +0100 Message-Id: <20230406094245.3633290-12-dhowells@redhat.com> In-Reply-To: <20230406094245.3633290-1-dhowells@redhat.com> References: <20230406094245.3633290-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762419703949559052?= X-GMAIL-MSGID: =?utf-8?q?1762419703949559052?= do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(), so inline it, allowing do_tcp_sendpages() to be removed. This is part of replacing ->sendpage() with a call to sendmsg() with MSG_SPLICE_PAGES set. Signed-off-by: David Howells cc: Bernard Metzler cc: Jason Gunthorpe cc: Leon Romanovsky cc: Tom Talpey cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: linux-rdma@vger.kernel.org cc: netdev@vger.kernel.org --- drivers/infiniband/sw/siw/siw_qp_tx.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c index 05052b49107f..fa5de40d85d5 100644 --- a/drivers/infiniband/sw/siw/siw_qp_tx.c +++ b/drivers/infiniband/sw/siw/siw_qp_tx.c @@ -313,7 +313,7 @@ static int siw_tx_ctrl(struct siw_iwarp_tx *c_tx, struct socket *s, } /* - * 0copy TCP transmit interface: Use do_tcp_sendpages. + * 0copy TCP transmit interface: Use MSG_SPLICE_PAGES. * * Using sendpage to push page by page appears to be less efficient * than using sendmsg, even if data are copied. @@ -324,20 +324,27 @@ static int siw_tx_ctrl(struct siw_iwarp_tx *c_tx, struct socket *s, static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset, size_t size) { + struct bio_vec bvec; + struct msghdr msg = { + .msg_flags = (MSG_MORE | MSG_DONTWAIT | MSG_SENDPAGE_NOTLAST | + MSG_SPLICE_PAGES), + }; struct sock *sk = s->sk; - int i = 0, rv = 0, sent = 0, - flags = MSG_MORE | MSG_DONTWAIT | MSG_SENDPAGE_NOTLAST; + int i = 0, rv = 0, sent = 0; while (size) { size_t bytes = min_t(size_t, PAGE_SIZE - offset, size); if (size + offset <= PAGE_SIZE) - flags = MSG_MORE | MSG_DONTWAIT; + msg.msg_flags = MSG_MORE | MSG_DONTWAIT; tcp_rate_check_app_limited(sk); + bvec_set_page(&bvec, page[i], bytes, offset); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); + try_page_again: lock_sock(sk); - rv = do_tcp_sendpages(sk, page[i], offset, bytes, flags); + rv = tcp_sendmsg_locked(sk, &msg, size); release_sock(sk); if (rv > 0) { From patchwork Thu Apr 6 09:42:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 80158 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp895227vqo; Thu, 6 Apr 2023 02:46:30 -0700 (PDT) X-Google-Smtp-Source: AKy350bjcgcQk9dVovMbaoC49qp81fJNOoKWRynnrJtNWIc4XFm3SESo1gZ7qsUDFKk+Amy8F6oF X-Received: by 2002:a17:90b:1651:b0:23d:1aae:29e5 with SMTP id il17-20020a17090b165100b0023d1aae29e5mr10714816pjb.20.1680774390378; Thu, 06 Apr 2023 02:46:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680774390; cv=none; d=google.com; s=arc-20160816; b=aiBKnW5v+uoUisHbHdX2tT8ae7Q9O/5YLTmmH4aqvbYZpran6xYQ/JoGkCMF+weWsb QbHr8PZ/4iyXxrx+EPzhjlKjzghmO2yBkw91MSRPJ3N3Y9iJTAZFPwInE4Iatc+4mrBq D7cEdh3lOiLqah+e/q0Ub/By2uEs1EGjQ6x9wA5DVdKC9eLzVBei7oZqtRfkSVufrFb/ 8/ZR35r6E9Tnnpe8rhmaGGKWDTkzr+uHIpXGqz+LysSch+XIAuxKHTJ72ifSmXKBLOb4 crsuuADkULJBm71nPYqZP5bYCsVoiEepD2tWm81QyIrRyPjFxGZWKyzEaKykfjnboC1W 8xsw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=RHjrbnbFl1QPUSYdPKMdHAGAsNDTVNQC0aaPP3HcXyE=; b=RlJ6RquyC52ULPlPOlhM4FWBJ7kYeSdAZvRn2BpLiSCBUcmG2g0fR6RUJotC4j44lD GS290IPXInCT+XTjcqqXAAs9ecJBKgYx4PgOHRqEhTaqNWAl+iUY8m2w5zHRmrIQ6I4q vKXo4PJ5TveEXTGNxt8eK9lVhcgKYvnC1gRYoL+7+n7L5U5MCQJ2i6I1nD/heD5z7XAn C7SzY1fiCwQVPTphQAsKfJlQGkaOh8uTb+C2aV5oeOHsFDnge/uvXZkv2Mz0N6qQ7Q5o BX0ijITo+9oSiqo6pBFmnDGV3bMddvBLQmKOrdKGbg0Ya6SSQ35pSp+xjhq91AHopCqD XQ6Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ZufC6Xfe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x10-20020a17090300ca00b001a2513b8e14si1251124plc.84.2023.04.06.02.46.17; Thu, 06 Apr 2023 02:46:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ZufC6Xfe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236643AbjDFJpG (ORCPT + 99 others); Thu, 6 Apr 2023 05:45:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52806 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233489AbjDFJoj (ORCPT ); Thu, 6 Apr 2023 05:44:39 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A344383E1 for ; Thu, 6 Apr 2023 02:43:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680774215; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RHjrbnbFl1QPUSYdPKMdHAGAsNDTVNQC0aaPP3HcXyE=; b=ZufC6Xfe9cCwHeosnkqecyNiJj8BXp5U1K/i6KQgydsk2XtJi+yaF79oSg8TsoKGsQh08f 0YZWUaVqDFwsyYLyHXORPSGVqgU4yxepsYLmMGM8EDAKCKKedfa7ut4udAPUSgeeF4C27y CCquZxMXoGt5j/NiVm/yClGxiIPbzjU= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-262-3gYCWEu5OKKN3foGZGQGOA-1; Thu, 06 Apr 2023 05:43:29 -0400 X-MC-Unique: 3gYCWEu5OKKN3foGZGQGOA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5B8D8101A550; Thu, 6 Apr 2023 09:43:28 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4AEF240C20FA; Thu, 6 Apr 2023 09:43:26 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , Matthew Wilcox , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, David Ahern Subject: [PATCH net-next v5 12/19] tcp: Fold do_tcp_sendpages() into tcp_sendpage_locked() Date: Thu, 6 Apr 2023 10:42:38 +0100 Message-Id: <20230406094245.3633290-13-dhowells@redhat.com> In-Reply-To: <20230406094245.3633290-1-dhowells@redhat.com> References: <20230406094245.3633290-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762419687159730389?= X-GMAIL-MSGID: =?utf-8?q?1762419687159730389?= Fold do_tcp_sendpages() into its last remaining caller, tcp_sendpage_locked(). Signed-off-by: David Howells cc: Eric Dumazet cc: David Ahern cc: "David S. Miller" cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- include/net/tcp.h | 2 -- net/ipv4/tcp.c | 21 +++++++-------------- 2 files changed, 7 insertions(+), 16 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index a0a91a988272..11c62d37f3d5 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -333,8 +333,6 @@ int tcp_sendpage(struct sock *sk, struct page *page, int offset, size_t size, int flags); int tcp_sendpage_locked(struct sock *sk, struct page *page, int offset, size_t size, int flags); -ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, - size_t size, int flags); int tcp_send_mss(struct sock *sk, int *size_goal, int flags); void tcp_push(struct sock *sk, int flags, int mss_now, int nonagle, int size_goal); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index a8a4ace8b3da..c7240beedfed 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -972,12 +972,17 @@ static int tcp_wmem_schedule(struct sock *sk, int copy) return min(copy, sk->sk_forward_alloc); } -ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, - size_t size, int flags) +int tcp_sendpage_locked(struct sock *sk, struct page *page, int offset, + size_t size, int flags) { struct bio_vec bvec; struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, }; + if (!(sk->sk_route_caps & NETIF_F_SG)) + return sock_no_sendpage_locked(sk, page, offset, size, flags); + + tcp_rate_check_app_limited(sk); /* is sending application-limited? */ + bvec_set_page(&bvec, page, size, offset); iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); @@ -986,18 +991,6 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, return tcp_sendmsg_locked(sk, &msg, size); } -EXPORT_SYMBOL_GPL(do_tcp_sendpages); - -int tcp_sendpage_locked(struct sock *sk, struct page *page, int offset, - size_t size, int flags) -{ - if (!(sk->sk_route_caps & NETIF_F_SG)) - return sock_no_sendpage_locked(sk, page, offset, size, flags); - - tcp_rate_check_app_limited(sk); /* is sending application-limited? */ - - return do_tcp_sendpages(sk, page, offset, size, flags); -} EXPORT_SYMBOL_GPL(tcp_sendpage_locked); int tcp_sendpage(struct sock *sk, struct page *page, int offset, From patchwork Thu Apr 6 09:42:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 80162 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp895611vqo; Thu, 6 Apr 2023 02:47:27 -0700 (PDT) X-Google-Smtp-Source: AKy350b4ptXEd/KEuRxOA2HkK4JywUOWRgL+96mBvEq9pNH1+gG8ommYYv8CPA3E0K7wNzltRaog X-Received: by 2002:aa7:958b:0:b0:62d:e966:ffc7 with SMTP id z11-20020aa7958b000000b0062de966ffc7mr8924141pfj.21.1680774447534; Thu, 06 Apr 2023 02:47:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680774447; cv=none; d=google.com; s=arc-20160816; b=0LAUMojL2VzyL1qIZYfK96Y+glzx1Xn86fikk/lyUll+U/G6pHZ3cRgGfqSokOTJFV zhLZKm0TnXUdD0bMLGOqQEmvCuBzqtGGNIn9j7OXC2XDMd3Wz4/09Cg0tOqDa0cuwdXv KLUXA+Py4tD87q9IHSGLBsUgNCI+xc2mPuBxeyukh9Qa2m8QgXaG5BIUILuPhTUhxcp+ u4jRDA3uJaM7GSqYAn5D+hEnrJKW+uIfQrGKq4RR354F4zKzI2PRofVQUqMcK0meAf8N v1wAZOzEPhYNZxYxcfQfMc6clWN4rdL67QJqmgxvRbghxgQRn3iF6fE87JRWfjtyIIUA hhwg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=DWLW7zv5E95PcNMvzADLtZC4cFCYwscW+4auTp2qbuU=; b=z4QQ8217A8Y0WZw2UZBMEJAYiycZ6AQCGrKm8uHDBO9NXaI02lHYsUL5Bp/kzfZ9zS 9R2TH90mdKrdqQ1YSMNwpE0N6dfYj+LHtS90wyU9vblqCqUx5BIx7+3A1NB/VqXLb5F2 UJFPGx1yZXesrZj61EYWDffee0M4zLDmrzrcc+XkzDSH8NBnJhDAnvNyJW0nlk+D0eef sSJIwWvz9Ku6WsxPqBGuAGKEoEKKqHTBFKcv3SYQNGFRHBWdY9Zet7On5eECgJQishRe Bxs0uQf7qbPOUH8JqteguBN/Gw+kCgKoflRLtIJMCWgtJhF2WoGnEXPpDMMJ58zHB13W TLqQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=a1OUmHAj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j15-20020a63fc0f000000b0051344de5b95si873552pgi.428.2023.04.06.02.47.15; Thu, 06 Apr 2023 02:47:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=a1OUmHAj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236634AbjDFJpT (ORCPT + 99 others); Thu, 6 Apr 2023 05:45:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52820 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236618AbjDFJop (ORCPT ); Thu, 6 Apr 2023 05:44:45 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D2AFA83E4 for ; Thu, 6 Apr 2023 02:43:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680774216; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DWLW7zv5E95PcNMvzADLtZC4cFCYwscW+4auTp2qbuU=; b=a1OUmHAjCNCQXlK+zBGRs38y+NnH4cTuau0gBa9InFov9qHaI0KeqASNt98jKDiBXyb+0v ZV+Et4YNoLnxVRJtncdeb6/JjAR87uPHC07oH5uVJ8X/IVmifldj7mPyC3pDAc6dcwerA9 v+yY3AniwOjtTFpGt+0EnsGKWls11q8= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-44-vEeTLutcMDW3WDa_hFgwtA-1; Thu, 06 Apr 2023 05:43:31 -0400 X-MC-Unique: vEeTLutcMDW3WDa_hFgwtA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0F9C2800B23; Thu, 6 Apr 2023 09:43:31 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id F03781121314; Thu, 6 Apr 2023 09:43:28 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , Matthew Wilcox , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, David Ahern Subject: [PATCH net-next v5 13/19] ip, udp: Support MSG_SPLICE_PAGES Date: Thu, 6 Apr 2023 10:42:39 +0100 Message-Id: <20230406094245.3633290-14-dhowells@redhat.com> In-Reply-To: <20230406094245.3633290-1-dhowells@redhat.com> References: <20230406094245.3633290-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762419746998719864?= X-GMAIL-MSGID: =?utf-8?q?1762419746998719864?= Make IP/UDP sendmsg() support MSG_SPLICE_PAGES. This causes pages to be spliced from the source iterator. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells cc: Willem de Bruijn cc: David Ahern cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/ipv4/ip_output.c | 47 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 22a90a9392eb..c6c318eb3d37 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -957,6 +957,41 @@ csum_page(struct page *page, int offset, int copy) return csum; } +/* + * Add (or copy) data pages for MSG_SPLICE_PAGES. + */ +static int __ip_splice_pages(struct sock *sk, struct sk_buff *skb, + void *from, int *pcopy) +{ + struct msghdr *msg = from; + struct page *page = NULL, **pages = &page; + ssize_t copy = *pcopy; + size_t off; + int err; + + copy = iov_iter_extract_pages(&msg->msg_iter, &pages, copy, 1, 0, &off); + if (copy <= 0) + return copy ?: -EIO; + + err = skb_append_pagefrags(skb, page, off, copy); + if (err < 0) { + iov_iter_revert(&msg->msg_iter, copy); + return err; + } + + if (skb->ip_summed == CHECKSUM_NONE) { + __wsum csum; + + csum = csum_page(page, off, copy); + skb->csum = csum_block_add(skb->csum, csum, skb->len); + } + + skb_len_add(skb, copy); + refcount_add(copy, &sk->sk_wmem_alloc); + *pcopy = copy; + return 0; +} + static int __ip_append_data(struct sock *sk, struct flowi4 *fl4, struct sk_buff_head *queue, @@ -1048,6 +1083,14 @@ static int __ip_append_data(struct sock *sk, skb_zcopy_set(skb, uarg, &extra_uref); } } + } else if ((flags & MSG_SPLICE_PAGES) && length) { + if (inet->hdrincl) + return -EPERM; + if (rt->dst.dev->features & NETIF_F_SG) + /* We need an empty buffer to attach stuff to */ + paged = true; + else + flags &= ~MSG_SPLICE_PAGES; } cork->length += length; @@ -1207,6 +1250,10 @@ static int __ip_append_data(struct sock *sk, err = -EFAULT; goto error; } + } else if (flags & MSG_SPLICE_PAGES) { + err = __ip_splice_pages(sk, skb, from, ©); + if (err < 0) + goto error; } else if (!zc) { int i = skb_shinfo(skb)->nr_frags; From patchwork Thu Apr 6 09:42:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 80171 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp898449vqo; Thu, 6 Apr 2023 02:54:41 -0700 (PDT) X-Google-Smtp-Source: AKy350ZI63Y3XSiScNSeDxE9RNeyM3uysW9WNiJ9Jb7x+9J/N0CUe7BH+d6kUqyhd7R2FaEMWPVD X-Received: by 2002:a17:907:d603:b0:931:af6a:ad08 with SMTP id wd3-20020a170907d60300b00931af6aad08mr6144285ejc.57.1680774880867; Thu, 06 Apr 2023 02:54:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680774880; cv=none; d=google.com; s=arc-20160816; b=XolgdUMvwoOIsZOjKus2hcIVcbP1zjy+DJWLtNXExGNwXhMTJZJP17TW1IRxZgr4Os OcOcVRlaI2904x+WAm207vY/Bjq4ij3aAPGOxBXYK8CFXrsZIS5SJGomCz3Bf2hhohV2 UruqbbBBRyPwch5CLOomDHGpXkUDdXdehv14Hojbioq8jXv8UKEl0YRK0TZa99QcZgcM MnkTYIeGEbmfIe2SfYoqfIgsfRJKZ/H5kvgMl02Hh0PWoIxOBDjqgI0rvb3EEiDXvqq8 DyLOndRQ1CZoPPvUG4qn7Wx0I/SnLFn0vQTRPbr46mto5YLMugn1HF73MmVQgb2HWN3D 0tmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Lhpi7afnUSYBVmFcNV33/AfpfwJkESufDM5kupMXc5M=; b=yYm8wCtLt0spxQ/5K+PzYNzzOlfB+4tD7heI84R5XPZeJU8MASOIykeh/e7fqvLh6I hbpbUasJQe2ySr4e8JKEv5+awljiIpo1IATi37clzazi726q19+duz73TZOHKN4sBYOB Jmd9S2YCecxKI9yewnCsdehdwRcTpbx+TwbcopHZJ+iDbr/15zYj5RCrTCh3Eg6XCg/i F5+eR9HOcUafthovkjF509cjIycLyDYrhC9xjOmyMoHOnHmSnYGsRut7CpbI/DvUtOzp bKNc9Em887InGj4ObOo5881toty/jjt3pXj63ZOx/rHK6HwakiVIJSVIUBakU37Pb6/X An4Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="C/KEt8yi"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g24-20020a17090613d800b009332bb8b1fcsi874488ejc.842.2023.04.06.02.54.16; Thu, 06 Apr 2023 02:54:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="C/KEt8yi"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236618AbjDFJqT (ORCPT + 99 others); Thu, 6 Apr 2023 05:46:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51956 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236723AbjDFJow (ORCPT ); Thu, 6 Apr 2023 05:44:52 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C657383F6 for ; Thu, 6 Apr 2023 02:43:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680774217; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Lhpi7afnUSYBVmFcNV33/AfpfwJkESufDM5kupMXc5M=; b=C/KEt8yiz8eyWTOqZ1H9mEQeQ5DdGfiyGURpwBjnnyZqacE4V1bSZiKHQqECqyOMLPiMMM EdCVb7ydz5JOYlqM5AFO1rMQVDMcoFJdKflQYc0Yk+pJa7u+X9ZwwLRoP9/pwzJM40rAwu kaG3lF2LqcRfqhgRUI6XnN4tWeriO2Q= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-387-ZCLBzN6RN_yR-HE_tGR4oA-1; Thu, 06 Apr 2023 05:43:34 -0400 X-MC-Unique: ZCLBzN6RN_yR-HE_tGR4oA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D9DE93C0ED40; Thu, 6 Apr 2023 09:43:33 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id C6DC0202701F; Thu, 6 Apr 2023 09:43:31 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , Matthew Wilcox , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, David Ahern Subject: [PATCH net-next v5 14/19] ip, udp: Make sendmsg(MSG_SPLICE_PAGES) copy unspliceable data Date: Thu, 6 Apr 2023 10:42:40 +0100 Message-Id: <20230406094245.3633290-15-dhowells@redhat.com> In-Reply-To: <20230406094245.3633290-1-dhowells@redhat.com> References: <20230406094245.3633290-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762420201769562559?= X-GMAIL-MSGID: =?utf-8?q?1762420201769562559?= If sendmsg() with MSG_SPLICE_PAGES encounters a page that shouldn't be spliced - a slab page, for instance, or one with a zero count - make __ip_append_data() copy it. Signed-off-by: David Howells cc: Willem de Bruijn cc: David Ahern cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/ipv4/ip_output.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index c6c318eb3d37..48db7bf475df 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -967,13 +967,32 @@ static int __ip_splice_pages(struct sock *sk, struct sk_buff *skb, struct page *page = NULL, **pages = &page; ssize_t copy = *pcopy; size_t off; + bool put = false; int err; copy = iov_iter_extract_pages(&msg->msg_iter, &pages, copy, 1, 0, &off); if (copy <= 0) return copy ?: -EIO; + if (!sendpage_ok(page)) { + const void *p = kmap_local_page(page); + void *q; + + q = page_frag_memdup(NULL, p + off, copy, + sk->sk_allocation, ULONG_MAX); + kunmap_local(p); + if (!q) { + iov_iter_revert(&msg->msg_iter, copy); + return -ENOMEM; + } + page = virt_to_page(q); + off = offset_in_page(q); + put = true; + } + err = skb_append_pagefrags(skb, page, off, copy); + if (put) + put_page(page); if (err < 0) { iov_iter_revert(&msg->msg_iter, copy); return err; From patchwork Thu Apr 6 09:42:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 80166 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp896983vqo; Thu, 6 Apr 2023 02:50:54 -0700 (PDT) X-Google-Smtp-Source: AKy350ZxoSAHrMarlsE221nJyreSV7s4IkQOOlvCIK34AQUP4DAdtPQNEBew7is9RBYVlK9CrbeF X-Received: by 2002:a62:2901:0:b0:62d:b7c7:bd6e with SMTP id p1-20020a622901000000b0062db7c7bd6emr7994651pfp.12.1680774653978; Thu, 06 Apr 2023 02:50:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680774653; cv=none; d=google.com; s=arc-20160816; b=RFpe8R+uTyI/8ETPTU6FnBKvuuYpLneriVywqkybhqeEfjrHsEejbt7AW4+8+gA+8P iTzaQNNRTzQXEDIfzZbboAllOPpRVlZPl6G0nQirVhOnoq6eo97qX5ErTcO5Hf+6NbZe UV7kOrsGaeKBum75Njwh9EtwOlrLYREPea5y3CeCQNPd2kpCxwKlfFwjMyjK1R9B4qX5 nzDeyBA3T/hWhJUPDGloTuIs44xmTxgNnqtB7FAE9WkB4R0IphLnlLb2MNIilO/8/fXn lBUZ+mDs329fX0QVp72XBzM5flXUp3FxczpQys/qsx+KGVehTduA+9kvzOae7RyNka01 Z+rQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=DdOAoT9+lQC5BMkbFi+53knWSKBAKQi3HhFM4xBQ4Wc=; b=lsdTpyYh9l9f1DxmYOcIB9uvyfcrYvhk0wZUKQygBrbCDvzKMsoBuD7xwk8LgcQyv4 WS8b6Uv3loJVWMkhPtmDjD5TEYv3h96cKWokAherwXhJRKMU4Q8g4njc0yvhKw75klgv dNufZm3JhGphuLwZlUZsvSD0b7uGeel3i1TDLqsjxPdK4xkAfH4VPJ7F3HQU57O5CGwS t00LPXCdiNSVlJIT9T5/ZbYqey59wqV4jr9X4gw8YKfON718cBV9h06WyrURGtJIblYe 2tHvYeaz08wSQ4ITlmUfAv5X4fRLP63DkSUr4KlVhOSzdzZYf+q5Q2DtGCAlokG8dOrK 2bYw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=JUVdNAnc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a29-20020aa795bd000000b00625ea7dae84si999663pfk.161.2023.04.06.02.50.40; Thu, 06 Apr 2023 02:50:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=JUVdNAnc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236610AbjDFJqX (ORCPT + 99 others); Thu, 6 Apr 2023 05:46:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51954 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235204AbjDFJox (ORCPT ); Thu, 6 Apr 2023 05:44:53 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 52F3583FD for ; Thu, 6 Apr 2023 02:43:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680774220; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DdOAoT9+lQC5BMkbFi+53knWSKBAKQi3HhFM4xBQ4Wc=; b=JUVdNAnc/5zraysqoQ4TZTmswEnu2QSkSaxa7qMTs8sw5bcWOoVB7xU4dvh/IuMMfyEGPl O7YsHRndJ/QT5tPFq0SW2I/GCDzeKgiD7zhmL6Gs4biFWoRHTNNYW209Tz6UpuGeyhqyYN Fc2YAA+1rtS3Cjg1FB967fs4Rq5Un4Q= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-245-8xTpgCDVMpmC5orYHYQeQw-1; Thu, 06 Apr 2023 05:43:37 -0400 X-MC-Unique: 8xTpgCDVMpmC5orYHYQeQw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 88C731C068E1; Thu, 6 Apr 2023 09:43:36 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 79E6B2166B26; Thu, 6 Apr 2023 09:43:34 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , Matthew Wilcox , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, David Ahern Subject: [PATCH net-next v5 15/19] ip6, udp6: Support MSG_SPLICE_PAGES Date: Thu, 6 Apr 2023 10:42:41 +0100 Message-Id: <20230406094245.3633290-16-dhowells@redhat.com> In-Reply-To: <20230406094245.3633290-1-dhowells@redhat.com> References: <20230406094245.3633290-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762419963740075984?= X-GMAIL-MSGID: =?utf-8?q?1762419963740075984?= Make IP6/UDP6 sendmsg() support MSG_SPLICE_PAGES. This causes pages to be spliced from the source iterator if possible, copying the data if not. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells cc: Willem de Bruijn cc: David Ahern cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- include/net/ip.h | 1 + net/ipv4/ip_output.c | 4 ++-- net/ipv6/ip6_output.c | 12 ++++++++++++ 3 files changed, 15 insertions(+), 2 deletions(-) diff --git a/include/net/ip.h b/include/net/ip.h index c3fffaa92d6e..dcbedeffab60 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -211,6 +211,7 @@ int ip_local_out(struct net *net, struct sock *sk, struct sk_buff *skb); int __ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl, __u8 tos); void ip_init(void); +int __ip_splice_pages(struct sock *sk, struct sk_buff *skb, void *from, int *pcopy); int ip_append_data(struct sock *sk, struct flowi4 *fl4, int getfrag(void *from, char *to, int offset, int len, int odd, struct sk_buff *skb), diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 48db7bf475df..5b66c28c2e41 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -960,8 +960,7 @@ csum_page(struct page *page, int offset, int copy) /* * Add (or copy) data pages for MSG_SPLICE_PAGES. */ -static int __ip_splice_pages(struct sock *sk, struct sk_buff *skb, - void *from, int *pcopy) +int __ip_splice_pages(struct sock *sk, struct sk_buff *skb, void *from, int *pcopy) { struct msghdr *msg = from; struct page *page = NULL, **pages = &page; @@ -1010,6 +1009,7 @@ static int __ip_splice_pages(struct sock *sk, struct sk_buff *skb, *pcopy = copy; return 0; } +EXPORT_SYMBOL_GPL(__ip_splice_pages); static int __ip_append_data(struct sock *sk, struct flowi4 *fl4, diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 0b6140f0179d..82846d18cf22 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1589,6 +1589,14 @@ static int __ip6_append_data(struct sock *sk, skb_zcopy_set(skb, uarg, &extra_uref); } } + } else if ((flags & MSG_SPLICE_PAGES) && length) { + if (inet_sk(sk)->hdrincl) + return -EPERM; + if (rt->dst.dev->features & NETIF_F_SG) + /* We need an empty buffer to attach stuff to */ + paged = true; + else + flags &= ~MSG_SPLICE_PAGES; } /* @@ -1778,6 +1786,10 @@ static int __ip6_append_data(struct sock *sk, err = -EFAULT; goto error; } + } else if (flags & MSG_SPLICE_PAGES) { + err = __ip_splice_pages(sk, skb, from, ©); + if (err < 0) + goto error; } else if (!zc) { int i = skb_shinfo(skb)->nr_frags; From patchwork Thu Apr 6 09:42:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 80170 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp898394vqo; Thu, 6 Apr 2023 02:54:32 -0700 (PDT) X-Google-Smtp-Source: AKy350YZdWy45WQ6inV5oRjdmL2RifSFwpCC1S3J/sPuOOYEJLzZy3carGmtykfNBWrIoY27X0Kt X-Received: by 2002:a17:906:ae9a:b0:932:40f4:5c49 with SMTP id md26-20020a170906ae9a00b0093240f45c49mr5738987ejb.67.1680774872144; Thu, 06 Apr 2023 02:54:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680774872; cv=none; d=google.com; s=arc-20160816; b=JsyOeZq2VcwoOeq0JgfqjyJR7nXlv+4lCX0MblC/LpWwZNH9QxhoAyFqiVbeB0Q+XD sldo76MDn01+jQM3YcTBMbc5+ZRoJhzwd4meNljnxMe0xN6bu658eN4q2Z7W0DZjB7EJ QTQUJ+bRLZviiJtgsRvRK0eEdZUEl1qWW3zltgpIsD3Emsm5CxPXHfU5adyn4NF9U3A7 ibBKQLwdwZ6juBvPBYJzfTpn25v4bWUxQXSLb0s6yDwBS96S6BniTO/WwFVfbyrfAdXy l/pDoHR7gEWYDIhkkDWRjbE+THA8GUaT4QcR8WXEpbdfDcrV02BZHPkd4Oz737vK3RBS gzIA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=HO5c6fWLbjU5UW9pwObE4TQgRDIF6vfHUazrp9mJO98=; b=qDiHlwLGz3RaBbWQrR+Af1n0z2C+6SkVoAqB9zlIZYyLKhIx7pojTbVYEilsd/GoxY kFobuonOFJwau1nN8ckV2HKc4FRVBgLMuZC7FdJBEV0IFH2/94bNIkqcWaPp2wzvEjdN 1NloRMnuhQQjs/xIbJ8IclpBRCgzZCr7HhfdPG0PefFdaVSM9gC+1BdMsEm3QQUom6Z6 /NsPb0hmM8Indga+mJZekTdGN08BrkkEveTJtikL3QWzXXimupjyXg5F2T98/7XAM3+o uHYr4AXOH+D8oygzfkqnj/IP3rny9T0U/vVoee/v60F7yF+zZrr3dfoEjlQ5Au+ttaBq 71QA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=CeqZjy+Y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hs31-20020a1709073e9f00b008dfcc758d90si344381ejc.789.2023.04.06.02.54.07; Thu, 06 Apr 2023 02:54:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=CeqZjy+Y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236978AbjDFJqb (ORCPT + 99 others); Thu, 6 Apr 2023 05:46:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53234 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236815AbjDFJpO (ORCPT ); Thu, 6 Apr 2023 05:45:14 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 60B6B8A54 for ; Thu, 6 Apr 2023 02:43:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680774227; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HO5c6fWLbjU5UW9pwObE4TQgRDIF6vfHUazrp9mJO98=; b=CeqZjy+Y+/B5qk8ACfOsCcfRscRi/3F/bDwGhGm4UrYMOIJ0hrdoFRREMHU6svQlMh3wOV TxpU0hW9XQTalxM1WE9BJgkpYp9LteYNPlQSdnuyCqDJZoWA323SgFfxyipes7cmdy0INm shDdt4m9s8Pf5PfJVhusHuBWPvn3GxA= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-122-QJgl81CHMCS3cOzCa4PB8g-1; Thu, 06 Apr 2023 05:43:40 -0400 X-MC-Unique: QJgl81CHMCS3cOzCa4PB8g-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4BC24884340; Thu, 6 Apr 2023 09:43:39 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3C6221121314; Thu, 6 Apr 2023 09:43:37 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , Matthew Wilcox , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, David Ahern Subject: [PATCH net-next v5 16/19] udp: Convert udp_sendpage() to use MSG_SPLICE_PAGES Date: Thu, 6 Apr 2023 10:42:42 +0100 Message-Id: <20230406094245.3633290-17-dhowells@redhat.com> In-Reply-To: <20230406094245.3633290-1-dhowells@redhat.com> References: <20230406094245.3633290-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762420192280882445?= X-GMAIL-MSGID: =?utf-8?q?1762420192280882445?= Convert udp_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather than directly splicing in the pages itself. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells cc: Willem de Bruijn cc: David Ahern cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/ipv4/udp.c | 50 +++++++++----------------------------------------- 1 file changed, 9 insertions(+), 41 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index aa32afd871ee..0d3e78a65f51 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1332,52 +1332,20 @@ EXPORT_SYMBOL(udp_sendmsg); int udp_sendpage(struct sock *sk, struct page *page, int offset, size_t size, int flags) { - struct inet_sock *inet = inet_sk(sk); - struct udp_sock *up = udp_sk(sk); + struct bio_vec bvec; + struct msghdr msg = { + .msg_flags = flags | MSG_SPLICE_PAGES | MSG_MORE + }; int ret; - if (flags & MSG_SENDPAGE_NOTLAST) - flags |= MSG_MORE; + bvec_set_page(&bvec, page, size, offset); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); - if (!up->pending) { - struct msghdr msg = { .msg_flags = flags|MSG_MORE }; - - /* Call udp_sendmsg to specify destination address which - * sendpage interface can't pass. - * This will succeed only when the socket is connected. - */ - ret = udp_sendmsg(sk, &msg, 0); - if (ret < 0) - return ret; - } + if (flags & MSG_SENDPAGE_NOTLAST) + msg.msg_flags |= MSG_MORE; lock_sock(sk); - - if (unlikely(!up->pending)) { - release_sock(sk); - - net_dbg_ratelimited("cork failed\n"); - return -EINVAL; - } - - ret = ip_append_page(sk, &inet->cork.fl.u.ip4, - page, offset, size, flags); - if (ret == -EOPNOTSUPP) { - release_sock(sk); - return sock_no_sendpage(sk->sk_socket, page, offset, - size, flags); - } - if (ret < 0) { - udp_flush_pending_frames(sk); - goto out; - } - - up->len += size; - if (!(READ_ONCE(up->corkflag) || (flags&MSG_MORE))) - ret = udp_push_pending_frames(sk); - if (!ret) - ret = size; -out: + ret = udp_sendmsg(sk, &msg, size); release_sock(sk); return ret; } From patchwork Thu Apr 6 09:42:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 80164 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp896048vqo; Thu, 6 Apr 2023 02:48:29 -0700 (PDT) X-Google-Smtp-Source: AKy350YlAxVL6yv8drOL8cOY0g/Sz+UMG7Ftdc/MALaLTOW1cgY3Q6hixay12sfyt0zoIzh+TLxK X-Received: by 2002:a05:6a20:3baa:b0:cd:ed5c:513 with SMTP id b42-20020a056a203baa00b000cded5c0513mr1925330pzh.31.1680774509316; Thu, 06 Apr 2023 02:48:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680774509; cv=none; d=google.com; s=arc-20160816; b=wW0W0U8zWLYFP7MElsKNfQuOhsAiF4TSHZAopCULFpzYAxjwSTGZ2gc6Yf8m2JLlke RcfArmwW1PydynuoHIgneQrfaObefp7VnfS/XZ+dvNg0/yNGJSjgVk2tqJxZ9ktVEonR fNlpSVcYTUlXAHw15NN2RA483/PrIEevYoZZrd5LQUalRyJm3aRJ+Se8wzvMvb+6s1q0 2nzO71yg25t29GLa3DG70KWdHNdQdEuApZ0xAcdOnscXhkDjyyJwBVu8zSZ8fXzgte7H 6SuT0Xn256xIk3kbwO15ObpxHGfGCZz0omEwWlegWytpG130BoCMeLVUKNtCr39nhAEF vqig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=g8CQfKCAD+XBsRHySlPh9qrpCvVJgKV8Pnbt38JTMEY=; b=FKntygGC34bNQA+dXrx/BfWwnvF3z9LjnffBvtpKEMnIMtFYi6inrPJbTAvSjXices yhs0nQCFX+SR4Mj4jauE4ndZ1HKtKl0XRYixzgFZsAUQOJb5X94YCZb8dBsrQZmV7E1i H0CbV62aICf2IRwyG4ixBa3PWEVWvbLmElfDPgTS2Cjm2Raet9DgejiDHIvZWy31sEFX chiVsMRpIzcPGS/ebF8cll0GQuSTyXI1fPXPilVvLH4CM1/BDhitqhODmzLckb/y72ty Ti7ppmJuTGya52lizKwVMy6Z11acn6E6Psj6tr3QGi0K7iR33elurA7/WvUaYSfv3ljG cTKw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=A2HZF2wJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c25-20020a6566d9000000b004fc21b8bd98si924743pgw.328.2023.04.06.02.48.16; Thu, 06 Apr 2023 02:48:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=A2HZF2wJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236968AbjDFJq2 (ORCPT + 99 others); Thu, 6 Apr 2023 05:46:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51972 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236810AbjDFJpN (ORCPT ); Thu, 6 Apr 2023 05:45:13 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 30C6A86A8 for ; Thu, 6 Apr 2023 02:43:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680774226; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=g8CQfKCAD+XBsRHySlPh9qrpCvVJgKV8Pnbt38JTMEY=; b=A2HZF2wJO4HXK2DKHEfk65CDpNzQwvFtskfvjwB74T6klhRue9HCCUs3XDLrZ9XF43YUJs klxTsqhv764R1bWVOSTh/2EfxlmU5EtrriixiGM6C8NSvwLBuL6bh/npmSp/y41rxjcSJe 01Auq+ApQDBWRgoMUGsCKZ/MlKs9SPU= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-74-HgBLwesAP063aPz_uGijkA-1; Thu, 06 Apr 2023 05:43:43 -0400 X-MC-Unique: HgBLwesAP063aPz_uGijkA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id F0B048996E2; Thu, 6 Apr 2023 09:43:41 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id DD68D40C83AC; Thu, 6 Apr 2023 09:43:39 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , Matthew Wilcox , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, David Ahern Subject: [PATCH net-next v5 17/19] ip: Remove ip_append_page() Date: Thu, 6 Apr 2023 10:42:43 +0100 Message-Id: <20230406094245.3633290-18-dhowells@redhat.com> In-Reply-To: <20230406094245.3633290-1-dhowells@redhat.com> References: <20230406094245.3633290-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762419811951588456?= X-GMAIL-MSGID: =?utf-8?q?1762419811951588456?= ip_append_page() is no longer used with the removal of udp_sendpage(), so remove it. Signed-off-by: David Howells cc: Willem de Bruijn cc: David Ahern cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- include/net/ip.h | 2 - net/ipv4/ip_output.c | 136 ++----------------------------------------- 2 files changed, 4 insertions(+), 134 deletions(-) diff --git a/include/net/ip.h b/include/net/ip.h index dcbedeffab60..8a50341007bf 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -221,8 +221,6 @@ int ip_append_data(struct sock *sk, struct flowi4 *fl4, unsigned int flags); int ip_generic_getfrag(void *from, char *to, int offset, int len, int odd, struct sk_buff *skb); -ssize_t ip_append_page(struct sock *sk, struct flowi4 *fl4, struct page *page, - int offset, size_t size, int flags); struct sk_buff *__ip_make_skb(struct sock *sk, struct flowi4 *fl4, struct sk_buff_head *queue, struct inet_cork *cork); diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 5b66c28c2e41..241a78d82766 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1376,10 +1376,10 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork, } /* - * ip_append_data() and ip_append_page() can make one large IP datagram - * from many pieces of data. Each pieces will be holded on the socket - * until ip_push_pending_frames() is called. Each piece can be a page - * or non-page data. + * ip_append_data() can make one large IP datagram from many pieces of + * data. Each piece will be held on the socket until + * ip_push_pending_frames() is called. Each piece can be a page or + * non-page data. * * Not only UDP, other transport protocols - e.g. raw sockets - can use * this interface potentially. @@ -1412,134 +1412,6 @@ int ip_append_data(struct sock *sk, struct flowi4 *fl4, from, length, transhdrlen, flags); } -ssize_t ip_append_page(struct sock *sk, struct flowi4 *fl4, struct page *page, - int offset, size_t size, int flags) -{ - struct inet_sock *inet = inet_sk(sk); - struct sk_buff *skb; - struct rtable *rt; - struct ip_options *opt = NULL; - struct inet_cork *cork; - int hh_len; - int mtu; - int len; - int err; - unsigned int maxfraglen, fragheaderlen, fraggap, maxnonfragsize; - - if (inet->hdrincl) - return -EPERM; - - if (flags&MSG_PROBE) - return 0; - - if (skb_queue_empty(&sk->sk_write_queue)) - return -EINVAL; - - cork = &inet->cork.base; - rt = (struct rtable *)cork->dst; - if (cork->flags & IPCORK_OPT) - opt = cork->opt; - - if (!(rt->dst.dev->features & NETIF_F_SG)) - return -EOPNOTSUPP; - - hh_len = LL_RESERVED_SPACE(rt->dst.dev); - mtu = cork->gso_size ? IP_MAX_MTU : cork->fragsize; - - fragheaderlen = sizeof(struct iphdr) + (opt ? opt->optlen : 0); - maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen; - maxnonfragsize = ip_sk_ignore_df(sk) ? 0xFFFF : mtu; - - if (cork->length + size > maxnonfragsize - fragheaderlen) { - ip_local_error(sk, EMSGSIZE, fl4->daddr, inet->inet_dport, - mtu - (opt ? opt->optlen : 0)); - return -EMSGSIZE; - } - - skb = skb_peek_tail(&sk->sk_write_queue); - if (!skb) - return -EINVAL; - - cork->length += size; - - while (size > 0) { - /* Check if the remaining data fits into current packet. */ - len = mtu - skb->len; - if (len < size) - len = maxfraglen - skb->len; - - if (len <= 0) { - struct sk_buff *skb_prev; - int alloclen; - - skb_prev = skb; - fraggap = skb_prev->len - maxfraglen; - - alloclen = fragheaderlen + hh_len + fraggap + 15; - skb = sock_wmalloc(sk, alloclen, 1, sk->sk_allocation); - if (unlikely(!skb)) { - err = -ENOBUFS; - goto error; - } - - /* - * Fill in the control structures - */ - skb->ip_summed = CHECKSUM_NONE; - skb->csum = 0; - skb_reserve(skb, hh_len); - - /* - * Find where to start putting bytes. - */ - skb_put(skb, fragheaderlen + fraggap); - skb_reset_network_header(skb); - skb->transport_header = (skb->network_header + - fragheaderlen); - if (fraggap) { - skb->csum = skb_copy_and_csum_bits(skb_prev, - maxfraglen, - skb_transport_header(skb), - fraggap); - skb_prev->csum = csum_sub(skb_prev->csum, - skb->csum); - pskb_trim_unique(skb_prev, maxfraglen); - } - - /* - * Put the packet on the pending queue. - */ - __skb_queue_tail(&sk->sk_write_queue, skb); - continue; - } - - if (len > size) - len = size; - - if (skb_append_pagefrags(skb, page, offset, len)) { - err = -EMSGSIZE; - goto error; - } - - if (skb->ip_summed == CHECKSUM_NONE) { - __wsum csum; - csum = csum_page(page, offset, len); - skb->csum = csum_block_add(skb->csum, csum, skb->len); - } - - skb_len_add(skb, len); - refcount_add(len, &sk->sk_wmem_alloc); - offset += len; - size -= len; - } - return 0; - -error: - cork->length -= size; - IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTDISCARDS); - return err; -} - static void ip_cork_release(struct inet_cork *cork) { cork->flags &= ~IPCORK_OPT; From patchwork Thu Apr 6 09:42:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 80169 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp898382vqo; Thu, 6 Apr 2023 02:54:29 -0700 (PDT) X-Google-Smtp-Source: AKy350ZbDcA3d7v6m5cQX7sF4c6B2kNb/u1Dnc5DeQ16fXxOhMrclPJTqs1cU9FdeOCzYEq6Qfax X-Received: by 2002:a17:907:c785:b0:947:630b:35f8 with SMTP id tz5-20020a170907c78500b00947630b35f8mr6417214ejc.17.1680774869313; Thu, 06 Apr 2023 02:54:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680774869; cv=none; d=google.com; s=arc-20160816; b=BBVVaevMDW95XcANJhiHQ5z6R/j4yBqRyD2WUjF/SxTJFKQSlUujvHgWx8yol/hM/6 DxQO3SvcK8aM+2n7FurBBPZ9OIdmZF/jYhT9l86lj5EQnMv0Lzj4yHoklzG9WHhuqYaa 4jrfn98uq3ilprL8gOejaU+LIbXgT5BGZbuHGiIxRON7d4nByKrQFv1/IkmGVJ6sCa7O +Ei6DALxSSPXTQSoHVxYhrfTFpZ2INdO2y25e95URWsMFutEZvOP90j+e26daYgvR9IC 39J55VUhcYWSZ5yJg+f5RDaIsL2iaeTbIak3tjUTKJyTd0i2RPhnFbTgjySAZ7qxqx4A LNwA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=5yIMxPSUSHRsbpXFBHBFC5Cox+6AF/paYR4bJAbS7VI=; b=G1+JcCEEj/EWPuZGFDqWTDoF8zeIsUDfQkO+iNBVBrkheLu9M0CRPWWQ/J0kypZOs2 B35eQ0f8iSk6J3OfdE5+jYnmycDL7NFQKZ3axozL8Lta4HW4VkZgssm1WnDej8bN7j0K HHSJrhASZsCUHFtI24CmX11CoYc31HtkOEy+zncH70qGY+s3Gc0Mezej7D9REowS/JTa c9AaJvVQJ8VTogkn033ZX6vLHDmRPGCWYP3TSVYMpuzom7zqwmmT8Yib6s4RHgWIecHD fjF9oQ0iY2KZKnQbjt/mQEiffGR1+nTYiLf8W0gAgwqBSBg7z+6GGnskAeeHQ0fjTrvK ZKMg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=L37QMR98; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ec20-20020a170906b6d400b00948aae5e39fsi807917ejb.750.2023.04.06.02.54.05; Thu, 06 Apr 2023 02:54:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=L37QMR98; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235933AbjDFJq5 (ORCPT + 99 others); Thu, 6 Apr 2023 05:46:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54080 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236822AbjDFJpO (ORCPT ); Thu, 6 Apr 2023 05:45:14 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 761428A6C for ; Thu, 6 Apr 2023 02:43:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680774228; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5yIMxPSUSHRsbpXFBHBFC5Cox+6AF/paYR4bJAbS7VI=; b=L37QMR98tZXNJBXIX4KUnDbKUALRljT7kQo+MXdO1bZv32bl++TK3//45OQ+6tFpLuozs0 Q3c+I6lbIVt9Q2I0yBXbwy87ZkAXMC+C26R/cRpH+Sotkh/QnkJQpnShxNCa8v3ELTxdPJ XMD7HWxqx//AyetSZzDzYzqxr11AJPQ= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-319-vnx3onwwNa-Pcv1t3H8O3A-1; Thu, 06 Apr 2023 05:43:45 -0400 X-MC-Unique: vnx3onwwNa-Pcv1t3H8O3A-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B2F60811E7C; Thu, 6 Apr 2023 09:43:44 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id A25F92166B26; Thu, 6 Apr 2023 09:43:42 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , Matthew Wilcox , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Kuniyuki Iwashima Subject: [PATCH net-next v5 18/19] af_unix: Support MSG_SPLICE_PAGES Date: Thu, 6 Apr 2023 10:42:44 +0100 Message-Id: <20230406094245.3633290-19-dhowells@redhat.com> In-Reply-To: <20230406094245.3633290-1-dhowells@redhat.com> References: <20230406094245.3633290-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762420189369029581?= X-GMAIL-MSGID: =?utf-8?q?1762420189369029581?= Make AF_UNIX sendmsg() support MSG_SPLICE_PAGES, splicing in pages from the source iterator if possible and copying the data in otherwise. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Kuniyuki Iwashima cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/unix/af_unix.c | 93 ++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 77 insertions(+), 16 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index fb31e8a4409e..fee431a089d3 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -2157,6 +2157,53 @@ static int queue_oob(struct socket *sock, struct msghdr *msg, struct sock *other } #endif +/* + * Extract pages from an iterator and add them to the socket buffer. + */ +static ssize_t unix_extract_bvec_to_skb(struct sk_buff *skb, + struct iov_iter *iter, ssize_t maxsize) +{ + struct page *pages[8], **ppages = pages; + unsigned int i, nr; + ssize_t ret = 0; + + while (iter->count > 0) { + size_t off, len; + + nr = min_t(size_t, MAX_SKB_FRAGS - skb_shinfo(skb)->nr_frags, + ARRAY_SIZE(pages)); + if (nr == 0) + break; + + len = iov_iter_extract_pages(iter, &ppages, maxsize, nr, 0, &off); + if (len <= 0) { + if (!ret) + ret = len ?: -EIO; + break; + } + + i = 0; + do { + size_t part = min_t(size_t, PAGE_SIZE - off, len); + + if (skb_append_pagefrags(skb, pages[i++], off, part) < 0) { + if (!ret) + ret = -EMSGSIZE; + goto out; + } + off = 0; + ret += part; + maxsize -= part; + len -= part; + } while (len > 0); + if (maxsize <= 0) + break; + } + +out: + return ret; +} + static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) { @@ -2200,19 +2247,25 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg, while (sent < len) { size = len - sent; - /* Keep two messages in the pipe so it schedules better */ - size = min_t(int, size, (sk->sk_sndbuf >> 1) - 64); + if (unlikely(msg->msg_flags & MSG_SPLICE_PAGES)) { + skb = sock_alloc_send_pskb(sk, 0, 0, + msg->msg_flags & MSG_DONTWAIT, + &err, 0); + } else { + /* Keep two messages in the pipe so it schedules better */ + size = min_t(int, size, (sk->sk_sndbuf >> 1) - 64); - /* allow fallback to order-0 allocations */ - size = min_t(int, size, SKB_MAX_HEAD(0) + UNIX_SKB_FRAGS_SZ); + /* allow fallback to order-0 allocations */ + size = min_t(int, size, SKB_MAX_HEAD(0) + UNIX_SKB_FRAGS_SZ); - data_len = max_t(int, 0, size - SKB_MAX_HEAD(0)); + data_len = max_t(int, 0, size - SKB_MAX_HEAD(0)); - data_len = min_t(size_t, size, PAGE_ALIGN(data_len)); + data_len = min_t(size_t, size, PAGE_ALIGN(data_len)); - skb = sock_alloc_send_pskb(sk, size - data_len, data_len, - msg->msg_flags & MSG_DONTWAIT, &err, - get_order(UNIX_SKB_FRAGS_SZ)); + skb = sock_alloc_send_pskb(sk, size - data_len, data_len, + msg->msg_flags & MSG_DONTWAIT, &err, + get_order(UNIX_SKB_FRAGS_SZ)); + } if (!skb) goto out_err; @@ -2224,13 +2277,21 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg, } fds_sent = true; - skb_put(skb, size - data_len); - skb->data_len = data_len; - skb->len = size; - err = skb_copy_datagram_from_iter(skb, 0, &msg->msg_iter, size); - if (err) { - kfree_skb(skb); - goto out_err; + if (unlikely(msg->msg_flags & MSG_SPLICE_PAGES)) { + size = unix_extract_bvec_to_skb(skb, &msg->msg_iter, size); + skb->data_len += size; + skb->len += size; + skb->truesize += size; + refcount_add(size, &sk->sk_wmem_alloc); + } else { + skb_put(skb, size - data_len); + skb->data_len = data_len; + skb->len = size; + err = skb_copy_datagram_from_iter(skb, 0, &msg->msg_iter, size); + if (err) { + kfree_skb(skb); + goto out_err; + } } unix_state_lock(other); From patchwork Thu Apr 6 09:42:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 80168 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp897922vqo; Thu, 6 Apr 2023 02:53:12 -0700 (PDT) X-Google-Smtp-Source: AKy350a5eJf7+0PK0ZbAQTHkZag0MLLNO5RcfdDRRqS9fi4ItBBF8HzT1s5qAsDOmAX6xG/ek4wN X-Received: by 2002:aa7:c303:0:b0:4fc:5d56:f91d with SMTP id l3-20020aa7c303000000b004fc5d56f91dmr4147793edq.18.1680774792489; Thu, 06 Apr 2023 02:53:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680774792; cv=none; d=google.com; s=arc-20160816; b=AkjCH/GTANpakJ61r1sO6PqHky/JPM9fecz0dZUozM0f9J5EBxaknaZdGoSSvZcKfy AJZfwJdD1vr3ZWyzO0pLmf+NvN5tCFHCLwJZ4+rwueMFUntkJfkEFFyO0LFAwAnH6Lnj qvb62ijGn4mBXQdiWEFNPsxX/RNY8eTHwUeajZ6R6D182vUam5KqnK/DHP0imfNIU9N3 bvi0nIXconZUTZvz8bn9ucutns4rPD3wSPUtc58VZ4cSIncPL5GI0VKvVer+1E17MmtV fvRQNHDJeMPjm+L4bv9juVkPOZ5xDDEfbIys+9ycJeML/ur4PadJIhqtNxYzmqOvit0f p/AQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ryfphnd6F+ChdmyXxRDqpuzs5jttBzLaOe0KjWoTLBo=; b=L57OQx66ihNWi5qy0qzJEMX6iRV7KG1pjbwzC3bxKJV93oWkf6AArghP05tXia7K/T q9qtJEdQVevbW5bWznrDOOpLnSPZoPMXbEuWaf3fQIUbIkmzu81epdM9qnjCRy8Y3hEE mZFPAmQIxWd7hbWAgZIrmt4/dEEDdmDsXBAw3zL+mMqdV1MbFzn8BM62gDBVXPb9UYhm wCJZvptiQjxNY8NnS1DM4J6k7u+3uMTVm9pPe655AiaPdLu8Bn29Jew4zodGZlz2vbPz PBWqWabiR8KlDUtzoEtC7/7OeI2d/RenppgVSrtJ3eLEdGA8jM+EN32eGhhzqUFS3rX/ HRnw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=MQvKCs8m; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e21-20020a056402089500b004fba129c19dsi905513edy.597.2023.04.06.02.52.48; Thu, 06 Apr 2023 02:53:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=MQvKCs8m; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235753AbjDFJqy (ORCPT + 99 others); Thu, 6 Apr 2023 05:46:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54650 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236853AbjDFJph (ORCPT ); Thu, 6 Apr 2023 05:45:37 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2E9658A6D for ; Thu, 6 Apr 2023 02:43:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680774232; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ryfphnd6F+ChdmyXxRDqpuzs5jttBzLaOe0KjWoTLBo=; b=MQvKCs8mkXJ1+Nu0exZyifVRViDjKBn8j4IdisT/+thaSz10BwMBZynFRE3AF49XZsi88T Xn0l/AldfnmumRu1TqSoYwvfKp8EMYgx+d3BDyM5gzcGaNM8OxKzfrjaJnqwrQvTjoOlmT wXhMGAEKMhPJKGgRrvFfaMxoKTxLvaM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-316-BqhN1L0dM6OPW16vDF8pVA-1; Thu, 06 Apr 2023 05:43:48 -0400 X-MC-Unique: BqhN1L0dM6OPW16vDF8pVA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 61D36185A78F; Thu, 6 Apr 2023 09:43:47 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4E84618EC6; Thu, 6 Apr 2023 09:43:45 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , Matthew Wilcox , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Kuniyuki Iwashima Subject: [PATCH net-next v5 19/19] af_unix: Make sendmsg(MSG_SPLICE_PAGES) copy unspliceable data Date: Thu, 6 Apr 2023 10:42:45 +0100 Message-Id: <20230406094245.3633290-20-dhowells@redhat.com> In-Reply-To: <20230406094245.3633290-1-dhowells@redhat.com> References: <20230406094245.3633290-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762420108939065032?= X-GMAIL-MSGID: =?utf-8?q?1762420108939065032?= If sendmsg() with MSG_SPLICE_PAGES encounters a page that shouldn't be spliced - a slab page, for instance, or one with a zero count - make unix_extract_bvec_to_skb() copy it. Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Kuniyuki Iwashima cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/unix/af_unix.c | 44 +++++++++++++++++++++++++++++++++----------- 1 file changed, 33 insertions(+), 11 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index fee431a089d3..6941be8dae7e 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -2160,12 +2160,12 @@ static int queue_oob(struct socket *sock, struct msghdr *msg, struct sock *other /* * Extract pages from an iterator and add them to the socket buffer. */ -static ssize_t unix_extract_bvec_to_skb(struct sk_buff *skb, - struct iov_iter *iter, ssize_t maxsize) +static ssize_t unix_extract_bvec_to_skb(struct sk_buff *skb, struct iov_iter *iter, + ssize_t maxsize, gfp_t gfp) { struct page *pages[8], **ppages = pages; unsigned int i, nr; - ssize_t ret = 0; + ssize_t spliced = 0, ret = 0; while (iter->count > 0) { size_t off, len; @@ -2177,31 +2177,52 @@ static ssize_t unix_extract_bvec_to_skb(struct sk_buff *skb, len = iov_iter_extract_pages(iter, &ppages, maxsize, nr, 0, &off); if (len <= 0) { - if (!ret) - ret = len ?: -EIO; + ret = len ?: -EIO; break; } i = 0; do { + struct page *page = pages[i++]; size_t part = min_t(size_t, PAGE_SIZE - off, len); + bool put = false; + + if (!sendpage_ok(page)) { + const void *p = kmap_local_page(page); + void *q; + + q = page_frag_memdup(NULL, p + off, part, gfp, + ULONG_MAX); + kunmap_local(p); + if (!q) { + iov_iter_revert(iter, len); + ret = -ENOMEM; + goto out; + } + page = virt_to_page(q); + off = offset_in_page(q); + put = true; + } - if (skb_append_pagefrags(skb, pages[i++], off, part) < 0) { - if (!ret) - ret = -EMSGSIZE; + ret = skb_append_pagefrags(skb, page, off, part); + if (put) + put_page(page); + if (ret < 0) { + iov_iter_revert(iter, len); goto out; } off = 0; - ret += part; + spliced += part; maxsize -= part; len -= part; } while (len > 0); + if (maxsize <= 0) break; } out: - return ret; + return spliced ?: ret; } static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg, @@ -2278,7 +2299,8 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg, fds_sent = true; if (unlikely(msg->msg_flags & MSG_SPLICE_PAGES)) { - size = unix_extract_bvec_to_skb(skb, &msg->msg_iter, size); + size = unix_extract_bvec_to_skb(skb, &msg->msg_iter, size, + sk->sk_allocation); skb->data_len += size; skb->len += size; skb->truesize += size;