From patchwork Wed May 24 15:33:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 98561 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6358:3046:b0:115:7a1d:dabb with SMTP id p6csp2914351rwl; Wed, 24 May 2023 08:41:33 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6vjTPdLxCOqVipbywW+Ra/Qx1lypqGDfsYikwnwDN9FrW00DwwHs83A9O2QiYywsk52IjP X-Received: by 2002:a17:90a:c218:b0:24b:7618:2d16 with SMTP id e24-20020a17090ac21800b0024b76182d16mr17113930pjt.31.1684942893677; Wed, 24 May 2023 08:41:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684942893; cv=none; d=google.com; s=arc-20160816; b=tI84YnMB9WszO2rms1qSDWdLxYuwC2FkOk/eFNa0mmK2HNr2ASdQe3ySCVrMMkn1pn rLx65Q8vEMGOpg6nuitLtxLo9iD4yNFGg8We2deeaqJKElJPEJAcjRpzTlrwosb5ufOl JivGinG6wwXQhDsnqcVYs46q7vmh8zlqypvnYha40AWt/jwn55F95bBeix0y4kBAk7Kn ijf8bpU4D1laqSaPrU6KM/MewFA8s+I7m8RtHnxLUJILVlREBmsdgzvpJwh13zgqZo/U CwMYnAtNc8rIlLc1iKmfvVFu0HNPIEW2JMlD5oemeffONHbzaUOUcWivTFG9i8s7izRh m1Hw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=VHLRJgykEA+vivKsr5PIe4LNQi6791a3nIaeRR1gBwE=; b=ZniCU0sxKSDf+89fx3Ftj+G0PRqmEFWUsvlRgEiJPNoyoWNBisCOw5uFrdGyp+s98t WX+/xluHgQAS2ltQLtkKEhaEbakO9nlurWBgq3bwxSB8OGhdT1Zbrqkq+JIBkYJwbw3D CV947QESEJpbj5APF8gqEjobGM7FIJH0z3wkr4E55ma4ZD90VFSlcl/Wt69aYxat+aKi z4TnrYpXd+r+UnGVslvNnuSwOkvJ+2Eu0bOf+sJk6ai3R+8mC5jjP+Kh6emy+OIZe2ec cO4kycl/xVgYB4pzuKlbPMXPFSQjgZG5hS93pjbmzKKOZKZ9/KrhVbi+BARgdstH2K55 Klog== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=cOeU669A; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z2-20020a63ac42000000b00535540cc8ccsi1540677pgn.375.2023.05.24.08.41.21; Wed, 24 May 2023 08:41:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=cOeU669A; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236358AbjEXPfF (ORCPT + 99 others); Wed, 24 May 2023 11:35:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33216 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236242AbjEXPen (ORCPT ); Wed, 24 May 2023 11:34:43 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4685310D7 for ; Wed, 24 May 2023 08:33:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684942402; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VHLRJgykEA+vivKsr5PIe4LNQi6791a3nIaeRR1gBwE=; b=cOeU669AI18MExKi4R3dOb1S16gAjbzK+GnxT9Ya7tlZePtAi8qkouUduAs7d6WZxO2HPs d6IHOHUrJSxshxOO3pnhZIRZHo2a8uL+0UJns/Co+84UyEv9nQmWcwgRfzRN/MB9is3d/9 oG13Y68oiUIvWn/oS17NjTzh8wu53OE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-283-sw8TgtPTPZKZI2FxAmu1nw-1; Wed, 24 May 2023 11:33:18 -0400 X-MC-Unique: sw8TgtPTPZKZI2FxAmu1nw-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 317F8185A793; Wed, 24 May 2023 15:33:17 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 60D09492B00; Wed, 24 May 2023 15:33:15 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Subject: [PATCH net-next 01/12] mm: Move the page fragment allocator from page_alloc.c into its own file Date: Wed, 24 May 2023 16:33:00 +0100 Message-Id: <20230524153311.3625329-2-dhowells@redhat.com> In-Reply-To: <20230524153311.3625329-1-dhowells@redhat.com> References: <20230524153311.3625329-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766790680082466876?= X-GMAIL-MSGID: =?utf-8?q?1766790680082466876?= Move the page fragment allocator from page_alloc.c into its own file preparatory to changing it. Signed-off-by: David Howells cc: Andrew Morton cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: linux-mm@kvack.org cc: netdev@vger.kernel.org --- mm/Makefile | 2 +- mm/page_alloc.c | 126 ----------------------------------------- mm/page_frag_alloc.c | 131 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 132 insertions(+), 127 deletions(-) create mode 100644 mm/page_frag_alloc.c diff --git a/mm/Makefile b/mm/Makefile index e29afc890cde..0daa4c6f4552 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -51,7 +51,7 @@ obj-y := filemap.o mempool.o oom_kill.o fadvise.o \ readahead.o swap.o truncate.o vmscan.o shmem.o \ util.o mmzone.o vmstat.o backing-dev.o \ mm_init.o percpu.o slab_common.o \ - compaction.o \ + compaction.o page_frag_alloc.o \ interval_tree.o list_lru.o workingset.o \ debug.o gup.o mmap_lock.o $(mmu-y) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 47421bedc12b..29dc79dbeb22 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4871,132 +4871,6 @@ void free_pages(unsigned long addr, unsigned int order) EXPORT_SYMBOL(free_pages); -/* - * Page Fragment: - * An arbitrary-length arbitrary-offset area of memory which resides - * within a 0 or higher order page. Multiple fragments within that page - * are individually refcounted, in the page's reference counter. - * - * The page_frag functions below provide a simple allocation framework for - * page fragments. This is used by the network stack and network device - * drivers to provide a backing region of memory for use as either an - * sk_buff->head, or to be used in the "frags" portion of skb_shared_info. - */ -static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, - gfp_t gfp_mask) -{ - struct page *page = NULL; - gfp_t gfp = gfp_mask; - -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - gfp_mask |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY | - __GFP_NOMEMALLOC; - page = alloc_pages_node(NUMA_NO_NODE, gfp_mask, - PAGE_FRAG_CACHE_MAX_ORDER); - nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE; -#endif - if (unlikely(!page)) - page = alloc_pages_node(NUMA_NO_NODE, gfp, 0); - - nc->va = page ? page_address(page) : NULL; - - return page; -} - -void __page_frag_cache_drain(struct page *page, unsigned int count) -{ - VM_BUG_ON_PAGE(page_ref_count(page) == 0, page); - - if (page_ref_sub_and_test(page, count)) - free_the_page(page, compound_order(page)); -} -EXPORT_SYMBOL(__page_frag_cache_drain); - -void *page_frag_alloc_align(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask, - unsigned int align_mask) -{ - unsigned int size = PAGE_SIZE; - struct page *page; - int offset; - - if (unlikely(!nc->va)) { -refill: - page = __page_frag_cache_refill(nc, gfp_mask); - if (!page) - return NULL; - -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - /* if size can vary use size else just use PAGE_SIZE */ - size = nc->size; -#endif - /* Even if we own the page, we do not use atomic_set(). - * This would break get_page_unless_zero() users. - */ - page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE); - - /* reset page count bias and offset to start of new frag */ - nc->pfmemalloc = page_is_pfmemalloc(page); - nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; - nc->offset = size; - } - - offset = nc->offset - fragsz; - if (unlikely(offset < 0)) { - page = virt_to_page(nc->va); - - if (!page_ref_sub_and_test(page, nc->pagecnt_bias)) - goto refill; - - if (unlikely(nc->pfmemalloc)) { - free_the_page(page, compound_order(page)); - goto refill; - } - -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - /* if size can vary use size else just use PAGE_SIZE */ - size = nc->size; -#endif - /* OK, page count is 0, we can safely set it */ - set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1); - - /* reset page count bias and offset to start of new frag */ - nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; - offset = size - fragsz; - if (unlikely(offset < 0)) { - /* - * The caller is trying to allocate a fragment - * with fragsz > PAGE_SIZE but the cache isn't big - * enough to satisfy the request, this may - * happen in low memory conditions. - * We don't release the cache page because - * it could make memory pressure worse - * so we simply return NULL here. - */ - return NULL; - } - } - - nc->pagecnt_bias--; - offset &= align_mask; - nc->offset = offset; - - return nc->va + offset; -} -EXPORT_SYMBOL(page_frag_alloc_align); - -/* - * Frees a page fragment allocated out of either a compound or order 0 page. - */ -void page_frag_free(void *addr) -{ - struct page *page = virt_to_head_page(addr); - - if (unlikely(put_page_testzero(page))) - free_the_page(page, compound_order(page)); -} -EXPORT_SYMBOL(page_frag_free); - static void *make_alloc_exact(unsigned long addr, unsigned int order, size_t size) { diff --git a/mm/page_frag_alloc.c b/mm/page_frag_alloc.c new file mode 100644 index 000000000000..bee95824ef8f --- /dev/null +++ b/mm/page_frag_alloc.c @@ -0,0 +1,131 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Page fragment allocator + * + * Page Fragment: + * An arbitrary-length arbitrary-offset area of memory which resides within a + * 0 or higher order page. Multiple fragments within that page are + * individually refcounted, in the page's reference counter. + * + * The page_frag functions provide a simple allocation framework for page + * fragments. This is used by the network stack and network device drivers to + * provide a backing region of memory for use as either an sk_buff->head, or to + * be used in the "frags" portion of skb_shared_info. + */ + +#include +#include +#include + +static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, + gfp_t gfp_mask) +{ + struct page *page = NULL; + gfp_t gfp = gfp_mask; + +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + gfp_mask |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY | + __GFP_NOMEMALLOC; + page = alloc_pages_node(NUMA_NO_NODE, gfp_mask, + PAGE_FRAG_CACHE_MAX_ORDER); + nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE; +#endif + if (unlikely(!page)) + page = alloc_pages_node(NUMA_NO_NODE, gfp, 0); + + nc->va = page ? page_address(page) : NULL; + + return page; +} + +void __page_frag_cache_drain(struct page *page, unsigned int count) +{ + VM_BUG_ON_PAGE(page_ref_count(page) == 0, page); + + if (page_ref_sub_and_test(page, count - 1)) + __free_pages(page, compound_order(page)); +} +EXPORT_SYMBOL(__page_frag_cache_drain); + +void *page_frag_alloc_align(struct page_frag_cache *nc, + unsigned int fragsz, gfp_t gfp_mask, + unsigned int align_mask) +{ + unsigned int size = PAGE_SIZE; + struct page *page; + int offset; + + if (unlikely(!nc->va)) { +refill: + page = __page_frag_cache_refill(nc, gfp_mask); + if (!page) + return NULL; + +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + /* if size can vary use size else just use PAGE_SIZE */ + size = nc->size; +#endif + /* Even if we own the page, we do not use atomic_set(). + * This would break get_page_unless_zero() users. + */ + page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE); + + /* reset page count bias and offset to start of new frag */ + nc->pfmemalloc = page_is_pfmemalloc(page); + nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; + nc->offset = size; + } + + offset = nc->offset - fragsz; + if (unlikely(offset < 0)) { + page = virt_to_page(nc->va); + + if (page_ref_count(page) != nc->pagecnt_bias) + goto refill; + if (unlikely(nc->pfmemalloc)) { + page_ref_sub(page, nc->pagecnt_bias - 1); + __free_pages(page, compound_order(page)); + goto refill; + } + +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + /* if size can vary use size else just use PAGE_SIZE */ + size = nc->size; +#endif + /* OK, page count is 0, we can safely set it */ + set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1); + + /* reset page count bias and offset to start of new frag */ + nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; + offset = size - fragsz; + if (unlikely(offset < 0)) { + /* + * The caller is trying to allocate a fragment + * with fragsz > PAGE_SIZE but the cache isn't big + * enough to satisfy the request, this may + * happen in low memory conditions. + * We don't release the cache page because + * it could make memory pressure worse + * so we simply return NULL here. + */ + return NULL; + } + } + + nc->pagecnt_bias--; + offset &= align_mask; + nc->offset = offset; + + return nc->va + offset; +} +EXPORT_SYMBOL(page_frag_alloc_align); + +/* + * Frees a page fragment allocated out of either a compound or order 0 page. + */ +void page_frag_free(void *addr) +{ + struct page *page = virt_to_head_page(addr); + + __free_pages(page, compound_order(page)); +} +EXPORT_SYMBOL(page_frag_free); From patchwork Wed May 24 15:33:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 98559 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6358:3046:b0:115:7a1d:dabb with SMTP id p6csp2914178rwl; Wed, 24 May 2023 08:41:16 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6gJEodzxNeAJ6DO9lKvt2chBtFU0hyOUMEi2mfX8M8S60JMnBEcw4p2EGd6t4P44+50opp X-Received: by 2002:a17:903:2288:b0:1af:e8cf:701c with SMTP id b8-20020a170903228800b001afe8cf701cmr1295104plh.66.1684942876645; Wed, 24 May 2023 08:41:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684942876; cv=none; d=google.com; s=arc-20160816; b=CJ4Ccfo7sychZN/11TPMUMVFu4i2pEv+0lUuqgEwm/zXCGBjRmlz+mNSlUNtOHp4sX zIwJyQ9T3u5MyJToGdvoQIL0Q1K6CjKThrcs173r+S2ynvPPEV4KvT3AOF5DTaYHPQp1 bcYEIL4SR4wcnVhuF+hL6BEaMOFY9dL8ul45a4C5myoHx8A060S9ndV1CBNbi0aoZx6R eH4xK62fSP7WoiRt8vTsbx4xZ0pPTyORy3iYz2YKb7JSsWyOgm48x4dzprTLSfeOR+Rh /9e75mOzfm0/bNfvF8B/lcgpO1iuIGx68XaZNs+wy4hfFq1f3f2nKHeFu2OQL82dgMVG G3mQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=5Xt/kb1I9tBW/zHVjBQrREyGjijOb+tpZUMT6BFY+/c=; b=igce8Px4ERrXgEx1ydHrdaKMpSAQievl7mQMwb1vYRpqnYain3FwqJoEttdFzMtTcc 6g4oWAOXWCVWbSufA/Tl/VVn1hA7JPyFIX6qXog0wQa9IwZG16vE7mbYGTQgvg8ok7AN 9oNGR38DoaHQsXKB6wC6aNHTw3dWw/vyE4Y+x0ThUYwaztncY/J4g1Z1kz/mxNgVrfPD StV4jG+3I/FIIBhCJ53vu/EgNQe7cP8j7VdWJhEvTjQIInj3xdnvJV+d2bbGxRAb01Wo Y4hr4KeRheDroHKhf0pJL7T1jAy0p3S+16lveJv11E4Rx0JaSMsSI+5jhKhTx++6zGiT 8l2Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HZ8Of0rK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g23-20020a1709029f9700b001a97bf417cdsi8414907plq.571.2023.05.24.08.41.04; Wed, 24 May 2023 08:41:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HZ8Of0rK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236376AbjEXPf0 (ORCPT + 99 others); Wed, 24 May 2023 11:35:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33402 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236640AbjEXPe5 (ORCPT ); Wed, 24 May 2023 11:34:57 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E20CCE4B for ; Wed, 24 May 2023 08:33:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684942408; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5Xt/kb1I9tBW/zHVjBQrREyGjijOb+tpZUMT6BFY+/c=; b=HZ8Of0rK1hyiWfcfsq/n1TQ4H44jwzSuyskoP3Xryi302meqeGzHEvoVATpV9YiR2GNG+6 CIcHInZD4ckVVi5gM2ajw2g4GvRoZ0siIranHiFJmQZWD6ZyvnrkqcJJZP0HV41w3hoQFK ni9sDY+5Z2+vodB0va9kc1KrZ5khZQM= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-147-h5Ilq50YMBixAqm1S8Ipag-1; Wed, 24 May 2023 11:33:24 -0400 X-MC-Unique: h5Ilq50YMBixAqm1S8Ipag-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 52D53280BC8A; Wed, 24 May 2023 15:33:22 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 24FB920296C8; Wed, 24 May 2023 15:33:18 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jeroen de Borst , Catherine Sullivan , Shailend Chand , Felix Fietkau , John Crispin , Sean Wang , Mark Lee , Lorenzo Bianconi , Matthias Brugger , AngeloGioacchino Del Regno , Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , Chaitanya Kulkarni , Andrew Morton , linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, linux-nvme@lists.infradead.org Subject: [PATCH net-next 02/12] mm: Provide a page_frag_cache allocator cleanup function Date: Wed, 24 May 2023 16:33:01 +0100 Message-Id: <20230524153311.3625329-3-dhowells@redhat.com> In-Reply-To: <20230524153311.3625329-1-dhowells@redhat.com> References: <20230524153311.3625329-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766790661372845702?= X-GMAIL-MSGID: =?utf-8?q?1766790661372845702?= Provide a function to clean up a page_frag_cache allocator rather than doing it manually each time. Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Jeroen de Borst cc: Catherine Sullivan cc: Shailend Chand cc: Felix Fietkau cc: John Crispin cc: Sean Wang cc: Mark Lee cc: Lorenzo Bianconi cc: Matthias Brugger cc: AngeloGioacchino Del Regno cc: Keith Busch cc: Jens Axboe cc: Christoph Hellwig cc: Sagi Grimberg cc: Chaitanya Kulkarni cc: Andrew Morton cc: Matthew Wilcox cc: netdev@vger.kernel.org cc: linux-arm-kernel@lists.infradead.org cc: linux-mediatek@lists.infradead.org cc: linux-nvme@lists.infradead.org cc: linux-mm@kvack.org --- drivers/net/ethernet/google/gve/gve_main.c | 11 ++--------- drivers/net/ethernet/mediatek/mtk_wed_wo.c | 17 ++--------------- drivers/nvme/host/tcp.c | 8 +------- drivers/nvme/target/tcp.c | 5 +---- include/linux/gfp.h | 2 ++ mm/page_frag_alloc.c | 17 +++++++++++++++++ 6 files changed, 25 insertions(+), 35 deletions(-) diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c index 8fb70db63b8b..55feab29bed9 100644 --- a/drivers/net/ethernet/google/gve/gve_main.c +++ b/drivers/net/ethernet/google/gve/gve_main.c @@ -1251,17 +1251,10 @@ static void gve_unreg_xdp_info(struct gve_priv *priv) static void gve_drain_page_cache(struct gve_priv *priv) { - struct page_frag_cache *nc; int i; - for (i = 0; i < priv->rx_cfg.num_queues; i++) { - nc = &priv->rx[i].page_cache; - if (nc->va) { - __page_frag_cache_drain(virt_to_page(nc->va), - nc->pagecnt_bias); - nc->va = NULL; - } - } + for (i = 0; i < priv->rx_cfg.num_queues; i++) + page_frag_cache_clear(&priv->rx[i].page_cache); } static int gve_open(struct net_device *dev) diff --git a/drivers/net/ethernet/mediatek/mtk_wed_wo.c b/drivers/net/ethernet/mediatek/mtk_wed_wo.c index 69fba29055e9..d90fea2c7d04 100644 --- a/drivers/net/ethernet/mediatek/mtk_wed_wo.c +++ b/drivers/net/ethernet/mediatek/mtk_wed_wo.c @@ -286,7 +286,6 @@ mtk_wed_wo_queue_free(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q) static void mtk_wed_wo_queue_tx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q) { - struct page *page; int i; for (i = 0; i < q->n_desc; i++) { @@ -298,19 +297,12 @@ mtk_wed_wo_queue_tx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q) entry->buf = NULL; } - if (!q->cache.va) - return; - - page = virt_to_page(q->cache.va); - __page_frag_cache_drain(page, q->cache.pagecnt_bias); - memset(&q->cache, 0, sizeof(q->cache)); + page_frag_cache_clear(&q->cache); } static void mtk_wed_wo_queue_rx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q) { - struct page *page; - for (;;) { void *buf = mtk_wed_wo_dequeue(wo, q, NULL, true); @@ -320,12 +312,7 @@ mtk_wed_wo_queue_rx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q) skb_free_frag(buf); } - if (!q->cache.va) - return; - - page = virt_to_page(q->cache.va); - __page_frag_cache_drain(page, q->cache.pagecnt_bias); - memset(&q->cache, 0, sizeof(q->cache)); + page_frag_cache_clear(&q->cache); } static void diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index bf0230442d57..dcc35f6bff8c 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -1315,7 +1315,6 @@ static int nvme_tcp_alloc_async_req(struct nvme_tcp_ctrl *ctrl) static void nvme_tcp_free_queue(struct nvme_ctrl *nctrl, int qid) { - struct page *page; struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl); struct nvme_tcp_queue *queue = &ctrl->queues[qid]; unsigned int noreclaim_flag; @@ -1326,12 +1325,7 @@ static void nvme_tcp_free_queue(struct nvme_ctrl *nctrl, int qid) if (queue->hdr_digest || queue->data_digest) nvme_tcp_free_crypto(queue); - if (queue->pf_cache.va) { - page = virt_to_head_page(queue->pf_cache.va); - __page_frag_cache_drain(page, queue->pf_cache.pagecnt_bias); - queue->pf_cache.va = NULL; - } - + page_frag_cache_clear(&queue->pf_cache); noreclaim_flag = memalloc_noreclaim_save(); sock_release(queue->sock); memalloc_noreclaim_restore(noreclaim_flag); diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index ed98df72c76b..984e6ce85dcd 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -1464,7 +1464,6 @@ static void nvmet_tcp_free_cmd_data_in_buffers(struct nvmet_tcp_queue *queue) static void nvmet_tcp_release_queue_work(struct work_struct *w) { - struct page *page; struct nvmet_tcp_queue *queue = container_of(w, struct nvmet_tcp_queue, release_work); @@ -1486,9 +1485,7 @@ static void nvmet_tcp_release_queue_work(struct work_struct *w) if (queue->hdr_digest || queue->data_digest) nvmet_tcp_free_crypto(queue); ida_free(&nvmet_tcp_queue_ida, queue->idx); - - page = virt_to_head_page(queue->pf_cache.va); - __page_frag_cache_drain(page, queue->pf_cache.pagecnt_bias); + page_frag_cache_clear(&queue->pf_cache); kfree(queue); } diff --git a/include/linux/gfp.h b/include/linux/gfp.h index ed8cb537c6a7..03504beb51e4 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -314,6 +314,8 @@ static inline void *page_frag_alloc(struct page_frag_cache *nc, return page_frag_alloc_align(nc, fragsz, gfp_mask, ~0u); } +void page_frag_cache_clear(struct page_frag_cache *nc); + extern void page_frag_free(void *addr); #define __free_page(page) __free_pages((page), 0) diff --git a/mm/page_frag_alloc.c b/mm/page_frag_alloc.c index bee95824ef8f..e02b81d68dc4 100644 --- a/mm/page_frag_alloc.c +++ b/mm/page_frag_alloc.c @@ -46,6 +46,23 @@ void __page_frag_cache_drain(struct page *page, unsigned int count) } EXPORT_SYMBOL(__page_frag_cache_drain); +/** + * page_frag_cache_clear - Clear out a page fragment cache + * @nc: The cache to clear + * + * Discard any pages still cached in a page fragment cache. + */ +void page_frag_cache_clear(struct page_frag_cache *nc) +{ + if (nc->va) { + struct page *page = virt_to_head_page(nc->va); + + __page_frag_cache_drain(page, nc->pagecnt_bias); + nc->va = NULL; + } +} +EXPORT_SYMBOL(page_frag_cache_clear); + void *page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask, unsigned int align_mask) From patchwork Wed May 24 15:33:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 98574 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6358:3046:b0:115:7a1d:dabb with SMTP id p6csp2918223rwl; Wed, 24 May 2023 08:48:36 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7xlMaj4elnidR0HtQyGI9c5OrdTlDHvw3CDjWEtLPjRC6c3G6LS4JSlkFcjqlFka/0vlXG X-Received: by 2002:a17:902:ecd0:b0:1af:d6fb:199c with SMTP id a16-20020a170902ecd000b001afd6fb199cmr4275536plh.16.1684943315762; Wed, 24 May 2023 08:48:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684943315; cv=none; d=google.com; s=arc-20160816; b=c4D6Tvu4ogsZZ/yQswMyFqG5bn8U+ZwchgPsQuSDphtA8mnd1WNl5vvm31WsTO0aKe 75HPwg48afHwsrSx53awsVpXzy0o6/dnNBpg0pGjQYn5oa0qryqYGRr0mtBc1SV1wvkn P62KDr0/fPXf6TsYQUnE1IH6/nC/SflkNkZbq0vPaP/bqQzKFvPHju7ITGfTWCgdKOvU zVfkh9u3uUsVzCIOBkQ9Lxv29AW5iv/zF9JzkrtDqVjT/JiYHyEJn2kicixmCEGCJu8d wbWp6yEy31WwUrrgKYZhQ35Pclb1X9gO2F79yShDkgP8yGrpE4NA8KmiGxEhoT1UOcFi atfQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=uN3mSK+PjIP6JAbZhfDAmtnx0lnfpAA82MAUedDHiDA=; b=Izz+zIA6jvbjGe9xl/z1VZXL6k/0lkuBNNXTWMtToZc3eZuZV8uLodsFgLWD8w6JMa qsoU7MLSeTVAOTsjT/2pVrrNbnpVlUOtdTZX6kiNQ17AfDn352ppSVsBZFg67mDG4Oaw NtD/xZb32r5w2je74wpkF5ob/oq31TDNSlikDcIXBOoEefMPZ4V2nzpAIMBZbm4EJynQ PQYVuwnMl82+6YBlEjsPBbTj3xjOV+yeQBS3WN6hhC7FqCy5Ev0pJRmrK/4z7rp0ZbHz 2PChZsBtmvq8fn8kcPDwg5JIOrodH5+aoQa//10tH8JbqHE0c3dzCjVdtReUkBpcYdVa 5A2w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=bP62N1du; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u14-20020a170903124e00b001a68d4a42c0si8955802plh.560.2023.05.24.08.48.23; Wed, 24 May 2023 08:48:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=bP62N1du; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236747AbjEXPgQ (ORCPT + 99 others); Wed, 24 May 2023 11:36:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32974 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236678AbjEXPfM (ORCPT ); Wed, 24 May 2023 11:35:12 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EDC171A4 for ; Wed, 24 May 2023 08:33:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684942412; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uN3mSK+PjIP6JAbZhfDAmtnx0lnfpAA82MAUedDHiDA=; b=bP62N1dupkuCW43Pb7TDZG9ocoOp2S5lyB2lhWFEX+jJLTNZkk+WbXl29K0T1G1+F9i6UI vBX/k5DkEWqhFn0X7rTLNFnjO5mOSp1v6TzfcnDvNxks2VDFGTiCAihCsPqn0BvbKAARKI RJwAFG32pMR6uzYvS9Nqto2XMnkDxz8= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-649-Arb27wHhPVquoWD2zvAqmw-1; Wed, 24 May 2023 11:33:28 -0400 X-MC-Unique: Arb27wHhPVquoWD2zvAqmw-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id EB53C3C0D185; Wed, 24 May 2023 15:33:26 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 12905492B0B; Wed, 24 May 2023 15:33:22 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jeroen de Borst , Catherine Sullivan , Shailend Chand , Felix Fietkau , John Crispin , Sean Wang , Mark Lee , Lorenzo Bianconi , Matthias Brugger , AngeloGioacchino Del Regno , Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , Chaitanya Kulkarni , Andrew Morton , linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, linux-nvme@lists.infradead.org Subject: [PATCH net-next 03/12] mm: Make the page_frag_cache allocator alignment param a pow-of-2 Date: Wed, 24 May 2023 16:33:02 +0100 Message-Id: <20230524153311.3625329-4-dhowells@redhat.com> In-Reply-To: <20230524153311.3625329-1-dhowells@redhat.com> References: <20230524153311.3625329-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766791122386355137?= X-GMAIL-MSGID: =?utf-8?q?1766791122386355137?= Make the page_frag_cache allocator's alignment parameter a power of 2 rather than a mask and give a warning if it isn't. This means that it's consistent with {napi,netdec}_alloc_frag_align() and allows __{napi,netdev}_alloc_frag_align() to be removed. Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Jeroen de Borst cc: Catherine Sullivan cc: Shailend Chand cc: Felix Fietkau cc: John Crispin cc: Sean Wang cc: Mark Lee cc: Lorenzo Bianconi cc: Matthias Brugger cc: AngeloGioacchino Del Regno cc: Keith Busch cc: Jens Axboe cc: Christoph Hellwig cc: Sagi Grimberg cc: Chaitanya Kulkarni cc: Andrew Morton cc: Matthew Wilcox cc: netdev@vger.kernel.org cc: linux-arm-kernel@lists.infradead.org cc: linux-mediatek@lists.infradead.org cc: linux-nvme@lists.infradead.org cc: linux-mm@kvack.org --- include/linux/gfp.h | 4 ++-- include/linux/skbuff.h | 22 ++++------------------ mm/page_frag_alloc.c | 8 +++++--- net/core/skbuff.c | 14 +++++++------- 4 files changed, 18 insertions(+), 30 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 03504beb51e4..fa30100f46ad 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -306,12 +306,12 @@ struct page_frag_cache; extern void __page_frag_cache_drain(struct page *page, unsigned int count); extern void *page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask, - unsigned int align_mask); + unsigned int align); static inline void *page_frag_alloc(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask) { - return page_frag_alloc_align(nc, fragsz, gfp_mask, ~0u); + return page_frag_alloc_align(nc, fragsz, gfp_mask, 1); } void page_frag_cache_clear(struct page_frag_cache *nc); diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 1b2ebf6113e0..41b63e72c6c3 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3158,7 +3158,7 @@ void skb_queue_purge(struct sk_buff_head *list); unsigned int skb_rbtree_purge(struct rb_root *root); -void *__netdev_alloc_frag_align(unsigned int fragsz, unsigned int align_mask); +void *netdev_alloc_frag_align(unsigned int fragsz, unsigned int align); /** * netdev_alloc_frag - allocate a page fragment @@ -3169,14 +3169,7 @@ void *__netdev_alloc_frag_align(unsigned int fragsz, unsigned int align_mask); */ static inline void *netdev_alloc_frag(unsigned int fragsz) { - return __netdev_alloc_frag_align(fragsz, ~0u); -} - -static inline void *netdev_alloc_frag_align(unsigned int fragsz, - unsigned int align) -{ - WARN_ON_ONCE(!is_power_of_2(align)); - return __netdev_alloc_frag_align(fragsz, -align); + return netdev_alloc_frag_align(fragsz, 1); } struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int length, @@ -3236,18 +3229,11 @@ static inline void skb_free_frag(void *addr) page_frag_free(addr); } -void *__napi_alloc_frag_align(unsigned int fragsz, unsigned int align_mask); +void *napi_alloc_frag_align(unsigned int fragsz, unsigned int align); static inline void *napi_alloc_frag(unsigned int fragsz) { - return __napi_alloc_frag_align(fragsz, ~0u); -} - -static inline void *napi_alloc_frag_align(unsigned int fragsz, - unsigned int align) -{ - WARN_ON_ONCE(!is_power_of_2(align)); - return __napi_alloc_frag_align(fragsz, -align); + return napi_alloc_frag_align(fragsz, 1); } struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, diff --git a/mm/page_frag_alloc.c b/mm/page_frag_alloc.c index e02b81d68dc4..9d3f6fbd9a07 100644 --- a/mm/page_frag_alloc.c +++ b/mm/page_frag_alloc.c @@ -64,13 +64,15 @@ void page_frag_cache_clear(struct page_frag_cache *nc) EXPORT_SYMBOL(page_frag_cache_clear); void *page_frag_alloc_align(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask, - unsigned int align_mask) + unsigned int fragsz, gfp_t gfp_mask, + unsigned int align) { unsigned int size = PAGE_SIZE; struct page *page; int offset; + WARN_ON_ONCE(!is_power_of_2(align)); + if (unlikely(!nc->va)) { refill: page = __page_frag_cache_refill(nc, gfp_mask); @@ -129,7 +131,7 @@ void *page_frag_alloc_align(struct page_frag_cache *nc, } nc->pagecnt_bias--; - offset &= align_mask; + offset &= ~(align - 1); nc->offset = offset; return nc->va + offset; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index f4a5b51aed22..cc507433b357 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -289,17 +289,17 @@ void napi_get_frags_check(struct napi_struct *napi) local_bh_enable(); } -void *__napi_alloc_frag_align(unsigned int fragsz, unsigned int align_mask) +void *napi_alloc_frag_align(unsigned int fragsz, unsigned int align) { struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache); fragsz = SKB_DATA_ALIGN(fragsz); - return page_frag_alloc_align(&nc->page, fragsz, GFP_ATOMIC, align_mask); + return page_frag_alloc_align(&nc->page, fragsz, GFP_ATOMIC, align); } -EXPORT_SYMBOL(__napi_alloc_frag_align); +EXPORT_SYMBOL(napi_alloc_frag_align); -void *__netdev_alloc_frag_align(unsigned int fragsz, unsigned int align_mask) +void *netdev_alloc_frag_align(unsigned int fragsz, unsigned int align) { void *data; @@ -307,18 +307,18 @@ void *__netdev_alloc_frag_align(unsigned int fragsz, unsigned int align_mask) if (in_hardirq() || irqs_disabled()) { struct page_frag_cache *nc = this_cpu_ptr(&netdev_alloc_cache); - data = page_frag_alloc_align(nc, fragsz, GFP_ATOMIC, align_mask); + data = page_frag_alloc_align(nc, fragsz, GFP_ATOMIC, align); } else { struct napi_alloc_cache *nc; local_bh_disable(); nc = this_cpu_ptr(&napi_alloc_cache); - data = page_frag_alloc_align(&nc->page, fragsz, GFP_ATOMIC, align_mask); + data = page_frag_alloc_align(&nc->page, fragsz, GFP_ATOMIC, align); local_bh_enable(); } return data; } -EXPORT_SYMBOL(__netdev_alloc_frag_align); +EXPORT_SYMBOL(netdev_alloc_frag_align); static struct sk_buff *napi_skb_cache_get(void) { From patchwork Wed May 24 15:33:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 98568 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6358:3046:b0:115:7a1d:dabb with SMTP id p6csp2916620rwl; Wed, 24 May 2023 08:45:39 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4rDs+DeZlblJKb/cEeP/h3sTOrMgvc2zJYxLcYa1R7TGWulK54HRE9ALN6rY9RKVuij1Hq X-Received: by 2002:a05:6a00:1349:b0:644:d220:64ac with SMTP id k9-20020a056a00134900b00644d22064acmr4893960pfu.2.1684943138848; Wed, 24 May 2023 08:45:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684943138; cv=none; d=google.com; s=arc-20160816; b=OBAjVn8Ebl8IMiy0yKtTjenFJVnKjgBqqzCj9ZV1twuusqzcRpzjlzmj73TCdpGgTe ptFPON1b8nHaDWz9BduoHtwOQfyjNJ/Ycm5slWzGmyfz9Vc7KIDbj9o3vn3zS8vfvNia R1QDVm2oFCYlvnTUEwpRB4c9Qu/iUKQXvDftPVsaLN2DY3HEuaWhV0yQV0H30GI0o9Ef h21wTyxup8MsIY2jyvzwx6HP6cbw95eR0sGQXy2ky3hpGOpHm3unTGA6Bs5hKEcRQXA+ SjZpnOsl86G+m7C8wcpHS6ZJ7lJGm+bYbZxDHq9EfpQfON+leA4znSbv+GahwsPjT2XF gu1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=E/mkzJ+c6qr7yPOsPF9NXbENxWjJthPNWT3mBurv3yc=; b=c8OTCdOFr3hbEdAmzALJ+Cx+w9yVZosatru8dfZIOSpor8BxVTO2gSVnvV+zbOUndE wLFGycNUpg+XMy+NU3kKVLPuM3O3e7s39G6qHyD22QjzvoyYCdlxH/J/93n0X6LhqHBH YyIf865Ck2Yad2XOYHJt4BTx6in+qulaN9dt9c80kXX6PnfLvHbgO2f+yN0XPG53to/V Gzo0MTtEzMmagQ+lqiXkDCkKCrrW6AswFLfqlLaWEKsBoMkrTiDTCshzqrU/fRQHZCOy LqxOVedcrIuv4ye6kPifX6qNEio357ywnw2+FYaFQNVjjuzrXPquVN8jJmDcp1bzgupQ gyHw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="Kl/d8uRV"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t13-20020aa7946d000000b0064d337029ecsi208323pfq.200.2023.05.24.08.45.26; Wed, 24 May 2023 08:45:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="Kl/d8uRV"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236208AbjEXPge (ORCPT + 99 others); Wed, 24 May 2023 11:36:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33516 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236709AbjEXPfO (ORCPT ); Wed, 24 May 2023 11:35:14 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 71F471A6 for ; Wed, 24 May 2023 08:33:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684942414; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=E/mkzJ+c6qr7yPOsPF9NXbENxWjJthPNWT3mBurv3yc=; b=Kl/d8uRVqHoqy6t67jIyWJG+FdoHl4o80HDg8dKimTsfU0CPt45a5B+LQPTP6L+bJIV+5t Z2DSn7/0jqr36t3FWcRjSDBO1W5aKMCOAj/kPlmw1YdFgSPMS3VrOCe7GNwydphU+RbsA9 TBaPHlMo6r89U5IgOsGkURKcllo/aoU= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-617-ozOdLk_aN02TMJY6p7EiVg-1; Wed, 24 May 2023 11:33:33 -0400 X-MC-Unique: ozOdLk_aN02TMJY6p7EiVg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B040C280BC80; Wed, 24 May 2023 15:33:31 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id C783840CFD45; Wed, 24 May 2023 15:33:27 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jeroen de Borst , Catherine Sullivan , Shailend Chand , Felix Fietkau , John Crispin , Sean Wang , Mark Lee , Lorenzo Bianconi , Matthias Brugger , AngeloGioacchino Del Regno , Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , Chaitanya Kulkarni , Andrew Morton , linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, linux-nvme@lists.infradead.org Subject: [PATCH net-next 04/12] mm: Make the page_frag_cache allocator use multipage folios Date: Wed, 24 May 2023 16:33:03 +0100 Message-Id: <20230524153311.3625329-5-dhowells@redhat.com> In-Reply-To: <20230524153311.3625329-1-dhowells@redhat.com> References: <20230524153311.3625329-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766790936821407860?= X-GMAIL-MSGID: =?utf-8?q?1766790936821407860?= Change the page_frag_cache allocator to use multipage folios rather than groups of pages. This reduces page_frag_free to just a folio_put() or put_page(). Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Jeroen de Borst cc: Catherine Sullivan cc: Shailend Chand cc: Felix Fietkau cc: John Crispin cc: Sean Wang cc: Mark Lee cc: Lorenzo Bianconi cc: Matthias Brugger cc: AngeloGioacchino Del Regno cc: Keith Busch cc: Jens Axboe cc: Christoph Hellwig cc: Sagi Grimberg cc: Chaitanya Kulkarni cc: Andrew Morton cc: Matthew Wilcox cc: netdev@vger.kernel.org cc: linux-arm-kernel@lists.infradead.org cc: linux-mediatek@lists.infradead.org cc: linux-nvme@lists.infradead.org cc: linux-mm@kvack.org --- include/linux/mm_types.h | 13 ++---- mm/page_frag_alloc.c | 99 +++++++++++++++++++--------------------- 2 files changed, 52 insertions(+), 60 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 306a3d1a0fa6..d7c52a5979cc 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -420,18 +420,13 @@ static inline void *folio_get_private(struct folio *folio) } struct page_frag_cache { - void * va; -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - __u16 offset; - __u16 size; -#else - __u32 offset; -#endif + struct folio *folio; + unsigned int offset; /* we maintain a pagecount bias, so that we dont dirty cache line * containing page->_refcount every time we allocate a fragment. */ - unsigned int pagecnt_bias; - bool pfmemalloc; + unsigned int pagecnt_bias; + bool pfmemalloc; }; typedef unsigned long vm_flags_t; diff --git a/mm/page_frag_alloc.c b/mm/page_frag_alloc.c index 9d3f6fbd9a07..ffd68bfb677d 100644 --- a/mm/page_frag_alloc.c +++ b/mm/page_frag_alloc.c @@ -16,33 +16,34 @@ #include #include -static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, - gfp_t gfp_mask) +/* + * Allocate a new folio for the frag cache. + */ +static struct folio *page_frag_cache_refill(struct page_frag_cache *nc, + gfp_t gfp_mask) { - struct page *page = NULL; + struct folio *folio = NULL; gfp_t gfp = gfp_mask; #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - gfp_mask |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY | - __GFP_NOMEMALLOC; - page = alloc_pages_node(NUMA_NO_NODE, gfp_mask, - PAGE_FRAG_CACHE_MAX_ORDER); - nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE; + gfp_mask |= __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC; + folio = folio_alloc(gfp_mask, PAGE_FRAG_CACHE_MAX_ORDER); #endif - if (unlikely(!page)) - page = alloc_pages_node(NUMA_NO_NODE, gfp, 0); + if (unlikely(!folio)) + folio = folio_alloc(gfp, 0); - nc->va = page ? page_address(page) : NULL; - - return page; + if (folio) + nc->folio = folio; + return folio; } void __page_frag_cache_drain(struct page *page, unsigned int count) { - VM_BUG_ON_PAGE(page_ref_count(page) == 0, page); + struct folio *folio = page_folio(page); + + VM_BUG_ON_FOLIO(folio_ref_count(folio) == 0, folio); - if (page_ref_sub_and_test(page, count - 1)) - __free_pages(page, compound_order(page)); + folio_put_refs(folio, count); } EXPORT_SYMBOL(__page_frag_cache_drain); @@ -54,11 +55,12 @@ EXPORT_SYMBOL(__page_frag_cache_drain); */ void page_frag_cache_clear(struct page_frag_cache *nc) { - if (nc->va) { - struct page *page = virt_to_head_page(nc->va); + struct folio *folio = nc->folio; - __page_frag_cache_drain(page, nc->pagecnt_bias); - nc->va = NULL; + if (folio) { + VM_BUG_ON_FOLIO(folio_ref_count(folio) == 0, folio); + folio_put_refs(folio, nc->pagecnt_bias); + nc->folio = NULL; } } EXPORT_SYMBOL(page_frag_cache_clear); @@ -67,56 +69,51 @@ void *page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask, unsigned int align) { - unsigned int size = PAGE_SIZE; - struct page *page; - int offset; + struct folio *folio = nc->folio; + size_t offset; WARN_ON_ONCE(!is_power_of_2(align)); - if (unlikely(!nc->va)) { + if (unlikely(!folio)) { refill: - page = __page_frag_cache_refill(nc, gfp_mask); - if (!page) + folio = page_frag_cache_refill(nc, gfp_mask); + if (!folio) return NULL; -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - /* if size can vary use size else just use PAGE_SIZE */ - size = nc->size; -#endif /* Even if we own the page, we do not use atomic_set(). * This would break get_page_unless_zero() users. */ - page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE); + folio_ref_add(folio, PAGE_FRAG_CACHE_MAX_SIZE); /* reset page count bias and offset to start of new frag */ - nc->pfmemalloc = page_is_pfmemalloc(page); + nc->pfmemalloc = folio_is_pfmemalloc(folio); nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; - nc->offset = size; + nc->offset = folio_size(folio); } - offset = nc->offset - fragsz; - if (unlikely(offset < 0)) { - page = virt_to_page(nc->va); - - if (page_ref_count(page) != nc->pagecnt_bias) + offset = nc->offset; + if (unlikely(fragsz > offset)) { + /* Reuse the folio if everyone we gave it to has finished with + * it. + */ + if (!folio_ref_sub_and_test(folio, nc->pagecnt_bias)) { + nc->folio = NULL; goto refill; + } + if (unlikely(nc->pfmemalloc)) { - page_ref_sub(page, nc->pagecnt_bias - 1); - __free_pages(page, compound_order(page)); + __folio_put(folio); + nc->folio = NULL; goto refill; } -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - /* if size can vary use size else just use PAGE_SIZE */ - size = nc->size; -#endif /* OK, page count is 0, we can safely set it */ - set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1); + folio_set_count(folio, PAGE_FRAG_CACHE_MAX_SIZE + 1); /* reset page count bias and offset to start of new frag */ nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; - offset = size - fragsz; - if (unlikely(offset < 0)) { + offset = folio_size(folio); + if (unlikely(fragsz > offset)) { /* * The caller is trying to allocate a fragment * with fragsz > PAGE_SIZE but the cache isn't big @@ -126,15 +123,17 @@ void *page_frag_alloc_align(struct page_frag_cache *nc, * it could make memory pressure worse * so we simply return NULL here. */ + nc->offset = offset; return NULL; } } nc->pagecnt_bias--; + offset -= fragsz; offset &= ~(align - 1); nc->offset = offset; - return nc->va + offset; + return folio_address(folio) + offset; } EXPORT_SYMBOL(page_frag_alloc_align); @@ -143,8 +142,6 @@ EXPORT_SYMBOL(page_frag_alloc_align); */ void page_frag_free(void *addr) { - struct page *page = virt_to_head_page(addr); - - __free_pages(page, compound_order(page)); + folio_put(virt_to_folio(addr)); } EXPORT_SYMBOL(page_frag_free); From patchwork Wed May 24 15:33:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 98572 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6358:3046:b0:115:7a1d:dabb with SMTP id p6csp2918074rwl; Wed, 24 May 2023 08:48:19 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ67/IAY8+kbhFEjOBxO+RAO7KZETfnfYPpvenDUEyjgSEgdCTE+IeMjzgYK6KaFkp4fWfXO X-Received: by 2002:a05:6a21:6d95:b0:10b:ce6e:656b with SMTP id wl21-20020a056a216d9500b0010bce6e656bmr11648954pzb.46.1684943299361; Wed, 24 May 2023 08:48:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684943299; cv=none; d=google.com; s=arc-20160816; b=b+ec819GjyOeayYiLTK4NG+OTmyNVNrSHJNIbEKlP/Fdn1hJ9ijtwjqCe8amM3eQeO ISfERgMjQZunCETETQF4IcWehL0ilTwRgFxfUlcmB7q/rJt1YMT4h5QS8dzswvVGTWSY zQ8uDxMOpckVjQNeRKj9rYjwY4XH7X4IxYF3+PgCF2JpsVXuOEGv7ThtrRS3hS2Cay4w 0exDYCGSo0QPwcr2ZpuSQLTbIQpodgqPOqgAS+qo1BdxP2vIMENsxFSmb5wjF0nDJ/UR LewMeA1LI2+cr9Zuih9l9P6QTVCUVLUNbR5bl5Z5B/Z/TcVVlQp0I2TGekJ2X3hQN6QN 3VQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=uekJL1E5Uz+9JLqowhoPzZVYALfVqMM/z7Y13jfCLzU=; b=j3GKuiPrQUQ7nJZdxGjEZvucBcysNLvnP22Q1ClzoprlTpdql5GveKjZ28tEBoGkBI OCveO0X6M4IBbwi87FlWKAD1rWcnsKmvncU0MDCWjnUDUZS3j6YT1lQwm0IAMnX28mDJ Pd8qHvbMGcnYaTkpjK/us8xo/Hu2oSgrPWWPo+zsOS8o7OxYbutsSRyrV5VnzBqEXXP9 i5iqSUX2Jj8K74zKZ89ahhVR9NgPU6Lytjj1Wn7vtk5+weM+8HEVWVGouGoZnpywXlFj CKQNcjY5tx0P1gmMD0SQlz0M7J/Ofjq85w1wDMWS4+S0Ao3DhughTn5Ty8+myYrqbRfj 5/xA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=BTDSpsph; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n24-20020a638f18000000b005208c4fa34csi1393349pgd.773.2023.05.24.08.48.06; Wed, 24 May 2023 08:48:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=BTDSpsph; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237145AbjEXPg0 (ORCPT + 99 others); Wed, 24 May 2023 11:36:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33688 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236763AbjEXPfR (ORCPT ); Wed, 24 May 2023 11:35:17 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF8771B6 for ; Wed, 24 May 2023 08:34:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684942421; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uekJL1E5Uz+9JLqowhoPzZVYALfVqMM/z7Y13jfCLzU=; b=BTDSpsphuW9fIcfezkiYIjMwV/fLk+H1R2qNj0ohldeC9yQQN1Nu1JRudoBtwqi+lYtstE 6lh71gCm9Lu8CZDVcLHpSJ+IgwNg61XWWX/cPQ6RTIyY0oQeYPKwz7va0rS4eZGy7NSVgv 9iNqw7yPVSnmEzUzC1q7a/kjS0OSino= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-168-UtOOYHJNMQqeOwkvKYu_rw-1; Wed, 24 May 2023 11:33:38 -0400 X-MC-Unique: UtOOYHJNMQqeOwkvKYu_rw-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4F027280BCA2; Wed, 24 May 2023 15:33:36 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6A068492B0A; Wed, 24 May 2023 15:33:32 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jeroen de Borst , Catherine Sullivan , Shailend Chand , Felix Fietkau , John Crispin , Sean Wang , Mark Lee , Lorenzo Bianconi , Matthias Brugger , AngeloGioacchino Del Regno , Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , Chaitanya Kulkarni , Andrew Morton , linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, linux-nvme@lists.infradead.org Subject: [PATCH net-next 05/12] mm: Make the page_frag_cache allocator handle __GFP_ZERO itself Date: Wed, 24 May 2023 16:33:04 +0100 Message-Id: <20230524153311.3625329-6-dhowells@redhat.com> In-Reply-To: <20230524153311.3625329-1-dhowells@redhat.com> References: <20230524153311.3625329-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766791104845936375?= X-GMAIL-MSGID: =?utf-8?q?1766791104845936375?= Make the page_frag_cache allocator handle __GFP_ZERO itself rather than passing it off to the page allocator. There may be a mix of callers, some specifying __GFP_ZERO and some not - and even if all specify __GFP_ZERO, we might refurbish the page, in which case the returned memory doesn't get cleared. This is a potential bug in the nvme over TCP driver. Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Jeroen de Borst cc: Catherine Sullivan cc: Shailend Chand cc: Felix Fietkau cc: John Crispin cc: Sean Wang cc: Mark Lee cc: Lorenzo Bianconi cc: Matthias Brugger cc: AngeloGioacchino Del Regno cc: Keith Busch cc: Jens Axboe cc: Christoph Hellwig cc: Sagi Grimberg cc: Chaitanya Kulkarni cc: Andrew Morton cc: Matthew Wilcox cc: netdev@vger.kernel.org cc: linux-arm-kernel@lists.infradead.org cc: linux-mediatek@lists.infradead.org cc: linux-nvme@lists.infradead.org cc: linux-mm@kvack.org --- mm/page_frag_alloc.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/mm/page_frag_alloc.c b/mm/page_frag_alloc.c index ffd68bfb677d..2b73c7f5d9a9 100644 --- a/mm/page_frag_alloc.c +++ b/mm/page_frag_alloc.c @@ -23,7 +23,10 @@ static struct folio *page_frag_cache_refill(struct page_frag_cache *nc, gfp_t gfp_mask) { struct folio *folio = NULL; - gfp_t gfp = gfp_mask; + gfp_t gfp; + + gfp_mask &= ~__GFP_ZERO; + gfp = gfp_mask; #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) gfp_mask |= __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC; @@ -71,6 +74,7 @@ void *page_frag_alloc_align(struct page_frag_cache *nc, { struct folio *folio = nc->folio; size_t offset; + void *p; WARN_ON_ONCE(!is_power_of_2(align)); @@ -133,7 +137,10 @@ void *page_frag_alloc_align(struct page_frag_cache *nc, offset &= ~(align - 1); nc->offset = offset; - return folio_address(folio) + offset; + p = folio_address(folio) + offset; + if (gfp_mask & __GFP_ZERO) + return memset(p, 0, fragsz); + return p; } EXPORT_SYMBOL(page_frag_alloc_align); From patchwork Wed May 24 15:33:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 98578 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6358:3046:b0:115:7a1d:dabb with SMTP id p6csp2920125rwl; Wed, 24 May 2023 08:52:08 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4NqYXUS04FlGCydUdIuq71FqifISDYZH3igNVv4Pr2h+c6qL/lktuJ9+F/S9u5PnVQm9Pt X-Received: by 2002:a05:6a00:a1a:b0:63f:2959:a271 with SMTP id p26-20020a056a000a1a00b0063f2959a271mr3638879pfh.6.1684943527922; Wed, 24 May 2023 08:52:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684943527; cv=none; d=google.com; s=arc-20160816; b=cfdkKNWOpRi+u/bXOr1sb3EdZe6jqH7j/hL2hjjlO/Io4KAfwRJV/PhLnE9pegX949 IItu2Q31aC23LMXQMnu+kfRuo1NOLE2KIrkcip66iB0ifirJyOgmJqCdzWqQ+moD2vg7 GDo6lD1dC+DHk73LdKaK25G+zzP9s91SFZDOotlaYonYeq3Yjj2RGcyLIQTUMksgYPGY 5GVmxob/cnFPQRcBFI8yUk+VsVNlN25aYWKV2RWoZc2q6RujCGxG3mDYrixOwMhRAadI afI1Sa7vcZY+M9zi5qpKxSXrWc+zXraIDeZSNqGcYuBam2GDIRePIjh2b7qcaIttRNqb aPbA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=p71nalHI+W3yFqMm+4MkrrRPKSqs6pNOJ/pHSBSTc4w=; b=VLYYaoKT9GJFA9CalRRVoTD1O0vHDTsJLJboHwDxHAfZWVzUzQvqi513QdqgYyRku5 WOcGJ2wOpPlH9NciikbrwN+WNlX6rPgsDR4IEa3FVSm+4cv1ALYAyKY9lMqY8cE3ofkb 0TtaWjG0eTjGzIFebKFWZPdyoVm2vKosgC7O3VCa/vJ0Qkb+iBd5i12Ttp4W68qmmGkd pxODMz34ULHJ/GD6u/nDWiKjlNur0o7Rc80QehcMWKF+xvcf6jXlMh2KQxmugBPKH7uy UIyQao/j7n5r3FoTkP/SfiyXSXLLj2BuevU3W9YdCghB5BtJ5H4oRHeBiJrKkmCRGTQW BUVQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=C5AVIpH7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g16-20020a633750000000b00538edfb070dsi3948584pgn.478.2023.05.24.08.51.54; Wed, 24 May 2023 08:52:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=C5AVIpH7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237166AbjEXPgb (ORCPT + 99 others); Wed, 24 May 2023 11:36:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33640 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236782AbjEXPfV (ORCPT ); Wed, 24 May 2023 11:35:21 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BC4FAE4A for ; Wed, 24 May 2023 08:34:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684942427; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=p71nalHI+W3yFqMm+4MkrrRPKSqs6pNOJ/pHSBSTc4w=; b=C5AVIpH7YPyT0C4XJFzDEWeVBBVw6PP1qtwO1i1mZVR1Xu+UI0iNGFJGWK1iFBpcfHjgoT Lbjw8qbP5+iWVnZqMneOpV2GHhgoghB4+A7eb8ZviT2n0HLB6X1WolMsIjDvEwfDWxxt6k 4IeZIb0YjJm3wuHJSawP7Sv8CQkAWII= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-136-3nahLBV-MXSO5tBDJHqSWA-1; Wed, 24 May 2023 11:33:43 -0400 X-MC-Unique: 3nahLBV-MXSO5tBDJHqSWA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 19058800888; Wed, 24 May 2023 15:33:41 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 217A440C6CCC; Wed, 24 May 2023 15:33:37 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jeroen de Borst , Catherine Sullivan , Shailend Chand , Felix Fietkau , John Crispin , Sean Wang , Mark Lee , Lorenzo Bianconi , Matthias Brugger , AngeloGioacchino Del Regno , Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , Chaitanya Kulkarni , Andrew Morton , linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, linux-nvme@lists.infradead.org Subject: [PATCH net-next 06/12] mm: Make the page_frag_cache allocator use per-cpu Date: Wed, 24 May 2023 16:33:05 +0100 Message-Id: <20230524153311.3625329-7-dhowells@redhat.com> In-Reply-To: <20230524153311.3625329-1-dhowells@redhat.com> References: <20230524153311.3625329-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766791344327744173?= X-GMAIL-MSGID: =?utf-8?q?1766791344327744173?= Make the page_frag_cache allocator have a separate allocation bucket for each cpu to avoid racing. This means that no lock is required, other than preempt disablement, to allocate from it, though if a softirq wants to access it, then softirq disablement will need to be added. Make the NVMe, mediatek and GVE drivers pass in NULL to page_frag_cache() and use the default allocation buckets rather than defining their own. Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Jeroen de Borst cc: Catherine Sullivan cc: Shailend Chand cc: Felix Fietkau cc: John Crispin cc: Sean Wang cc: Mark Lee cc: Lorenzo Bianconi cc: Matthias Brugger cc: AngeloGioacchino Del Regno cc: Keith Busch cc: Jens Axboe cc: Christoph Hellwig cc: Sagi Grimberg cc: Chaitanya Kulkarni cc: Andrew Morton cc: Matthew Wilcox cc: netdev@vger.kernel.org cc: linux-arm-kernel@lists.infradead.org cc: linux-mediatek@lists.infradead.org cc: linux-nvme@lists.infradead.org cc: linux-mm@kvack.org --- drivers/net/ethernet/google/gve/gve.h | 1 - drivers/net/ethernet/google/gve/gve_main.c | 9 - drivers/net/ethernet/google/gve/gve_rx.c | 2 +- drivers/net/ethernet/mediatek/mtk_wed_wo.c | 6 +- drivers/net/ethernet/mediatek/mtk_wed_wo.h | 2 - drivers/nvme/host/tcp.c | 13 +- drivers/nvme/target/tcp.c | 19 +- include/linux/gfp.h | 19 +- mm/page_frag_alloc.c | 202 +++++++++++++-------- net/core/skbuff.c | 32 ++-- 10 files changed, 163 insertions(+), 142 deletions(-) diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h index 98eb78d98e9f..87244ab911bd 100644 --- a/drivers/net/ethernet/google/gve/gve.h +++ b/drivers/net/ethernet/google/gve/gve.h @@ -250,7 +250,6 @@ struct gve_rx_ring { struct xdp_rxq_info xdp_rxq; struct xdp_rxq_info xsk_rxq; struct xsk_buff_pool *xsk_pool; - struct page_frag_cache page_cache; /* Page cache to allocate XDP frames */ }; /* A TX desc ring entry */ diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c index 55feab29bed9..9f0fb986d61e 100644 --- a/drivers/net/ethernet/google/gve/gve_main.c +++ b/drivers/net/ethernet/google/gve/gve_main.c @@ -1249,14 +1249,6 @@ static void gve_unreg_xdp_info(struct gve_priv *priv) } } -static void gve_drain_page_cache(struct gve_priv *priv) -{ - int i; - - for (i = 0; i < priv->rx_cfg.num_queues; i++) - page_frag_cache_clear(&priv->rx[i].page_cache); -} - static int gve_open(struct net_device *dev) { struct gve_priv *priv = netdev_priv(dev); @@ -1340,7 +1332,6 @@ static int gve_close(struct net_device *dev) netif_carrier_off(dev); if (gve_get_device_rings_ok(priv)) { gve_turndown(priv); - gve_drain_page_cache(priv); err = gve_destroy_rings(priv); if (err) goto err; diff --git a/drivers/net/ethernet/google/gve/gve_rx.c b/drivers/net/ethernet/google/gve/gve_rx.c index d1da7413dc4d..7ae8377c394f 100644 --- a/drivers/net/ethernet/google/gve/gve_rx.c +++ b/drivers/net/ethernet/google/gve/gve_rx.c @@ -634,7 +634,7 @@ static int gve_xdp_redirect(struct net_device *dev, struct gve_rx_ring *rx, total_len = headroom + SKB_DATA_ALIGN(len) + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); - frame = page_frag_alloc(&rx->page_cache, total_len, GFP_ATOMIC); + frame = page_frag_alloc(NULL, total_len, GFP_ATOMIC); if (!frame) { u64_stats_update_begin(&rx->statss); rx->xdp_alloc_fails++; diff --git a/drivers/net/ethernet/mediatek/mtk_wed_wo.c b/drivers/net/ethernet/mediatek/mtk_wed_wo.c index d90fea2c7d04..859f34447f2f 100644 --- a/drivers/net/ethernet/mediatek/mtk_wed_wo.c +++ b/drivers/net/ethernet/mediatek/mtk_wed_wo.c @@ -143,7 +143,7 @@ mtk_wed_wo_queue_refill(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q, dma_addr_t addr; void *buf; - buf = page_frag_alloc(&q->cache, q->buf_size, GFP_ATOMIC); + buf = page_frag_alloc(NULL, q->buf_size, GFP_ATOMIC); if (!buf) break; @@ -296,8 +296,6 @@ mtk_wed_wo_queue_tx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q) skb_free_frag(entry->buf); entry->buf = NULL; } - - page_frag_cache_clear(&q->cache); } static void @@ -311,8 +309,6 @@ mtk_wed_wo_queue_rx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q) skb_free_frag(buf); } - - page_frag_cache_clear(&q->cache); } static void diff --git a/drivers/net/ethernet/mediatek/mtk_wed_wo.h b/drivers/net/ethernet/mediatek/mtk_wed_wo.h index 7a1a2a28f1ac..f69bd83dc486 100644 --- a/drivers/net/ethernet/mediatek/mtk_wed_wo.h +++ b/drivers/net/ethernet/mediatek/mtk_wed_wo.h @@ -211,8 +211,6 @@ struct mtk_wed_wo_queue_entry { struct mtk_wed_wo_queue { struct mtk_wed_wo_queue_regs regs; - struct page_frag_cache cache; - struct mtk_wed_wo_queue_desc *desc; dma_addr_t desc_dma; diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index dcc35f6bff8c..145cf6186509 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -147,8 +147,6 @@ struct nvme_tcp_queue { __le32 exp_ddgst; __le32 recv_ddgst; - struct page_frag_cache pf_cache; - void (*state_change)(struct sock *); void (*data_ready)(struct sock *); void (*write_space)(struct sock *); @@ -482,9 +480,8 @@ static int nvme_tcp_init_request(struct blk_mq_tag_set *set, struct nvme_tcp_queue *queue = &ctrl->queues[queue_idx]; u8 hdgst = nvme_tcp_hdgst_len(queue); - req->pdu = page_frag_alloc(&queue->pf_cache, - sizeof(struct nvme_tcp_cmd_pdu) + hdgst, - GFP_KERNEL | __GFP_ZERO); + req->pdu = page_frag_alloc(NULL, sizeof(struct nvme_tcp_cmd_pdu) + hdgst, + GFP_KERNEL | __GFP_ZERO); if (!req->pdu) return -ENOMEM; @@ -1303,9 +1300,8 @@ static int nvme_tcp_alloc_async_req(struct nvme_tcp_ctrl *ctrl) struct nvme_tcp_request *async = &ctrl->async_req; u8 hdgst = nvme_tcp_hdgst_len(queue); - async->pdu = page_frag_alloc(&queue->pf_cache, - sizeof(struct nvme_tcp_cmd_pdu) + hdgst, - GFP_KERNEL | __GFP_ZERO); + async->pdu = page_frag_alloc(NULL, sizeof(struct nvme_tcp_cmd_pdu) + hdgst, + GFP_KERNEL | __GFP_ZERO); if (!async->pdu) return -ENOMEM; @@ -1325,7 +1321,6 @@ static void nvme_tcp_free_queue(struct nvme_ctrl *nctrl, int qid) if (queue->hdr_digest || queue->data_digest) nvme_tcp_free_crypto(queue); - page_frag_cache_clear(&queue->pf_cache); noreclaim_flag = memalloc_noreclaim_save(); sock_release(queue->sock); memalloc_noreclaim_restore(noreclaim_flag); diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index 984e6ce85dcd..cb352f5d2bbf 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -169,8 +169,6 @@ struct nvmet_tcp_queue { struct nvmet_tcp_cmd connect; - struct page_frag_cache pf_cache; - void (*data_ready)(struct sock *); void (*state_change)(struct sock *); void (*write_space)(struct sock *); @@ -1338,25 +1336,25 @@ static int nvmet_tcp_alloc_cmd(struct nvmet_tcp_queue *queue, c->queue = queue; c->req.port = queue->port->nport; - c->cmd_pdu = page_frag_alloc(&queue->pf_cache, - sizeof(*c->cmd_pdu) + hdgst, GFP_KERNEL | __GFP_ZERO); + c->cmd_pdu = page_frag_alloc(NULL, sizeof(*c->cmd_pdu) + hdgst, + GFP_KERNEL | __GFP_ZERO); if (!c->cmd_pdu) return -ENOMEM; c->req.cmd = &c->cmd_pdu->cmd; - c->rsp_pdu = page_frag_alloc(&queue->pf_cache, - sizeof(*c->rsp_pdu) + hdgst, GFP_KERNEL | __GFP_ZERO); + c->rsp_pdu = page_frag_alloc(NULL, sizeof(*c->rsp_pdu) + hdgst, + GFP_KERNEL | __GFP_ZERO); if (!c->rsp_pdu) goto out_free_cmd; c->req.cqe = &c->rsp_pdu->cqe; - c->data_pdu = page_frag_alloc(&queue->pf_cache, - sizeof(*c->data_pdu) + hdgst, GFP_KERNEL | __GFP_ZERO); + c->data_pdu = page_frag_alloc(NULL, sizeof(*c->data_pdu) + hdgst, + GFP_KERNEL | __GFP_ZERO); if (!c->data_pdu) goto out_free_rsp; - c->r2t_pdu = page_frag_alloc(&queue->pf_cache, - sizeof(*c->r2t_pdu) + hdgst, GFP_KERNEL | __GFP_ZERO); + c->r2t_pdu = page_frag_alloc(NULL, sizeof(*c->r2t_pdu) + hdgst, + GFP_KERNEL | __GFP_ZERO); if (!c->r2t_pdu) goto out_free_data; @@ -1485,7 +1483,6 @@ static void nvmet_tcp_release_queue_work(struct work_struct *w) if (queue->hdr_digest || queue->data_digest) nvmet_tcp_free_crypto(queue); ida_free(&nvmet_tcp_queue_ida, queue->idx); - page_frag_cache_clear(&queue->pf_cache); kfree(queue); } diff --git a/include/linux/gfp.h b/include/linux/gfp.h index fa30100f46ad..baa25a00d9e3 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -304,18 +304,19 @@ extern void free_pages(unsigned long addr, unsigned int order); struct page_frag_cache; extern void __page_frag_cache_drain(struct page *page, unsigned int count); -extern void *page_frag_alloc_align(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask, - unsigned int align); - -static inline void *page_frag_alloc(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask) +extern void *page_frag_alloc_align(struct page_frag_cache __percpu *frag_cache, + size_t fragsz, gfp_t gfp, + unsigned long align); +extern void *page_frag_memdup(struct page_frag_cache __percpu *frag_cache, + const void *p, size_t fragsz, gfp_t gfp, + unsigned long align); + +static inline void *page_frag_alloc(struct page_frag_cache __percpu *frag_cache, + size_t fragsz, gfp_t gfp) { - return page_frag_alloc_align(nc, fragsz, gfp_mask, 1); + return page_frag_alloc_align(frag_cache, fragsz, gfp, 1); } -void page_frag_cache_clear(struct page_frag_cache *nc); - extern void page_frag_free(void *addr); #define __free_page(page) __free_pages((page), 0) diff --git a/mm/page_frag_alloc.c b/mm/page_frag_alloc.c index 2b73c7f5d9a9..b035bbb34fac 100644 --- a/mm/page_frag_alloc.c +++ b/mm/page_frag_alloc.c @@ -16,28 +16,25 @@ #include #include +static DEFINE_PER_CPU(struct page_frag_cache, page_frag_default_allocator); + /* * Allocate a new folio for the frag cache. */ -static struct folio *page_frag_cache_refill(struct page_frag_cache *nc, - gfp_t gfp_mask) +static struct folio *page_frag_cache_refill(gfp_t gfp) { - struct folio *folio = NULL; - gfp_t gfp; + struct folio *folio; - gfp_mask &= ~__GFP_ZERO; - gfp = gfp_mask; + gfp &= ~__GFP_ZERO; #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - gfp_mask |= __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC; - folio = folio_alloc(gfp_mask, PAGE_FRAG_CACHE_MAX_ORDER); + folio = folio_alloc(gfp | __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC, + PAGE_FRAG_CACHE_MAX_ORDER); + if (folio) + return folio; #endif - if (unlikely(!folio)) - folio = folio_alloc(gfp, 0); - if (folio) - nc->folio = folio; - return folio; + return folio_alloc(gfp, 0); } void __page_frag_cache_drain(struct page *page, unsigned int count) @@ -51,63 +48,70 @@ void __page_frag_cache_drain(struct page *page, unsigned int count) EXPORT_SYMBOL(__page_frag_cache_drain); /** - * page_frag_cache_clear - Clear out a page fragment cache - * @nc: The cache to clear + * page_frag_alloc_align - Allocate some memory for use in zerocopy + * @frag_cache: The frag cache to use (or NULL for the default) + * @fragsz: The size of the fragment desired + * @gfp: Allocation flags under which to make an allocation + * @align: The required alignment + * + * Allocate some memory for use with zerocopy where protocol bits have to be + * mixed in with spliced/zerocopied data. Unlike memory allocated from the + * slab, this memory's lifetime is purely dependent on the folio's refcount. + * + * The way it works is that a folio is allocated and fragments are broken off + * sequentially and returned to the caller with a ref until the folio no longer + * has enough spare space - at which point the allocator's ref is dropped and a + * new folio is allocated. The folio remains in existence until the last ref + * held by, say, an sk_buff is discarded and then the page is returned to the + * page allocator. * - * Discard any pages still cached in a page fragment cache. + * Returns a pointer to the memory on success and -ENOMEM on allocation + * failure. + * + * The allocated memory should be disposed of with folio_put(). */ -void page_frag_cache_clear(struct page_frag_cache *nc) -{ - struct folio *folio = nc->folio; - - if (folio) { - VM_BUG_ON_FOLIO(folio_ref_count(folio) == 0, folio); - folio_put_refs(folio, nc->pagecnt_bias); - nc->folio = NULL; - } -} -EXPORT_SYMBOL(page_frag_cache_clear); - -void *page_frag_alloc_align(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask, - unsigned int align) +void *page_frag_alloc_align(struct page_frag_cache __percpu *frag_cache, + size_t fragsz, gfp_t gfp, unsigned long align) { - struct folio *folio = nc->folio; + struct page_frag_cache *nc; + struct folio *folio, *spare = NULL; size_t offset; void *p; WARN_ON_ONCE(!is_power_of_2(align)); - if (unlikely(!folio)) { -refill: - folio = page_frag_cache_refill(nc, gfp_mask); - if (!folio) - return NULL; - - /* Even if we own the page, we do not use atomic_set(). - * This would break get_page_unless_zero() users. - */ - folio_ref_add(folio, PAGE_FRAG_CACHE_MAX_SIZE); + if (!frag_cache) + frag_cache = &page_frag_default_allocator; + if (WARN_ON_ONCE(fragsz == 0)) + fragsz = 1; - /* reset page count bias and offset to start of new frag */ - nc->pfmemalloc = folio_is_pfmemalloc(folio); - nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; - nc->offset = folio_size(folio); + nc = get_cpu_ptr(frag_cache); +reload: + folio = nc->folio; + offset = nc->offset; +try_again: + + /* Make the allocation if there's sufficient space. */ + if (fragsz <= offset) { + nc->pagecnt_bias--; + offset = (offset - fragsz) & ~(align - 1); + nc->offset = offset; + p = folio_address(folio) + offset; + put_cpu_ptr(frag_cache); + if (spare) + folio_put(spare); + if (gfp & __GFP_ZERO) + return memset(p, 0, fragsz); + return p; } - offset = nc->offset; - if (unlikely(fragsz > offset)) { - /* Reuse the folio if everyone we gave it to has finished with - * it. - */ - if (!folio_ref_sub_and_test(folio, nc->pagecnt_bias)) { - nc->folio = NULL; + /* Insufficient space - see if we can refurbish the current folio. */ + if (folio) { + if (!folio_ref_sub_and_test(folio, nc->pagecnt_bias)) goto refill; - } if (unlikely(nc->pfmemalloc)) { __folio_put(folio); - nc->folio = NULL; goto refill; } @@ -117,30 +121,56 @@ void *page_frag_alloc_align(struct page_frag_cache *nc, /* reset page count bias and offset to start of new frag */ nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; offset = folio_size(folio); - if (unlikely(fragsz > offset)) { - /* - * The caller is trying to allocate a fragment - * with fragsz > PAGE_SIZE but the cache isn't big - * enough to satisfy the request, this may - * happen in low memory conditions. - * We don't release the cache page because - * it could make memory pressure worse - * so we simply return NULL here. - */ - nc->offset = offset; + if (unlikely(fragsz > offset)) + goto frag_too_big; + goto try_again; + } + +refill: + if (!spare) { + nc->folio = NULL; + put_cpu_ptr(frag_cache); + + spare = page_frag_cache_refill(gfp); + if (!spare) return NULL; - } + + nc = get_cpu_ptr(frag_cache); + /* We may now be on a different cpu and/or someone else may + * have refilled it + */ + nc->pfmemalloc = folio_is_pfmemalloc(spare); + if (nc->folio) + goto reload; } - nc->pagecnt_bias--; - offset -= fragsz; - offset &= ~(align - 1); + nc->folio = spare; + folio = spare; + spare = NULL; + + /* Even if we own the page, we do not use atomic_set(). This would + * break get_page_unless_zero() users. + */ + folio_ref_add(folio, PAGE_FRAG_CACHE_MAX_SIZE); + + /* Reset page count bias and offset to start of new frag */ + nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; + offset = folio_size(folio); + goto try_again; + +frag_too_big: + /* + * The caller is trying to allocate a fragment with fragsz > PAGE_SIZE + * but the cache isn't big enough to satisfy the request, this may + * happen in low memory conditions. We don't release the cache page + * because it could make memory pressure worse so we simply return NULL + * here. + */ nc->offset = offset; - - p = folio_address(folio) + offset; - if (gfp_mask & __GFP_ZERO) - return memset(p, 0, fragsz); - return p; + put_cpu_ptr(frag_cache); + if (spare) + folio_put(spare); + return NULL; } EXPORT_SYMBOL(page_frag_alloc_align); @@ -152,3 +182,25 @@ void page_frag_free(void *addr) folio_put(virt_to_folio(addr)); } EXPORT_SYMBOL(page_frag_free); + +/** + * page_frag_memdup - Allocate a page fragment and duplicate some data into it + * @frag_cache: The frag cache to use (or NULL for the default) + * @fragsz: The amount of memory to copy (maximum 1/2 page). + * @p: The source data to copy + * @gfp: Allocation flags under which to make an allocation + * @align_mask: The required alignment + */ +void *page_frag_memdup(struct page_frag_cache __percpu *frag_cache, + const void *p, size_t fragsz, gfp_t gfp, + unsigned long align_mask) +{ + void *q; + + q = page_frag_alloc_align(frag_cache, fragsz, gfp, align_mask); + if (!q) + return q; + + return memcpy(q, p, fragsz); +} +EXPORT_SYMBOL(page_frag_memdup); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index cc507433b357..225a16f3713f 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -263,13 +263,13 @@ static void *page_frag_alloc_1k(struct page_frag_1k *nc, gfp_t gfp_mask) #endif struct napi_alloc_cache { - struct page_frag_cache page; struct page_frag_1k page_small; unsigned int skb_count; void *skb_cache[NAPI_SKB_CACHE_SIZE]; }; static DEFINE_PER_CPU(struct page_frag_cache, netdev_alloc_cache); +static DEFINE_PER_CPU(struct page_frag_cache, napi_frag_cache); static DEFINE_PER_CPU(struct napi_alloc_cache, napi_alloc_cache); /* Double check that napi_get_frags() allocates skbs with @@ -291,11 +291,9 @@ void napi_get_frags_check(struct napi_struct *napi) void *napi_alloc_frag_align(unsigned int fragsz, unsigned int align) { - struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache); - fragsz = SKB_DATA_ALIGN(fragsz); - return page_frag_alloc_align(&nc->page, fragsz, GFP_ATOMIC, align); + return page_frag_alloc_align(&napi_frag_cache, fragsz, GFP_ATOMIC, align); } EXPORT_SYMBOL(napi_alloc_frag_align); @@ -305,15 +303,12 @@ void *netdev_alloc_frag_align(unsigned int fragsz, unsigned int align) fragsz = SKB_DATA_ALIGN(fragsz); if (in_hardirq() || irqs_disabled()) { - struct page_frag_cache *nc = this_cpu_ptr(&netdev_alloc_cache); - - data = page_frag_alloc_align(nc, fragsz, GFP_ATOMIC, align); + data = page_frag_alloc_align(&netdev_alloc_cache, + fragsz, GFP_ATOMIC, align); } else { - struct napi_alloc_cache *nc; - local_bh_disable(); - nc = this_cpu_ptr(&napi_alloc_cache); - data = page_frag_alloc_align(&nc->page, fragsz, GFP_ATOMIC, align); + data = page_frag_alloc_align(&napi_frag_cache, + fragsz, GFP_ATOMIC, align); local_bh_enable(); } return data; @@ -691,7 +686,6 @@ EXPORT_SYMBOL(__alloc_skb); struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len, gfp_t gfp_mask) { - struct page_frag_cache *nc; struct sk_buff *skb; bool pfmemalloc; void *data; @@ -716,14 +710,12 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len, gfp_mask |= __GFP_MEMALLOC; if (in_hardirq() || irqs_disabled()) { - nc = this_cpu_ptr(&netdev_alloc_cache); - data = page_frag_alloc(nc, len, gfp_mask); - pfmemalloc = nc->pfmemalloc; + data = page_frag_alloc(&netdev_alloc_cache, len, gfp_mask); + pfmemalloc = folio_is_pfmemalloc(virt_to_folio(data)); } else { local_bh_disable(); - nc = this_cpu_ptr(&napi_alloc_cache.page); - data = page_frag_alloc(nc, len, gfp_mask); - pfmemalloc = nc->pfmemalloc; + data = page_frag_alloc(&napi_frag_cache, len, gfp_mask); + pfmemalloc = folio_is_pfmemalloc(virt_to_folio(data)); local_bh_enable(); } @@ -811,8 +803,8 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len, } else { len = SKB_HEAD_ALIGN(len); - data = page_frag_alloc(&nc->page, len, gfp_mask); - pfmemalloc = nc->page.pfmemalloc; + data = page_frag_alloc(&napi_frag_cache, len, gfp_mask); + pfmemalloc = folio_is_pfmemalloc(virt_to_folio(data)); } if (unlikely(!data)) From patchwork Wed May 24 15:33:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 98566 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6358:3046:b0:115:7a1d:dabb with SMTP id p6csp2916509rwl; Wed, 24 May 2023 08:45:27 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7/zLYY38lAWhX7159bu/5AzTH6bkjD8Te2xXt43kaMWewfnm81KNrpUuEhCDgGmUPb9jPN X-Received: by 2002:a05:6a21:9016:b0:103:f088:105e with SMTP id tq22-20020a056a21901600b00103f088105emr14416863pzb.16.1684943126814; Wed, 24 May 2023 08:45:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684943126; cv=none; d=google.com; s=arc-20160816; b=WcNLwz+Z2pwFd1p5gLRII329i/GS97W2ZJINcRgalC19EqRiK7cb1/4in/6PlU9hYC dErykGd1PLHWQ06zXBUztkphM7Caa5XQYwwdrbT6n2npXBqvnB6HegJbd+5XIgeQ5sJ7 utNyrEEHgzWvdfulSsCXX8EgNQRyOPDwlr4DwsXMO88dcOFN4jk9jORTjZyy2S+EpHgA TFKfsc2yV6hFZUMVmQVEfxRfKhsKJbksblLx/RPppNAkA6q2rBMryYI3uaS1cPxDFIIw f5IoGZCP60a5nEjlqG9PHELAiWzN7EshC30HPvhOq2Ww1Rqrp6fFDhHshbUfZBSIuUmh UbMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=hgbi0fncQFlNqYJvJIncCzqzDTHwL/0fY7UO/lvli4s=; b=tU9eIDxC2+MSrR895s3SFuLXUwbMZws/F6dxkakM/Okh4gODEQbg8VWN5/r587azRb vE8yGGtCeordMt+YoiC+o+WAXWsRgDcwjk/p1J8lAeOScAptP+nSbI94BB6wbHtbA+0l cwUYjPJI20vMi5axFEH9DtsqWhp/sDIuMoaEm1jRuwaMmOrEKiGlpvxe5W41A44fQBVV 3AX5ERGhAEW0uSkDf9BJcC97PAgZpIV+jo84R4f6bpR5a/OmhcYXjPd5d7QOYvHQkXnY +kAKRhCRErvTirPhJm53Nw70ZmA0B8ifQYSitS6l16XptTvgs3LL1CF1zuUZFraQz2M9 lXUw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=hTPgvlWo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e28-20020a63371c000000b0052c910d862fsi2297884pga.244.2023.05.24.08.45.13; Wed, 24 May 2023 08:45:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=hTPgvlWo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237130AbjEXPgX (ORCPT + 99 others); Wed, 24 May 2023 11:36:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33536 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236698AbjEXPfM (ORCPT ); Wed, 24 May 2023 11:35:12 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E5EC5E7A for ; Wed, 24 May 2023 08:34:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684942428; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hgbi0fncQFlNqYJvJIncCzqzDTHwL/0fY7UO/lvli4s=; b=hTPgvlWoInLvn++tgGEbbo5CQe2zyOKlXGZJnFqICvHNGvvqFT0wLnTwNb+rhRHkGgTUFY 6XG3CLTf74Ae1W1e8TJFHzj2e4rSRboLSPjU5M2W5O/WgDHbRAMg3Eq+Mc0RMLFQPDDIxx INSLprQJTn1RfnGkk4xLRKKwy8kuS8w= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-170-IhsFsMjsP6yLutFsBhfOSA-1; Wed, 24 May 2023 11:33:44 -0400 X-MC-Unique: IhsFsMjsP6yLutFsBhfOSA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6EDD280027F; Wed, 24 May 2023 15:33:43 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id C2DC4C1ED99; Wed, 24 May 2023 15:33:41 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next 07/12] net: Clean up users of netdev_alloc_cache and napi_frag_cache Date: Wed, 24 May 2023 16:33:06 +0100 Message-Id: <20230524153311.3625329-8-dhowells@redhat.com> In-Reply-To: <20230524153311.3625329-1-dhowells@redhat.com> References: <20230524153311.3625329-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766790923944550368?= X-GMAIL-MSGID: =?utf-8?q?1766790923944550368?= The users of netdev_alloc_cache and napi_frag_cache don't need to take the bh lock around access to these fragment caches any more as the percpu handling is now done in page_frag_alloc_align(). Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-mm@kvack.org --- include/linux/skbuff.h | 3 ++- net/core/skbuff.c | 29 +++++++++-------------------- 2 files changed, 11 insertions(+), 21 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 41b63e72c6c3..e11a765fe7fa 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -252,7 +252,8 @@ /* Maximum value in skb->csum_level */ #define SKB_MAX_CSUM_LEVEL 3 -#define SKB_DATA_ALIGN(X) ALIGN(X, SMP_CACHE_BYTES) +#define SKB_DATA_ALIGNMENT SMP_CACHE_BYTES +#define SKB_DATA_ALIGN(X) ALIGN(X, SKB_DATA_ALIGNMENT) #define SKB_WITH_OVERHEAD(X) \ ((X) - SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 225a16f3713f..c2840b0dcad9 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -291,27 +291,20 @@ void napi_get_frags_check(struct napi_struct *napi) void *napi_alloc_frag_align(unsigned int fragsz, unsigned int align) { - fragsz = SKB_DATA_ALIGN(fragsz); - + align = min_t(unsigned int, align, SKB_DATA_ALIGNMENT); return page_frag_alloc_align(&napi_frag_cache, fragsz, GFP_ATOMIC, align); } EXPORT_SYMBOL(napi_alloc_frag_align); void *netdev_alloc_frag_align(unsigned int fragsz, unsigned int align) { - void *data; - - fragsz = SKB_DATA_ALIGN(fragsz); - if (in_hardirq() || irqs_disabled()) { - data = page_frag_alloc_align(&netdev_alloc_cache, + align = min_t(unsigned int, align, SKB_DATA_ALIGNMENT); + if (in_hardirq() || irqs_disabled()) + return page_frag_alloc_align(&netdev_alloc_cache, fragsz, GFP_ATOMIC, align); - } else { - local_bh_disable(); - data = page_frag_alloc_align(&napi_frag_cache, + else + return page_frag_alloc_align(&napi_frag_cache, fragsz, GFP_ATOMIC, align); - local_bh_enable(); - } - return data; } EXPORT_SYMBOL(netdev_alloc_frag_align); @@ -709,15 +702,11 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len, if (sk_memalloc_socks()) gfp_mask |= __GFP_MEMALLOC; - if (in_hardirq() || irqs_disabled()) { + if (in_hardirq() || irqs_disabled()) data = page_frag_alloc(&netdev_alloc_cache, len, gfp_mask); - pfmemalloc = folio_is_pfmemalloc(virt_to_folio(data)); - } else { - local_bh_disable(); + else data = page_frag_alloc(&napi_frag_cache, len, gfp_mask); - pfmemalloc = folio_is_pfmemalloc(virt_to_folio(data)); - local_bh_enable(); - } + pfmemalloc = folio_is_pfmemalloc(virt_to_folio(data)); if (unlikely(!data)) return NULL; From patchwork Wed May 24 15:33:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 98558 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6358:3046:b0:115:7a1d:dabb with SMTP id p6csp2914168rwl; Wed, 24 May 2023 08:41:16 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7r1F2UdTctlG1y/RCBhiPf4sd5wMpF7OU48y9F+D9luUVqX1V9lkL37a9uL2ZKapOgXKMb X-Received: by 2002:a17:903:32c3:b0:1ac:7345:f254 with SMTP id i3-20020a17090332c300b001ac7345f254mr21120685plr.33.1684942875933; Wed, 24 May 2023 08:41:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684942875; cv=none; d=google.com; s=arc-20160816; b=KRI8pDhhZcW16+7hDcMi5aQdraeUOlUKOcxtZv56IW3aGNbIG5ExZI1n5n8rHYaJNT DECry0bcr7CbrXnskaPYwYciwVSZNG5BYXKDCP/3eBYx557xLDUrIdAGWFpxODq/7g6V gimAy629W3b8Um9EDtzDjJUYMVwUTICPET4qJeJuLFMXIMFzDMScG2nmUoZB6AKLb8NZ cMObLiSh5t/WJep1+BUQPrjoKVPwvX4v+tqPoxy6k+Mu3hyunUHQTMLug5IXYYQizYGU b28J4RMTlzLAtnSBhsDx7ZeOIERt8a02+4rkax1G0bl3p/8ChRgYyY/pkPw132uA8mi/ n1RA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=PZf0byHPp2E2QKGUTDTD+TlMjPhCnr7dpojTO1v28qY=; b=hfPrPWPS2tmVqmc1gS+i4ihg87PoJjWxcWDi7j42Dl7TVs1RtAkb8ptyaxcgIn35gx SzYv+GbgFWAyi2Ff4AlBfpL2HKlc4D9zDM2j2Pl6zwxdre+Zfh7dnzO8oeL23DC2ubsN mb3xYrQeVH3kT+c/1lcp9MFQwQ88Dx8wFGTC9jGnbODkxXimxT37f1OCXTiWESdAxTJg OycBi6IWZ9bFSk2Vyw57aLuZQ/IdDt65JFr4HJkGT7CUIRSQ0gMI+RXB3thGEJWDLPCk gxcuwMhXPy8t13b/7lcFPI4y4K4qvL2LVpkBYxPDeAv7FZxBrivHlRqoVUPuKoCecu06 LWMQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=jWzDHD7q; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q16-20020a17090311d000b001aaf15227easi9066002plh.332.2023.05.24.08.41.03; Wed, 24 May 2023 08:41:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=jWzDHD7q; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237112AbjEXPgT (ORCPT + 99 others); Wed, 24 May 2023 11:36:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33540 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236323AbjEXPfM (ORCPT ); Wed, 24 May 2023 11:35:12 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 17F5E123 for ; Wed, 24 May 2023 08:34:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684942430; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PZf0byHPp2E2QKGUTDTD+TlMjPhCnr7dpojTO1v28qY=; b=jWzDHD7qRb45gP5eAEdveVXJvNr8sK15Oep6MRqqVvJ2Sovbdc5Sd7u6eODXgpIfZzpYO+ SlxXksB1FMqy7MtnvE8BFdOQN8hwzWz5zmTP/DyJITyMqPBNZzRK3je9LvwakkG2VZAwIg qS8l8D5yMR71WoeA6oF+EpBDVcXcgog= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-26-wF7-Cb2KPtKxfSJZIAw0Dg-1; Wed, 24 May 2023 11:33:46 -0400 X-MC-Unique: wF7-Cb2KPtKxfSJZIAw0Dg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E46EB811E85; Wed, 24 May 2023 15:33:45 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4289240C6EC4; Wed, 24 May 2023 15:33:44 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next 08/12] net: Copy slab data for sendmsg(MSG_SPLICE_PAGES) Date: Wed, 24 May 2023 16:33:07 +0100 Message-Id: <20230524153311.3625329-9-dhowells@redhat.com> In-Reply-To: <20230524153311.3625329-1-dhowells@redhat.com> References: <20230524153311.3625329-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766790661052898355?= X-GMAIL-MSGID: =?utf-8?q?1766790661052898355?= If sendmsg() is passed MSG_SPLICE_PAGES and is given a buffer that contains some data that's resident in the slab, copy/coalesce it rather than returning EIO. Signed-off-by: David Howells cc: Eric Dumazet cc: "David S. Miller" cc: David Ahern cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- include/linux/skbuff.h | 3 +++ net/core/skbuff.c | 33 ++++++++++++++++++++++++++++++--- 2 files changed, 33 insertions(+), 3 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index e11a765fe7fa..11d98990f5f1 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -5084,6 +5084,9 @@ static inline void skb_mark_for_recycle(struct sk_buff *skb) #endif } +ssize_t skb_splice_from_iter(struct sk_buff *skb, struct iov_iter *iter, + ssize_t maxsize, gfp_t gfp); + ssize_t skb_splice_from_iter(struct sk_buff *skb, struct iov_iter *iter, ssize_t maxsize, gfp_t gfp); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index c2840b0dcad9..a16499b9942b 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -6927,17 +6927,44 @@ ssize_t skb_splice_from_iter(struct sk_buff *skb, struct iov_iter *iter, break; } + if (space == 0 && + !skb_can_coalesce(skb, skb_shinfo(skb)->nr_frags, + pages[0], off)) { + iov_iter_revert(iter, len); + break; + } + i = 0; do { struct page *page = pages[i++]; size_t part = min_t(size_t, PAGE_SIZE - off, len); - - ret = -EIO; - if (WARN_ON_ONCE(!sendpage_ok(page))) + bool put = false; + + if (PageSlab(page)) { + const void *p; + void *q; + + p = kmap_local_page(page); + q = page_frag_memdup(NULL, p + off, part, gfp, + ULONG_MAX); + kunmap_local(p); + if (!q) { + iov_iter_revert(iter, len); + ret = -ENOMEM; + goto out; + } + page = virt_to_page(q); + off = offset_in_page(q); + put = true; + } else if (WARN_ON_ONCE(!sendpage_ok(page))) { + ret = -EIO; goto out; + } ret = skb_append_pagefrags(skb, page, off, part, frag_limit); + if (put) + put_page(page); if (ret < 0) { iov_iter_revert(iter, len); goto out; From patchwork Wed May 24 15:33:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 98576 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6358:3046:b0:115:7a1d:dabb with SMTP id p6csp2919499rwl; Wed, 24 May 2023 08:51:04 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6b1zEqBSCTJlBy0d1J23VxtdlVj3ufDNanVaLY2Hp155/AmkirSqPHLIt8ZKZDor3F2RG0 X-Received: by 2002:a17:90b:e8f:b0:253:4ae0:4256 with SMTP id fv15-20020a17090b0e8f00b002534ae04256mr16642297pjb.2.1684943464227; Wed, 24 May 2023 08:51:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684943464; cv=none; d=google.com; s=arc-20160816; b=knr16cNcMofw2q9Mdy/oUrqigEbe+D/WDyW2we9wcLAlx4hFQY8VvBi+X+gjX83LuA mx9xc88uixLM9/td1qdH9DuJthj7P4d+52RtrXs3mUit6kxVC1VnSdt0Tv1LOATV76fO GIJu14wTuYw59MPiG0riF3sdRo4epVigR7R3pYG2UvoIxGbwGbevi0zUyhozWURPi17+ kwFmkEWvkhoZjYKHvuDuEw6319ynm6QR7U7PRcVZuxLLvJLd1CVc+pj0vfERyZCT6kni 0CHxkYwxQyHUX3hP9FcLifq5yaiGk1qgBNrQc1PLVTrbCggSGDpDLxcVgPdZzAuEwBOi L0xQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=hDpyQQ1JedFG6lTxY3chHyZphdbNgdYt0GxXfK9/WHk=; b=l2ivHluvBrt62mZbZi1/PEGXbm2H0EQzVrGRICsP2MgnZL3whJuGgK2l/CmXz0ro/y gHfOwmgmo37j5jJ5UV8FDW2EdtECSsDVJSJ6mPsPI5PPgopsfZBBSj4uyReINLwn72t7 MglZ1xr60z9h3rfw3+9t3jwLgoz0ubhgeaijoDjKsiFoM+t3IODRyF+6oebs/IG4LEVq td/4wltiZvWBF/SNDziqiCdxoNWs3fSu/ZxLK6HhUmuydRY5cZ/YIs4ZByiT29N/JFTH Xf+emHq+q7Aya7yuYiqM9Ut2q8DJqZZjC5FlioTe9Mp2wQQvzqjcjHUHFJw6fClBwu3x 0TTg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=MsotCOIc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e15-20020a17090ab38f00b0024799a3324dsi1566283pjr.162.2023.05.24.08.50.48; Wed, 24 May 2023 08:51:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=MsotCOIc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237221AbjEXPgm (ORCPT + 99 others); Wed, 24 May 2023 11:36:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33790 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236398AbjEXPfZ (ORCPT ); Wed, 24 May 2023 11:35:25 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 16A2413E for ; Wed, 24 May 2023 08:34:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684942436; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hDpyQQ1JedFG6lTxY3chHyZphdbNgdYt0GxXfK9/WHk=; b=MsotCOIcMt265KW/WcHD8SryM+iPuTMZYn5frv12XWfn7YbGysExOJQtOMlSN8/geXaFrL 4qqJXcmmepvd4TavLJS/xNEDPwpTA9Y18+C0CRaUFOnkPssoYgdVLaD4eTTobLcoatSiFs wIhmQZUp9fpRh18CwQlwmPhuUHTfMSI= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-164-4pleeuJrPNyai-iLRApvog-1; Wed, 24 May 2023 11:33:49 -0400 X-MC-Unique: 4pleeuJrPNyai-iLRApvog-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9A7B73849528; Wed, 24 May 2023 15:33:48 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9B045C164ED; Wed, 24 May 2023 15:33:46 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Chuck Lever , Boris Pismenny , John Fastabend Subject: [PATCH net-next 09/12] tls/sw: Support MSG_SPLICE_PAGES Date: Wed, 24 May 2023 16:33:08 +0100 Message-Id: <20230524153311.3625329-10-dhowells@redhat.com> In-Reply-To: <20230524153311.3625329-1-dhowells@redhat.com> References: <20230524153311.3625329-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766791278070449522?= X-GMAIL-MSGID: =?utf-8?q?1766791278070449522?= Make TLS's sendmsg() support MSG_SPLICE_PAGES. This causes pages to be spliced from the source iterator if possible and copied the data if not. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells cc: Chuck Lever cc: Boris Pismenny cc: John Fastabend cc: Jakub Kicinski cc: Eric Dumazet cc: "David S. Miller" cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/tls/tls_sw.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 56 insertions(+), 1 deletion(-) diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 635b8bf6b937..0ccef8aa9951 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -929,6 +929,49 @@ static int tls_sw_push_pending_record(struct sock *sk, int flags) &copied, flags); } +static int rls_sw_sendmsg_splice(struct sock *sk, struct msghdr *msg, + struct sk_msg *msg_pl, size_t try_to_copy, + ssize_t *copied) +{ + struct page *page = NULL, **pages = &page; + + do { + ssize_t part; + size_t off; + bool put = false; + + part = iov_iter_extract_pages(&msg->msg_iter, &pages, + try_to_copy, 1, 0, &off); + if (part <= 0) + return part ?: -EIO; + + if (!sendpage_ok(page)) { + const void *p = kmap_local_page(page); + void *q; + + q = page_frag_memdup(NULL, p + off, part, + sk->sk_allocation, ULONG_MAX); + kunmap_local(p); + if (!q) { + iov_iter_revert(&msg->msg_iter, part); + return -ENOMEM; + } + page = virt_to_page(q); + off = offset_in_page(q); + put = true; + } + + sk_msg_page_add(msg_pl, page, part, off); + sk_mem_charge(sk, part); + if (put) + put_page(page); + *copied += part; + try_to_copy -= part; + } while (try_to_copy && !sk_msg_full(msg_pl)); + + return 0; +} + int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) { long timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT); @@ -1018,6 +1061,17 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) full_record = true; } + if (try_to_copy && (msg->msg_flags & MSG_SPLICE_PAGES)) { + ret = rls_sw_sendmsg_splice(sk, msg, msg_pl, + try_to_copy, &copied); + if (ret < 0) + goto send_end; + tls_ctx->pending_open_record_frags = true; + if (full_record || eor || sk_msg_full(msg_pl)) + goto copied; + continue; + } + if (!is_kvec && (full_record || eor) && !async_capable) { u32 first = msg_pl->sg.end; @@ -1080,8 +1134,9 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) /* Open records defined only if successfully copied, otherwise * we would trim the sg but not reset the open record frags. */ - tls_ctx->pending_open_record_frags = true; copied += try_to_copy; +copied: + tls_ctx->pending_open_record_frags = true; if (full_record || eor) { ret = bpf_exec_tx_verdict(msg_pl, sk, full_record, record_type, &copied, From patchwork Wed May 24 15:33:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 98583 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6358:3046:b0:115:7a1d:dabb with SMTP id p6csp2921535rwl; Wed, 24 May 2023 08:54:48 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5DfC47UdvHeoiG+q+Iv5+4bpVpeuwURPGwJDZGFUAzjsF4DIG/v59ltBD1s8h47ni2/Yh/ X-Received: by 2002:a05:6a20:2585:b0:f0:3e78:715b with SMTP id k5-20020a056a20258500b000f03e78715bmr21036402pzd.40.1684943688713; Wed, 24 May 2023 08:54:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684943688; cv=none; d=google.com; s=arc-20160816; b=uqpb2x12stgDZFxe/VfIlBCOZBDEOBbKAAl2irHbfBDioCtwTuTTbUEK+uZeSIHT0U NRtD8V1vzMAkbvM0mY88RlIO3v4McFLtl2bKYTNtesDJd1iemhc/2TxTuEvq8wjCKUmY p/mxDzC1SkpUO99qByVBRM8tZhbt8G9uuxkm3Wp4oG+/d07rtBc2A/0PpkSNuCwlmkY8 rzov69FBScaYcoyf3IuXE2KvA4gZ1MDtuZGuU8Y4Z7q4AJooxfGpYYsmJM0mfGA+9pJx zD7AWhGwgIbUudg6GheF9fO0VO/riTdLcF7Rfz3BBzTyx/09Ljtq6ea2MT7svOkumlUr p6OQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=/e3etHO5quwZ5TcFINBOwsTOV20jtXx9aa4v5VXGvJw=; b=LXrx/ia6zREchsYEQDxDJTGTVDWn+9WJqhPFQPlw1KPOUooHe0XmAkU67rmIYtVIHt 749WYDnFYtTYT6kRnzHjbFumaKkdJloqlFuoaxZoDW1tLhX1dOqWnzD5Jet+37GiLFAE GOpGreIVFE+Ahwy2EIAgSk5RDSnq8clOCQfVZe7nHppNQoNVJ27Y8FqTAESYekr7z+Dn DZsmRJy7DZFM3HpHXD+3cLnuVOWRYzc6nUa1f9wPzowhCH8IxtXaBp5C9wq1+qzGbRId 9KehhqrX+MZx0Onox3Eq4uzmril5f8riRHLg0rLkCdDaY2w09I3nl2lBWa2QJXGMkqQp MWEA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=CFqr511Z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z22-20020a637e16000000b0051b930ef847si511615pgc.134.2023.05.24.08.54.36; Wed, 24 May 2023 08:54:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=CFqr511Z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236803AbjEXPgi (ORCPT + 99 others); Wed, 24 May 2023 11:36:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33662 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230443AbjEXPfZ (ORCPT ); Wed, 24 May 2023 11:35:25 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A9DE512B for ; Wed, 24 May 2023 08:34:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684942435; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/e3etHO5quwZ5TcFINBOwsTOV20jtXx9aa4v5VXGvJw=; b=CFqr511ZS72WA1Ald2VELwastPSRF0Pj6qsMLrKecrxT8tl8/ALvKHG8GU6mYE5w+pMAH9 kg7d4Tsj6fh06+TCCGy3QEDrl6sAPmcPNxK1JR2J/K1OojDlhzF+g56uuH1np2/6AeuoBF 1HNDjdOuQQ48yoXNcyaAhJFUo+VdXmw= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-519-Sq90ewIjPbyQ2MCsFbryuA-1; Wed, 24 May 2023 11:33:52 -0400 X-MC-Unique: Sq90ewIjPbyQ2MCsFbryuA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 74E1D8032F5; Wed, 24 May 2023 15:33:51 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 74180C164ED; Wed, 24 May 2023 15:33:49 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Chuck Lever , Boris Pismenny , John Fastabend Subject: [PATCH net-next 10/12] tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES Date: Wed, 24 May 2023 16:33:09 +0100 Message-Id: <20230524153311.3625329-11-dhowells@redhat.com> In-Reply-To: <20230524153311.3625329-1-dhowells@redhat.com> References: <20230524153311.3625329-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766791513094694568?= X-GMAIL-MSGID: =?utf-8?q?1766791513094694568?= Convert tls_sw_sendpage() and tls_sw_sendpage_locked() to use sendmsg() with MSG_SPLICE_PAGES rather than directly splicing in the pages itself. [!] Note that tls_sw_sendpage_locked() appears to have the wrong locking upstream. I think the caller will only hold the socket lock, but it should hold tls_ctx->tx_lock too. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells cc: Chuck Lever cc: Boris Pismenny cc: John Fastabend cc: Jakub Kicinski cc: Eric Dumazet cc: "David S. Miller" cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/tls/tls_sw.c | 164 +++++++++-------------------------------------- 1 file changed, 30 insertions(+), 134 deletions(-) diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 0ccef8aa9951..1a5926cc3e84 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -972,7 +972,7 @@ static int rls_sw_sendmsg_splice(struct sock *sk, struct msghdr *msg, return 0; } -int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) +static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) { long timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT); struct tls_context *tls_ctx = tls_get_ctx(sk); @@ -995,15 +995,6 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) int ret = 0; int pending; - if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | - MSG_CMSG_COMPAT)) - return -EOPNOTSUPP; - - ret = mutex_lock_interruptible(&tls_ctx->tx_lock); - if (ret) - return ret; - lock_sock(sk); - if (unlikely(msg->msg_controllen)) { ret = tls_process_cmsg(sk, msg, &record_type); if (ret) { @@ -1204,157 +1195,62 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) send_end: ret = sk_stream_error(sk, msg->msg_flags, ret); - - release_sock(sk); - mutex_unlock(&tls_ctx->tx_lock); return copied > 0 ? copied : ret; } -static int tls_sw_do_sendpage(struct sock *sk, struct page *page, - int offset, size_t size, int flags) +int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) { - long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT); struct tls_context *tls_ctx = tls_get_ctx(sk); - struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx); - struct tls_prot_info *prot = &tls_ctx->prot_info; - unsigned char record_type = TLS_RECORD_TYPE_DATA; - struct sk_msg *msg_pl; - struct tls_rec *rec; - int num_async = 0; - ssize_t copied = 0; - bool full_record; - int record_room; - int ret = 0; - bool eor; - - eor = !(flags & MSG_SENDPAGE_NOTLAST); - sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk); - - /* Call the sk_stream functions to manage the sndbuf mem. */ - while (size > 0) { - size_t copy, required_size; - - if (sk->sk_err) { - ret = -sk->sk_err; - goto sendpage_end; - } - - if (ctx->open_rec) - rec = ctx->open_rec; - else - rec = ctx->open_rec = tls_get_rec(sk); - if (!rec) { - ret = -ENOMEM; - goto sendpage_end; - } - - msg_pl = &rec->msg_plaintext; - - full_record = false; - record_room = TLS_MAX_PAYLOAD_SIZE - msg_pl->sg.size; - copy = size; - if (copy >= record_room) { - copy = record_room; - full_record = true; - } - - required_size = msg_pl->sg.size + copy + prot->overhead_size; - - if (!sk_stream_memory_free(sk)) - goto wait_for_sndbuf; -alloc_payload: - ret = tls_alloc_encrypted_msg(sk, required_size); - if (ret) { - if (ret != -ENOSPC) - goto wait_for_memory; - - /* Adjust copy according to the amount that was - * actually allocated. The difference is due - * to max sg elements limit - */ - copy -= required_size - msg_pl->sg.size; - full_record = true; - } - - sk_msg_page_add(msg_pl, page, copy, offset); - sk_mem_charge(sk, copy); - - offset += copy; - size -= copy; - copied += copy; - - tls_ctx->pending_open_record_frags = true; - if (full_record || eor || sk_msg_full(msg_pl)) { - ret = bpf_exec_tx_verdict(msg_pl, sk, full_record, - record_type, &copied, flags); - if (ret) { - if (ret == -EINPROGRESS) - num_async++; - else if (ret == -ENOMEM) - goto wait_for_memory; - else if (ret != -EAGAIN) { - if (ret == -ENOSPC) - ret = 0; - goto sendpage_end; - } - } - } - continue; -wait_for_sndbuf: - set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); -wait_for_memory: - ret = sk_stream_wait_memory(sk, &timeo); - if (ret) { - if (ctx->open_rec) - tls_trim_both_msgs(sk, msg_pl->sg.size); - goto sendpage_end; - } + int ret; - if (ctx->open_rec) - goto alloc_payload; - } + if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | + MSG_CMSG_COMPAT | MSG_SPLICE_PAGES | + MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY)) + return -EOPNOTSUPP; - if (num_async) { - /* Transmit if any encryptions have completed */ - if (test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask)) { - cancel_delayed_work(&ctx->tx_work.work); - tls_tx_records(sk, flags); - } - } -sendpage_end: - ret = sk_stream_error(sk, flags, ret); - return copied > 0 ? copied : ret; + ret = mutex_lock_interruptible(&tls_ctx->tx_lock); + if (ret) + return ret; + lock_sock(sk); + ret = tls_sw_sendmsg_locked(sk, msg, size); + release_sock(sk); + mutex_unlock(&tls_ctx->tx_lock); + return ret; } int tls_sw_sendpage_locked(struct sock *sk, struct page *page, int offset, size_t size, int flags) { + struct bio_vec bvec; + struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, }; + if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY | MSG_NO_SHARED_FRAGS)) return -EOPNOTSUPP; + if (flags & MSG_SENDPAGE_NOTLAST) + msg.msg_flags |= MSG_MORE; - return tls_sw_do_sendpage(sk, page, offset, size, flags); + bvec_set_page(&bvec, page, size, offset); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); + return tls_sw_sendmsg_locked(sk, &msg, size); } int tls_sw_sendpage(struct sock *sk, struct page *page, int offset, size_t size, int flags) { - struct tls_context *tls_ctx = tls_get_ctx(sk); - int ret; + struct bio_vec bvec; + struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, }; if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY)) return -EOPNOTSUPP; + if (flags & MSG_SENDPAGE_NOTLAST) + msg.msg_flags |= MSG_MORE; - ret = mutex_lock_interruptible(&tls_ctx->tx_lock); - if (ret) - return ret; - lock_sock(sk); - ret = tls_sw_do_sendpage(sk, page, offset, size, flags); - release_sock(sk); - mutex_unlock(&tls_ctx->tx_lock); - return ret; + bvec_set_page(&bvec, page, size, offset); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); + return tls_sw_sendmsg(sk, &msg, size); } static int From patchwork Wed May 24 15:33:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 98562 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6358:3046:b0:115:7a1d:dabb with SMTP id p6csp2914594rwl; Wed, 24 May 2023 08:42:03 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5AXwF4ECw5Bu22T0i2unY39I29OcPaqA6b6cV0yaDLXqoT8Q/zLHU1FpRRMOkHzXCFTnli X-Received: by 2002:a17:902:c944:b0:1ae:66cf:b90f with SMTP id i4-20020a170902c94400b001ae66cfb90fmr21017289pla.66.1684942922878; Wed, 24 May 2023 08:42:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684942922; cv=none; d=google.com; s=arc-20160816; b=oOnLI/UDsSJfKcxdpnv1efswqVgQB6Lzj1a7V0WChKm43YD6N1kLwVf69dg1B5QE11 H5jGEOEeTZmjPokW5VBpxIXV4E5mSx96ZqYtakOsYr/byyo1kNlJIfYS8Ff5KBbRsO5G OoTKYUbYjh0TO0alVI/38/I5OTDSRy/1PwvbSGFk9+9m35M6PJ19QFf7+HfCz1wrcHLz C3g1Ay49dSk+ZgqLBenO2yq0kKp7hk5PROb8BUDTb/GOUXac/emk3mAXJQLIFHhWh0S3 y1FYmRoxeF6E80YTcL7P7kvG+DMMaXsKfBegRhdpWspK66AQFWRgJznwlkX3x8Yh3Y8e w00A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=E1PF2dIIOP/VptjcBGEUgdHYPyQQ9EA3G7wUORmRH+w=; b=BETGkbppH/O+e5ppX8GxYhj4x7rwGV1AIZ2rDk8xcRMa7+vTlzM7Om0OvFWNHChJZE kQY1/CcCsfRJVkuYvxn4K0q0GCMHXb/DPISU0DZpV6cCWPA2ZyZHv8eOd5rU2Al+fUta xV+dgLceWhMNx0HPuNbGk0TaddMEW2ZwUr1rEeIV8cK0hpQBlyd3Nc9eG/fNmTtx4Jr3 DQs4TyGF31EIpnlHgU06lA8GjJiWfRZpLIOqxxr4289qeY+M0pHurI3G9xPXnaJvJqs6 obwHqBln+bpWIHsEatGIy3i7N1Tix+4KrvIv+R4HaOdUV/TvvdZvpP3VebdHAItaK7Fk 2L7w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=WvT7Nf13; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l19-20020a170902d35300b001ac404445c9si4136570plk.139.2023.05.24.08.41.49; Wed, 24 May 2023 08:42:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=WvT7Nf13; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237233AbjEXPgq (ORCPT + 99 others); Wed, 24 May 2023 11:36:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33738 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236623AbjEXPf3 (ORCPT ); Wed, 24 May 2023 11:35:29 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 557DEE6C for ; Wed, 24 May 2023 08:34:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684942461; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=E1PF2dIIOP/VptjcBGEUgdHYPyQQ9EA3G7wUORmRH+w=; b=WvT7Nf13rQbO8W26d93jhsMwEGqyTG4lh6BXOYOPMKXn8QaRhkj3VLpNAt+cioRAxJuojO ZL9zJODHjSBCf7B+ov+WBOgGaQOz1C4CdRnQU+wbQzfKHjqEmjLUvUwSQ9ODltB9RaFgaC YJH6DxXIxOUCxL1JOIT0XCi37qAxvl4= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-237-qbT634KSMquY-3XAUHekDg-1; Wed, 24 May 2023 11:34:16 -0400 X-MC-Unique: qbT634KSMquY-3XAUHekDg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 18F5B3C0F24B; Wed, 24 May 2023 15:34:11 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2D91C2166B25; Wed, 24 May 2023 15:33:52 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Chuck Lever , Boris Pismenny , John Fastabend Subject: [PATCH net-next 11/12] tls/device: Support MSG_SPLICE_PAGES Date: Wed, 24 May 2023 16:33:10 +0100 Message-Id: <20230524153311.3625329-12-dhowells@redhat.com> In-Reply-To: <20230524153311.3625329-1-dhowells@redhat.com> References: <20230524153311.3625329-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766790710353775687?= X-GMAIL-MSGID: =?utf-8?q?1766790710353775687?= Make TLS's device sendmsg() support MSG_SPLICE_PAGES. This causes pages to be spliced from the source iterator if possible and copied the data if not. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells cc: Chuck Lever cc: Boris Pismenny cc: John Fastabend cc: Jakub Kicinski cc: Eric Dumazet cc: "David S. Miller" cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/tls/tls_device.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c index daeff54bdbfa..ee07f6e67d52 100644 --- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -508,7 +508,30 @@ static int tls_push_data(struct sock *sk, tls_append_frag(record, &zc_pfrag, copy); iter_offset.offset += copy; + } else if (copy && (flags & MSG_SPLICE_PAGES)) { + struct page_frag zc_pfrag; + struct page **pages = &zc_pfrag.page; + size_t off; + + rc = iov_iter_extract_pages(iter_offset.msg_iter, &pages, + copy, 1, 0, &off); + if (rc <= 0) { + if (rc == 0) + rc = -EIO; + goto handle_error; + } + copy = rc; + + if (!sendpage_ok(zc_pfrag.page)) { + iov_iter_revert(iter_offset.msg_iter, copy); + goto no_zcopy_this_page; + } + + zc_pfrag.offset = off; + zc_pfrag.size = copy; + tls_append_frag(record, &zc_pfrag, copy); } else if (copy) { +no_zcopy_this_page: copy = min_t(size_t, copy, pfrag->size - pfrag->offset); rc = tls_device_copy_data(page_address(pfrag->page) + @@ -571,6 +594,9 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) union tls_iter_offset iter; int rc; + if (!tls_ctx->zerocopy_sendfile) + msg->msg_flags &= ~MSG_SPLICE_PAGES; + mutex_lock(&tls_ctx->tx_lock); lock_sock(sk); From patchwork Wed May 24 15:33:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 98571 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6358:3046:b0:115:7a1d:dabb with SMTP id p6csp2917752rwl; Wed, 24 May 2023 08:47:44 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4BxRn6C/GPE8aCsyIAwGQIKvXN2ExeXqcCn+/LIzWbe4yJJAdhEoRebG9C3uinWMAfojTd X-Received: by 2002:a05:6a20:841c:b0:10b:cf0a:f826 with SMTP id c28-20020a056a20841c00b0010bcf0af826mr11357051pzd.46.1684943264340; Wed, 24 May 2023 08:47:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684943264; cv=none; d=google.com; s=arc-20160816; b=AgSqJDHXmb5s/9pga+/X6YHeZGLdrjZRT8W2XTjI7zb2X+CmrlVHus79kvpXXsqnEY 4SCKQfgLQgmlR0bzKk/GRyT5OG+/62PEH4+ZSlBMH09j/sTPCbxnGUbuFVGL1m4T2QP3 V84WqWH1fo7MGH2K/GlSMRINhQ+O326Uyk6SH0nMVf+tGOuOXzATuue5Py6Cdjg/swGT Vt8OZdOc66gIvJhXkl2cIDdg7sCdElaDnZuIg+MMEC8hF/75j1ROPoCmcqwk7Mu9YVbK r+htmNY2NQz7l+pw/RpOQz39hFp61MgqELPa2BfUUMENggp6qG8gk9beB8+NVYNtk3Am n7MA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=k9cQP4HmvFAjJY7J8On2WvrNLYKoX68FiRQGhNJuMdw=; b=wRXZrRYLH9VWF4VaVD86Ri7pQ5guekwGz661f6CxXY/4PlD1tMkGLqntafhMoc8MfJ IoBDH8vhZxXn1NN7TGYAKXlHHRFejMtbxg5mwTCQBA3X11BdAUdoM5U+GQQOutgcGs8b 57qh5OqFOVu9uk2dffjg2Ck1z3bpEbGXJS7DgDQHykOdXq6N9TfEULUSlJLjceJ1fBwk RiiSl2zlRH6+Zz2N8y6u/WNisiR6Dn7AY+GSt11Y3Ly5WAFahZipSH6cuadsXGdJL2AE BGW759SdUpGZ7jn6kHOok3ys13ZMaxANN/8KYcvfVy04rJvfVu6DQNRDcDH/IxHmzFs3 U60A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=VkNKzs7F; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c5-20020aa79525000000b006434a04f2fcsi8843847pfp.297.2023.05.24.08.47.31; Wed, 24 May 2023 08:47:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=VkNKzs7F; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236456AbjEXPgv (ORCPT + 99 others); Wed, 24 May 2023 11:36:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33968 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236389AbjEXPfg (ORCPT ); Wed, 24 May 2023 11:35:36 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CF9691A8 for ; Wed, 24 May 2023 08:34:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684942464; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=k9cQP4HmvFAjJY7J8On2WvrNLYKoX68FiRQGhNJuMdw=; b=VkNKzs7FQGBCMESNKhArybTK35MjEh0U0XX5ISzqpif58pqJigtr1S/+6fByGkyzeK6nsI fVlSSNKsV9h3h8jAkRjYlZraYrHSC5RZKj1VxXjNfwPkGtjM/SyHKN9G2Hj1UghcT/e5Me ga1aKUno3vcSCLqM++SHX/GbS64nctE= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-121-TJrS7NBWNQOPag9a_0ZUCA-1; Wed, 24 May 2023 11:34:18 -0400 X-MC-Unique: TJrS7NBWNQOPag9a_0ZUCA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 712A928078D3; Wed, 24 May 2023 15:34:17 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id DE8F140CFD45; Wed, 24 May 2023 15:34:14 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Chuck Lever , Boris Pismenny , John Fastabend Subject: [PATCH net-next 12/12] tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES Date: Wed, 24 May 2023 16:33:11 +0100 Message-Id: <20230524153311.3625329-13-dhowells@redhat.com> In-Reply-To: <20230524153311.3625329-1-dhowells@redhat.com> References: <20230524153311.3625329-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766791068287711308?= X-GMAIL-MSGID: =?utf-8?q?1766791068287711308?= Convert tls_device_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather than directly splicing in the pages itself. With that, the tls_iter_offset union is no longer necessary and can be replaced with an iov_iter pointer and the zc_page argument to tls_push_data() can also be removed. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells cc: Chuck Lever cc: Boris Pismenny cc: John Fastabend cc: Jakub Kicinski cc: Eric Dumazet cc: "David S. Miller" cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/tls/tls_device.c | 81 ++++++++++---------------------------------- 1 file changed, 18 insertions(+), 63 deletions(-) diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c index ee07f6e67d52..f2c895009314 100644 --- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -422,16 +422,10 @@ static int tls_device_copy_data(void *addr, size_t bytes, struct iov_iter *i) return 0; } -union tls_iter_offset { - struct iov_iter *msg_iter; - int offset; -}; - static int tls_push_data(struct sock *sk, - union tls_iter_offset iter_offset, + struct iov_iter *iter, size_t size, int flags, - unsigned char record_type, - struct page *zc_page) + unsigned char record_type) { struct tls_context *tls_ctx = tls_get_ctx(sk); struct tls_prot_info *prot = &tls_ctx->prot_info; @@ -499,21 +493,12 @@ static int tls_push_data(struct sock *sk, record = ctx->open_record; copy = min_t(size_t, size, max_open_record_len - record->len); - if (copy && zc_page) { - struct page_frag zc_pfrag; - - zc_pfrag.page = zc_page; - zc_pfrag.offset = iter_offset.offset; - zc_pfrag.size = copy; - tls_append_frag(record, &zc_pfrag, copy); - - iter_offset.offset += copy; - } else if (copy && (flags & MSG_SPLICE_PAGES)) { + if (copy && (flags & MSG_SPLICE_PAGES)) { struct page_frag zc_pfrag; struct page **pages = &zc_pfrag.page; size_t off; - rc = iov_iter_extract_pages(iter_offset.msg_iter, &pages, + rc = iov_iter_extract_pages(iter, &pages, copy, 1, 0, &off); if (rc <= 0) { if (rc == 0) @@ -523,7 +508,7 @@ static int tls_push_data(struct sock *sk, copy = rc; if (!sendpage_ok(zc_pfrag.page)) { - iov_iter_revert(iter_offset.msg_iter, copy); + iov_iter_revert(iter, copy); goto no_zcopy_this_page; } @@ -536,7 +521,7 @@ static int tls_push_data(struct sock *sk, rc = tls_device_copy_data(page_address(pfrag->page) + pfrag->offset, copy, - iter_offset.msg_iter); + iter); if (rc) goto handle_error; tls_append_frag(record, pfrag, copy); @@ -591,7 +576,6 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) { unsigned char record_type = TLS_RECORD_TYPE_DATA; struct tls_context *tls_ctx = tls_get_ctx(sk); - union tls_iter_offset iter; int rc; if (!tls_ctx->zerocopy_sendfile) @@ -606,8 +590,7 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) goto out; } - iter.msg_iter = &msg->msg_iter; - rc = tls_push_data(sk, iter, size, msg->msg_flags, record_type, NULL); + rc = tls_push_data(sk, &msg->msg_iter, size, msg->msg_flags, record_type); out: release_sock(sk); @@ -618,44 +601,18 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) int tls_device_sendpage(struct sock *sk, struct page *page, int offset, size_t size, int flags) { - struct tls_context *tls_ctx = tls_get_ctx(sk); - union tls_iter_offset iter_offset; - struct iov_iter msg_iter; - char *kaddr; - struct kvec iov; - int rc; + struct bio_vec bvec; + struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, }; if (flags & MSG_SENDPAGE_NOTLAST) - flags |= MSG_MORE; - - mutex_lock(&tls_ctx->tx_lock); - lock_sock(sk); + msg.msg_flags |= MSG_MORE; - if (flags & MSG_OOB) { - rc = -EOPNOTSUPP; - goto out; - } - - if (tls_ctx->zerocopy_sendfile) { - iter_offset.offset = offset; - rc = tls_push_data(sk, iter_offset, size, - flags, TLS_RECORD_TYPE_DATA, page); - goto out; - } - - kaddr = kmap(page); - iov.iov_base = kaddr + offset; - iov.iov_len = size; - iov_iter_kvec(&msg_iter, ITER_SOURCE, &iov, 1, size); - iter_offset.msg_iter = &msg_iter; - rc = tls_push_data(sk, iter_offset, size, flags, TLS_RECORD_TYPE_DATA, - NULL); - kunmap(page); + if (flags & MSG_OOB) + return -EOPNOTSUPP; -out: - release_sock(sk); - mutex_unlock(&tls_ctx->tx_lock); - return rc; + bvec_set_page(&bvec, page, size, offset); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); + return tls_device_sendmsg(sk, &msg, size); } struct tls_record_info *tls_get_record(struct tls_offload_context_tx *context, @@ -720,12 +677,10 @@ EXPORT_SYMBOL(tls_get_record); static int tls_device_push_pending_record(struct sock *sk, int flags) { - union tls_iter_offset iter; - struct iov_iter msg_iter; + struct iov_iter iter; - iov_iter_kvec(&msg_iter, ITER_SOURCE, NULL, 0, 0); - iter.msg_iter = &msg_iter; - return tls_push_data(sk, iter, 0, flags, TLS_RECORD_TYPE_DATA, NULL); + iov_iter_kvec(&iter, ITER_SOURCE, NULL, 0, 0); + return tls_push_data(sk, &iter, 0, flags, TLS_RECORD_TYPE_DATA); } void tls_device_write_space(struct sock *sk, struct tls_context *ctx)