From patchwork Mon Jan 16 23:12:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44383 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1459843wrn; Mon, 16 Jan 2023 15:50:02 -0800 (PST) X-Google-Smtp-Source: AMrXdXsp1XVZ9HW2UNVQVCU8FmiskwHYdYvkLgxFXCi7tLbGgxgcFn/Yp9Ls/HnZ1k6NUpmYb0SL X-Received: by 2002:a05:6a21:8cc3:b0:b8:7d27:2cbd with SMTP id ta3-20020a056a218cc300b000b87d272cbdmr650376pzb.43.1673913002289; Mon, 16 Jan 2023 15:50:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673913002; cv=none; d=google.com; s=arc-20160816; b=Gp23OstgygCnyYafok0wSMvj6gxLus2cL6OmMS/HGbmDMqNFUe8wZqiqDgdbux4c42 OhYCJr8si2aJ6rfiygmmJbnws2YJaNCz+h9qlZLY9U+gHQCf6UGnqKoSy2bOAI0p0Kyr S9ynuH3efG72JN3WC2mkhnEEuHKRxkLBSwifdyuoYIW6ogADiL1Q5yJDgJkQLLNfwI8R VsVRpwfbtP3Q3Rt5skl5sWiiGh8S2V1raaAzEcr3FYsRXwMuUCtidHXvrQJbkzTc/GtF GjeioYEB4pMzCtKlLGNBzVfLWwk67Vi2Lbxwo7YRqDaZ1+fvqozeicA6U3oiLUAA6+dC v79A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=RemqltCjkZ1MEJ2va2p40ZyP53eaWFY2Pd4/PSibIZ0=; b=ygO4KqHWa9tD325+QvO2AffVrVoIm3XftMq+VsSG0wdNWDEO01GGYUzycICsVuPPNX KPur0zIYTsPD36H/lFcCLD09QnjquRI9OF5Wl3msS1fSf2r/tEfYkTtYGvlnJNU7JTm+ ZOMIGB/4dzhXg2+k6TF4OhCB4sy8CDjzSQZFKEBeG/tqvdXi4HBsgWK6TOw2E0Tu99BC GyQNZflEyoDfOTQ8fxA0ZfZPEVlg4aCHxmhNaQDxQ4YUuODQNcQJH/ZycP6COgQfsgFr vsQVqnaSAFi/jgfUgB1xdfnrxanvQUT4dencPKeVdC34ie/OonU7/dyyNr3Lm85GCDK2 Y8zw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Gn2wrn8g; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x24-20020a63db58000000b0047888d9f3aesi30736345pgi.547.2023.01.16.15.49.50; Mon, 16 Jan 2023 15:50:02 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Gn2wrn8g; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235097AbjAPXW5 (ORCPT + 99 others); Mon, 16 Jan 2023 18:22:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34164 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235463AbjAPXUq (ORCPT ); Mon, 16 Jan 2023 18:20:46 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 53E372E0E0 for ; Mon, 16 Jan 2023 15:13:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910738; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RemqltCjkZ1MEJ2va2p40ZyP53eaWFY2Pd4/PSibIZ0=; b=Gn2wrn8gL7YvuNqRwgyV8UApYFZvadnYUlQqlELH9bF9IArtQ5jxh2CS7r5352WRrr/2RT 96B3PCfAGiZQ9kjBytRy30jO9YTBDj93Jj3gRrCDnM73toE3OOs+Y5CRwxojff9jhhYILs rGsOqCK2C5Wt0VCqozWaBibkAL/XPWk= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-631-QixFgPowM6ytp9eOTe22RA-1; Mon, 16 Jan 2023 18:12:13 -0500 X-MC-Unique: QixFgPowM6ytp9eOTe22RA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8248E1C0432A; Mon, 16 Jan 2023 23:12:12 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id B86192026D4B; Mon, 16 Jan 2023 23:12:10 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 34/34] net: [RFC][WIP] Make __zerocopy_sg_from_iter() correctly pin or leave pages unref'd From: David Howells To: Al Viro Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , netdev@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:12:10 +0000 Message-ID: <167391073019.2311931.11127613443740355536.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755225000046306934?= X-GMAIL-MSGID: =?utf-8?q?1755225000046306934?= Make __zerocopy_sg_from_iter() call iov_iter_extract_pages() to get pages that have been ref'd, pinned or left alone as appropriate. As this is only used for source buffers, pinning isn't an option, but being unref'd is. The way __zerocopy_sg_from_iter() merges fragments is also altered, such that fragments must also match their cleanup modes to be merged. An extra helper and wrapper, folio_put_unpin_sub() and page_put_unpin_sub() are added to allow multiple refs to be put/unpinned. Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: netdev@vger.kernel.org --- include/linux/mm.h | 2 ++ mm/gup.c | 25 +++++++++++++++++++++++++ net/core/datagram.c | 23 +++++++++++++---------- 3 files changed, 40 insertions(+), 10 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index f14edb192394..e3923b89c75e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1368,7 +1368,9 @@ static inline bool is_cow_mapping(vm_flags_t flags) #endif void folio_put_unpin(struct folio *folio, unsigned int flags); +void folio_put_unpin_sub(struct folio *folio, unsigned int flags, unsigned int refs); void page_put_unpin(struct page *page, unsigned int flags); +void page_put_unpin_sub(struct page *page, unsigned int flags, unsigned int refs); /* * The identification function is mainly used by the buddy allocator for diff --git a/mm/gup.c b/mm/gup.c index 3ee4b4c7e0cb..49dd27ba6c13 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -213,6 +213,31 @@ void page_put_unpin(struct page *page, unsigned int flags) } EXPORT_SYMBOL_GPL(page_put_unpin); +/** + * folio_put_unpin_sub - Unpin/put a folio as appropriate + * @folio: The folio to release + * @flags: gup flags indicating the mode of release (FOLL_*) + * @refs: Number of refs/pins to drop + * + * Release a folio according to the flags. If FOLL_GET is set, the folio has a + * ref dropped; if FOLL_PIN is set, it is unpinned; otherwise it is left + * unaltered. + */ +void folio_put_unpin_sub(struct folio *folio, unsigned int flags, + unsigned int refs) +{ + if (flags & (FOLL_GET | FOLL_PIN)) + gup_put_folio(folio, refs, flags); +} +EXPORT_SYMBOL_GPL(folio_put_unpin_sub); + +void page_put_unpin_sub(struct page *page, unsigned int flags, + unsigned int refs) +{ + folio_put_unpin_sub(page_folio(page), flags, refs); +} +EXPORT_SYMBOL_GPL(page_put_unpin_sub); + /** * try_grab_page() - elevate a page's refcount by a flag-dependent amount * @page: pointer to page to be grabbed diff --git a/net/core/datagram.c b/net/core/datagram.c index 122bfb144d32..63ea1f8817e0 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -614,6 +614,7 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, struct sk_buff *skb, struct iov_iter *from, size_t length) { + unsigned int cleanup_mode = iov_iter_extract_mode(from, FOLL_SOURCE_BUF); int frag; if (msg && msg->msg_ubuf && msg->sg_from_iter) @@ -622,7 +623,7 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, frag = skb_shinfo(skb)->nr_frags; while (length && iov_iter_count(from)) { - struct page *pages[MAX_SKB_FRAGS]; + struct page *pages[MAX_SKB_FRAGS], **ppages = pages; struct page *last_head = NULL; size_t start; ssize_t copied; @@ -632,9 +633,9 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, if (frag == MAX_SKB_FRAGS) return -EMSGSIZE; - copied = iov_iter_get_pages(from, pages, length, - MAX_SKB_FRAGS - frag, &start, - FOLL_SOURCE_BUF); + copied = iov_iter_extract_pages(from, &ppages, length, + MAX_SKB_FRAGS - frag, + FOLL_SOURCE_BUF, &start); if (copied < 0) return -EFAULT; @@ -662,12 +663,14 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, skb_frag_t *last = &skb_shinfo(skb)->frags[frag - 1]; if (head == skb_frag_page(last) && + cleanup_mode == skb_frag_cleanup(last) && start == skb_frag_off(last) + skb_frag_size(last)) { skb_frag_size_add(last, size); /* We combined this page, we need to release - * a reference. Since compound pages refcount - * is shared among many pages, batch the refcount - * adjustments to limit false sharing. + * a reference or a pin. Since compound pages + * refcount is shared among many pages, batch + * the refcount adjustments to limit false + * sharing. */ last_head = head; refs++; @@ -675,14 +678,14 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, } } if (refs) { - page_ref_sub(last_head, refs); + page_put_unpin_sub(last_head, cleanup_mode, refs); refs = 0; } skb_fill_page_desc_noacc(skb, frag++, head, start, size, - FOLL_GET); + cleanup_mode); } if (refs) - page_ref_sub(last_head, refs); + page_put_unpin_sub(last_head, cleanup_mode, refs); } return 0; }