From patchwork Wed Jan 11 14:28:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 42046 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp3362215wrt; Wed, 11 Jan 2023 06:49:54 -0800 (PST) X-Google-Smtp-Source: AMrXdXv7mS8kzdsBkg8gilZgFfNBlJeKORIrW3oSWCaD8OWOZsO/TsTW69j8jIxUDZFhSPlp6VoR X-Received: by 2002:a62:fb0d:0:b0:583:319a:4413 with SMTP id x13-20020a62fb0d000000b00583319a4413mr21813598pfm.24.1673448594279; Wed, 11 Jan 2023 06:49:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673448594; cv=none; d=google.com; s=arc-20160816; b=D81Wkxxe7dmbY7GmXtXgH7DWx/ltLoz8MHZsNLW7rH+vaIAoDUywEvlwfGKLC1Asak SjBEvp9C4fP8Kv71k62H/JQ6Kw+Fh4gXzLUYzsU0Eu/viBUYmyDB7yI26yipHXiwSSey d2c6s817n1CeMJjHSE4v3eYSAx9dCvOz62SQAUcE4oo2TdhAYp8yQNiN54wrcfuHAadJ awEiZN68Lk8S5G4WUObq9ayGPpKbRJUxPp9PdrzH0ZYV10MitLtSF9u8MZDvA0cdHSZc /22Tz/M15oe53Qpr+3F0kJPzeiGM3k4faT+bfBLhYrzRExaBvcg6WO2Y4osoBP0f66Lp XEjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=qnUzA+R5KfP2NcqZCCm2WVAch9Sq+rIedNIRc+Rrk4Y=; b=IjcLQo+W+XdsiehMyE210Gm6eJ9YqQpZ84xTnX170S235/cmNFw/Uek1KVh1J6HhyR 37HsGL6MTa9K6/Zu4U2bqmhBPLhj1qiCcsqoqCqY1pBpQ3QJl+vaFlgklpWAW11WxGqI fFEn41BMXCjvQikZ32wLFZ3dRiO3L4qFfIq6MMrp84oDXDNyraSnS6+rVqbTUxpc+xrO bXheGwIM0FioQS6bQDn1GZRXutuo45ujYjJQ7DWusCCTzlVTUj5TI5T4W+MgZVfDgeAb Q2dfH/ly9wdXHVo6gChop29zICc1iFk8hnWT6Nps1Bbu8z/l4r0XpAij1TH7v8Zr0bSq MN6w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=bV0CreZ8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 73-20020a62144c000000b005815a37207esi14363322pfu.168.2023.01.11.06.49.40; Wed, 11 Jan 2023 06:49:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=bV0CreZ8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238196AbjAKOek (ORCPT + 99 others); Wed, 11 Jan 2023 09:34:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43004 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238568AbjAKOcL (ORCPT ); Wed, 11 Jan 2023 09:32:11 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E03191EC64 for ; Wed, 11 Jan 2023 06:28:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673447301; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qnUzA+R5KfP2NcqZCCm2WVAch9Sq+rIedNIRc+Rrk4Y=; b=bV0CreZ8FVPfrjtIoDfKiOpgREP1ulwVyUsX4+CdGv77byad5U1pdWAolmY8IIwuj5ylSc zzFKelkpwS5vbzyGVWQAVtw2lZyIB/Wovb1U54MS6EbaccEO6gVKAzRaqLoq3DWUXmUc2/ XvMxafv1L4J8FJ7qOutoVcLMcfPFzcE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-18-D8hwKHlfMBi3d7eLYkyNYA-1; Wed, 11 Jan 2023 09:28:16 -0500 X-MC-Unique: D8hwKHlfMBi3d7eLYkyNYA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4290218E6C41; Wed, 11 Jan 2023 14:28:15 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.87]) by smtp.corp.redhat.com (Postfix) with ESMTP id 63C391121314; Wed, 11 Jan 2023 14:28:13 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v5 5/9] netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator From: David Howells To: Al Viro Cc: Jeff Layton , Steve French , Shyam Prasad N , Rohith Surabattula , linux-cachefs@redhat.com, linux-cifs@vger.kernel.org, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 11 Jan 2023 14:28:12 +0000 Message-ID: <167344729278.2425628.3277966637577509831.stgit@warthog.procyon.org.uk> In-Reply-To: <167344725490.2425628.13771289553670112965.stgit@warthog.procyon.org.uk> References: <167344725490.2425628.13771289553670112965.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754738033140878536?= X-GMAIL-MSGID: =?utf-8?q?1754738033140878536?= Add a function to extract the pages from a user-space supplied iterator (UBUF- or IOVEC-type) into a BVEC-type iterator, retaining the pages by getting a ref on them (ITER_SOURCE, ie. WRITE) or pinning them (ITER_DEST, ie. READ) as we go. This is useful in three situations: (1) A userspace thread may have a sibling that unmaps or remaps the process's VM during the operation, changing the assignment of the pages and potentially causing an error. Retaining the pages keeps some pages around, even if this occurs; futher, we find out at the point of extraction if EFAULT is going to be incurred. (2) Pages might get swapped out/discarded if not retained, so we want to retain them to avoid the reload causing a deadlock due to a DIO from/to an mmapped region on the same file. (3) The iterator may get passed to sendmsg() by the filesystem. If a fault occurs, we may get a short write to a TCP stream that's then tricky to recover from. We don't deal with other types of iterator here, leaving it to other mechanisms to retain the pages (eg. PG_locked, PG_writeback and the pipe lock). Changes: ======== ver #3) - Switch to using EXPORT_SYMBOL_GPL to prevent indirect 3rd-party access to get/pin_user_pages_fast()[1]. Signed-off-by: David Howells cc: Jeff Layton cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: linux-cachefs@redhat.com cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/Y3zFzdWnWlEJ8X8/@infradead.org/ [1] Link: https://lore.kernel.org/r/166697255265.61150.6289490555867717077.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732026503.3186319.12020462741051772825.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166869690376.3723671.8813331570219190705.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166920904810.1461876.11603559311247187100.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/166997422579.9475.12101700945635692496.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/167305164634.1521586.12199658904363317567.stgit@warthog.procyon.org.uk/ # v4 --- fs/netfs/Makefile | 1 fs/netfs/iterator.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/netfs.h | 3 + 3 files changed, 103 insertions(+) create mode 100644 fs/netfs/iterator.c diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile index f684c0cd1ec5..386d6fb92793 100644 --- a/fs/netfs/Makefile +++ b/fs/netfs/Makefile @@ -3,6 +3,7 @@ netfs-y := \ buffered_read.o \ io.o \ + iterator.o \ main.o \ objects.o diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c new file mode 100644 index 000000000000..7d802d21b9c5 --- /dev/null +++ b/fs/netfs/iterator.c @@ -0,0 +1,99 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* Iterator helpers. + * + * Copyright (C) 2022 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include +#include +#include +#include +#include "internal.h" + +/** + * netfs_extract_user_iter - Extract the pages from a user iterator into a bvec + * @orig: The original iterator + * @orig_len: The amount of iterator to copy + * @new: The iterator to be set up + * @cleanup_mode: Where to indicate the cleanup mode + * + * Extract the page fragments from the given amount of the source iterator and + * build up a second iterator that refers to all of those bits. This allows + * the original iterator to disposed of. + * + * On success, the number of elements in the bvec is returned, the original + * iterator will have been advanced by the amount extracted and @*cleanup_mode + * will have been set to FOLL_GET, FOLL_PIN or 0. + */ +ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len, + struct iov_iter *new, unsigned int *cleanup_mode) +{ + struct bio_vec *bv = NULL; + struct page **pages; + unsigned int cur_npages; + unsigned int max_pages; + unsigned int npages = 0; + unsigned int i; + ssize_t ret; + size_t count = orig_len, offset, len; + size_t bv_size, pg_size; + + if (WARN_ON_ONCE(!iter_is_ubuf(orig) && !iter_is_iovec(orig))) + return -EIO; + + max_pages = iov_iter_npages(orig, INT_MAX); + bv_size = array_size(max_pages, sizeof(*bv)); + bv = kvmalloc(bv_size, GFP_KERNEL); + if (!bv) + return -ENOMEM; + + *cleanup_mode = 0; + + /* Put the page list at the end of the bvec list storage. bvec + * elements are larger than page pointers, so as long as we work + * 0->last, we should be fine. + */ + pg_size = array_size(max_pages, sizeof(*pages)); + pages = (void *)bv + bv_size - pg_size; + + while (count && npages < max_pages) { + ret = iov_iter_extract_pages(orig, &pages, count, + max_pages - npages, 0, + &offset, cleanup_mode); + if (ret < 0) { + pr_err("Couldn't get user pages (rc=%zd)\n", ret); + break; + } + + if (ret > count) { + pr_err("get_pages rc=%zd more than %zu\n", ret, count); + break; + } + + count -= ret; + ret += offset; + cur_npages = DIV_ROUND_UP(ret, PAGE_SIZE); + + if (npages + cur_npages > max_pages) { + pr_err("Out of bvec array capacity (%u vs %u)\n", + npages + cur_npages, max_pages); + break; + } + + for (i = 0; i < cur_npages; i++) { + len = ret > PAGE_SIZE ? PAGE_SIZE : ret; + bv[npages + i].bv_page = *pages++; + bv[npages + i].bv_offset = offset; + bv[npages + i].bv_len = len - offset; + ret -= len; + offset = 0; + } + + npages += cur_npages; + } + + iov_iter_bvec(new, iov_iter_rw(orig), bv, npages, orig_len - count); + return npages; +} +EXPORT_SYMBOL_GPL(netfs_extract_user_iter); diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 4c76ddfb6a67..26fe3e6bafa1 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -296,6 +296,9 @@ void netfs_get_subrequest(struct netfs_io_subrequest *subreq, void netfs_put_subrequest(struct netfs_io_subrequest *subreq, bool was_async, enum netfs_sreq_ref_trace what); void netfs_stats_show(struct seq_file *); +ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len, + struct iov_iter *new, + unsigned int *cleanup_mode); /** * netfs_inode - Get the netfs inode context from the inode