From patchwork Thu Nov 17 14:55:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 21713 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp452758wrr; Thu, 17 Nov 2022 07:11:03 -0800 (PST) X-Google-Smtp-Source: AA0mqf4tjtg8RZvf7VQ5etGKfGbmgAtTcF88Cn65zNWrVR9dZhwd3sIfhy5qrNHqIJQ8x5MUA6j1 X-Received: by 2002:a05:6402:f:b0:468:56c3:7c8 with SMTP id d15-20020a056402000f00b0046856c307c8mr2613929edu.109.1668697863133; Thu, 17 Nov 2022 07:11:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668697863; cv=none; d=google.com; s=arc-20160816; b=a5P7/ecV1zA+2nvUILH1cnR5xcJiN4w6+btz/RhvxbuJPPtT0SeQuNudldFH/uF98g oWXspoYRhddx+d2jSjf7vJugHwXWB5JtDEDTle1dG3yh6f5Sor5yo1Rx5SFg1hCDxZ+r M9SkvErRoSdiNX9OuxVX+HQxOzcET6Tgj78hjQTTjBLJRjGGMKd8+hnl5AkPLDk9bYzX BqKute0gstcoO04OFJ9JMxLCLK8KIByAYbUEeQ968OUz5CGG7c85d3ZEOaZsM1Pl07vE rn4MPGawsCT1vk0CpT72+Tw0W+IX7D2QTSKoqI52JjfthSOpDj3MP1oTnpC8dpL5FMPc CNxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=Z3P53KIxfMRXAPqaFtDYK0vfI2Qi3JDHggMcI2BJt1g=; b=PtpTZAebSAgm8uvraTTJPmrUCmik71YBp7v2clS+SAphbqMeeoy2mWfgUa0v6ZyzH/ KgOTrIlVYhRCT/R2AlJR+7O8JI6gZgGZIF4rolRSzux9YnliYpg9FuK3Atu8wedEGLXu eNa6FzQWJCg28UqEHkSIRaJugJE8L7rkNbhZtwpgOOP32YJgePnwL9KXqRaz2SNldd+K jD1/WuFCYdQELDcD83oGzRpffH8WrVfzu+Mus8iCQXN+sdkOIZLOtf8BrTcixukca681 7qFF75P3vT1W6JGXKG2WMEeQfRzJJ1G1A0ZU9sK1lX9MpuHmL4lRAdJkB0xirP7tHEe+ g7LQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Mnxgk9SO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id tl2-20020a170907c30200b00781599eb7dbsi641019ejc.573.2022.11.17.07.10.30; Thu, 17 Nov 2022 07:11:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Mnxgk9SO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234248AbiKQO5G (ORCPT + 99 others); Thu, 17 Nov 2022 09:57:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46002 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239572AbiKQO4e (ORCPT ); Thu, 17 Nov 2022 09:56:34 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1151A5E3D8 for ; Thu, 17 Nov 2022 06:55:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668696919; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Z3P53KIxfMRXAPqaFtDYK0vfI2Qi3JDHggMcI2BJt1g=; b=Mnxgk9SOUw5ZlvL86nLl9915881fcNtH2ZLNrOMW0JBSt2u6T0rVAAu/bmmZycWfeleg1D jdK+pPot77KkOToZOjuwoOMs33UJRqjhD/VeZ9cIl271L13bRx/o6dFH7vFt7SeTfFo9w1 1IDD/pVip6gHkLC83VtUs4vxaO+1c1A= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-623-uM6Zn6CkPaewCesa9jCaPQ-1; Thu, 17 Nov 2022 09:55:08 -0500 X-MC-Unique: uM6Zn6CkPaewCesa9jCaPQ-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id F20D11006E32; Thu, 17 Nov 2022 14:55:07 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 878B1492B04; Thu, 17 Nov 2022 14:55:06 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 3/4] netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator From: David Howells To: Al Viro Cc: Jeff Layton , Steve French , Shyam Prasad N , Rohith Surabattula , linux-cachefs@redhat.com, linux-cifs@vger.kernel.org, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jeff Layton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Thu, 17 Nov 2022 14:55:03 +0000 Message-ID: <166869690376.3723671.8813331570219190705.stgit@warthog.procyon.org.uk> In-Reply-To: <166869687556.3723671.10061142538708346995.stgit@warthog.procyon.org.uk> References: <166869687556.3723671.10061142538708346995.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749756530065335137?= X-GMAIL-MSGID: =?utf-8?q?1749756530065335137?= Add a function to extract the pages from a user-space supplied iterator (UBUF- or IOVEC-type) into a BVEC-type iterator, pinning the pages as we go. This is useful in three situations: (1) A userspace thread may have a sibling that unmaps or remaps the process's VM during the operation, changing the assignment of the pages and potentially causing an error. Pinning the pages keeps some pages around, even if this occurs; futher, we find out at the point of extraction if EFAULT is going to be incurred. (2) Pages might get swapped out/discarded if not pinned, so we want to pin them to avoid the reload causing a deadlock due to a DIO from/to an mmapped region on the same file. (3) The iterator may get passed to sendmsg() by the filesystem. If a fault occurs, we may get a short write to a TCP stream that's then tricky to recover from. We assume that other types of iterator (eg. BVEC-, KVEC- and XARRAY-type) are constructed only by kernel internals and that the pages are pinned in those cases. DISCARD- and PIPE-type iterators aren't DIO'able. Signed-off-by: David Howells cc: Jeff Layton cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: linux-cachefs@redhat.com cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org --- fs/netfs/Makefile | 1 + fs/netfs/iterator.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/netfs.h | 2 + 3 files changed, 97 insertions(+) create mode 100644 fs/netfs/iterator.c diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile index f684c0cd1ec5..386d6fb92793 100644 --- a/fs/netfs/Makefile +++ b/fs/netfs/Makefile @@ -3,6 +3,7 @@ netfs-y := \ buffered_read.o \ io.o \ + iterator.o \ main.o \ objects.o diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c new file mode 100644 index 000000000000..c11d05a66a4a --- /dev/null +++ b/fs/netfs/iterator.c @@ -0,0 +1,94 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* Iterator helpers. + * + * Copyright (C) 2022 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include +#include +#include +#include +#include "internal.h" + +/** + * netfs_extract_user_iter - Extract the pages from a user iterator into a bvec + * @orig: The original iterator + * @orig_len: The amount of iterator to copy + * @new: The iterator to be set up + * + * Extract the page fragments from the given amount of the source iterator and + * build up a second iterator that refers to all of those bits. This allows + * the original iterator to disposed of. + * + * On success, the number of elements in the bvec is returned and the original + * iterator will have been advanced by the amount extracted. + */ +ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len, + struct iov_iter *new) +{ + struct bio_vec *bv = NULL; + struct page **pages; + unsigned int cur_npages; + unsigned int max_pages; + unsigned int npages = 0; + unsigned int i; + ssize_t ret; + size_t count = orig_len, offset, len; + size_t bv_size, pg_size; + + if (WARN_ON_ONCE(!iter_is_ubuf(orig) && !iter_is_iovec(orig))) + return -EIO; + + max_pages = iov_iter_npages(orig, INT_MAX); + bv_size = array_size(max_pages, sizeof(*bv)); + bv = kvmalloc(bv_size, GFP_KERNEL); + if (!bv) + return -ENOMEM; + + /* Put the page list at the end of the bvec list storage. bvec + * elements are larger than page pointers, so as long as we work + * 0->last, we should be fine. + */ + pg_size = array_size(max_pages, sizeof(*pages)); + pages = (void *)bv + bv_size - pg_size; + + while (count && npages < max_pages) { + ret = iov_iter_extract_pages(orig, &pages, count, + max_pages - npages, &offset); + if (ret < 0) { + pr_err("Couldn't get user pages (rc=%zd)\n", ret); + break; + } + + if (ret > count) { + pr_err("get_pages rc=%zd more than %zu\n", ret, count); + break; + } + + count -= ret; + ret += offset; + cur_npages = DIV_ROUND_UP(ret, PAGE_SIZE); + + if (npages + cur_npages > max_pages) { + pr_err("Out of bvec array capacity (%u vs %u)\n", + npages + cur_npages, max_pages); + break; + } + + for (i = 0; i < cur_npages; i++) { + len = ret > PAGE_SIZE ? PAGE_SIZE : ret; + bv[npages + i].bv_page = *pages++; + bv[npages + i].bv_offset = offset; + bv[npages + i].bv_len = len - offset; + ret -= len; + offset = 0; + } + + npages += cur_npages; + } + + iov_iter_bvec(new, iov_iter_rw(orig), bv, npages, orig_len - count); + return npages; +} +EXPORT_SYMBOL(netfs_extract_user_iter); diff --git a/include/linux/netfs.h b/include/linux/netfs.h index f2402ddeafbf..5f6ad0246946 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -288,6 +288,8 @@ void netfs_get_subrequest(struct netfs_io_subrequest *subreq, void netfs_put_subrequest(struct netfs_io_subrequest *subreq, bool was_async, enum netfs_sreq_ref_trace what); void netfs_stats_show(struct seq_file *); +ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len, + struct iov_iter *new); /** * netfs_inode - Get the netfs inode context from the inode