From patchwork Fri May 19 07:49:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 96257 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1069115vqo; Fri, 19 May 2023 01:18:13 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4r9v5UFFaofHImHm78W2fzFcyQtuQMDEad1XS5o4e007qmyzD/UW937ZrsXxv3NFZ2Mna+ X-Received: by 2002:a05:6a20:1454:b0:106:ff3c:be9c with SMTP id a20-20020a056a20145400b00106ff3cbe9cmr1375806pzi.43.1684484293249; Fri, 19 May 2023 01:18:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684484293; cv=none; d=google.com; s=arc-20160816; b=i8TdltGF/rkuHq4bDIs/HWnUKK0RflSBAPxbtbP0xrE29zBJMOffRsKx0+1Amg9Yje 1oQzZNGhi100NwJHlV//KPyqMEuWK2oq4nO9tBuk+BRF5BSRtSppZrnQ3sfzsZ0I4Aox x1cBGssrED1ZrY6bwxgoYvM7Y5I6Bw6So0UGNoKtk6ys0ha9VWSlxYQxbn8POiZ8+XfQ SMgoGP5SVcslvQ40u3yALj5EdM34JncydlCJyxazL2doEW718tXcq4nVNLfYZLUtgaV7 L3PxY0Nyv7te9K/in+nbQkqXEXrZUlH7SaZ2cJhLBemZRb9rmJmtKCOF58JdwCwnpLxm mlow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:content-transfer-encoding :content-id:mime-version:subject:cc:to:references:in-reply-to:from :organization:dkim-signature; bh=XeaCD2UlGOxbB6KCWNh7uHI2oUW1y45K4W4A/6wlL3s=; b=PO7TTKEBb00h1oLESj+hRxXs36Pmaomu0mQNIVm5HDqt7eSRwb/gN/mLTFifydfPfm bryPKVKW5PN6cfcScQfWMa4lElI7+D7Kr4Ga3Is4gbX8SPbYXc03LgSKpUmjSLgbBasG jNSEFd+48UiQBFE/QAXtVAskcRHD1hI58rj/cnl2NOffM4znFFRdj2+x8HDLcbwHfsZK JNQJGTu1dA0ZseNQlazO+QRwDsQJ5cCmZzBfBzs+Dzdl0mcyPjgIEQ5AWeyeTgYfoenl 2wSPBRz4VKOq0pybUwIJgAVoJyXSR+aWsL4B2g4hocw3NuJy5jX4z7hcVVrvEqcehgvO ixMg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=efYGnOlt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x22-20020a656ab6000000b0051b718151a4si3344343pgu.201.2023.05.19.01.17.58; Fri, 19 May 2023 01:18:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=efYGnOlt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230085AbjESHwW (ORCPT + 99 others); Fri, 19 May 2023 03:52:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36350 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231403AbjESHwF (ORCPT ); Fri, 19 May 2023 03:52:05 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A15C51BC6 for ; Fri, 19 May 2023 00:50:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684482566; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XeaCD2UlGOxbB6KCWNh7uHI2oUW1y45K4W4A/6wlL3s=; b=efYGnOlta2BWflszXGBpaz3SujAQLCkIXnZn9dkUPFTRRc5xk66ea449S/KFCDrrtW9Aea OsqeV0RGDaB841EC0uLnjgvI1esmIuWR4e31qy/4NFW+VAKBZqwSDl4LjIPANttMk/+cDD lTo58/ZYNFnTymKMJDPxch4YcLTrQIY= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-352-oXYVwhe6PISPwOFbVlp0Kg-1; Fri, 19 May 2023 03:49:21 -0400 X-MC-Unique: oXYVwhe6PISPwOFbVlp0Kg-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E867B2999B22; Fri, 19 May 2023 07:49:20 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.42.28.221]) by smtp.corp.redhat.com (Postfix) with ESMTP id 97D5B4F2DE0; Fri, 19 May 2023 07:49:18 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells In-Reply-To: <20230519074047.1739879-1-dhowells@redhat.com> References: <20230519074047.1739879-1-dhowells@redhat.com> To: Jens Axboe , Al Viro , Christoph Hellwig Cc: dhowells@redhat.com, Matthew Wilcox , Jan Kara , Jeff Layton , David Hildenbrand , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , Christian Brauner , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH] iov_iter: Add automatic-alloc for ITER_BVEC and use in direct_splice_read() MIME-Version: 1.0 Content-ID: <1740263.1684482558.1@warthog.procyon.org.uk> Date: Fri, 19 May 2023 08:49:18 +0100 Message-ID: <1740264.1684482558@warthog.procyon.org.uk> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766309802490757044?= X-GMAIL-MSGID: =?utf-8?q?1766309802490757044?= If it's a problem that direct_splice_read() always allocates as much memory as is asked for and that will fit into the pipe when less could be allocated in the case that, say, an O_DIRECT-read will hit a hole and do a short read or a socket will return less than was asked for, something like the attached modification to ITER_BVEC could be made. David --- iov_iter: Add automatic-alloc for ITER_BVEC and use in direct_splice_read() Add a flag to the iov_iter struct that tells things that write to or allow writing to a BVEC-type iterator that they should allocate pages to fill in any slots in the bio_vec array that have null page pointers. This allows the bufferage in the bvec to be allocated on-demand. Iterators of this type are initialised with iov_iter_bvec_autoalloc() instead of iov_iter_bvec(). Only destination (ie. READ/ITER_DEST) iterators may be used in this fashion. An additional function, iov_iter_auto_alloc() is provided to perform the allocation in the case that the caller wishes to make use of the bio_vec array directly and the block layer is modified to use it. direct_splice_read() is then modified to make use of this. This is less efficient if we know in advance that we want to allocate the full buffer as we can't so easily use bulk allocation, but it does mean in cases where we might not want the full buffer (say we hit a hole in DIO), we don't have to allocate it. Signed-off-by: David Howells cc: Christoph Hellwig cc: Jens Axboe cc: Al Viro cc: David Hildenbrand cc: John Hubbard cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org --- block/bio.c | 2 fs/splice.c | 36 ++++++----------- include/linux/uio.h | 13 ++++-- lib/iov_iter.c | 110 +++++++++++++++++++++++++++++++++++++++++++++++++++- 4 files changed, 133 insertions(+), 28 deletions(-) diff --git a/block/bio.c b/block/bio.c index 798cc4cf3bd2..72d5c1125df2 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1330,6 +1330,8 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) int ret = 0; if (iov_iter_is_bvec(iter)) { + if (!iov_iter_auto_alloc(iter, iov_iter_count(iter))) + return -ENOMEM; bio_iov_bvec_set(bio, iter); iov_iter_advance(iter, bio->bi_iter.bi_size); return 0; diff --git a/fs/splice.c b/fs/splice.c index 56d9802729d0..30e7a31c5ada 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -310,10 +310,8 @@ ssize_t direct_splice_read(struct file *in, loff_t *ppos, struct iov_iter to; struct bio_vec *bv; struct kiocb kiocb; - struct page **pages; ssize_t ret; - size_t used, npages, chunk, remain, keep = 0; - int i; + size_t used, npages, chunk, remain, keep = 0, i; if (!len) return 0; @@ -334,30 +332,14 @@ ssize_t direct_splice_read(struct file *in, loff_t *ppos, len = min_t(size_t, len, npages * PAGE_SIZE); npages = DIV_ROUND_UP(len, PAGE_SIZE); - bv = kzalloc(array_size(npages, sizeof(bv[0])) + - array_size(npages, sizeof(struct page *)), GFP_KERNEL); + bv = kzalloc(array_size(npages, sizeof(bv[0])), GFP_KERNEL); if (!bv) return -ENOMEM; - pages = (struct page **)(bv + npages); - npages = alloc_pages_bulk_array(GFP_USER, npages, pages); - if (!npages) { - kfree(bv); - return -ENOMEM; - } - remain = len = min_t(size_t, len, npages * PAGE_SIZE); - for (i = 0; i < npages; i++) { - chunk = min_t(size_t, PAGE_SIZE, remain); - bv[i].bv_page = pages[i]; - bv[i].bv_offset = 0; - bv[i].bv_len = chunk; - remain -= chunk; - } - /* Do the I/O */ - iov_iter_bvec(&to, ITER_DEST, bv, npages, len); + iov_iter_bvec_autoalloc(&to, ITER_DEST, bv, npages, len); init_sync_kiocb(&kiocb, in); kiocb.ki_pos = *ppos; ret = call_read_iter(in, &kiocb, &to); @@ -376,8 +358,16 @@ ssize_t direct_splice_read(struct file *in, loff_t *ppos, } /* Free any pages that didn't get touched at all. */ - if (keep < npages) - release_pages(pages + keep, npages - keep); + if (keep < npages) { + struct page **pages = (struct page **)&bv[keep]; + size_t j = 0; + + for (i = keep; i < npages; i++) + if (bv[i].bv_page) + pages[j++] = bv[i].bv_page; + if (j) + release_pages(pages, j); + } /* Push the remaining pages into the pipe. */ remain = ret; diff --git a/include/linux/uio.h b/include/linux/uio.h index 60c342bb7ab8..6bc2287021d9 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -40,10 +40,11 @@ struct iov_iter_state { struct iov_iter { u8 iter_type; - bool copy_mc; - bool nofault; - bool data_source; - bool user_backed; + bool copy_mc:1; + bool nofault:1; + bool data_source:1; + bool user_backed:1; + bool auto_alloc:1; /* Automatically alloc pages into a bvec */ union { size_t iov_offset; int last_offset; @@ -263,6 +264,7 @@ static inline bool iov_iter_is_copy_mc(const struct iov_iter *i) } #endif +bool iov_iter_auto_alloc(struct iov_iter *iter, size_t count); size_t iov_iter_zero(size_t bytes, struct iov_iter *); bool iov_iter_is_aligned(const struct iov_iter *i, unsigned addr_mask, unsigned len_mask); @@ -274,6 +276,9 @@ void iov_iter_kvec(struct iov_iter *i, unsigned int direction, const struct kvec unsigned long nr_segs, size_t count); void iov_iter_bvec(struct iov_iter *i, unsigned int direction, const struct bio_vec *bvec, unsigned long nr_segs, size_t count); +void iov_iter_bvec_autoalloc(struct iov_iter *i, unsigned int direction, + const struct bio_vec *bvec, unsigned long nr_segs, + size_t count); void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count); void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray *xarray, loff_t start, size_t count); diff --git a/lib/iov_iter.c b/lib/iov_iter.c index f18138e0292a..3643f9d80ecc 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -52,7 +52,11 @@ while (n) { \ unsigned offset = p->bv_offset + skip; \ unsigned left; \ - void *kaddr = kmap_local_page(p->bv_page + \ + void *kaddr; \ + \ + if (!p->bv_page) \ + break; \ + kaddr = kmap_local_page(p->bv_page + \ offset / PAGE_SIZE); \ base = kaddr + offset % PAGE_SIZE; \ len = min(min(n, (size_t)(p->bv_len - skip)), \ @@ -159,6 +163,49 @@ __out: \ #define iterate_and_advance(i, n, base, len, off, I, K) \ __iterate_and_advance(i, n, base, len, off, I, ((void)(K),0)) +/* + * Preallocate pages into the bvec sufficient to store count bytes. + */ +static bool bvec_auto_alloc(struct iov_iter *iter, size_t count) +{ + struct bio_vec *bvec = (struct bio_vec *)iter->bvec; + int j; + + if (!count) + return true; + + count += iter->iov_offset; + for (j = 0; j < iter->nr_segs; j++) { + if (!bvec[j].bv_page) { + bvec[j].bv_page = alloc_page(GFP_KERNEL); + if (!bvec[j].bv_page) + return false; + } + if (bvec[j].bv_len >= count) + break; + count -= bvec[j].bv_len; + } + + return true; +} + +/** + * iov_iter_auto_alloc - Perform auto-alloc for an iterator + * @iter: The iterator to do the allocation for + * @count: The number of bytes we need to store + * + * Perform auto-allocation on a iterator. This only works with ITER_BVEC-type + * iterators. It will make sure that pages are allocated sufficient to store + * the specified number of bytes. Returns true if sufficient pages are present + * in the bvec and false if an allocation failure occurs. + */ +bool iov_iter_auto_alloc(struct iov_iter *iter, size_t count) +{ + return !iov_iter_is_bvec(iter) || !iter->auto_alloc || + bvec_auto_alloc(iter, count); +} +EXPORT_SYMBOL_GPL(iov_iter_auto_alloc); + static int copyout(void __user *to, const void *from, size_t n) { if (should_fail_usercopy()) @@ -313,6 +360,8 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i) return 0; if (user_backed_iter(i)) might_fault(); + if (!iov_iter_auto_alloc(i, bytes)) + return 0; iterate_and_advance(i, bytes, base, len, off, copyout(base, addr + off, len), memcpy(base, addr + off, len) @@ -362,6 +411,8 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i) return 0; if (user_backed_iter(i)) might_fault(); + if (!iov_iter_auto_alloc(i, bytes)) + return 0; __iterate_and_advance(i, bytes, base, len, off, copyout_mc(base, addr + off, len), copy_mc_to_kernel(base, addr + off, len) @@ -503,6 +554,8 @@ size_t copy_page_to_iter_nofault(struct page *page, unsigned offset, size_t byte return 0; if (WARN_ON_ONCE(i->data_source)) return 0; + if (!iov_iter_auto_alloc(i, bytes)) + return 0; page += offset / PAGE_SIZE; // first subpage offset %= PAGE_SIZE; while (1) { @@ -557,6 +610,8 @@ EXPORT_SYMBOL(copy_page_from_iter); size_t iov_iter_zero(size_t bytes, struct iov_iter *i) { + if (!iov_iter_auto_alloc(i, bytes)) + return -ENOMEM; iterate_and_advance(i, bytes, base, len, count, clear_user(base, len), memset(base, 0, len) @@ -598,6 +653,7 @@ static void iov_iter_bvec_advance(struct iov_iter *i, size_t size) size += i->iov_offset; for (bvec = i->bvec, end = bvec + i->nr_segs; bvec < end; bvec++) { + BUG_ON(!bvec->bv_page); if (likely(size < bvec->bv_len)) break; size -= bvec->bv_len; @@ -740,6 +796,51 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction, } EXPORT_SYMBOL(iov_iter_bvec); +/** + * iov_iter_bvec_autoalloc - Initialise a BVEC-type I/O iterator with automatic alloc + * @i: The iterator to initialise. + * @direction: The direction of the transfer. + * @bvec: The array of bio_vecs listing the buffer segments + * @nr_segs: The number of segments in @bvec[]. + * @count: The size of the I/O buffer in bytes. + * + * Set up an I/O iterator to insert pages into a bvec as data is written into + * it where NULL pointers exist in the bvec array (if a pointer isn't NULL, the + * page it points to will just be used). No more than @nr_segs pages will be + * filled in. Empty slots will have bv_offset set to 0 and bv_len to + * PAGE_SIZE. + * + * If the iterator is reverted, excess pages will be left for the + * caller to clean up. + */ +void iov_iter_bvec_autoalloc(struct iov_iter *i, unsigned int direction, + const struct bio_vec *bvec, unsigned long nr_segs, + size_t count) +{ + struct bio_vec *bv = (struct bio_vec *)bvec; + unsigned long j; + + BUG_ON(direction != READ); + *i = (struct iov_iter){ + .iter_type = ITER_BVEC, + .copy_mc = false, + .data_source = direction, + .auto_alloc = true, + .bvec = bvec, + .nr_segs = nr_segs, + .iov_offset = 0, + .count = count + }; + + for (j = 0; j < nr_segs; j++) { + if (!bv[j].bv_page) { + bv[j].bv_offset = 0; + bv[j].bv_len = PAGE_SIZE; + } + } +} +EXPORT_SYMBOL(iov_iter_bvec_autoalloc); + /** * iov_iter_xarray - Initialise an I/O iterator to use the pages in an xarray * @i: The iterator to initialise. @@ -1122,6 +1223,8 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i, struct page **p; struct page *page; + if (!iov_iter_auto_alloc(i, maxsize)) + return -ENOMEM; page = first_bvec_segment(i, &maxsize, start); n = want_pages_array(pages, maxsize, *start, maxpages); if (!n) @@ -1226,6 +1329,8 @@ size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *_csstate, csstate->off += bytes; return bytes; } + if (!iov_iter_auto_alloc(i, bytes)) + return -ENOMEM; sum = csum_shift(csstate->csum, csstate->off); iterate_and_advance(i, bytes, base, len, off, ({ @@ -1664,6 +1769,9 @@ static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i, size_t skip = i->iov_offset, offset; int k; + if (!iov_iter_auto_alloc(i, maxsize)) + return -ENOMEM; + for (;;) { if (i->nr_segs == 0) return 0;