From patchwork Mon Jan 16 23:10:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44368 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1451155wrn; Mon, 16 Jan 2023 15:23:27 -0800 (PST) X-Google-Smtp-Source: AMrXdXt2kBa8BYy+fiN+bWWmnbjbX7tuiwaP3Lg+xd/Z+igfKX1xsd/aTXPA3pTPP2HPrVctJVr0 X-Received: by 2002:a05:6402:f05:b0:49c:d9c3:ca74 with SMTP id i5-20020a0564020f0500b0049cd9c3ca74mr1170093eda.13.1673911407390; Mon, 16 Jan 2023 15:23:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673911407; cv=none; d=google.com; s=arc-20160816; b=pQWWQ7eeSJ6Armss7DlUqhRseLnp/3ugjz757HZKHRdFKfMRPoHqABSJHUzCxgqEWO uo8FU8r+MpX6R5Yqt+9Rc/edjkOXmqt86vvhT87LK307CKg4KThX5rS+PZ2w1NWAGuBo 6X0JbBobijyXeL3+WWcHG+LRBNrx5dlTQ393DIkoVlmCjZbaS9prsVDJVr+LWsFRf1Yw c3TjLx0LIK3+xKRdk9eo9z906gfZUclgWkjI9aro+RqxbQINlEWcGBzeC/N0o7eWLLzP M3kvwXyvFjZGPght8lY0hyitXdiSaV6mOiEMooDk3twKISiXTDHvoqZ7ZN4aWYnH27s7 RYZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=jarh3eMoc9hZrrF4c4ACwfJRYikpAl6+h2t1z40q5Pk=; b=ScN8Xs/iazuQHtAbUlcUWVurO9QFLPvTEm9pU8fN4nAlJPNHf4CT5lg1VT1zqCRkub +r8Oss/OLYI/FEXYLZM/tXbluhluOM7mXSTtenqJstkCFHmDqzpZpHfgS/3bxm+Sjk8P 4vynqXPXHcmyKM6oYjD1ipSRTJGt3hTZaOi0L40PkpbSeRmIyre3uQEiABMhHiDHaLrR hJgGFkpSNoOig98YV8RWoOmu3L85UhvXqtIDWTcXxawL9Y+15pM+80YOmHzkJGLUQix5 a/c1iOD1QG3I93+2bzhMmLzI7k9eAXQ6F9qENna4xUlRKcxVd6zBIlJYsQfqxb2tECMe U5qA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=VkqXekKW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h11-20020a0564020e0b00b0049cb1901e8fsi11145327edh.561.2023.01.16.15.23.04; Mon, 16 Jan 2023 15:23:27 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=VkqXekKW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235412AbjAPXOh (ORCPT + 99 others); Mon, 16 Jan 2023 18:14:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49022 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234370AbjAPXNy (ORCPT ); Mon, 16 Jan 2023 18:13:54 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ADFF12F793 for ; Mon, 16 Jan 2023 15:10:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910616; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jarh3eMoc9hZrrF4c4ACwfJRYikpAl6+h2t1z40q5Pk=; b=VkqXekKWGlgvPHdiBnhSr1uO6KEveFBM2oxV/+dJjLq7rcsDU8WgiE/XuuYpnB2U5wi6Tk sf4JQ2BaOXhi+/H6xFaALl9VlwxArqsb++EzLwnRDzly840lvZpi2v+6TEPeSYiMKEdFO3 U/2YIVeg8JDk2FTDC0eUWaQYniav/yQ= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-193-WvNlc5M6Mx-qoev1bLXoGA-1; Mon, 16 Jan 2023 18:10:13 -0500 X-MC-Unique: WvNlc5M6Mx-qoev1bLXoGA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1F0A42A59569; Mon, 16 Jan 2023 23:10:13 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id B36BE53AA; Mon, 16 Jan 2023 23:10:11 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 18/34] dio: Pin pages rather than ref'ing if appropriate From: David Howells To: Al Viro Cc: Jens Axboe , Jan Kara , Christoph Hellwig , Matthew Wilcox , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:10:11 +0000 Message-ID: <167391061117.2311931.16807283804788007499.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755223327779958307?= X-GMAIL-MSGID: =?utf-8?q?1755223327779958307?= Convert the generic direct-I/O code to use iov_iter_extract_pages() instead of iov_iter_get_pages(). This will pin pages or leave them unaltered rather than getting a ref on them as appropriate to the iterator. The pages need to be pinned for DIO-read rather than having refs taken on them to prevent VM copy-on-write from malfunctioning during a concurrent fork() (the result of the I/O would otherwise end up only visible to the child process and not the parent). Signed-off-by: David Howells cc: Al Viro cc: Jens Axboe cc: Jan Kara cc: Christoph Hellwig cc: Matthew Wilcox cc: Logan Gunthorpe cc: linux-fsdevel@vger.kernel.org cc: linux-block@vger.kernel.org --- fs/direct-io.c | 57 ++++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 37 insertions(+), 20 deletions(-) diff --git a/fs/direct-io.c b/fs/direct-io.c index b1e26a706e31..b4d2c9f85a5b 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -142,9 +142,11 @@ struct dio { /* * pages[] (and any fields placed after it) are not zeroed out at - * allocation time. Don't add new fields after pages[] unless you - * wish that they not be zeroed. + * allocation time. Don't add new fields after pages[] unless you wish + * that they not be zeroed. Pages may have a ref taken, a pin emplaced + * or no retention measures. */ + unsigned int cleanup_mode; /* How pages should be cleaned up (0/FOLL_GET/PIN) */ union { struct page *pages[DIO_PAGES]; /* page buffer */ struct work_struct complete_work;/* deferred AIO completion */ @@ -167,12 +169,13 @@ static inline unsigned dio_pages_present(struct dio_submit *sdio) static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio) { const enum req_op dio_op = dio->opf & REQ_OP_MASK; + unsigned int gup_flags = + op_is_write(dio_op) ? FOLL_SOURCE_BUF : FOLL_DEST_BUF; + struct page **pages = dio->pages; ssize_t ret; - ret = iov_iter_get_pages(sdio->iter, dio->pages, LONG_MAX, DIO_PAGES, - &sdio->from, - op_is_write(dio_op) ? - FOLL_SOURCE_BUF : FOLL_DEST_BUF); + ret = iov_iter_extract_pages(sdio->iter, &pages, LONG_MAX, DIO_PAGES, + gup_flags, &sdio->from); if (ret < 0 && sdio->blocks_available && dio_op == REQ_OP_WRITE) { struct page *page = ZERO_PAGE(0); @@ -183,7 +186,7 @@ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio) */ if (dio->page_errors == 0) dio->page_errors = ret; - get_page(page); + dio->cleanup_mode = 0; dio->pages[0] = page; sdio->head = 0; sdio->tail = 1; @@ -197,6 +200,8 @@ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio) sdio->head = 0; sdio->tail = (ret + PAGE_SIZE - 1) / PAGE_SIZE; sdio->to = ((ret - 1) & (PAGE_SIZE - 1)) + 1; + dio->cleanup_mode = + iov_iter_extract_mode(sdio->iter, gup_flags); return 0; } return ret; @@ -400,6 +405,10 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio, * we request a valid number of vectors. */ bio = bio_alloc(bdev, nr_vecs, dio->opf, GFP_KERNEL); + if (!(dio->cleanup_mode & FOLL_GET)) + bio_clear_flag(bio, BIO_PAGE_REFFED); + if (dio->cleanup_mode & FOLL_PIN) + bio_set_flag(bio, BIO_PAGE_PINNED); bio->bi_iter.bi_sector = first_sector; if (dio->is_async) bio->bi_end_io = dio_bio_end_aio; @@ -443,13 +452,18 @@ static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio) sdio->logical_offset_in_bio = 0; } +static void dio_cleanup_page(struct dio *dio, struct page *page) +{ + page_put_unpin(page, dio->cleanup_mode); +} + /* * Release any resources in case of a failure */ static inline void dio_cleanup(struct dio *dio, struct dio_submit *sdio) { while (sdio->head < sdio->tail) - put_page(dio->pages[sdio->head++]); + dio_cleanup_page(dio, dio->pages[sdio->head++]); } /* @@ -704,7 +718,7 @@ static inline int dio_new_bio(struct dio *dio, struct dio_submit *sdio, * * Return zero on success. Non-zero means the caller needs to start a new BIO. */ -static inline int dio_bio_add_page(struct dio_submit *sdio) +static inline int dio_bio_add_page(struct dio *dio, struct dio_submit *sdio) { int ret; @@ -771,11 +785,11 @@ static inline int dio_send_cur_page(struct dio *dio, struct dio_submit *sdio, goto out; } - if (dio_bio_add_page(sdio) != 0) { + if (dio_bio_add_page(dio, sdio) != 0) { dio_bio_submit(dio, sdio); ret = dio_new_bio(dio, sdio, sdio->cur_page_block, map_bh); if (ret == 0) { - ret = dio_bio_add_page(sdio); + ret = dio_bio_add_page(dio, sdio); BUG_ON(ret != 0); } } @@ -832,13 +846,16 @@ submit_page_section(struct dio *dio, struct dio_submit *sdio, struct page *page, */ if (sdio->cur_page) { ret = dio_send_cur_page(dio, sdio, map_bh); - put_page(sdio->cur_page); + dio_cleanup_page(dio, sdio->cur_page); sdio->cur_page = NULL; if (ret) return ret; } - get_page(page); /* It is in dio */ + ret = try_grab_page(page, dio->cleanup_mode); /* It is in dio */ + if (ret < 0) + return ret; + sdio->cur_page = page; sdio->cur_page_offset = offset; sdio->cur_page_len = len; @@ -853,7 +870,7 @@ submit_page_section(struct dio *dio, struct dio_submit *sdio, struct page *page, ret = dio_send_cur_page(dio, sdio, map_bh); if (sdio->bio) dio_bio_submit(dio, sdio); - put_page(sdio->cur_page); + dio_cleanup_page(dio, sdio->cur_page); sdio->cur_page = NULL; } return ret; @@ -954,7 +971,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, ret = get_more_blocks(dio, sdio, map_bh); if (ret) { - put_page(page); + dio_cleanup_page(dio, page); goto out; } if (!buffer_mapped(map_bh)) @@ -999,7 +1016,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, /* AKPM: eargh, -ENOTBLK is a hack */ if (dio_op == REQ_OP_WRITE) { - put_page(page); + dio_cleanup_page(dio, page); return -ENOTBLK; } @@ -1012,7 +1029,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, if (sdio->block_in_file >= i_size_aligned >> blkbits) { /* We hit eof */ - put_page(page); + dio_cleanup_page(dio, page); goto out; } zero_user(page, from, 1 << blkbits); @@ -1052,7 +1069,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, sdio->next_block_for_io, map_bh); if (ret) { - put_page(page); + dio_cleanup_page(dio, page); goto out; } sdio->next_block_for_io += this_chunk_blocks; @@ -1068,7 +1085,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, } /* Drop the ref which was taken in get_user_pages() */ - put_page(page); + dio_cleanup_page(dio, page); } out: return ret; @@ -1288,7 +1305,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, ret2 = dio_send_cur_page(dio, &sdio, &map_bh); if (retval == 0) retval = ret2; - put_page(sdio.cur_page); + dio_cleanup_page(dio, sdio.cur_page); sdio.cur_page = NULL; } if (sdio.bio)