From patchwork Fri Jan 20 17:55:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 46551 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp343031wrn; Fri, 20 Jan 2023 09:58:55 -0800 (PST) X-Google-Smtp-Source: AMrXdXtTJXxnihu7ZVJsHbKnC0+lf5mUMLYrMJeA2jxbWEpQEkSmvhzmWus6D6FMJ9oPLDcfgoqp X-Received: by 2002:a05:6a00:328b:b0:577:1c59:a96c with SMTP id ck11-20020a056a00328b00b005771c59a96cmr17863924pfb.2.1674237535670; Fri, 20 Jan 2023 09:58:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674237535; cv=none; d=google.com; s=arc-20160816; b=vM0CKp1Oti+cPsaAqDpabyaGCQNwd5C5IGyThRrI1hNY+iKHbzE7+fcpihQEKqQt5p rDvXhcikQGBuMns5sHFl+LHHqOKfvndB3V/z8vcbDSYY6+qnvSu3fViT4ZP3rVuQKwpz dp2Xtk2ApwSLPgl42gvE89oPKq2Wx8WyvS2XFGy3RSAFynkhjZ1QKCdZnEjuDbYnDQ6p EAJDyifXOvPK53+aZC/+rejYLq6a5lAmtGmTk7b69RMdA/p3sInbn39Rag6sr28JIDMo xLdf8ZbNObj0yumGxE63dlPJlhIqzfCC/4+hcGBZ4IF628ekhmOfhw9lIVS6vO91HkEh VTwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=JDZ/7Ot8U6XN3/To5ec8B6LyI9EVCy/1UijSgurXyM0=; b=c8sI0de36P3BcFrIcDebBUyfmiAajGDXHHLP/Oq/fJ6tzXdx06vwLSemZ5LhoMCqia SzdRrSgLNcKwQ7iaITbCg80uYI6HixSYGlQ//C/s+tn2edyZiqffnXHcb6+BQ0sNYo3+ dK57iK37tE5u7xYWgBRC2Mj55CG3LxpfZ/S9aiHNwGUq0Htc/Al7rhmvIm/btlZEz1W4 cG8O1aGTDpICvpluETwUV/1J1c2ZZOP04sVuOJm1Eraf3uYgFOwrkoRpodeRHO39hjSH R16VpzYheOBvW2kXqV1VkHl1GHgdAF6ebAwtxKrWUz0zkbR47Kbno7VAS9eGi5yDuoCm N/MQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=huSP1DCO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l11-20020a056a00140b00b0052dd9f10a47si42234197pfu.363.2023.01.20.09.58.43; Fri, 20 Jan 2023 09:58:55 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=huSP1DCO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230486AbjATR4y (ORCPT + 99 others); Fri, 20 Jan 2023 12:56:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46688 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229993AbjATR4x (ORCPT ); Fri, 20 Jan 2023 12:56:53 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0135E3EC53 for ; Fri, 20 Jan 2023 09:56:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1674237368; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JDZ/7Ot8U6XN3/To5ec8B6LyI9EVCy/1UijSgurXyM0=; b=huSP1DCOdvvgvIsLijUuUm63C+BEcF5TBbBmFUnJRXAWJOeNsopNbjboYSuftCs4bGKiH8 6i9+X0H/GTrxbKpM4NcdvHYkDY6ZJXasTVWn+S3SIxTKJJCB0siX24asT7u54GsUwNcuqB +GbUKeTlGpv6EXQoR38j09YaeymmQOY= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-202-JcCxSbnaOqC3-mkXQoC8KA-1; Fri, 20 Jan 2023 12:56:05 -0500 X-MC-Unique: JcCxSbnaOqC3-mkXQoC8KA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6EE811C426B3; Fri, 20 Jan 2023 17:56:04 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 216682026D68; Fri, 20 Jan 2023 17:56:03 +0000 (UTC) From: David Howells To: Al Viro , Christoph Hellwig Cc: David Howells , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v7 1/8] iov_iter: Define flags to qualify page extraction. Date: Fri, 20 Jan 2023 17:55:49 +0000 Message-Id: <20230120175556.3556978-2-dhowells@redhat.com> In-Reply-To: <20230120175556.3556978-1-dhowells@redhat.com> References: <20230120175556.3556978-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755565298119789327?= X-GMAIL-MSGID: =?utf-8?q?1755565298119789327?= Define flags to qualify page extraction to pass into iov_iter_*_pages*() rather than passing in FOLL_* flags. For now only a flag to allow peer-to-peer DMA is allowed; but in future, additional flags will be provided to indicate the I/O direction. Signed-off-by: David Howells cc: Al Viro cc: Christoph Hellwig cc: Jens Axboe cc: Logan Gunthorpe cc: linux-fsdevel@vger.kernel.org cc: linux-block@vger.kernel.org --- Notes: ver #7) - Don't use FOLL_* as a parameter, but rather define constants specifically to use with iov_iter_*_pages*(). - Drop the I/O direction constants for now. block/bio.c | 6 +++--- block/blk-map.c | 8 ++++---- include/linux/uio.h | 7 +++++-- lib/iov_iter.c | 14 ++++++++------ 4 files changed, 20 insertions(+), 15 deletions(-) diff --git a/block/bio.c b/block/bio.c index 5f96fcae3f75..6a6c73d14bfd 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1249,7 +1249,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt; struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt; struct page **pages = (struct page **)bv; - unsigned int gup_flags = 0; + unsigned int extract_flags = 0; ssize_t size, left; unsigned len, i = 0; size_t offset, trim; @@ -1264,7 +1264,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) pages += entries_left * (PAGE_PTRS_PER_BVEC - 1); if (bio->bi_bdev && blk_queue_pci_p2pdma(bio->bi_bdev->bd_disk->queue)) - gup_flags |= FOLL_PCI_P2PDMA; + extract_flags |= ITER_ALLOW_P2PDMA; /* * Each segment in the iov is required to be a block size multiple. @@ -1275,7 +1275,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) */ size = iov_iter_get_pages(iter, pages, UINT_MAX - bio->bi_iter.bi_size, - nr_pages, &offset, gup_flags); + nr_pages, &offset, extract_flags); if (unlikely(size <= 0)) return size ? size : -EFAULT; diff --git a/block/blk-map.c b/block/blk-map.c index 19940c978c73..bc111261fc82 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -267,7 +267,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter, { unsigned int max_sectors = queue_max_hw_sectors(rq->q); unsigned int nr_vecs = iov_iter_npages(iter, BIO_MAX_VECS); - unsigned int gup_flags = 0; + unsigned int extract_flags = 0; struct bio *bio; int ret; int j; @@ -280,7 +280,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter, return -ENOMEM; if (blk_queue_pci_p2pdma(rq->q)) - gup_flags |= FOLL_PCI_P2PDMA; + extract_flags |= ITER_ALLOW_P2PDMA; while (iov_iter_count(iter)) { struct page **pages, *stack_pages[UIO_FASTIOV]; @@ -291,10 +291,10 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter, if (nr_vecs <= ARRAY_SIZE(stack_pages)) { pages = stack_pages; bytes = iov_iter_get_pages(iter, pages, LONG_MAX, - nr_vecs, &offs, gup_flags); + nr_vecs, &offs, extract_flags); } else { bytes = iov_iter_get_pages_alloc(iter, &pages, - LONG_MAX, &offs, gup_flags); + LONG_MAX, &offs, extract_flags); } if (unlikely(bytes <= 0)) { ret = bytes ? bytes : -EFAULT; diff --git a/include/linux/uio.h b/include/linux/uio.h index 9f158238edba..46d5080314c6 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -252,12 +252,12 @@ void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray * loff_t start, size_t count); ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, size_t *start, - unsigned gup_flags); + unsigned extract_flags); ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, size_t *start); ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, size_t maxsize, size_t *start, - unsigned gup_flags); + unsigned extract_flags); ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, struct page ***pages, size_t maxsize, size_t *start); int iov_iter_npages(const struct iov_iter *i, int maxpages); @@ -360,4 +360,7 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction, }; } +/* Flags for iov_iter_get/extract_pages*() */ +#define ITER_ALLOW_P2PDMA 0x01 /* Allow P2PDMA on the extracted pages */ + #endif diff --git a/lib/iov_iter.c b/lib/iov_iter.c index f9a3ff37ecd1..fb04abe7d746 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1432,9 +1432,9 @@ static struct page *first_bvec_segment(const struct iov_iter *i, static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, size_t maxsize, unsigned int maxpages, size_t *start, - unsigned int gup_flags) + unsigned int extract_flags) { - unsigned int n; + unsigned int n, gup_flags = 0; if (maxsize > i->count) maxsize = i->count; @@ -1442,6 +1442,8 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i, return 0; if (maxsize > MAX_RW_COUNT) maxsize = MAX_RW_COUNT; + if (extract_flags & ITER_ALLOW_P2PDMA) + gup_flags |= FOLL_PCI_P2PDMA; if (likely(user_backed_iter(i))) { unsigned long addr; @@ -1495,14 +1497,14 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i, ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, - size_t *start, unsigned gup_flags) + size_t *start, unsigned extract_flags) { if (!maxpages) return 0; BUG_ON(!pages); return __iov_iter_get_pages_alloc(i, &pages, maxsize, maxpages, - start, gup_flags); + start, extract_flags); } EXPORT_SYMBOL_GPL(iov_iter_get_pages); @@ -1515,14 +1517,14 @@ EXPORT_SYMBOL(iov_iter_get_pages2); ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, size_t maxsize, - size_t *start, unsigned gup_flags) + size_t *start, unsigned extract_flags) { ssize_t len; *pages = NULL; len = __iov_iter_get_pages_alloc(i, pages, maxsize, ~0U, start, - gup_flags); + extract_flags); if (len <= 0) { kvfree(*pages); *pages = NULL; From patchwork Fri Jan 20 17:55:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 46552 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp350164wrn; Fri, 20 Jan 2023 10:11:11 -0800 (PST) X-Google-Smtp-Source: AMrXdXu7oUrfMbR9MXingw2Zi4Xyh2sTpnJv56UnO2tXn3XVlrG4e0E+nB0nlc596DZ+PWphS2go X-Received: by 2002:a05:6a20:65a3:b0:b3:4044:1503 with SMTP id p35-20020a056a2065a300b000b340441503mr16066283pzh.52.1674238271101; Fri, 20 Jan 2023 10:11:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674238271; cv=none; d=google.com; s=arc-20160816; b=bFEEPC8+g3l+NUCBJqnkv8uli+RR1lGzIee+BjvlvsacvWUAFf4Rmw0qGJ1NYtifu3 Fz/Bn1I1FqzX+a+4gVLJn1/RFjEDy6mNjJIT8EAxZ+U7AIVuzevk5ylTBkkwmygt0mzJ Mm6EatUaCaV6+rjeAC/OjOOOGpO+pTKy5uw1BK0fQwWC4OWlp1QldqzX31ctViFF+6cZ JKHDk4XxpftijbcbOSWT7aWoqmsbo5hVqh7kC+iO/seaXNNUJVSbnnOvEtFPvJ+1p0Wk h1dnEHxQr2CIxl7Q7FMPsZ3qXK1fsg8CzyopSdOXrYtfb4yfSFvpH5BkBnVJZjokScfG HWyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=YOc0D7umTnyUP1ok6rrw95EYcgD0TtRB9xXUvJKEMgA=; b=pGS7hdsUJ5Ph674UKlx0N8HJONzyXZ0DX3u+I3rFvV4DqGFsYp9IDAmYxuIt5kM5/h 0cieqLRa/Z7if9BgusNMomqT5zufyNAwVsuuwqeMEhDzRBeCmFA0QxyjQrLDt1mAYA0r v19GeK0PufxHwfbawI7A2oaLzEzLkhVAbotz1gTiFURBGGLPfaPjrNpmXakOXywOlolo /8BasCrZ/r5V0LLtVAEVcKx7I2PJh0hcdqsbDssvD7SDChxVK4h5+KRvVNBhibTiiZsg 8vQmAl8gfHjvqXJvZ/EiC70Nmvo9MfuBWOn83TGEid9T6CFM4Peqmh0dB8Pb2YXXeibp Naig== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=g8xBOOLB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bm18-20020a656e92000000b004b668f1b6d0si31633590pgb.327.2023.01.20.10.10.58; Fri, 20 Jan 2023 10:11:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=g8xBOOLB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230487AbjATR5j (ORCPT + 99 others); Fri, 20 Jan 2023 12:57:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229943AbjATR5h (ORCPT ); Fri, 20 Jan 2023 12:57:37 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BB749402F6 for ; Fri, 20 Jan 2023 09:56:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1674237372; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YOc0D7umTnyUP1ok6rrw95EYcgD0TtRB9xXUvJKEMgA=; b=g8xBOOLBk5l3x26jFCr5uxmczg1cS2HzD88i65JR+ooSgnDK4AIWqZnf+UohB2azLddBIR +NFd6gUV9FMxPXF2sVHVuBGk71qHh9/9tcLQrPolz+yi+WaeJ1LgdMJRISWocR0RT04APT V9eM98Kk4EAQzyRxTYyqI6CuSC+UJfg= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-639-FJgQK12-PNayxGNY9zL-hQ-1; Fri, 20 Jan 2023 12:56:07 -0500 X-MC-Unique: FJgQK12-PNayxGNY9zL-hQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id BD6ED802BEE; Fri, 20 Jan 2023 17:56:06 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 102662166B2A; Fri, 20 Jan 2023 17:56:04 +0000 (UTC) From: David Howells To: Al Viro , Christoph Hellwig Cc: David Howells , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Christoph Hellwig , John Hubbard , linux-mm@kvack.org Subject: [PATCH v7 2/8] iov_iter: Add a function to extract a page list from an iterator Date: Fri, 20 Jan 2023 17:55:50 +0000 Message-Id: <20230120175556.3556978-3-dhowells@redhat.com> In-Reply-To: <20230120175556.3556978-1-dhowells@redhat.com> References: <20230120175556.3556978-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755566069258774547?= X-GMAIL-MSGID: =?utf-8?q?1755566069258774547?= Add a function, iov_iter_extract_pages(), to extract a list of pages from an iterator. The pages may be returned with a reference added or a pin added or neither, depending on the type of iterator and the direction of transfer. The caller must pass FOLL_READ_FROM_MEM or FOLL_WRITE_TO_MEM as part of gup_flags to indicate how the iterator contents are to be used. Add a second function, iov_iter_extract_mode(), to determine how the cleanup should be done. There are three cases: (1) Transfer *into* an ITER_IOVEC or ITER_UBUF iterator. Extracted pages will have pins obtained on them (but not references) so that fork() doesn't CoW the pages incorrectly whilst the I/O is in progress. iov_iter_extract_mode() will return FOLL_PIN for this case. The caller should use something like unpin_user_page() to dispose of the page. (2) Transfer is *out of* an ITER_IOVEC or ITER_UBUF iterator. Extracted pages will have references obtained on them, but not pins. iov_iter_extract_mode() will return FOLL_GET. The caller should use something like put_page() for page disposal. (3) Any other sort of iterator. No refs or pins are obtained on the page, the assumption is made that the caller will manage page retention. ITER_ALLOW_P2PDMA is not permitted. iov_iter_extract_mode() will return 0. The pages don't need additional disposal. Signed-off-by: David Howells cc: Al Viro cc: Christoph Hellwig cc: John Hubbard cc: Matthew Wilcox cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/166920903885.1461876.692029808682876184.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/166997421646.9475.14837976344157464997.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/167305163883.1521586.10777155475378874823.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/167344728530.2425628.9613910866466387722.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/167391053207.2311931.16398133457201442907.stgit@warthog.procyon.org.uk/ # v6 --- Notes: ver #7) - Switch to passing in iter-specific flags rather than FOLL_* flags. - Drop the direction flags for now. - Use ITER_ALLOW_P2PDMA to request FOLL_PCI_P2PDMA. - Disallow use of ITER_ALLOW_P2PDMA with non-user-backed iter. - Add support for extraction from KVEC-type iters. - Use iov_iter_advance() rather than open-coding it. - Make BVEC- and KVEC-type skip over initial empty vectors. ver #6) - Add back the function to indicate the cleanup mode. - Drop the cleanup_mode return arg to iov_iter_extract_pages(). - Pass FOLL_SOURCE/DEST_BUF in gup_flags. Check this against the iter data_source. ver #4) - Use ITER_SOURCE/DEST instead of WRITE/READ. - Allow additional FOLL_* flags, such as FOLL_PCI_P2PDMA to be passed in. ver #3) - Switch to using EXPORT_SYMBOL_GPL to prevent indirect 3rd-party access to get/pin_user_pages_fast()[1]. include/linux/uio.h | 28 +++ lib/iov_iter.c | 424 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 452 insertions(+) diff --git a/include/linux/uio.h b/include/linux/uio.h index 46d5080314c6..a4233049ab7a 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -363,4 +363,32 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction, /* Flags for iov_iter_get/extract_pages*() */ #define ITER_ALLOW_P2PDMA 0x01 /* Allow P2PDMA on the extracted pages */ +ssize_t iov_iter_extract_pages(struct iov_iter *i, struct page ***pages, + size_t maxsize, unsigned int maxpages, + unsigned int extract_flags, size_t *offset0); + +/** + * iov_iter_extract_mode - Indicate how pages from the iterator will be retained + * @iter: The iterator + * @extract_flags: How the iterator is to be used + * + * Examine the iterator and @extract_flags and indicate by returning FOLL_PIN, + * FOLL_GET or 0 as to how, if at all, pages extracted from the iterator will + * be retained by the extraction function. + * + * FOLL_GET indicates that the pages will have a reference taken on them that + * the caller must put. This can be done for DMA/async DIO write from a page. + * + * FOLL_PIN indicates that the pages will have a pin placed in them that the + * caller must unpin. This is must be done for DMA/async DIO read to a page to + * avoid CoW problems in fork. + * + * 0 indicates that no measures are taken and that it's up to the caller to + * retain the pages. + */ +#define iov_iter_extract_mode(iter, extract_flags) \ + (user_backed_iter(iter) ? \ + (iter->data_source == ITER_SOURCE) ? \ + FOLL_GET : FOLL_PIN : 0) + #endif diff --git a/lib/iov_iter.c b/lib/iov_iter.c index fb04abe7d746..843abe566efb 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1916,3 +1916,427 @@ void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state) i->iov -= state->nr_segs - i->nr_segs; i->nr_segs = state->nr_segs; } + +/* + * Extract a list of contiguous pages from an ITER_PIPE iterator. This does + * not get references of its own on the pages, nor does it get a pin on them. + * If there's a partial page, it adds that first and will then allocate and add + * pages into the pipe to make up the buffer space to the amount required. + * + * The caller must hold the pipe locked and only transferring into a pipe is + * supported. + */ +static ssize_t iov_iter_extract_pipe_pages(struct iov_iter *i, + struct page ***pages, size_t maxsize, + unsigned int maxpages, + unsigned int extract_flags, + size_t *offset0) +{ + unsigned int nr, offset, chunk, j; + struct page **p; + size_t left; + + if (!sanity(i)) + return -EFAULT; + + offset = pipe_npages(i, &nr); + if (!nr) + return -EFAULT; + *offset0 = offset; + + maxpages = min_t(size_t, nr, maxpages); + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + p = *pages; + + left = maxsize; + for (j = 0; j < maxpages; j++) { + struct page *page = append_pipe(i, left, &offset); + if (!page) + break; + chunk = min_t(size_t, left, PAGE_SIZE - offset); + left -= chunk; + *p++ = page; + } + if (!j) + return -EFAULT; + return maxsize - left; +} + +/* + * Extract a list of contiguous pages from an ITER_XARRAY iterator. This does not + * get references on the pages, nor does it get a pin on them. + */ +static ssize_t iov_iter_extract_xarray_pages(struct iov_iter *i, + struct page ***pages, size_t maxsize, + unsigned int maxpages, + unsigned int extract_flags, + size_t *offset0) +{ + struct page *page, **p; + unsigned int nr = 0, offset; + loff_t pos = i->xarray_start + i->iov_offset; + pgoff_t index = pos >> PAGE_SHIFT; + XA_STATE(xas, i->xarray, index); + + offset = pos & ~PAGE_MASK; + *offset0 = offset; + + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + p = *pages; + + rcu_read_lock(); + for (page = xas_load(&xas); page; page = xas_next(&xas)) { + if (xas_retry(&xas, page)) + continue; + + /* Has the page moved or been split? */ + if (unlikely(page != xas_reload(&xas))) { + xas_reset(&xas); + continue; + } + + p[nr++] = find_subpage(page, xas.xa_index); + if (nr == maxpages) + break; + } + rcu_read_unlock(); + + maxsize = min_t(size_t, nr * PAGE_SIZE - offset, maxsize); + iov_iter_advance(i, maxsize); + return maxsize; +} + +/* + * Extract a list of contiguous pages from an ITER_BVEC iterator. This does + * not get references on the pages, nor does it get a pin on them. + */ +static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i, + struct page ***pages, size_t maxsize, + unsigned int maxpages, + unsigned int extract_flags, + size_t *offset0) +{ + struct page **p, *page; + size_t skip = i->iov_offset, offset; + int k; + + for (;;) { + if (i->nr_segs == 0) + return 0; + maxsize = min(maxsize, i->bvec->bv_len - skip); + if (maxsize) + break; + i->iov_offset = 0; + i->nr_segs--; + i->kvec++; + skip = 0; + } + + skip += i->bvec->bv_offset; + page = i->bvec->bv_page + skip / PAGE_SIZE; + offset = skip % PAGE_SIZE; + *offset0 = offset; + + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + p = *pages; + for (k = 0; k < maxpages; k++) + p[k] = page + k; + + maxsize = min_t(size_t, maxsize, maxpages * PAGE_SIZE - offset); + iov_iter_advance(i, maxsize); + return maxsize; +} + +/* + * Extract a list of virtually contiguous pages from an ITER_KVEC iterator. + * This does not get references on the pages, nor does it get a pin on them. + */ +static ssize_t iov_iter_extract_kvec_pages(struct iov_iter *i, + struct page ***pages, size_t maxsize, + unsigned int maxpages, + unsigned int extract_flags, + size_t *offset0) +{ + struct page **p, *page; + const void *kaddr; + size_t skip = i->iov_offset, offset, len; + int k; + + for (;;) { + if (i->nr_segs == 0) + return 0; + maxsize = min(maxsize, i->kvec->iov_len - skip); + if (maxsize) + break; + i->iov_offset = 0; + i->nr_segs--; + i->kvec++; + skip = 0; + } + + offset = skip % PAGE_SIZE; + *offset0 = offset; + kaddr = i->kvec->iov_base; + + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + p = *pages; + + kaddr -= offset; + len = offset + maxsize; + for (k = 0; k < maxpages; k++) { + size_t seg = min_t(size_t, len, PAGE_SIZE); + + if (is_vmalloc_or_module_addr(kaddr)) + page = vmalloc_to_page(kaddr); + else + page = virt_to_page(kaddr); + + p[k] = page; + len -= seg; + kaddr += PAGE_SIZE; + } + + maxsize = min_t(size_t, maxsize, maxpages * PAGE_SIZE - offset); + iov_iter_advance(i, maxsize); + return maxsize; +} + +/* + * Get the first segment from an ITER_UBUF or ITER_IOVEC iterator. The + * iterator must not be empty. + */ +static unsigned long iov_iter_extract_first_user_segment(const struct iov_iter *i, + size_t *size) +{ + size_t skip; + long k; + + if (iter_is_ubuf(i)) + return (unsigned long)i->ubuf + i->iov_offset; + + for (k = 0, skip = i->iov_offset; k < i->nr_segs; k++, skip = 0) { + size_t len = i->iov[k].iov_len - skip; + + if (unlikely(!len)) + continue; + if (*size > len) + *size = len; + return (unsigned long)i->iov[k].iov_base + skip; + } + BUG(); // if it had been empty, we wouldn't get called +} + +/* + * Extract a list of contiguous pages from a user iterator and get references + * on them. This should only be used iff the iterator is user-backed + * (IOBUF/UBUF) and data is being transferred out of the buffer described by + * the iterator (ie. this is the source). + * + * The pages are returned with incremented refcounts that the caller must undo + * once the transfer is complete, but no additional pins are obtained. + * + * This is only safe to be used where background IO/DMA is not going to be + * modifying the buffer, and so won't cause a problem with CoW on fork. + */ +static ssize_t iov_iter_extract_user_pages_and_get(struct iov_iter *i, + struct page ***pages, + size_t maxsize, + unsigned int maxpages, + unsigned int extract_flags, + size_t *offset0) +{ + unsigned long addr; + unsigned int gup_flags = FOLL_GET; + size_t offset; + int res; + + if (WARN_ON_ONCE(i->data_source != ITER_SOURCE)) + return -EFAULT; + + if (extract_flags & ITER_ALLOW_P2PDMA) + gup_flags |= FOLL_PCI_P2PDMA; + if (i->nofault) + gup_flags |= FOLL_NOFAULT; + + addr = iov_iter_extract_first_user_segment(i, &maxsize); + *offset0 = offset = addr % PAGE_SIZE; + addr &= PAGE_MASK; + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + res = get_user_pages_fast(addr, maxpages, gup_flags, *pages); + if (unlikely(res <= 0)) + return res; + maxsize = min_t(size_t, maxsize, res * PAGE_SIZE - offset); + iov_iter_advance(i, maxsize); + return maxsize; +} + +/* + * Extract a list of contiguous pages from a user iterator and get a pin on + * each of them. This should only be used iff the iterator is user-backed + * (IOBUF/UBUF) and data is being transferred into the buffer described by the + * iterator (ie. this is the destination). + * + * It does not get refs on the pages, but the pages must be unpinned by the + * caller once the transfer is complete. + * + * This is safe to be used where background IO/DMA *is* going to be modifying + * the buffer; using a pin rather than a ref makes sure that CoW happens + * correctly in the parent during fork. + */ +static ssize_t iov_iter_extract_user_pages_and_pin(struct iov_iter *i, + struct page ***pages, + size_t maxsize, + unsigned int maxpages, + unsigned int extract_flags, + size_t *offset0) +{ + unsigned long addr; + unsigned int gup_flags = FOLL_PIN | FOLL_WRITE; + size_t offset; + int res; + + if (WARN_ON_ONCE(i->data_source != ITER_DEST)) + return -EFAULT; + + if (extract_flags & ITER_ALLOW_P2PDMA) + gup_flags |= FOLL_PCI_P2PDMA; + if (i->nofault) + gup_flags |= FOLL_NOFAULT; + + addr = first_iovec_segment(i, &maxsize); + *offset0 = offset = addr % PAGE_SIZE; + addr &= PAGE_MASK; + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + res = pin_user_pages_fast(addr, maxpages, gup_flags, *pages); + if (unlikely(res <= 0)) + return res; + maxsize = min_t(size_t, maxsize, res * PAGE_SIZE - offset); + iov_iter_advance(i, maxsize); + return maxsize; +} + +static ssize_t iov_iter_extract_user_pages(struct iov_iter *i, + struct page ***pages, size_t maxsize, + unsigned int maxpages, + unsigned int extract_flags, + size_t *offset0) +{ + if (iov_iter_extract_mode(i, extract_flags) == FOLL_GET) + return iov_iter_extract_user_pages_and_get(i, pages, maxsize, + maxpages, extract_flags, + offset0); + else + return iov_iter_extract_user_pages_and_pin(i, pages, maxsize, + maxpages, extract_flags, + offset0); +} + +/** + * iov_iter_extract_pages - Extract a list of contiguous pages from an iterator + * @i: The iterator to extract from + * @pages: Where to return the list of pages + * @maxsize: The maximum amount of iterator to extract + * @maxpages: The maximum size of the list of pages + * @extract_flags: Flags to qualify request + * @offset0: Where to return the starting offset into (*@pages)[0] + * + * Extract a list of contiguous pages from the current point of the iterator, + * advancing the iterator. The maximum number of pages and the maximum amount + * of page contents can be set. + * + * If *@pages is NULL, a page list will be allocated to the required size and + * *@pages will be set to its base. If *@pages is not NULL, it will be assumed + * that the caller allocated a page list at least @maxpages in size and this + * will be filled in. + * + * @extract_flags can have ITER_ALLOW_P2PDMA set to request peer-to-peer DMA be + * allowed on the pages extracted. + * + * The iov_iter_extract_mode() function can be used to query how cleanup should + * be performed. + * + * Extra refs or pins on the pages may be obtained as follows: + * + * (*) If the iterator is user-backed (ITER_IOVEC/ITER_UBUF) and data is to be + * transferred /OUT OF/ the buffer (@i->data_source == ITER_SOURCE), refs + * will be taken on the pages, but pins will not be added. This can be + * used for DMA from a page; it cannot be used for DMA to a page, as it + * may cause page-COW problems in fork. iov_iter_extract_mode() will + * return FOLL_GET. + * + * (*) If the iterator is user-backed (ITER_IOVEC/ITER_UBUF) and data is to be + * transferred /INTO/ the described buffer (@i->data_source |= ITER_DEST), + * pins will be added to the pages, but refs will not be taken. This must + * be used for DMA to a page. iov_iter_extract_mode() will return + * FOLL_PIN. + * + * (*) If the iterator is ITER_PIPE, this must describe a destination for the + * data. Additional pages may be allocated and added to the pipe (which + * will hold the refs), but neither refs nor pins will be obtained for the + * caller. The caller must hold the pipe lock. iov_iter_extract_mode() + * will return 0. + * + * (*) If the iterator is ITER_KVEC, ITER_BVEC or ITER_XARRAY, the pages are + * merely listed; no extra refs or pins are obtained. + * iov_iter_extract_mode() will return 0. + * + * Note also: + * + * (*) Peer-to-peer DMA (ITER_ALLOW_P2PDMA) is only permitted with user-backed + * iterators. + * + * (*) Use with ITER_DISCARD is not supported as that has no content. + * + * On success, the function sets *@pages to the new pagelist, if allocated, and + * sets *offset0 to the offset into the first page.. + * + * It may also return -ENOMEM and -EFAULT. + */ +ssize_t iov_iter_extract_pages(struct iov_iter *i, + struct page ***pages, + size_t maxsize, + unsigned int maxpages, + unsigned int extract_flags, + size_t *offset0) +{ + maxsize = min_t(size_t, min_t(size_t, maxsize, i->count), MAX_RW_COUNT); + if (!maxsize) + return 0; + + if (likely(user_backed_iter(i))) + return iov_iter_extract_user_pages(i, pages, maxsize, + maxpages, extract_flags, + offset0); + if (WARN_ON_ONCE(extract_flags & ITER_ALLOW_P2PDMA)) + return -EIO; + if (iov_iter_is_kvec(i)) + return iov_iter_extract_kvec_pages(i, pages, maxsize, + maxpages, extract_flags, + offset0); + if (iov_iter_is_bvec(i)) + return iov_iter_extract_bvec_pages(i, pages, maxsize, + maxpages, extract_flags, + offset0); + if (iov_iter_is_pipe(i)) + return iov_iter_extract_pipe_pages(i, pages, maxsize, + maxpages, extract_flags, + offset0); + if (iov_iter_is_xarray(i)) + return iov_iter_extract_xarray_pages(i, pages, maxsize, + maxpages, extract_flags, + offset0); + return -EFAULT; +} +EXPORT_SYMBOL_GPL(iov_iter_extract_pages); From patchwork Fri Jan 20 17:55:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 46553 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp350246wrn; Fri, 20 Jan 2023 10:11:21 -0800 (PST) X-Google-Smtp-Source: AMrXdXuupndagNFmhmrQCsnZEf7DLpoV/MFBlISzLNT/02oKnfLFeK4ee/g8Czq6oniXRhexo68v X-Received: by 2002:a17:902:db01:b0:195:efb2:6c6d with SMTP id m1-20020a170902db0100b00195efb26c6dmr2131452plx.45.1674238281247; Fri, 20 Jan 2023 10:11:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674238281; cv=none; d=google.com; s=arc-20160816; b=M+qwZT003FIXbN6GWqEzEF5Jq9Fht1orNmWrKhIxSGGtIULpOdPMlLkcChcN3OjrEc LVACqk4URkEzWXLaz9CnVuDORG2rmuh9uM0DavpmklG8GdLuyv5/zCOXMkGUY9dhf8PT A9MK/jvIJI0EqXJjvGUgsDba3qAU3y8Kfl1A1uxaQorCH2zhNSy9xT9qfNScpn67xe4M yMvzmj6gxIEW4myPftNvysu3BpDpqnoK3CtYr1o18gGEOpJ0Ca0APSujK0ucSxjBFSp7 DNsIx0+KfLJdz1nSPHCAwrUVx6WeOYgVz5KQJJlKrTiWR3YFfIm1+pS9wuq6uzob51Ta 4wCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=CYkyXriQe+db2xB55AZm41G9E0QZGdRv8ItoT4z1ez8=; b=ulzbuJrVV2qA7Tc5T2GSeyWzuHIt4SfkJsTF7eeFkqiiligEhwtw3rungfz+fnFIEZ WQ+h+RTBprjZ6937DPlViojIiXU/e6F7QXJAuTZBcf+iWRXH+2XbatgJt/jgTFb2vWwl aIZTDtMPql+TsNGe+Nn7wbW8fYKnnFOHYdPzuMXE/H4xP/pwUlA8wCdp8oDE1icPdg0Z e25512fdqt1RjkrqW6uGq3SWN0Cj9YFpw1Nus4MSSgBXKhLhEmE2RGAxKYnl+VZI7t0/ JK711+h4iHQ7/SFvYxE7YVdIGWVKoY3POPQAqe9FxUC26zMIZCpntdWIxLzumyp2uADL Dk6Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=U4FysDuP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p9-20020a170902eac900b00191ffe197bbsi41402188pld.212.2023.01.20.10.11.08; Fri, 20 Jan 2023 10:11:21 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=U4FysDuP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230211AbjATR5m (ORCPT + 99 others); Fri, 20 Jan 2023 12:57:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46912 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229943AbjATR5k (ORCPT ); Fri, 20 Jan 2023 12:57:40 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D458C46091 for ; Fri, 20 Jan 2023 09:56:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1674237375; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CYkyXriQe+db2xB55AZm41G9E0QZGdRv8ItoT4z1ez8=; b=U4FysDuPRIwf4vXTToFrnIYAasHiGSmvyB4DbHtnutn0ayA8/zz4JbEBP/cPA59vYIJnjs HpJS2aHbDY+q8Y5/Tcu5zD218UhMEX35iJf234mnvzJtqaqxlEAS/fZXHQfSmbEprVK2LB u6s0S18Ds9qrQjifD9kF5ED1gFowDg8= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-569-rOeDDMKeNQmo4B5_T1HLmg-1; Fri, 20 Jan 2023 12:56:09 -0500 X-MC-Unique: rOeDDMKeNQmo4B5_T1HLmg-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 04CEB185A794; Fri, 20 Jan 2023 17:56:09 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7C3A5492C3C; Fri, 20 Jan 2023 17:56:07 +0000 (UTC) From: David Howells To: Al Viro , Christoph Hellwig Cc: David Howells , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Christoph Hellwig , linux-mm@kvack.org Subject: [PATCH v7 3/8] mm: Provide a helper to drop a pin/ref on a page Date: Fri, 20 Jan 2023 17:55:51 +0000 Message-Id: <20230120175556.3556978-4-dhowells@redhat.com> In-Reply-To: <20230120175556.3556978-1-dhowells@redhat.com> References: <20230120175556.3556978-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755566079996346164?= X-GMAIL-MSGID: =?utf-8?q?1755566079996346164?= Provide a helper in the get_user_pages code to drop a pin or a ref on a page based on being given FOLL_GET or FOLL_PIN in its flags argument or do nothing if neither is set. Signed-off-by: David Howells cc: Al Viro cc: Christoph Hellwig cc: Matthew Wilcox cc: linux-fsdevel@vger.kernel.org cc: linux-block@vger.kernel.org cc: linux-mm@kvack.org --- include/linux/mm.h | 3 +++ mm/gup.c | 22 ++++++++++++++++++++++ 2 files changed, 25 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index f3f196e4d66d..f1cf8f4eb946 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1367,6 +1367,9 @@ static inline bool is_cow_mapping(vm_flags_t flags) #define SECTION_IN_PAGE_FLAGS #endif +void folio_put_unpin(struct folio *folio, unsigned int flags); +void page_put_unpin(struct page *page, unsigned int flags); + /* * The identification function is mainly used by the buddy allocator for * determining if two pages could be buddies. We are not really identifying diff --git a/mm/gup.c b/mm/gup.c index f45a3a5be53a..3ee4b4c7e0cb 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -191,6 +191,28 @@ static void gup_put_folio(struct folio *folio, int refs, unsigned int flags) folio_put_refs(folio, refs); } +/** + * folio_put_unpin - Unpin/put a folio as appropriate + * @folio: The folio to release + * @flags: gup flags indicating the mode of release (FOLL_*) + * + * Release a folio according to the flags. If FOLL_GET is set, the folio has a + * ref dropped; if FOLL_PIN is set, it is unpinned; otherwise it is left + * unaltered. + */ +void folio_put_unpin(struct folio *folio, unsigned int flags) +{ + if (flags & (FOLL_GET | FOLL_PIN)) + gup_put_folio(folio, 1, flags); +} +EXPORT_SYMBOL_GPL(folio_put_unpin); + +void page_put_unpin(struct page *page, unsigned int flags) +{ + folio_put_unpin(page_folio(page), flags); +} +EXPORT_SYMBOL_GPL(page_put_unpin); + /** * try_grab_page() - elevate a page's refcount by a flag-dependent amount * @page: pointer to page to be grabbed From patchwork Fri Jan 20 17:55:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 46554 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp350516wrn; Fri, 20 Jan 2023 10:11:52 -0800 (PST) X-Google-Smtp-Source: AMrXdXvCXwT0UeEhCpwDpXsACuV4nOnk79eF0hVODzCGJelz6wXI63l2Hj7Fewk+OtVh06ax7mJW X-Received: by 2002:aa7:dd44:0:b0:49e:689:f003 with SMTP id o4-20020aa7dd44000000b0049e0689f003mr15183317edw.10.1674238312451; Fri, 20 Jan 2023 10:11:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674238312; cv=none; d=google.com; s=arc-20160816; b=LEmc97PGp7utwK+a963jcKsRm6TXH9aWg621M55PJY7mUxt44jK/sAGyVlc9vY4HiB t5h1LoJQyHC2VD3vRNnHzERf5FzSIgioAfYnHoHM3culjns7xMYnENIuWr37qLzVPZXo cVa577alBQJVC6RAAPK555C1uAXU8o6HB1audF3rY8dmPH9ZZop3tnpBOLzqFYG7ZQcF sddiDL4iFrjAS+Hc3BArX4E6ZmBeHIdXDzMS9+bhQmKVlBGbTKrgRb4sFc4zMHfnECRI 33Ms9kauomVhJRdfE1oEg9SQqhbJAJM4oFrVmGKtNwrPbWmA3PDU08iSnhfTSaH1SmVT TCaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=+g3DRHKAgrPcJigEk3L/WkOUOzIOHOHcwdPC7AZqHTY=; b=fEuHDHfFdXImbAd6GuBnIqizLK9k+AiON9Ph7HdUrVlQoUwNp8bHp5EPuiwD3ThXlC /PIzTKJGu3AC+uz9CgawjjvCduhgMw9kVbnK1dg2IKFxGnD2MygZyTGjdIp2FpN3waNb etc9FlcjRLVjrApxwpBNj8tONLo9cuhCs8aYch0KAI8ujQgCR3van1qGoQn5nr9NtpjD xD/4X6u/hSUqCqLRRztaQTMDnbgLLq5i7kr6e0lUfkZ0u4qjmfQFaml7vuqbETcaFFxt fF07j23r26lxUDMzbteIKsTtSlDbMrkvupv01YPQ2eo7pBJ+Pnozre6C4+GhtBn8S6wH 47+w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=YpgLdrxy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y9-20020a056402270900b0049e667887c8si8249142edd.506.2023.01.20.10.11.28; Fri, 20 Jan 2023 10:11:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=YpgLdrxy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231143AbjATR5w (ORCPT + 99 others); Fri, 20 Jan 2023 12:57:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46910 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231127AbjATR5n (ORCPT ); Fri, 20 Jan 2023 12:57:43 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B546547ECB for ; Fri, 20 Jan 2023 09:56:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1674237376; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+g3DRHKAgrPcJigEk3L/WkOUOzIOHOHcwdPC7AZqHTY=; b=YpgLdrxyRGc1XJTJqGZMQ1r1Qn9EQViu80XJn+rHNbDRtqIeQVZH1zMugoZKmrUb7Ug5jp OLe8VeWLq9kARbJ8Zf7b2ZRd86TxtxWTXHSJAPscivpxrEGk5L0Xeksvs+5HOWk2kS85Bb yp3l011JuIiIXgGG8dovc+Z/iBmtnZU= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-355-cnhAqD6VOd6VC0a-4QNREg-1; Fri, 20 Jan 2023 12:56:11 -0500 X-MC-Unique: cnhAqD6VOd6VC0a-4QNREg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0D8C1185A794; Fri, 20 Jan 2023 17:56:11 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9C6CC1121315; Fri, 20 Jan 2023 17:56:09 +0000 (UTC) From: David Howells To: Al Viro , Christoph Hellwig Cc: David Howells , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Christoph Hellwig Subject: [PATCH v7 4/8] block: Rename BIO_NO_PAGE_REF to BIO_PAGE_REFFED and invert the meaning Date: Fri, 20 Jan 2023 17:55:52 +0000 Message-Id: <20230120175556.3556978-5-dhowells@redhat.com> In-Reply-To: <20230120175556.3556978-1-dhowells@redhat.com> References: <20230120175556.3556978-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755566112328989195?= X-GMAIL-MSGID: =?utf-8?q?1755566112328989195?= Rename BIO_NO_PAGE_REF to BIO_PAGE_REFFED and invert the meaning. In a following patch I intend to add a BIO_PAGE_PINNED flag to indicate that the page needs unpinning and this way both flags have the same logic. Signed-off-by: David Howells Reviewed-by: Christoph Hellwig cc: Al Viro cc: Jens Axboe cc: Jan Kara cc: Matthew Wilcox cc: Logan Gunthorpe cc: linux-block@vger.kernel.org Link: https://lore.kernel.org/r/167305166150.1521586.10220949115402059720.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/167344730802.2425628.14034153595667416149.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/167391054631.2311931.7588488803802952158.stgit@warthog.procyon.org.uk/ # v6 --- Notes: ver #5) - Split from patch that uses iov_iter_extract_pages(). block/bio.c | 9 ++++++++- fs/iomap/direct-io.c | 1 - include/linux/bio.h | 2 +- include/linux/blk_types.h | 2 +- 4 files changed, 10 insertions(+), 4 deletions(-) diff --git a/block/bio.c b/block/bio.c index 6a6c73d14bfd..cfe11f4799d1 100644 --- a/block/bio.c +++ b/block/bio.c @@ -243,6 +243,10 @@ static void bio_free(struct bio *bio) * Users of this function have their own bio allocation. Subsequently, * they must remember to pair any call to bio_init() with bio_uninit() * when IO has completed, or when the bio is released. + * + * We set the initial assumption that pages attached to the bio will be + * released with put_page() by setting BIO_PAGE_REFFED; if the pages + * should not be put, this flag should be cleared. */ void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table, unsigned short max_vecs, blk_opf_t opf) @@ -274,6 +278,7 @@ void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table, #ifdef CONFIG_BLK_DEV_INTEGRITY bio->bi_integrity = NULL; #endif + bio_set_flag(bio, BIO_PAGE_REFFED); bio->bi_vcnt = 0; atomic_set(&bio->__bi_remaining, 1); @@ -302,6 +307,7 @@ void bio_reset(struct bio *bio, struct block_device *bdev, blk_opf_t opf) { bio_uninit(bio); memset(bio, 0, BIO_RESET_BYTES); + bio_set_flag(bio, BIO_PAGE_REFFED); atomic_set(&bio->__bi_remaining, 1); bio->bi_bdev = bdev; if (bio->bi_bdev) @@ -812,6 +818,7 @@ EXPORT_SYMBOL(bio_put); static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp) { bio_set_flag(bio, BIO_CLONED); + bio_clear_flag(bio, BIO_PAGE_REFFED); bio->bi_ioprio = bio_src->bi_ioprio; bio->bi_iter = bio_src->bi_iter; @@ -1198,7 +1205,7 @@ void bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter) bio->bi_io_vec = (struct bio_vec *)iter->bvec; bio->bi_iter.bi_bvec_done = iter->iov_offset; bio->bi_iter.bi_size = size; - bio_set_flag(bio, BIO_NO_PAGE_REF); + bio_clear_flag(bio, BIO_PAGE_REFFED); bio_set_flag(bio, BIO_CLONED); } diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 9804714b1751..c0e75900e754 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -202,7 +202,6 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio, bio->bi_private = dio; bio->bi_end_io = iomap_dio_bio_end_io; - get_page(page); __bio_add_page(bio, page, len, 0); iomap_dio_submit_bio(iter, dio, bio, pos); } diff --git a/include/linux/bio.h b/include/linux/bio.h index 22078a28d7cb..63bfd91793f9 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -482,7 +482,7 @@ void zero_fill_bio(struct bio *bio); static inline void bio_release_pages(struct bio *bio, bool mark_dirty) { - if (!bio_flagged(bio, BIO_NO_PAGE_REF)) + if (bio_flagged(bio, BIO_PAGE_REFFED)) __bio_release_pages(bio, mark_dirty); } diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 99be590f952f..86711fb0534a 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -318,7 +318,7 @@ struct bio { * bio flags */ enum { - BIO_NO_PAGE_REF, /* don't put release vec pages */ + BIO_PAGE_REFFED, /* Pages need refs putting (equivalent to FOLL_GET) */ BIO_CLONED, /* doesn't own data */ BIO_BOUNCED, /* bio is a bounce bio */ BIO_QUIET, /* Make BIO Quiet */ From patchwork Fri Jan 20 17:55:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 46555 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp350875wrn; Fri, 20 Jan 2023 10:12:31 -0800 (PST) X-Google-Smtp-Source: AMrXdXscnqfGzxnd9jAzYUysZiRCyriIJY1OBWDMcFYL34452nDqinl1V/qvndog7mb7lbdBScne X-Received: by 2002:a17:907:8b98:b0:84d:44dd:e03a with SMTP id tb24-20020a1709078b9800b0084d44dde03amr17919724ejc.57.1674238351434; Fri, 20 Jan 2023 10:12:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674238351; cv=none; d=google.com; s=arc-20160816; b=logDxpEGMq9rWIcrM1ZpwqgZ/+E+eyvYO7gI/L7fKNdAKihGG2AmeZ97xfUgc30RWC alb3krF4a5x1BL0m1ghQwQ3vyd5UpA1trZHHJoD3KzcIpDCIcezQuNFweRfHxXATq+G/ 0P4eJ354JsK1WzoOiw6qsP/ubI7irqTe8LIpiSuEiXdaEnz6odGMe/Fxp3w82iFywAGc ZKAqwG8ID/1q1UX9A3Bjvnm8B0TjVdLcEIJs/tdJ/UVgHdBgPzy8Q+uB/l2JZBJ7N/HV R9q0YcmUk0kMVw0hoFN5h5iJVX0/x4/bb8h7jV/K9nyMt7vCW5AmuzYh8fH5qxaEEJFc JABg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ta463GNWpnL1eda0EOJixqIlh22m7Il+MjZ51ImK1zc=; b=gSquEXeNzBP2gGfur1H/YAdlYpCSpYe0CelV+eeL1r6HMASg/ItKICskm1eeJ6pvzl I/1NmhfNnRXJOmh3htEsV93ziL+/rbmxrrg1CKxGxqM0Cs/jlNVfcajwPUa973JC4uzV mnioF6ARMxtVUyK+NMRP5apU8XUNsEt9F3DK79ul2FpnM9MXBqoqbhbcjv6adl3urrn8 45ZmtwV2GlTArM++3+/JpJD1aCMVxJi8Hg0lewdculcPVDqkntE0SWzJI0yixlL1ZjmJ /dR7rCxBJ0hxc4p3N7bBEYBWqeSqHVqY+gN6vBH7Fs2XsKaIthMiUQvKCBWEYC++ezib XztQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=h1358uUW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id go11-20020a1709070d8b00b0087784d21e5asi4846360ejc.721.2023.01.20.10.12.07; Fri, 20 Jan 2023 10:12:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=h1358uUW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229473AbjATR6k (ORCPT + 99 others); Fri, 20 Jan 2023 12:58:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47568 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230189AbjATR60 (ORCPT ); Fri, 20 Jan 2023 12:58:26 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 853C44ED0C for ; Fri, 20 Jan 2023 09:56:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1674237378; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ta463GNWpnL1eda0EOJixqIlh22m7Il+MjZ51ImK1zc=; b=h1358uUWgs/AJBnstXP9lK6fMC30LAU0iyhKlQbqPByH5u2vMiT7bT4l+OhfaSOELdJ9RV I7dVFAnLMhaEC0cNacj3uM7Ino6y7ox80Q/bgaeVrc7WjgUdJOa5SjlNxgGsu+jqrOcJsw k5N9dDwcD+RBiCxYhp8/quQSBTIyB54= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-635-QpHAzj-iPBujrtcfNdvXMQ-1; Fri, 20 Jan 2023 12:56:13 -0500 X-MC-Unique: QpHAzj-iPBujrtcfNdvXMQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 13593380450D; Fri, 20 Jan 2023 17:56:13 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id A47B640C6EC4; Fri, 20 Jan 2023 17:56:11 +0000 (UTC) From: David Howells To: Al Viro , Christoph Hellwig Cc: David Howells , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Christoph Hellwig Subject: [PATCH v7 5/8] block: Add BIO_PAGE_PINNED Date: Fri, 20 Jan 2023 17:55:53 +0000 Message-Id: <20230120175556.3556978-6-dhowells@redhat.com> In-Reply-To: <20230120175556.3556978-1-dhowells@redhat.com> References: <20230120175556.3556978-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755566153842420327?= X-GMAIL-MSGID: =?utf-8?q?1755566153842420327?= BIO_PAGE_PINNED to indicate that the pages in a bio were pinned (FOLL_PIN) and that the pin will need removing. Signed-off-by: David Howells cc: Al Viro cc: Jens Axboe cc: Jan Kara cc: Christoph Hellwig cc: Matthew Wilcox cc: Logan Gunthorpe cc: linux-block@vger.kernel.org --- include/linux/blk_types.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 86711fb0534a..42b40156c517 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -319,6 +319,7 @@ struct bio { */ enum { BIO_PAGE_REFFED, /* Pages need refs putting (equivalent to FOLL_GET) */ + BIO_PAGE_PINNED, /* Pages need unpinning (equivalent to FOLL_PIN) */ BIO_CLONED, /* doesn't own data */ BIO_BOUNCED, /* bio is a bounce bio */ BIO_QUIET, /* Make BIO Quiet */ From patchwork Fri Jan 20 17:55:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 46557 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp351033wrn; Fri, 20 Jan 2023 10:12:51 -0800 (PST) X-Google-Smtp-Source: AMrXdXvH1/EMCMWyN85XWy5wbG/bgafjEPDRgmg/SPt0Uri6AJ7vK3nQ1JRifzBE/vlsM1CxJxo0 X-Received: by 2002:a17:902:f1ca:b0:194:d48a:2c17 with SMTP id e10-20020a170902f1ca00b00194d48a2c17mr5364820plc.11.1674238371518; Fri, 20 Jan 2023 10:12:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674238371; cv=none; d=google.com; s=arc-20160816; b=rmlmKsMarX2hk5c6gk3zBEsc1gMquv0ug5I3qhKnsC/djHSlUJjrRKPxpgwDnX6hq3 d5wASFsx36rp+LN3kDpmiUMEPQAjF+KqnT/etyDivePjt1je0WPm+n4J+O2fq+wlCCm+ oFz+BqYuu8NRz9FC4++U0HnuEIo9uTliT57o/jAhOlWDGeiCrc4o4Xbs7EX4JmfGOP/9 +20/8Db0wUFiRfta3TZRGYlpl2XV/Y9GH5i4j7XpbzeV7o5qHjiaDlpybyXamy5UxIkc 6QIakDIKGQGOKTEtkFjSqxjocTxILZ2NefV2b8y51M/YOZimy8Nr/uImXOdqLzDtz1xg oyKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=QX1h8EIb8MUDuVwndBiO27rWOJbbjgtov60pgMyP8j8=; b=jmsDKOJpzwQGWguT9erzHRZEq7VVJnA68MTYWQBIURRLN8Nu8iPs4InLDHhm2xDlsG Hi6qYZcNlDfDA2EKjOtgpz8u8FS1Ns9mpAoZxKo0yx1wYPyxa6IaGxFh20KbwG5JOsNH ybq3NxetrFtp6ea+Oo3N1k0r7Y0sPe+UQf/81R1HmYD/WmWpVXEOmObxUqNt2snnIu8s +niCj3JvFpBp4/rvTXIeX6EPbIhhW4kCAgfubbj0uAX47f+kiMdjM4vM+od+18vKpU+F 6hlxKb0BQq7CuJRsp1CEEhAsjjPW9Q63a7DvD4gsOKzzUspYkS7W5eP30xP8KNBcWKuB xdZQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=BjYXQg7N; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u12-20020a17090341cc00b00189ac5a25e8si14602750ple.160.2023.01.20.10.12.38; Fri, 20 Jan 2023 10:12:51 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=BjYXQg7N; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229878AbjATR6h (ORCPT + 99 others); Fri, 20 Jan 2023 12:58:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47564 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231169AbjATR6X (ORCPT ); Fri, 20 Jan 2023 12:58:23 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CFB5048605 for ; Fri, 20 Jan 2023 09:56:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1674237377; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QX1h8EIb8MUDuVwndBiO27rWOJbbjgtov60pgMyP8j8=; b=BjYXQg7Nqgc0VZpwRjlwgGuwuSXGoahD8MXgUBLJ3c34mcMRp4cf7lYOnR/KyN1S6rHKLS H60xwJ5VOAVvB2m8zwBI3+IhKKwetYmOh5szppUbQIA+288jJCPaI72pcjGOtWARE0Y5nw Z21VRR40koS6TLkLfnXrnE75Ip+OWyY= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-492-Run05T5QMhCwWBqWatVMjA-1; Fri, 20 Jan 2023 12:56:15 -0500 X-MC-Unique: Run05T5QMhCwWBqWatVMjA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4E00C2807D6F; Fri, 20 Jan 2023 17:56:15 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id C55EF1121315; Fri, 20 Jan 2023 17:56:13 +0000 (UTC) From: David Howells To: Al Viro , Christoph Hellwig Cc: David Howells , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Christoph Hellwig Subject: [PATCH v7 6/8] block: Make bio structs pin pages rather than ref'ing if appropriate Date: Fri, 20 Jan 2023 17:55:54 +0000 Message-Id: <20230120175556.3556978-7-dhowells@redhat.com> In-Reply-To: <20230120175556.3556978-1-dhowells@redhat.com> References: <20230120175556.3556978-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755566174677334998?= X-GMAIL-MSGID: =?utf-8?q?1755566174677334998?= Convert the block layer's bio code to use iov_iter_extract_pages() instead of iov_iter_get_pages(). This will pin pages or leave them unaltered rather than getting a ref on them as appropriate to the source iterator. The pages need to be pinned for DIO-read rather than having refs taken on them to prevent VM copy-on-write from malfunctioning during a concurrent fork() (the result of the I/O would otherwise end up only visible to the child process and not the parent). To implement this: (1) If the BIO_PAGE_REFFED flag is set, this causes attached pages to be passed to put_page() during cleanup. (2) A BIO_PAGE_PINNED flag is provided. If set, this causes attached pages to be passed to unpin_user_page() during cleanup. (3) BIO_PAGE_REFFED is set by default and BIO_PAGE_PINNED is cleared by default when the bio is (re-)initialised. (4) If iov_iter_extract_pages() indicates FOLL_GET, this causes BIO_PAGE_REFFED to be set and if FOLL_PIN is indicated, this causes BIO_PAGE_PINNED to be set. If it returns neither FOLL_* flag, then both BIO_PAGE_* flags will be cleared. Mixing sets of pages with different clean up modes is not supported. (5) Cloned bio structs have both flags cleared. (6) bio_release_pages() will do the release if either BIO_PAGE_* flag is set. [!] Note that this is tested a bit with ext4, but nothing else. Signed-off-by: David Howells cc: Al Viro cc: Jens Axboe cc: Jan Kara cc: Christoph Hellwig cc: Matthew Wilcox cc: Logan Gunthorpe cc: linux-block@vger.kernel.org Link: https://lore.kernel.org/r/167305166150.1521586.10220949115402059720.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/167344731521.2425628.5403113335062567245.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/167391056047.2311931.6772604381276147664.stgit@warthog.procyon.org.uk/ # v6 --- Notes: ver #7) - Don't treat BIO_PAGE_REFFED/PINNED as being the same as FOLL_GET/PIN. ver #5) - Transcribe the FOLL_* flags returned by iov_iter_extract_pages() to BIO_* flags and got rid of bi_cleanup_mode. - Replaced BIO_NO_PAGE_REF to BIO_PAGE_REFFED in the preceding patch. block/bio.c | 34 +++++++++++++++++++--------------- block/blk-map.c | 22 +++++++++++----------- block/blk.h | 29 +++++++++++++++++++++++++++++ include/linux/bio.h | 3 ++- 4 files changed, 61 insertions(+), 27 deletions(-) diff --git a/block/bio.c b/block/bio.c index cfe11f4799d1..2a6568b58501 100644 --- a/block/bio.c +++ b/block/bio.c @@ -245,8 +245,9 @@ static void bio_free(struct bio *bio) * when IO has completed, or when the bio is released. * * We set the initial assumption that pages attached to the bio will be - * released with put_page() by setting BIO_PAGE_REFFED; if the pages - * should not be put, this flag should be cleared. + * released with put_page() by setting BIO_PAGE_REFFED, but this should be set + * to BIO_PAGE_PINNED if the page should be unpinned instead; if the pages + * should not be put or unpinned, these flags should be cleared. */ void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table, unsigned short max_vecs, blk_opf_t opf) @@ -819,6 +820,7 @@ static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp) { bio_set_flag(bio, BIO_CLONED); bio_clear_flag(bio, BIO_PAGE_REFFED); + bio_clear_flag(bio, BIO_PAGE_PINNED); bio->bi_ioprio = bio_src->bi_ioprio; bio->bi_iter = bio_src->bi_iter; @@ -1183,7 +1185,7 @@ void __bio_release_pages(struct bio *bio, bool mark_dirty) bio_for_each_segment_all(bvec, bio, iter_all) { if (mark_dirty && !PageCompound(bvec->bv_page)) set_page_dirty_lock(bvec->bv_page); - put_page(bvec->bv_page); + bio_release_page(bio, bvec->bv_page); } } EXPORT_SYMBOL_GPL(__bio_release_pages); @@ -1220,7 +1222,7 @@ static int bio_iov_add_page(struct bio *bio, struct page *page, } if (same_page) - put_page(page); + bio_release_page(bio, page); return 0; } @@ -1234,7 +1236,7 @@ static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page, queue_max_zone_append_sectors(q), &same_page) != len) return -EINVAL; if (same_page) - put_page(page); + bio_release_page(bio, page); return 0; } @@ -1245,10 +1247,10 @@ static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page, * @bio: bio to add pages to * @iter: iov iterator describing the region to be mapped * - * Pins pages from *iter and appends them to @bio's bvec array. The - * pages will have to be released using put_page() when done. - * For multi-segment *iter, this function only adds pages from the - * next non-empty segment of the iov iterator. + * Extracts pages from *iter and appends them to @bio's bvec array. The pages + * will have to be cleaned up in the way indicated by the BIO_PAGE_REFFED and + * BIO_PAGE_PINNED flags. For a multi-segment *iter, this function only adds + * pages from the next non-empty segment of the iov iterator. */ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) { @@ -1280,9 +1282,9 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) * result to ensure the bio's total size is correct. The remainder of * the iov data will be picked up in the next bio iteration. */ - size = iov_iter_get_pages(iter, pages, - UINT_MAX - bio->bi_iter.bi_size, - nr_pages, &offset, extract_flags); + size = iov_iter_extract_pages(iter, &pages, + UINT_MAX - bio->bi_iter.bi_size, + nr_pages, extract_flags, &offset); if (unlikely(size <= 0)) return size ? size : -EFAULT; @@ -1315,7 +1317,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) iov_iter_revert(iter, left); out: while (i < nr_pages) - put_page(pages[i++]); + bio_release_page(bio, pages[i++]); return ret; } @@ -1344,6 +1346,8 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) { int ret = 0; + bio_set_cleanup_mode(bio, iter, 0); + if (iov_iter_is_bvec(iter)) { bio_iov_bvec_set(bio, iter); iov_iter_advance(iter, bio->bi_iter.bi_size); @@ -1496,8 +1500,8 @@ void bio_set_pages_dirty(struct bio *bio) * the BIO and re-dirty the pages in process context. * * It is expected that bio_check_pages_dirty() will wholly own the BIO from - * here on. It will run one put_page() against each page and will run one - * bio_put() against the BIO. + * here on. It will run one put_page() or unpin_user_page() against each page + * and will run one bio_put() against the BIO. */ static void bio_dirty_fn(struct work_struct *work); diff --git a/block/blk-map.c b/block/blk-map.c index bc111261fc82..7d1bc75b9cf2 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -282,20 +282,20 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter, if (blk_queue_pci_p2pdma(rq->q)) extract_flags |= ITER_ALLOW_P2PDMA; + bio_set_cleanup_mode(bio, iter, extract_flags); + while (iov_iter_count(iter)) { - struct page **pages, *stack_pages[UIO_FASTIOV]; + struct page *stack_pages[UIO_FASTIOV]; + struct page **pages = stack_pages; ssize_t bytes; size_t offs; int npages; - if (nr_vecs <= ARRAY_SIZE(stack_pages)) { - pages = stack_pages; - bytes = iov_iter_get_pages(iter, pages, LONG_MAX, - nr_vecs, &offs, extract_flags); - } else { - bytes = iov_iter_get_pages_alloc(iter, &pages, - LONG_MAX, &offs, extract_flags); - } + if (nr_vecs > ARRAY_SIZE(stack_pages)) + pages = NULL; + + bytes = iov_iter_extract_pages(iter, &pages, LONG_MAX, + nr_vecs, extract_flags, &offs); if (unlikely(bytes <= 0)) { ret = bytes ? bytes : -EFAULT; goto out_unmap; @@ -317,7 +317,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter, if (!bio_add_hw_page(rq->q, bio, page, n, offs, max_sectors, &same_page)) { if (same_page) - put_page(page); + bio_release_page(bio, page); break; } @@ -329,7 +329,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter, * release the pages we didn't map into the bio, if any */ while (j < npages) - put_page(pages[j++]); + bio_release_page(bio, pages[j++]); if (pages != stack_pages) kvfree(pages); /* couldn't stuff something into bio? */ diff --git a/block/blk.h b/block/blk.h index 4c3b3325219a..16c8a7a84a16 100644 --- a/block/blk.h +++ b/block/blk.h @@ -425,6 +425,35 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio, struct page *page, unsigned int len, unsigned int offset, unsigned int max_sectors, bool *same_page); +/* + * Set the cleanup mode for a bio from an iterator and the extraction flags. + */ +static inline void bio_set_cleanup_mode(struct bio *bio, struct iov_iter *iter, + unsigned int extract_flags) +{ + unsigned int cleanup_mode; + + bio_clear_flag(bio, BIO_PAGE_REFFED); + cleanup_mode = iov_iter_extract_mode(iter, extract_flags); + if (cleanup_mode & FOLL_GET) + bio_set_flag(bio, BIO_PAGE_REFFED); + if (cleanup_mode & FOLL_PIN) + bio_set_flag(bio, BIO_PAGE_PINNED); +} + +/* + * Clean up a page appropriately, where the page may be pinned, may have a + * ref taken on it or neither. + */ +static inline void bio_release_page(struct bio *bio, struct page *page) +{ + unsigned int gup_flags = 0; + + gup_flags |= bio_flagged(bio, BIO_PAGE_REFFED) ? FOLL_GET : 0; + gup_flags |= bio_flagged(bio, BIO_PAGE_PINNED) ? FOLL_PIN : 0; + page_put_unpin(page, gup_flags); +} + struct request_queue *blk_alloc_queue(int node_id); int disk_scan_partitions(struct gendisk *disk, fmode_t mode, void *owner); diff --git a/include/linux/bio.h b/include/linux/bio.h index 63bfd91793f9..1c6f051f6ff2 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -482,7 +482,8 @@ void zero_fill_bio(struct bio *bio); static inline void bio_release_pages(struct bio *bio, bool mark_dirty) { - if (bio_flagged(bio, BIO_PAGE_REFFED)) + if (bio_flagged(bio, BIO_PAGE_REFFED) || + bio_flagged(bio, BIO_PAGE_PINNED)) __bio_release_pages(bio, mark_dirty); } From patchwork Fri Jan 20 17:55:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 46558 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp351384wrn; Fri, 20 Jan 2023 10:13:37 -0800 (PST) X-Google-Smtp-Source: AMrXdXua4Z7zLo2H+A0Eo3ZVrtzQk/tjPVIZ49XDQij9Bmd+3YurSdh9SW1jWOgcgGhw197aRStG X-Received: by 2002:a05:6a20:6a93:b0:b8:aa87:e322 with SMTP id bi19-20020a056a206a9300b000b8aa87e322mr16850775pzb.13.1674238416687; Fri, 20 Jan 2023 10:13:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674238416; cv=none; d=google.com; s=arc-20160816; b=RmNCEyEWjRB8MltiG7C88XuDinlan/ONoKsaOFAlWFxNuO/dUucryDSz8l4P95Uy68 GwKc5y/FtcUJECseIdzutd1b8Sc+XCEQbWCSlwjJWKrvTnfsc6ZI/P4ZIkllYiP/CDIV 1+YDcWVmGyXzUzP9W+wWWP0TX99hDBGGt6h1NfWJSeWAh/gzmdwCAOf0u8F3azHley1j XEQ7uKptPEsBF7uJwmFkAeVylwHe6fvm3zWUBxhAw5n5qfO2BPd6Wgz7krQBksYS7g0F wBjQhKY3VXznUD+7wVYOaUuQGckRjkySvrHMAqlGm0B1uM1lSPlRYvG3oNZNkeBSWWo/ BGQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=BIdz+Kfhb3Mbi+adj+F0X2Vrzxptw0enhnglhrMvwYk=; b=G4yeg7V92hkvl91E/S84RRstlSD+fuValF4Yyy7oNGTBdM5Kji1Od2+I9aS+tpN3pM kTLwCDimJrrEtLdE0SBvuy0OEzVSoqNKP0QGJz21u0P0W9YWfo497qEvmPp7LHUZQ7PQ CZBAvt+pTvGbeX5M2nGHjl27e4AbshZNx1BNISoFktMeo6Fm3ktqW6sR6GTpjR+Gd0Vt VYMsVwtNlJIpPLDNnkZqqhvtpIaFTN2BejRRH0rI4GDzkkUrfH2mV/iHfv5Nk+Myh6qc jNFUTvuT5VvMPSVXIBWh350wf8UHx/pXh4iZ2n96qXJP6mrICfckS5MJ8upCPyYZ/+uU 1yCQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DeSAX4KN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i13-20020a63e90d000000b00480a937f894si43422719pgh.766.2023.01.20.10.13.24; Fri, 20 Jan 2023 10:13:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DeSAX4KN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230247AbjATR6r (ORCPT + 99 others); Fri, 20 Jan 2023 12:58:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229711AbjATR61 (ORCPT ); Fri, 20 Jan 2023 12:58:27 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 65164521F6 for ; Fri, 20 Jan 2023 09:56:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1674237381; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BIdz+Kfhb3Mbi+adj+F0X2Vrzxptw0enhnglhrMvwYk=; b=DeSAX4KNZ3ajIPpPDXj7N2pCuWauK7KKHjUE0ejEkqD4ogL+dxCEm7qbUJLe05TFFK2LfI FyLtDOKOwUnkwUZ4+brt83kdTFeajpSRDQy23L6UMxpt4DJdMaOAHpydFAJoQLT1Oa2JGn rZFOPLAdavfBl/FuAPCyV2UrDuavEN8= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-552-eICs1qe3OOy62k6d_alEGg-1; Fri, 20 Jan 2023 12:56:18 -0500 X-MC-Unique: eICs1qe3OOy62k6d_alEGg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 52365811E9C; Fri, 20 Jan 2023 17:56:17 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id E11401759E; Fri, 20 Jan 2023 17:56:15 +0000 (UTC) From: David Howells To: Al Viro , Christoph Hellwig Cc: David Howells , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Christoph Hellwig Subject: [PATCH v7 7/8] block: Fix bio_flagged() so that gcc can better optimise it Date: Fri, 20 Jan 2023 17:55:55 +0000 Message-Id: <20230120175556.3556978-8-dhowells@redhat.com> In-Reply-To: <20230120175556.3556978-1-dhowells@redhat.com> References: <20230120175556.3556978-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755566222024420561?= X-GMAIL-MSGID: =?utf-8?q?1755566222024420561?= Fix bio_flagged() so that multiple instances of it, such as: if (bio_flagged(bio, BIO_PAGE_REFFED) || bio_flagged(bio, BIO_PAGE_PINNED)) can be combined by the gcc optimiser into a single test in assembly (arguably, this is a compiler optimisation issue[1]). The missed optimisation stems from bio_flagged() comparing the result of the bitwise-AND to zero. This results in an out-of-line bio_release_page() being compiled to something like: <+0>: mov 0x14(%rdi),%eax <+3>: test $0x1,%al <+5>: jne 0xffffffff816dac53 <+7>: test $0x2,%al <+9>: je 0xffffffff816dac5c <+11>: movzbl %sil,%esi <+15>: jmp 0xffffffff816daba1 <__bio_release_pages> <+20>: jmp 0xffffffff81d0b800 <__x86_return_thunk> However, the test is superfluous as the return type is bool. Removing it results in: <+0>: testb $0x3,0x14(%rdi) <+4>: je 0xffffffff816e4af4 <+6>: movzbl %sil,%esi <+10>: jmp 0xffffffff816dab7c <__bio_release_pages> <+15>: jmp 0xffffffff81d0b7c0 <__x86_return_thunk> instead. Also, the MOVZBL instruction looks unnecessary[2] - I think it's just 're-booling' the mark_dirty parameter. Fixes: b7c44ed9d2fc ("block: manipulate bio->bi_flags through helpers") Signed-off-by: David Howells Reviewed-by: Christoph Hellwig cc: Jens Axboe cc: linux-block@vger.kernel.org Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108370 [1] Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108371 [2] Link: https://lore.kernel.org/r/167391056756.2311931.356007731815807265.stgit@warthog.procyon.org.uk/ # v6 --- include/linux/bio.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/bio.h b/include/linux/bio.h index 1c6f051f6ff2..2e6109b0fca8 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -227,7 +227,7 @@ static inline void bio_cnt_set(struct bio *bio, unsigned int count) static inline bool bio_flagged(struct bio *bio, unsigned int bit) { - return (bio->bi_flags & (1U << bit)) != 0; + return bio->bi_flags & (1U << bit); } static inline void bio_set_flag(struct bio *bio, unsigned int bit) From patchwork Fri Jan 20 17:55:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 46556 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp350878wrn; Fri, 20 Jan 2023 10:12:31 -0800 (PST) X-Google-Smtp-Source: AMrXdXuUDninXpG8MdkP6JJoye4EacUcNOR6PzHBEaut1POXzkQYAbLbHa4Hrwj4Qt9QaDazmsZH X-Received: by 2002:a17:907:8999:b0:877:83ea:2bfc with SMTP id rr25-20020a170907899900b0087783ea2bfcmr6442451ejc.39.1674238351789; Fri, 20 Jan 2023 10:12:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674238351; cv=none; d=google.com; s=arc-20160816; b=KXyl3CrdY/FduRprfOcWcmYsCJldc3c2i4DQQ89pX1w0xxvkjGhVLmqs5z5vMoeLmK KkGlF8AVt1Uuth6TBJ0B3FywhuCXXWa9MpD55koto+wyzKSW0fU8ejLgBKoWlc1EKHzs kJrxvxhjGxAcGxWUiTNauYvsjX8YTapLEwCZPWlr0vVccgKOr9TzV2QbPq2maM/0MaXH VRG1hVCFqBLaFiZ0MdD3RkLHB8WqVnMO03E1eC5O0Bke9pZoOOCcKJX8WctOfBPdodEc 5gUH2sOAXBgZlUzlOowN48sUqXkxfJ6yoO1QP51qzbKt38CkOWAXcG0fhelFeV+Xr2Rt VxCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=MdinPY6vQ8fVw027uUZU584MsmM5YD/+rGjhklLF+d8=; b=daze6Ou3cR9RrHke8iKAFGq+hbN/61PynI6PnhbS+Zk0gArT+T+0hyQKAOIs8H3h5Q 9dTx1V3LIRc9lcV+DOfQ+PlcmHBnnNEpZebrkhhY/mKVhcK4ujbl3sUXvv5QG7kBQOO6 Y85coFpx6rYRypjtWfGbMApUnMycqRr9BeS4NWNYQcB45EvjvKM+zfsaA3S4CMnejXtx X8vPL0T4CXKdxyudLYEYkkeQxpFGj+WiXrfCr0LR8CBRzsdY5lrNTtVcIvZfkVAN6guR 9vD/gmn7vTZnEaWf0/6Y3ShUUAL64DH9K/A1heJR7vlmqhB+H0ETJOs8vgpKIChItU7l C8yA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="cFO/QjjT"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n6-20020a056402514600b00487205f3103si41293782edd.324.2023.01.20.10.12.08; Fri, 20 Jan 2023 10:12:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="cFO/QjjT"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231162AbjATR6n (ORCPT + 99 others); Fri, 20 Jan 2023 12:58:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46912 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230437AbjATR6a (ORCPT ); Fri, 20 Jan 2023 12:58:30 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 66EEF59250 for ; Fri, 20 Jan 2023 09:56:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1674237387; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MdinPY6vQ8fVw027uUZU584MsmM5YD/+rGjhklLF+d8=; b=cFO/QjjTbCLg0xDh6iXEiLW4oUTkbM58OgkI2fVY9QDq9K1Zjcm+L5Ds+99sNPnQI9Wjvq 7DvKgvfHnnRE9Y7SA5UPvTmAxtzNPyelmxEQqo/O7I3HbUL0aDDuI7wQSj1fr+9WwxYHZE sGW+Bc6mu1NASZ/5NL8XLtM9hF2YPFE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-442-XWXguqUFNAG-6TTKujd0rA-1; Fri, 20 Jan 2023 12:56:20 -0500 X-MC-Unique: XWXguqUFNAG-6TTKujd0rA-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6D11A101A521; Fri, 20 Jan 2023 17:56:19 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id E593A492C3C; Fri, 20 Jan 2023 17:56:17 +0000 (UTC) From: David Howells To: Al Viro , Christoph Hellwig Cc: David Howells , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Christoph Hellwig , linux-mm@kvack.org Subject: [PATCH v7 8/8] mm: Renumber FOLL_GET and FOLL_PIN down Date: Fri, 20 Jan 2023 17:55:56 +0000 Message-Id: <20230120175556.3556978-9-dhowells@redhat.com> In-Reply-To: <20230120175556.3556978-1-dhowells@redhat.com> References: <20230120175556.3556978-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755566153905229046?= X-GMAIL-MSGID: =?utf-8?q?1755566153905229046?= Renumber FOLL_GET and FOLL_PIN down to bit 0 and 1 respectively so that they are coincidentally the same as BIO_PAGE_REFFED and BIO_PAGE_PINNED and also so that they can be stored in the bottom two bits of a page pointer (something I'm looking at for zerocopy socket fragments). Signed-off-by: David Howells cc: Al Viro cc: Christoph Hellwig cc: Matthew Wilcox cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Reviewed-by: Matthew Wilcox (Oracle) --- include/linux/mm.h | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index f1cf8f4eb946..33c9eacd9548 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3074,12 +3074,13 @@ static inline vm_fault_t vmf_error(int err) struct page *follow_page(struct vm_area_struct *vma, unsigned long address, unsigned int foll_flags); -#define FOLL_WRITE 0x01 /* check pte is writable */ -#define FOLL_TOUCH 0x02 /* mark page accessed */ -#define FOLL_GET 0x04 /* do get_page on page */ -#define FOLL_DUMP 0x08 /* give error on hole if it would be zero */ -#define FOLL_FORCE 0x10 /* get_user_pages read/write w/o permission */ -#define FOLL_NOWAIT 0x20 /* if a disk transfer is needed, start the IO +#define FOLL_GET 0x01 /* do get_page on page (equivalent to BIO_FOLL_GET) */ +#define FOLL_PIN 0x02 /* pages must be released via unpin_user_page */ +#define FOLL_WRITE 0x04 /* check pte is writable */ +#define FOLL_TOUCH 0x08 /* mark page accessed */ +#define FOLL_DUMP 0x10 /* give error on hole if it would be zero */ +#define FOLL_FORCE 0x20 /* get_user_pages read/write w/o permission */ +#define FOLL_NOWAIT 0x40 /* if a disk transfer is needed, start the IO * and return without waiting upon it */ #define FOLL_NOFAULT 0x80 /* do not fault in pages */ #define FOLL_HWPOISON 0x100 /* check page is hwpoisoned */ @@ -3088,7 +3089,6 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, #define FOLL_ANON 0x8000 /* don't do file mappings */ #define FOLL_LONGTERM 0x10000 /* mapping lifetime is indefinite: see below */ #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ -#define FOLL_PIN 0x40000 /* pages must be released via unpin_user_page */ #define FOLL_FAST_ONLY 0x80000 /* gup_fast: prevent fall-back to slow gup */ #define FOLL_PCI_P2PDMA 0x100000 /* allow returning PCI P2PDMA pages */ #define FOLL_INTERRUPTIBLE 0x200000 /* allow interrupts from generic signals */