From patchwork Thu Feb 9 10:29:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 54844 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp255686wrn; Thu, 9 Feb 2023 02:33:47 -0800 (PST) X-Google-Smtp-Source: AK7set99cySz/Hoe7GFXjMaFZhxOTmdXKkri+lyDZd1ELzwqWYd2XkZdljvTWYXTtLNovoj6lBvQ X-Received: by 2002:a17:90b:1e03:b0:231:70e:964e with SMTP id pg3-20020a17090b1e0300b00231070e964emr7746794pjb.35.1675938827196; Thu, 09 Feb 2023 02:33:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675938827; cv=none; d=google.com; s=arc-20160816; b=C9VKbk84rDEeZrKIzBv1o0Y8G3EeEQL/sx/oV4O1TnZo+GnHccEDs6k47jlO0t4mgg VJYoV4LDEiaPCZyq0KOgghTy2EUdL4pjUZWrh+RVPIsj3bMuDUNQ/IrFwpVRHODFxuqy nvOJwPnMNNESwBk/N8/Z7gmOooVASTCQdkFkXh7ryfJo2jGSYMGQ5itNFD2KDNXUjAU2 zHQCor+s1C9y5L1eKl0ZIjDTeo9BaGx0PnozrWSYOV83GJmcAKGLIUD882BjWOjTIZU0 K0NpV9WzQCRiu4/qLvVgBNrQvRCPueqhdKDd9XMgcqC/Y3ysBb+K50gcLim4ytaEZ9K+ zFMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=zsYNekVz+lznQWi+DyiWbMmMgsK+xoFDa6MvvOVfs9k=; b=q4qaS31xW/tWjsDtHfH4RPx1p/y9sDPmV3AmpGYTheAp83WNBGYIPETuli0nld+HMb NxvzOFLZwheWHkF763aqcw8V9g3/RzmEJhz+JXZzOTQyXuAPAgHWJ0KeSbbDikrEDNh8 U2lhmw8D/sqZatgSRxFQ/cWTANJOV2ZaniS1BcaV8Uu6BXK3YY2dkP3hPjiXXiDhnbYW sAOkP+3SYZ8X1J3lzr0Kdil7r6UzIybbOeS99Ded/OSKAQTWuIvqpIAepqy6NQW2SWT5 92Uaqv+kHdiWNRlf0nMI+pNZM31xcglAszMUalZYbCUmoktBBv0tta2mgtfGa9tXUJNX eX2A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=LfB3wO7S; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id fr11-20020a17090ae2cb00b00231141b9011si1450474pjb.97.2023.02.09.02.33.34; Thu, 09 Feb 2023 02:33:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=LfB3wO7S; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229658AbjBIKbs (ORCPT + 99 others); Thu, 9 Feb 2023 05:31:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50700 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229794AbjBIKaz (ORCPT ); Thu, 9 Feb 2023 05:30:55 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 59107DBED for ; Thu, 9 Feb 2023 02:30:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675938608; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zsYNekVz+lznQWi+DyiWbMmMgsK+xoFDa6MvvOVfs9k=; b=LfB3wO7Sikhc3XLw7SPcLcB7j+k08yNao8foid9fgCeHQjX8VLEjEpg8Rtvs5SaXlkPBOM 3foSfHxLQ7sOc0ANzRDbstBhyVDt9XhrUm2j+w7sFzTsP6DyWazef3sIbQhu3BANt0Ow8G I4gh6XXNRZU4SgJy4hoq6EZIDb9KXNY= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-324-svCu24HANLy3Imi4GjyZKQ-1; Thu, 09 Feb 2023 05:30:02 -0500 X-MC-Unique: svCu24HANLy3Imi4GjyZKQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8BAD52806041; Thu, 9 Feb 2023 10:30:01 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 849DB403D0C5; Thu, 9 Feb 2023 10:29:59 +0000 (UTC) From: David Howells To: Jens Axboe , Al Viro , Christoph Hellwig Cc: David Howells , Matthew Wilcox , Jan Kara , Jeff Layton , David Hildenbrand , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, syzbot+a440341a59e3b7142895@syzkaller.appspotmail.com, Christoph Hellwig , John Hubbard Subject: [PATCH v13 01/12] splice: Fix O_DIRECT file read splice to avoid reversion of ITER_PIPE Date: Thu, 9 Feb 2023 10:29:43 +0000 Message-Id: <20230209102954.528942-2-dhowells@redhat.com> In-Reply-To: <20230209102954.528942-1-dhowells@redhat.com> References: <20230209102954.528942-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757349231446142175?= X-GMAIL-MSGID: =?utf-8?q?1757349231446142175?= With the upcoming iov_iter_extract_pages() function, pages extracted from a non-user-backed iterator such as ITER_PIPE aren't pinned. __iomap_dio_rw(), however, calls iov_iter_revert() to shorten the iterator to just the bufferage it is going to use - which has the side-effect of freeing the excess pipe buffers, even though they're attached to a bio and may get written to by DMA (thanks to Hillf Danton for spotting this[1]). This then causes memory corruption that is particularly noticable when the syzbot test[2] is run. The test boils down to: out = creat(argv[1], 0666); ftruncate(out, 0x800); lseek(out, 0x200, SEEK_SET); in = open(argv[1], O_RDONLY | O_DIRECT | O_NOFOLLOW); sendfile(out, in, NULL, 0x1dd00); run repeatedly in parallel. What I think is happening is that ftruncate() occasionally shortens the DIO read that's about to be made by sendfile's splice core by reducing i_size. Fix this by splitting the handling of a splice from an O_DIRECT file fd off from that of non-DIO and in this case, replacing the use of an ITER_PIPE iterator with an ITER_BVEC iterator for which reversion won't free the buffers. The DIO-specific code bulk allocates all the buffers it thinks it is going to use in advance, does the read synchronously and only then trims the buffer down. The pages we did use get pushed into the pipe. This should be more efficient for DIO read by virtue of doing a bulk page allocation, but slightly less efficient by ignoring any partial page in the pipe. Fixes: 920756a3306a ("block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages") Reported-by: syzbot+a440341a59e3b7142895@syzkaller.appspotmail.com Signed-off-by: David Howells cc: Jens Axboe cc: Christoph Hellwig cc: Al Viro cc: David Hildenbrand cc: John Hubbard cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20230207094731.1390-1-hdanton@sina.com/ [1] Link: https://lore.kernel.org/r/000000000000b0b3c005f3a09383@google.com/ [2] --- Notes: ver #13) - Don't completely replace generic_file_splice_read(), but rather only use this if we're doing a splicing from an O_DIRECT file fd. fs/splice.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 96 insertions(+) diff --git a/fs/splice.c b/fs/splice.c index 5969b7a1d353..b4be6fc314a1 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -282,6 +282,99 @@ void splice_shrink_spd(struct splice_pipe_desc *spd) kfree(spd->partial); } +/* + * Splice data from an O_DIRECT file into pages and then add them to the output + * pipe. + */ +static ssize_t generic_file_direct_splice_read(struct file *in, loff_t *ppos, + struct pipe_inode_info *pipe, + size_t len, unsigned int flags) +{ + LIST_HEAD(pages); + struct iov_iter to; + struct bio_vec *bv; + struct kiocb kiocb; + struct page *page; + unsigned int head; + ssize_t ret; + size_t used, npages, chunk, remain, reclaim; + int i; + + /* Work out how much data we can actually add into the pipe */ + used = pipe_occupancy(pipe->head, pipe->tail); + npages = max_t(ssize_t, pipe->max_usage - used, 0); + len = min_t(size_t, len, npages * PAGE_SIZE); + npages = DIV_ROUND_UP(len, PAGE_SIZE); + + bv = kmalloc(array_size(npages, sizeof(bv[0])), GFP_KERNEL); + if (!bv) + return -ENOMEM; + + npages = alloc_pages_bulk_list(GFP_USER, npages, &pages); + if (!npages) { + kfree(bv); + return -ENOMEM; + } + + remain = len = min_t(size_t, len, npages * PAGE_SIZE); + + for (i = 0; i < npages; i++) { + chunk = min_t(size_t, PAGE_SIZE, remain); + page = list_first_entry(&pages, struct page, lru); + list_del_init(&page->lru); + bv[i].bv_page = page; + bv[i].bv_offset = 0; + bv[i].bv_len = chunk; + remain -= chunk; + } + + /* Do the I/O */ + iov_iter_bvec(&to, ITER_DEST, bv, npages, len); + init_sync_kiocb(&kiocb, in); + kiocb.ki_pos = *ppos; + ret = call_read_iter(in, &kiocb, &to); + + reclaim = npages * PAGE_SIZE; + remain = 0; + if (ret > 0) { + reclaim -= ret; + remain = ret; + *ppos = kiocb.ki_pos; + file_accessed(in); + } else if (ret < 0) { + /* + * callers of ->splice_read() expect -EAGAIN on + * "can't put anything in there", rather than -EFAULT. + */ + if (ret == -EFAULT) + ret = -EAGAIN; + } + + /* Free any pages that didn't get touched at all. */ + for (; reclaim >= PAGE_SIZE; reclaim -= PAGE_SIZE) + __free_page(bv[--npages].bv_page); + + /* Push the remaining pages into the pipe. */ + head = pipe->head; + for (i = 0; i < npages; i++) { + struct pipe_buffer *buf = &pipe->bufs[head & (pipe->ring_size - 1)]; + + chunk = min_t(size_t, remain, PAGE_SIZE); + *buf = (struct pipe_buffer) { + .ops = &default_pipe_buf_ops, + .page = bv[i].bv_page, + .offset = 0, + .len = chunk, + }; + head++; + remain -= chunk; + } + pipe->head = head; + + kfree(bv); + return ret; +} + /** * generic_file_splice_read - splice data from file to a pipe * @in: file to splice from @@ -303,6 +396,9 @@ ssize_t generic_file_splice_read(struct file *in, loff_t *ppos, struct kiocb kiocb; int ret; + if (in->f_flags & O_DIRECT) + return generic_file_direct_splice_read(in, ppos, pipe, len, flags); + iov_iter_pipe(&to, ITER_DEST, pipe, len); init_sync_kiocb(&kiocb, in); kiocb.ki_pos = *ppos; From patchwork Thu Feb 9 10:29:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 54847 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp255817wrn; Thu, 9 Feb 2023 02:34:08 -0800 (PST) X-Google-Smtp-Source: AK7set8NV6Ka2zqiOz49kRFUqHHFdHd++Kx6wUZYlIuLgI0s3mEizgQMpk33wVhOIILgv6uvm3QN X-Received: by 2002:a17:903:32cc:b0:19a:5c52:712 with SMTP id i12-20020a17090332cc00b0019a5c520712mr1538818plr.55.1675938848313; Thu, 09 Feb 2023 02:34:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675938848; cv=none; d=google.com; s=arc-20160816; b=xd9c7jfUmfAKn0axnRqeDKasDuibQoCqJvzGgZb+laBA+S0TEcSwyd30D7fODVq95O sKKkC+7ef/vl5yn5hFsjZilSIpbPojRcQfAxp3IqO0KL30yTSEF6Mrt9zw4Cop8FwS6x hFE+kXdP40h3JVybMVLdLbxuuARJoKotnEdK9w5oDAfK09a1F0kwdkvTJ/qInxMLbGKb qalFbw+TpfVkqgZOgF9pEXYCckTm0zc7TcYnhQSntFnpOc6fjELSqfTiLH8PlJ1XZSs9 p4tBPifuxnBwohZnVurdWbx2VIoHrc2+0IeIPMVM3G0kKO5MSVYcXb6Wleqbkbrw7diU lRvw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=PwfaTp9eGrV7iL5hQ8gz8NVeVdzWatLunSphj9wOeU8=; b=yxUnWW2tEmfyev2BKEqMGELGakjzmuZu/vlB6uS8HQlfKR6bILK/f1gwvficUc5ipb cVqi28OGHJ0g6svX43iXs3WjoAcjX1DnINtp4plMGNscxOu1sAuLQqXEgEYttkJc+jjF pOWaayvvrEWc1Rd9HBbe76THRL4bPrOiJ/5IT6lcIKnEaJONR/kapiHh4DFi7zLMQpK0 u6lb+/lOaiXFNwUb10l7KdX7FPuUJBMcWEw9GwpXADch2DamcgJbV/Xh21jMrO3Zwxzw axDAd8OsVYfGr4d05Ct/vEy/xjmvdbYksp0CN3b0gVyc+BvnhCfdHwpk65mcmsbbnB4L d8CA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=TI+11T13; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h6-20020a170902f54600b00198f45c854bsi1472680plf.561.2023.02.09.02.33.55; Thu, 09 Feb 2023 02:34:08 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=TI+11T13; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229539AbjBIKcX (ORCPT + 99 others); Thu, 9 Feb 2023 05:32:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50930 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230002AbjBIKbg (ORCPT ); Thu, 9 Feb 2023 05:31:36 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8869F19F24 for ; Thu, 9 Feb 2023 02:30:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675938623; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PwfaTp9eGrV7iL5hQ8gz8NVeVdzWatLunSphj9wOeU8=; b=TI+11T13Udj6hWFtIDMuwypib0toO226oCxLnpKPA3CEfFOMbmF1K27GzD0SnaHHN18HZl +3AhDcavAaLh1FH+fZHx+0jalIqW3NMrxWk5cHZXnS0NpRaUl88TJcML34N7Cz3G7cu+iH skx/5VcEyD4TyUfdLKJOLUY1Y+ctwn4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-159-9nBR7GipO8SN11GsW6sDPQ-1; Thu, 09 Feb 2023 05:30:20 -0500 X-MC-Unique: 9nBR7GipO8SN11GsW6sDPQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3C19E1871D9B; Thu, 9 Feb 2023 10:30:04 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4D5E640C83B6; Thu, 9 Feb 2023 10:30:02 +0000 (UTC) From: David Howells To: Jens Axboe , Al Viro , Christoph Hellwig Cc: David Howells , Matthew Wilcox , Jan Kara , Jeff Layton , David Hildenbrand , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Christoph Hellwig , John Hubbard Subject: [PATCH v13 02/12] mm: Pass info, not iter, into filemap_get_pages() and unstatic it Date: Thu, 9 Feb 2023 10:29:44 +0000 Message-Id: <20230209102954.528942-3-dhowells@redhat.com> In-Reply-To: <20230209102954.528942-1-dhowells@redhat.com> References: <20230209102954.528942-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757349253881988696?= X-GMAIL-MSGID: =?utf-8?q?1757349253881988696?= filemap_get_pages() and a number of functions that it calls take an iterator to provide two things: the number of bytes to be got from the file specified and whether partially uptodate pages are allowed. Change these functions so that this information is passed in directly. This allows it to be called without having an iterator to hand. Also make filemap_get_pages() available so that it can be used by a later patch to fix splicing from a buffered file. Signed-off-by: David Howells cc: Jens Axboe cc: Christoph Hellwig cc: Matthew Wilcox cc: Al Viro cc: David Hildenbrand cc: John Hubbard cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig --- include/linux/pagemap.h | 2 ++ mm/filemap.c | 31 ++++++++++++++++++------------- 2 files changed, 20 insertions(+), 13 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 29e1f9e76eb6..3a7bdb35acff 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -748,6 +748,8 @@ struct page *read_cache_page(struct address_space *, pgoff_t index, filler_t *filler, struct file *file); extern struct page * read_cache_page_gfp(struct address_space *mapping, pgoff_t index, gfp_t gfp_mask); +int filemap_get_pages(struct kiocb *iocb, size_t count, + struct folio_batch *fbatch, bool need_uptodate); static inline struct page *read_mapping_page(struct address_space *mapping, pgoff_t index, struct file *file) diff --git a/mm/filemap.c b/mm/filemap.c index c4d4ace9cc70..b31168a9bafd 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2440,21 +2440,19 @@ static int filemap_read_folio(struct file *file, filler_t filler, } static bool filemap_range_uptodate(struct address_space *mapping, - loff_t pos, struct iov_iter *iter, struct folio *folio) + loff_t pos, size_t count, struct folio *folio, + bool need_uptodate) { - int count; - if (folio_test_uptodate(folio)) return true; /* pipes can't handle partially uptodate pages */ - if (iov_iter_is_pipe(iter)) + if (need_uptodate) return false; if (!mapping->a_ops->is_partially_uptodate) return false; if (mapping->host->i_blkbits >= folio_shift(folio)) return false; - count = iter->count; if (folio_pos(folio) > pos) { count -= folio_pos(folio) - pos; pos = 0; @@ -2466,8 +2464,8 @@ static bool filemap_range_uptodate(struct address_space *mapping, } static int filemap_update_page(struct kiocb *iocb, - struct address_space *mapping, struct iov_iter *iter, - struct folio *folio) + struct address_space *mapping, size_t count, + struct folio *folio, bool need_uptodate) { int error; @@ -2501,7 +2499,8 @@ static int filemap_update_page(struct kiocb *iocb, goto unlock; error = 0; - if (filemap_range_uptodate(mapping, iocb->ki_pos, iter, folio)) + if (filemap_range_uptodate(mapping, iocb->ki_pos, count, folio, + need_uptodate)) goto unlock; error = -EAGAIN; @@ -2577,8 +2576,12 @@ static int filemap_readahead(struct kiocb *iocb, struct file *file, return 0; } -static int filemap_get_pages(struct kiocb *iocb, struct iov_iter *iter, - struct folio_batch *fbatch) +/* + * Extract some folios from the pagecache of a file, reading those pages from + * the backing store if necessary and waiting for them. + */ +int filemap_get_pages(struct kiocb *iocb, size_t count, + struct folio_batch *fbatch, bool need_uptodate) { struct file *filp = iocb->ki_filp; struct address_space *mapping = filp->f_mapping; @@ -2588,7 +2591,7 @@ static int filemap_get_pages(struct kiocb *iocb, struct iov_iter *iter, struct folio *folio; int err = 0; - last_index = DIV_ROUND_UP(iocb->ki_pos + iter->count, PAGE_SIZE); + last_index = DIV_ROUND_UP(iocb->ki_pos + count, PAGE_SIZE); retry: if (fatal_signal_pending(current)) return -EINTR; @@ -2621,7 +2624,8 @@ static int filemap_get_pages(struct kiocb *iocb, struct iov_iter *iter, if ((iocb->ki_flags & IOCB_WAITQ) && folio_batch_count(fbatch) > 1) iocb->ki_flags |= IOCB_NOWAIT; - err = filemap_update_page(iocb, mapping, iter, folio); + err = filemap_update_page(iocb, mapping, count, folio, + need_uptodate); if (err) goto err; } @@ -2691,7 +2695,8 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter, if (unlikely(iocb->ki_pos >= i_size_read(inode))) break; - error = filemap_get_pages(iocb, iter, &fbatch); + error = filemap_get_pages(iocb, iter->count, &fbatch, + iov_iter_is_pipe(iter)); if (error < 0) break; From patchwork Thu Feb 9 10:29:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 54849 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp256024wrn; Thu, 9 Feb 2023 02:34:48 -0800 (PST) X-Google-Smtp-Source: AK7set+4sunIgI5nGsoDgtbaYt9ncTR/Ekwl90/84+6OVq+ZBP448KAjzyPv2KZZVsYbBEwrch9L X-Received: by 2002:a17:902:ea0e:b0:199:4362:93df with SMTP id s14-20020a170902ea0e00b00199436293dfmr7429314plg.12.1675938888539; Thu, 09 Feb 2023 02:34:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675938888; cv=none; d=google.com; s=arc-20160816; b=F5AD41Dj9chXisS8QzUaO3Ag0V73o3Jj6o7H+X8GM2w6LS1H3Y3kGEcu1mirr5k4oG s8zCuB8jtI3JVOHcZ23K/wQnIKafn3zrIKxKjhbHmGo/0Fi+ls2Vc59tvhpcvmlvvhxm dvVRtaPGuSSZvpz00pTpZEe/FWlf1jtqezuiwcys0F/D49aRZULoiEZa5/J8NeF3dqLy qbQgs8SjVt8uQNZ2fLbaPUCqN1FidzeBvkLs4ssVLih5m21B6FBwr7pMvNsb2+GOeNNi E8uOddkb2/nD1cLMdQCKK1iJ+TsROoWNC8VvLdTmlmP7Ab/+8cq8Vklsf1JXfkgC/nsx JL/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=gZh4MABlkpe6ltIe1JtsGk3eMjQ687XGJ1n+AAo3k5s=; b=uhNi0q74haTzM+AB7T3QWibJq5a9UjNp6URpkjQJwjOc0hNiK0u67erh8tanMSx2dq 1FNXaFJNHKbGGMkjA7TLms0bPF9uS2lkrG0PZTdROp59G3OZyX7jeJCQnYdV5JC7HbPz yVT0amCYM/2YafOzXKrUhppd57CsKqET0Lr4PCA4ZSxjpUEry6XCPd16XnexsI1l/jTL N6SFSI5UC1404Ax9BM/lbdeyuOLXZrofqnIp3KO5tbjMo5JNgf+XpwJNBIcnL1YOdhtt 1QV820TZWqfMx+HXC2C/F54ywUHvVs2u70O/cBmzm9FlN+ktRTgX/cCMZbsZJIGYyLdl iZ/w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=P9mEUDzS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e4-20020a17090301c400b001992ede12a5si1734670plh.27.2023.02.09.02.34.34; Thu, 09 Feb 2023 02:34:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=P9mEUDzS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229504AbjBIKcH (ORCPT + 99 others); Thu, 9 Feb 2023 05:32:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50960 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229958AbjBIKbg (ORCPT ); Thu, 9 Feb 2023 05:31:36 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7478C1ABF6 for ; Thu, 9 Feb 2023 02:30:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675938624; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gZh4MABlkpe6ltIe1JtsGk3eMjQ687XGJ1n+AAo3k5s=; b=P9mEUDzSQ3G+NoMLpTNK00H0ZK0MUwQVnK+gUFiNNuiX+T5pKH0ybuPVoBA4WentwDTCHf t2Hn1tmRAhc+zImwSI4yYIoFJ0F5Ke4mC7IGlj6uF114aZC06eGKgMK/aJv+hleU6/NpTM LIrX/fxNX9IAqDHQOGZVveoKPJkOcOM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-528-mHs8wxQ_M9q60lEOpooSWQ-1; Thu, 09 Feb 2023 05:30:19 -0500 X-MC-Unique: mHs8wxQ_M9q60lEOpooSWQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id BFC1D100F83C; Thu, 9 Feb 2023 10:30:06 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id CC3E7C16022; Thu, 9 Feb 2023 10:30:04 +0000 (UTC) From: David Howells To: Jens Axboe , Al Viro , Christoph Hellwig Cc: David Howells , Matthew Wilcox , Jan Kara , Jeff Layton , David Hildenbrand , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Christoph Hellwig , John Hubbard Subject: [PATCH v13 03/12] splice: Do splice read from a buffered file without using ITER_PIPE Date: Thu, 9 Feb 2023 10:29:45 +0000 Message-Id: <20230209102954.528942-4-dhowells@redhat.com> In-Reply-To: <20230209102954.528942-1-dhowells@redhat.com> References: <20230209102954.528942-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757349296160678022?= X-GMAIL-MSGID: =?utf-8?q?1757349296160678022?= Provide a function to do splice read from a buffered file, pulling the folios out of the pagecache directly by calling filemap_get_pages() to do any required reading and then pasting the returned folios into the pipe. A helper function is provided to do the actual folio pasting and will handle multipage folios by splicing as many of the relevant subpages as will fit into the pipe. The ITER_BVEC-based splicing previously added is then only used for splicing from O_DIRECT files. The code is loosely based on filemap_read() and might belong in mm/filemap.c with that as it needs to use filemap_get_pages(). With this, ITER_PIPE is no longer used. Signed-off-by: David Howells cc: Jens Axboe cc: Christoph Hellwig cc: Al Viro cc: David Hildenbrand cc: John Hubbard cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org --- fs/splice.c | 159 ++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 135 insertions(+), 24 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index b4be6fc314a1..963cbf20abc8 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -375,6 +376,135 @@ static ssize_t generic_file_direct_splice_read(struct file *in, loff_t *ppos, return ret; } +/* + * Splice subpages from a folio into a pipe. + */ +static size_t splice_folio_into_pipe(struct pipe_inode_info *pipe, + struct folio *folio, + loff_t fpos, size_t size) +{ + struct page *page; + size_t spliced = 0, offset = offset_in_folio(folio, fpos); + + page = folio_page(folio, offset / PAGE_SIZE); + size = min(size, folio_size(folio) - offset); + offset %= PAGE_SIZE; + + while (spliced < size && + !pipe_full(pipe->head, pipe->tail, pipe->max_usage)) { + struct pipe_buffer *buf = &pipe->bufs[pipe->head & (pipe->ring_size - 1)]; + size_t part = min_t(size_t, PAGE_SIZE - offset, size - spliced); + + *buf = (struct pipe_buffer) { + .ops = &page_cache_pipe_buf_ops, + .page = page, + .offset = offset, + .len = part, + }; + folio_get(folio); + pipe->head++; + page++; + spliced += part; + offset = 0; + } + + return spliced; +} + +/* + * Splice folios from the pagecache of a buffered (ie. non-O_DIRECT) file into + * a pipe. + */ +static ssize_t generic_file_buffered_splice_read(struct file *in, loff_t *ppos, + struct pipe_inode_info *pipe, + size_t len, + unsigned int flags) +{ + struct folio_batch fbatch; + size_t total_spliced = 0, used, npages; + loff_t isize, end_offset; + bool writably_mapped; + int i, error = 0; + + struct kiocb iocb = { + .ki_filp = in, + .ki_pos = *ppos, + }; + + /* Work out how much data we can actually add into the pipe */ + used = pipe_occupancy(pipe->head, pipe->tail); + npages = max_t(ssize_t, pipe->max_usage - used, 0); + len = min_t(size_t, len, npages * PAGE_SIZE); + + folio_batch_init(&fbatch); + + do { + cond_resched(); + + if (*ppos >= i_size_read(file_inode(in))) + break; + + iocb.ki_pos = *ppos; + error = filemap_get_pages(&iocb, len, &fbatch, true); + if (error < 0) + break; + + /* + * i_size must be checked after we know the pages are Uptodate. + * + * Checking i_size after the check allows us to calculate + * the correct value for "nr", which means the zero-filled + * part of the page is not copied back to userspace (unless + * another truncate extends the file - this is desired though). + */ + isize = i_size_read(file_inode(in)); + if (unlikely(*ppos >= isize)) + break; + end_offset = min_t(loff_t, isize, *ppos + len); + + /* + * Once we start copying data, we don't want to be touching any + * cachelines that might be contended: + */ + writably_mapped = mapping_writably_mapped(in->f_mapping); + + for (i = 0; i < folio_batch_count(&fbatch); i++) { + struct folio *folio = fbatch.folios[i]; + size_t n; + + if (folio_pos(folio) >= end_offset) + goto out; + folio_mark_accessed(folio); + + /* + * If users can be writing to this folio using arbitrary + * virtual addresses, take care of potential aliasing + * before reading the folio on the kernel side. + */ + if (writably_mapped) + flush_dcache_folio(folio); + + n = splice_folio_into_pipe(pipe, folio, *ppos, len); + if (!n) + goto out; + len -= n; + total_spliced += n; + *ppos += n; + in->f_ra.prev_pos = *ppos; + if (pipe_full(pipe->head, pipe->tail, pipe->max_usage)) + goto out; + } + + folio_batch_release(&fbatch); + } while (len); + +out: + folio_batch_release(&fbatch); + file_accessed(in); + + return total_spliced ? total_spliced : error; +} + /** * generic_file_splice_read - splice data from file to a pipe * @in: file to splice from @@ -392,32 +522,13 @@ ssize_t generic_file_splice_read(struct file *in, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags) { - struct iov_iter to; - struct kiocb kiocb; - int ret; - + if (unlikely(*ppos >= file_inode(in)->i_sb->s_maxbytes)) + return 0; + if (unlikely(!len)) + return 0; if (in->f_flags & O_DIRECT) return generic_file_direct_splice_read(in, ppos, pipe, len, flags); - - iov_iter_pipe(&to, ITER_DEST, pipe, len); - init_sync_kiocb(&kiocb, in); - kiocb.ki_pos = *ppos; - ret = call_read_iter(in, &kiocb, &to); - if (ret > 0) { - *ppos = kiocb.ki_pos; - file_accessed(in); - } else if (ret < 0) { - /* free what was emitted */ - pipe_discard_from(pipe, to.start_head); - /* - * callers of ->splice_read() expect -EAGAIN on - * "can't put anything in there", rather than -EFAULT. - */ - if (ret == -EFAULT) - ret = -EAGAIN; - } - - return ret; + return generic_file_buffered_splice_read(in, ppos, pipe, len, flags); } EXPORT_SYMBOL(generic_file_splice_read); From patchwork Thu Feb 9 10:29:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 54846 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp255784wrn; Thu, 9 Feb 2023 02:34:03 -0800 (PST) X-Google-Smtp-Source: AK7set+b+DU3ZSAED05X7vIik1EMXMq6EWwAyUw30XTMEqlQdtivpRrEeWaATmrXqdjtQaT+RYo1 X-Received: by 2002:a05:6a20:8413:b0:bc:2622:ffcb with SMTP id c19-20020a056a20841300b000bc2622ffcbmr13602672pzd.61.1675938843254; Thu, 09 Feb 2023 02:34:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675938843; cv=none; d=google.com; s=arc-20160816; b=xZmE7v43lHLnmlzUX5+YY4jRhtH+mJFZTCPpO3hQpccRrcsirTa0kcuOrUOLhVTWCo pdCjz0dIMMJazavmmC2+yICLQGD0wJNrdRH6FbY8GeytvX/6NhfU05RTrU1gstGQvgB9 vSG2n5DuP+g4ezrxy7eojgkO8Mj+YeBbNovOgGtFjuizGEhaN/hCP3fj0jCLHCRauBs7 HYotG7f38pg1UliMAutk25SkJ0yMTw0NC8l/oVHsgtHDRwVUaMx348e2ZUjlGLXjzGfn kJ1OGNRRI6WXMIxSCvpN9hVjacRcCZMa3AJs7Afbn4U2pAXrGvJY4kMv478qiuWXKn3t yMaw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=qwz2q+yLr2zGFFawjjmWjHNoqop7eGuYZKEvUt7Y67M=; b=j8/a8tDi4qFNusfNvWZXNYoqhJNYvMawulhfJYoagrEuIp+bVAkWW7PySv3R/2ixUp tSP9mnPObmWjCMqQrT1D14yIY2r/gG2EN6VQe1IlPgGY/VECv/JKznvBBpq+3azSebA8 mVlR6exJnAVXOolZ1gnXl+rXmKrlU8neibg4Ot8DL+q0Vc0ysLCHMF4ZuUbbQQ8Z3bnZ mQVw6xmhbRAx8tTKCeswNngXDzekpOLSRgw1xsE6i8pUqXzaKVeNusKsN1v5kOJ743b2 qePuAr81N1pDM5+kcUhGbZLYIr7iEHQ70INCurd8vzn+AilIrpX7zj7aQcB3wwk2IHmK QfUw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=UxKgOcn6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y20-20020aa79af4000000b005940003f68csi1425126pfp.356.2023.02.09.02.33.51; Thu, 09 Feb 2023 02:34:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=UxKgOcn6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229476AbjBIKcQ (ORCPT + 99 others); Thu, 9 Feb 2023 05:32:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50962 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229971AbjBIKbg (ORCPT ); Thu, 9 Feb 2023 05:31:36 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F2ECF1A948 for ; Thu, 9 Feb 2023 02:30:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675938624; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qwz2q+yLr2zGFFawjjmWjHNoqop7eGuYZKEvUt7Y67M=; b=UxKgOcn6/MUPpJtamd3v+S5Jp+h599ecZnPDcIb0fOkkgQaAdpskaCbxXrBv0cDCkhAeA4 eJskidwDVkGDivRX0wBAvn1yoqYjBx7YnCBHky3aRfL7d4bFBJTb1iy2qaUFe1Gmh//6Ba 0h+A+fJNFw8eX3Tfu7/5b6fsvKFxik0= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-155-cPu1SGShOLWHoyJH53fjYQ-1; Thu, 09 Feb 2023 05:30:19 -0500 X-MC-Unique: cPu1SGShOLWHoyJH53fjYQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A13A080590E; Thu, 9 Feb 2023 10:30:09 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7DFBE403D0CE; Thu, 9 Feb 2023 10:30:07 +0000 (UTC) From: David Howells To: Jens Axboe , Al Viro , Christoph Hellwig Cc: David Howells , Matthew Wilcox , Jan Kara , Jeff Layton , David Hildenbrand , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Christoph Hellwig , John Hubbard Subject: [PATCH v13 04/12] iov_iter: Kill ITER_PIPE Date: Thu, 9 Feb 2023 10:29:46 +0000 Message-Id: <20230209102954.528942-5-dhowells@redhat.com> In-Reply-To: <20230209102954.528942-1-dhowells@redhat.com> References: <20230209102954.528942-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757349248841087320?= X-GMAIL-MSGID: =?utf-8?q?1757349248841087320?= The ITER_PIPE-type iterator was only used for generic_file_splice_read(), but that has now been switched to either pull pages directly from the pagecache for buffered file splice-reads or to use ITER_BVEC instead for O_DIRECT file splice-reads. This leaves ITER_PIPE unused - so remove it. Signed-off-by: David Howells cc: Jens Axboe cc: Christoph Hellwig cc: Al Viro cc: David Hildenbrand cc: John Hubbard cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig --- fs/cifs/file.c | 8 +- include/linux/uio.h | 14 -- lib/iov_iter.c | 435 +------------------------------------------- mm/filemap.c | 3 +- 4 files changed, 5 insertions(+), 455 deletions(-) diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 22dfc1f8b4f1..57ca4eea69dd 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -3806,13 +3806,7 @@ cifs_readdata_to_iov(struct cifs_readdata *rdata, struct iov_iter *iter) size_t copy = min_t(size_t, remaining, PAGE_SIZE); size_t written; - if (unlikely(iov_iter_is_pipe(iter))) { - void *addr = kmap_atomic(page); - - written = copy_to_iter(addr, copy, iter); - kunmap_atomic(addr); - } else - written = copy_page_to_iter(page, 0, copy, iter); + written = copy_page_to_iter(page, 0, copy, iter); remaining -= written; if (written < copy && iov_iter_count(iter) > 0) break; diff --git a/include/linux/uio.h b/include/linux/uio.h index 9f158238edba..dcc0ca5ef491 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -11,7 +11,6 @@ #include struct page; -struct pipe_inode_info; struct kvec { void *iov_base; /* and that should *never* hold a userland pointer */ @@ -23,7 +22,6 @@ enum iter_type { ITER_IOVEC, ITER_KVEC, ITER_BVEC, - ITER_PIPE, ITER_XARRAY, ITER_DISCARD, ITER_UBUF, @@ -53,15 +51,10 @@ struct iov_iter { const struct kvec *kvec; const struct bio_vec *bvec; struct xarray *xarray; - struct pipe_inode_info *pipe; void __user *ubuf; }; union { unsigned long nr_segs; - struct { - unsigned int head; - unsigned int start_head; - }; loff_t xarray_start; }; }; @@ -99,11 +92,6 @@ static inline bool iov_iter_is_bvec(const struct iov_iter *i) return iov_iter_type(i) == ITER_BVEC; } -static inline bool iov_iter_is_pipe(const struct iov_iter *i) -{ - return iov_iter_type(i) == ITER_PIPE; -} - static inline bool iov_iter_is_discard(const struct iov_iter *i) { return iov_iter_type(i) == ITER_DISCARD; @@ -245,8 +233,6 @@ void iov_iter_kvec(struct iov_iter *i, unsigned int direction, const struct kvec unsigned long nr_segs, size_t count); void iov_iter_bvec(struct iov_iter *i, unsigned int direction, const struct bio_vec *bvec, unsigned long nr_segs, size_t count); -void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct pipe_inode_info *pipe, - size_t count); void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count); void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray *xarray, loff_t start, size_t count); diff --git a/lib/iov_iter.c b/lib/iov_iter.c index f9a3ff37ecd1..adc5e8aa8ae8 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -14,8 +14,6 @@ #include #include -#define PIPE_PARANOIA /* for now */ - /* covers ubuf and kbuf alike */ #define iterate_buf(i, n, base, len, off, __p, STEP) { \ size_t __maybe_unused off = 0; \ @@ -186,156 +184,6 @@ static int copyin(void *to, const void __user *from, size_t n) return res; } -static inline struct pipe_buffer *pipe_buf(const struct pipe_inode_info *pipe, - unsigned int slot) -{ - return &pipe->bufs[slot & (pipe->ring_size - 1)]; -} - -#ifdef PIPE_PARANOIA -static bool sanity(const struct iov_iter *i) -{ - struct pipe_inode_info *pipe = i->pipe; - unsigned int p_head = pipe->head; - unsigned int p_tail = pipe->tail; - unsigned int p_occupancy = pipe_occupancy(p_head, p_tail); - unsigned int i_head = i->head; - unsigned int idx; - - if (i->last_offset) { - struct pipe_buffer *p; - if (unlikely(p_occupancy == 0)) - goto Bad; // pipe must be non-empty - if (unlikely(i_head != p_head - 1)) - goto Bad; // must be at the last buffer... - - p = pipe_buf(pipe, i_head); - if (unlikely(p->offset + p->len != abs(i->last_offset))) - goto Bad; // ... at the end of segment - } else { - if (i_head != p_head) - goto Bad; // must be right after the last buffer - } - return true; -Bad: - printk(KERN_ERR "idx = %d, offset = %d\n", i_head, i->last_offset); - printk(KERN_ERR "head = %d, tail = %d, buffers = %d\n", - p_head, p_tail, pipe->ring_size); - for (idx = 0; idx < pipe->ring_size; idx++) - printk(KERN_ERR "[%p %p %d %d]\n", - pipe->bufs[idx].ops, - pipe->bufs[idx].page, - pipe->bufs[idx].offset, - pipe->bufs[idx].len); - WARN_ON(1); - return false; -} -#else -#define sanity(i) true -#endif - -static struct page *push_anon(struct pipe_inode_info *pipe, unsigned size) -{ - struct page *page = alloc_page(GFP_USER); - if (page) { - struct pipe_buffer *buf = pipe_buf(pipe, pipe->head++); - *buf = (struct pipe_buffer) { - .ops = &default_pipe_buf_ops, - .page = page, - .offset = 0, - .len = size - }; - } - return page; -} - -static void push_page(struct pipe_inode_info *pipe, struct page *page, - unsigned int offset, unsigned int size) -{ - struct pipe_buffer *buf = pipe_buf(pipe, pipe->head++); - *buf = (struct pipe_buffer) { - .ops = &page_cache_pipe_buf_ops, - .page = page, - .offset = offset, - .len = size - }; - get_page(page); -} - -static inline int last_offset(const struct pipe_buffer *buf) -{ - if (buf->ops == &default_pipe_buf_ops) - return buf->len; // buf->offset is 0 for those - else - return -(buf->offset + buf->len); -} - -static struct page *append_pipe(struct iov_iter *i, size_t size, - unsigned int *off) -{ - struct pipe_inode_info *pipe = i->pipe; - int offset = i->last_offset; - struct pipe_buffer *buf; - struct page *page; - - if (offset > 0 && offset < PAGE_SIZE) { - // some space in the last buffer; add to it - buf = pipe_buf(pipe, pipe->head - 1); - size = min_t(size_t, size, PAGE_SIZE - offset); - buf->len += size; - i->last_offset += size; - i->count -= size; - *off = offset; - return buf->page; - } - // OK, we need a new buffer - *off = 0; - size = min_t(size_t, size, PAGE_SIZE); - if (pipe_full(pipe->head, pipe->tail, pipe->max_usage)) - return NULL; - page = push_anon(pipe, size); - if (!page) - return NULL; - i->head = pipe->head - 1; - i->last_offset = size; - i->count -= size; - return page; -} - -static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t bytes, - struct iov_iter *i) -{ - struct pipe_inode_info *pipe = i->pipe; - unsigned int head = pipe->head; - - if (unlikely(bytes > i->count)) - bytes = i->count; - - if (unlikely(!bytes)) - return 0; - - if (!sanity(i)) - return 0; - - if (offset && i->last_offset == -offset) { // could we merge it? - struct pipe_buffer *buf = pipe_buf(pipe, head - 1); - if (buf->page == page) { - buf->len += bytes; - i->last_offset -= bytes; - i->count -= bytes; - return bytes; - } - } - if (pipe_full(pipe->head, pipe->tail, pipe->max_usage)) - return 0; - - push_page(pipe, page, offset, bytes); - i->last_offset = -(offset + bytes); - i->head = head; - i->count -= bytes; - return bytes; -} - /* * fault_in_iov_iter_readable - fault in iov iterator for reading * @i: iterator @@ -439,46 +287,6 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction, } EXPORT_SYMBOL(iov_iter_init); -// returns the offset in partial buffer (if any) -static inline unsigned int pipe_npages(const struct iov_iter *i, int *npages) -{ - struct pipe_inode_info *pipe = i->pipe; - int used = pipe->head - pipe->tail; - int off = i->last_offset; - - *npages = max((int)pipe->max_usage - used, 0); - - if (off > 0 && off < PAGE_SIZE) { // anon and not full - (*npages)++; - return off; - } - return 0; -} - -static size_t copy_pipe_to_iter(const void *addr, size_t bytes, - struct iov_iter *i) -{ - unsigned int off, chunk; - - if (unlikely(bytes > i->count)) - bytes = i->count; - if (unlikely(!bytes)) - return 0; - - if (!sanity(i)) - return 0; - - for (size_t n = bytes; n; n -= chunk) { - struct page *page = append_pipe(i, n, &off); - chunk = min_t(size_t, n, PAGE_SIZE - off); - if (!page) - return bytes - n; - memcpy_to_page(page, off, addr, chunk); - addr += chunk; - } - return bytes; -} - static __wsum csum_and_memcpy(void *to, const void *from, size_t len, __wsum sum, size_t off) { @@ -486,44 +294,10 @@ static __wsum csum_and_memcpy(void *to, const void *from, size_t len, return csum_block_add(sum, next, off); } -static size_t csum_and_copy_to_pipe_iter(const void *addr, size_t bytes, - struct iov_iter *i, __wsum *sump) -{ - __wsum sum = *sump; - size_t off = 0; - unsigned int chunk, r; - - if (unlikely(bytes > i->count)) - bytes = i->count; - if (unlikely(!bytes)) - return 0; - - if (!sanity(i)) - return 0; - - while (bytes) { - struct page *page = append_pipe(i, bytes, &r); - char *p; - - if (!page) - break; - chunk = min_t(size_t, bytes, PAGE_SIZE - r); - p = kmap_local_page(page); - sum = csum_and_memcpy(p + r, addr + off, chunk, sum, off); - kunmap_local(p); - off += chunk; - bytes -= chunk; - } - *sump = sum; - return off; -} - size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i) { if (WARN_ON_ONCE(i->data_source)) return 0; - if (unlikely(iov_iter_is_pipe(i))) - return copy_pipe_to_iter(addr, bytes, i); if (user_backed_iter(i)) might_fault(); iterate_and_advance(i, bytes, base, len, off, @@ -545,42 +319,6 @@ static int copyout_mc(void __user *to, const void *from, size_t n) return n; } -static size_t copy_mc_pipe_to_iter(const void *addr, size_t bytes, - struct iov_iter *i) -{ - size_t xfer = 0; - unsigned int off, chunk; - - if (unlikely(bytes > i->count)) - bytes = i->count; - if (unlikely(!bytes)) - return 0; - - if (!sanity(i)) - return 0; - - while (bytes) { - struct page *page = append_pipe(i, bytes, &off); - unsigned long rem; - char *p; - - if (!page) - break; - chunk = min_t(size_t, bytes, PAGE_SIZE - off); - p = kmap_local_page(page); - rem = copy_mc_to_kernel(p + off, addr + xfer, chunk); - chunk -= rem; - kunmap_local(p); - xfer += chunk; - bytes -= chunk; - if (rem) { - iov_iter_revert(i, rem); - break; - } - } - return xfer; -} - /** * _copy_mc_to_iter - copy to iter with source memory error exception handling * @addr: source kernel address @@ -600,9 +338,8 @@ static size_t copy_mc_pipe_to_iter(const void *addr, size_t bytes, * alignment and poison alignment assumptions to avoid re-triggering * hardware exceptions. * - * * ITER_KVEC, ITER_PIPE, and ITER_BVEC can return short copies. - * Compare to copy_to_iter() where only ITER_IOVEC attempts might return - * a short copy. + * * ITER_KVEC and ITER_BVEC can return short copies. Compare to + * copy_to_iter() where only ITER_IOVEC attempts might return a short copy. * * Return: number of bytes copied (may be %0) */ @@ -610,8 +347,6 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i) { if (WARN_ON_ONCE(i->data_source)) return 0; - if (unlikely(iov_iter_is_pipe(i))) - return copy_mc_pipe_to_iter(addr, bytes, i); if (user_backed_iter(i)) might_fault(); __iterate_and_advance(i, bytes, base, len, off, @@ -717,8 +452,6 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes, return 0; if (WARN_ON_ONCE(i->data_source)) return 0; - if (unlikely(iov_iter_is_pipe(i))) - return copy_page_to_iter_pipe(page, offset, bytes, i); page += offset / PAGE_SIZE; // first subpage offset %= PAGE_SIZE; while (1) { @@ -767,36 +500,8 @@ size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes, } EXPORT_SYMBOL(copy_page_from_iter); -static size_t pipe_zero(size_t bytes, struct iov_iter *i) -{ - unsigned int chunk, off; - - if (unlikely(bytes > i->count)) - bytes = i->count; - if (unlikely(!bytes)) - return 0; - - if (!sanity(i)) - return 0; - - for (size_t n = bytes; n; n -= chunk) { - struct page *page = append_pipe(i, n, &off); - char *p; - - if (!page) - return bytes - n; - chunk = min_t(size_t, n, PAGE_SIZE - off); - p = kmap_local_page(page); - memset(p + off, 0, chunk); - kunmap_local(p); - } - return bytes; -} - size_t iov_iter_zero(size_t bytes, struct iov_iter *i) { - if (unlikely(iov_iter_is_pipe(i))) - return pipe_zero(bytes, i); iterate_and_advance(i, bytes, base, len, count, clear_user(base, len), memset(base, 0, len) @@ -827,32 +532,6 @@ size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, size_t byt } EXPORT_SYMBOL(copy_page_from_iter_atomic); -static void pipe_advance(struct iov_iter *i, size_t size) -{ - struct pipe_inode_info *pipe = i->pipe; - int off = i->last_offset; - - if (!off && !size) { - pipe_discard_from(pipe, i->start_head); // discard everything - return; - } - i->count -= size; - while (1) { - struct pipe_buffer *buf = pipe_buf(pipe, i->head); - if (off) /* make it relative to the beginning of buffer */ - size += abs(off) - buf->offset; - if (size <= buf->len) { - buf->len = size; - i->last_offset = last_offset(buf); - break; - } - size -= buf->len; - i->head++; - off = 0; - } - pipe_discard_from(pipe, i->head + 1); // discard everything past this one -} - static void iov_iter_bvec_advance(struct iov_iter *i, size_t size) { const struct bio_vec *bvec, *end; @@ -904,8 +583,6 @@ void iov_iter_advance(struct iov_iter *i, size_t size) iov_iter_iovec_advance(i, size); } else if (iov_iter_is_bvec(i)) { iov_iter_bvec_advance(i, size); - } else if (iov_iter_is_pipe(i)) { - pipe_advance(i, size); } else if (iov_iter_is_discard(i)) { i->count -= size; } @@ -919,26 +596,6 @@ void iov_iter_revert(struct iov_iter *i, size_t unroll) if (WARN_ON(unroll > MAX_RW_COUNT)) return; i->count += unroll; - if (unlikely(iov_iter_is_pipe(i))) { - struct pipe_inode_info *pipe = i->pipe; - unsigned int head = pipe->head; - - while (head > i->start_head) { - struct pipe_buffer *b = pipe_buf(pipe, --head); - if (unroll < b->len) { - b->len -= unroll; - i->last_offset = last_offset(b); - i->head = head; - return; - } - unroll -= b->len; - pipe_buf_release(pipe, b); - pipe->head--; - } - i->last_offset = 0; - i->head = head; - return; - } if (unlikely(iov_iter_is_discard(i))) return; if (unroll <= i->iov_offset) { @@ -1026,24 +683,6 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction, } EXPORT_SYMBOL(iov_iter_bvec); -void iov_iter_pipe(struct iov_iter *i, unsigned int direction, - struct pipe_inode_info *pipe, - size_t count) -{ - BUG_ON(direction != READ); - WARN_ON(pipe_full(pipe->head, pipe->tail, pipe->ring_size)); - *i = (struct iov_iter){ - .iter_type = ITER_PIPE, - .data_source = false, - .pipe = pipe, - .head = pipe->head, - .start_head = pipe->head, - .last_offset = 0, - .count = count - }; -} -EXPORT_SYMBOL(iov_iter_pipe); - /** * iov_iter_xarray - Initialise an I/O iterator to use the pages in an xarray * @i: The iterator to initialise. @@ -1168,19 +807,6 @@ bool iov_iter_is_aligned(const struct iov_iter *i, unsigned addr_mask, if (iov_iter_is_bvec(i)) return iov_iter_aligned_bvec(i, addr_mask, len_mask); - if (iov_iter_is_pipe(i)) { - size_t size = i->count; - - if (size & len_mask) - return false; - if (size && i->last_offset > 0) { - if (i->last_offset & addr_mask) - return false; - } - - return true; - } - if (iov_iter_is_xarray(i)) { if (i->count & len_mask) return false; @@ -1250,14 +876,6 @@ unsigned long iov_iter_alignment(const struct iov_iter *i) if (iov_iter_is_bvec(i)) return iov_iter_alignment_bvec(i); - if (iov_iter_is_pipe(i)) { - size_t size = i->count; - - if (size && i->last_offset > 0) - return size | i->last_offset; - return size; - } - if (iov_iter_is_xarray(i)) return (i->xarray_start + i->iov_offset) | i->count; @@ -1309,36 +927,6 @@ static int want_pages_array(struct page ***res, size_t size, return count; } -static ssize_t pipe_get_pages(struct iov_iter *i, - struct page ***pages, size_t maxsize, unsigned maxpages, - size_t *start) -{ - unsigned int npages, count, off, chunk; - struct page **p; - size_t left; - - if (!sanity(i)) - return -EFAULT; - - *start = off = pipe_npages(i, &npages); - if (!npages) - return -EFAULT; - count = want_pages_array(pages, maxsize, off, min(npages, maxpages)); - if (!count) - return -ENOMEM; - p = *pages; - for (npages = 0, left = maxsize ; npages < count; npages++, left -= chunk) { - struct page *page = append_pipe(i, left, &off); - if (!page) - break; - chunk = min_t(size_t, left, PAGE_SIZE - off); - get_page(*p++ = page); - } - if (!npages) - return -EFAULT; - return maxsize - left; -} - static ssize_t iter_xarray_populate_pages(struct page **pages, struct xarray *xa, pgoff_t index, unsigned int nr_pages) { @@ -1486,8 +1074,6 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i, } return maxsize; } - if (iov_iter_is_pipe(i)) - return pipe_get_pages(i, pages, maxsize, maxpages, start); if (iov_iter_is_xarray(i)) return iter_xarray_get_pages(i, pages, maxsize, maxpages, start); return -EFAULT; @@ -1577,9 +1163,7 @@ size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *_csstate, } sum = csum_shift(csstate->csum, csstate->off); - if (unlikely(iov_iter_is_pipe(i))) - bytes = csum_and_copy_to_pipe_iter(addr, bytes, i, &sum); - else iterate_and_advance(i, bytes, base, len, off, ({ + iterate_and_advance(i, bytes, base, len, off, ({ next = csum_and_copy_to_user(addr + off, base, len); sum = csum_block_add(sum, next, off); next ? 0 : len; @@ -1664,15 +1248,6 @@ int iov_iter_npages(const struct iov_iter *i, int maxpages) return iov_npages(i, maxpages); if (iov_iter_is_bvec(i)) return bvec_npages(i, maxpages); - if (iov_iter_is_pipe(i)) { - int npages; - - if (!sanity(i)) - return 0; - - pipe_npages(i, &npages); - return min(npages, maxpages); - } if (iov_iter_is_xarray(i)) { unsigned offset = (i->xarray_start + i->iov_offset) % PAGE_SIZE; int npages = DIV_ROUND_UP(offset + i->count, PAGE_SIZE); @@ -1685,10 +1260,6 @@ EXPORT_SYMBOL(iov_iter_npages); const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags) { *new = *old; - if (unlikely(iov_iter_is_pipe(new))) { - WARN_ON(1); - return NULL; - } if (iov_iter_is_bvec(new)) return new->bvec = kmemdup(new->bvec, new->nr_segs * sizeof(struct bio_vec), diff --git a/mm/filemap.c b/mm/filemap.c index b31168a9bafd..6970be64a3e0 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2695,8 +2695,7 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter, if (unlikely(iocb->ki_pos >= i_size_read(inode))) break; - error = filemap_get_pages(iocb, iter->count, &fbatch, - iov_iter_is_pipe(iter)); + error = filemap_get_pages(iocb, iter->count, &fbatch, false); if (error < 0) break; From patchwork Thu Feb 9 10:29:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 54854 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp256250wrn; Thu, 9 Feb 2023 02:35:22 -0800 (PST) X-Google-Smtp-Source: AK7set89jn77eD5TzNW+V9BtNyBizrW2fRvJmkvhqRn8SHrcI3UtdvYzWg+TKBEOZILcWBWGIPeQ X-Received: by 2002:a17:902:f545:b0:199:3a19:3578 with SMTP id h5-20020a170902f54500b001993a193578mr9425225plf.53.1675938922381; Thu, 09 Feb 2023 02:35:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675938922; cv=none; d=google.com; s=arc-20160816; b=Eqdk3oSKxI1f9AWV00vdajd3O4Zjfvjs3b7p/dW95gG5eFA9nHDuMvai4Khak5N9Jo 4gB6uKQ0U1UtWLIPhchpKUSaP5cMZ2phV+lpnCk1BnwZ7g+y2aG5Ct4vHSSqYaUIPgh9 ys6hXB8qLz071sSJ0RqONyrUAjPY0eX7/Rwh1ve83fjEo7INgTI8DuuRqbT7tpHe+OYg j/aVPyPU3veh0WMZnndQGXWyDCORAHMUVfd3IpdYDhj72sdLlEpDnZU/meYih2M+sKfw pOTCfuNloOXHT1jD6CNzEZSFUsH1G/EYWSSw8qao0MQM0YY9TAQmth9gR9sU03OcWvOP EjZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Q5bJAUy0Yf/xIkHbL0gYjIax658ix+8tiANnbR9foj4=; b=o4ePG6HoCiM/+ZcfKAeJKGJE8FMbfL73H2V5CaLXqA+M9wS7TagBcJ1eCk8fg9Afkj 0Vf+KibbbQ+62QvPj5GsyS7KqDgcptEX4ARri8TvAsNFrRRxgYh2iYHq5NNERDzwsyuq CEETG10+k58xagqW5L1d6XCn5WmjBYGccy0prUG+9bjqzmbbya4s3U974FX0Zn8LJsRQ rrdZOHsWlu66/FTTL0PjsqfUD8Bc0oCtbJy+KwzpUiKEmf4N9V5ZF+eexSyxIDjF9nTI Z+s/jp4WOT6UUFHuQiZctTPTXwv+dwg5Kl0a5eNwdCBEbVqzPs5m0EXyYPULspF2WRQp 1tvg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=TuczYKaZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i17-20020a170902c95100b001960b0d1f18si1719042pla.175.2023.02.09.02.35.09; Thu, 09 Feb 2023 02:35:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=TuczYKaZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229582AbjBIKce (ORCPT + 99 others); Thu, 9 Feb 2023 05:32:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50968 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230131AbjBIKbh (ORCPT ); Thu, 9 Feb 2023 05:31:37 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B5F4132529 for ; Thu, 9 Feb 2023 02:30:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675938627; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Q5bJAUy0Yf/xIkHbL0gYjIax658ix+8tiANnbR9foj4=; b=TuczYKaZcpSYkB3swpCun2ENfERANZc7lBymbyogdJw37p7eDUhBafX0Thajt8wIOi5lXk rVw8P67puw30tzNfv4lPh7Tl0p5zB1/C1o+L1vRTVu+5BT/8Mou0yTGbw26vZ7GgcfMkQH vP59v+VTBsdEevteXVahR8iZjD+4QDs= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-644-HKGEz01pP0Wfszk7FqXLoA-1; Thu, 09 Feb 2023 05:30:22 -0500 X-MC-Unique: HKGEz01pP0Wfszk7FqXLoA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 312BA3C16E94; Thu, 9 Feb 2023 10:30:12 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3ED8F1415127; Thu, 9 Feb 2023 10:30:10 +0000 (UTC) From: David Howells To: Jens Axboe , Al Viro , Christoph Hellwig Cc: David Howells , Matthew Wilcox , Jan Kara , Jeff Layton , David Hildenbrand , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Christoph Hellwig , John Hubbard Subject: [PATCH v13 05/12] iov_iter: Define flags to qualify page extraction. Date: Thu, 9 Feb 2023 10:29:47 +0000 Message-Id: <20230209102954.528942-6-dhowells@redhat.com> In-Reply-To: <20230209102954.528942-1-dhowells@redhat.com> References: <20230209102954.528942-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757349331402870685?= X-GMAIL-MSGID: =?utf-8?q?1757349331402870685?= Define flags to qualify page extraction to pass into iov_iter_*_pages*() rather than passing in FOLL_* flags. For now only a flag to allow peer-to-peer DMA is supported. Signed-off-by: David Howells Reviewed-by: Christoph Hellwig Reviewed-by: John Hubbard cc: Al Viro cc: Jens Axboe cc: Logan Gunthorpe cc: linux-fsdevel@vger.kernel.org cc: linux-block@vger.kernel.org --- Notes: ver #12) - Use __bitwise for the extraction flags typedef. ver #11) - Use __bitwise for the extraction flags. ver #9) - Change extract_flags to extraction_flags. ver #7) - Don't use FOLL_* as a parameter, but rather define constants specifically to use with iov_iter_*_pages*(). - Drop the I/O direction constants for now. block/bio.c | 6 +++--- block/blk-map.c | 8 ++++---- include/linux/uio.h | 10 ++++++++-- lib/iov_iter.c | 14 ++++++++------ 4 files changed, 23 insertions(+), 15 deletions(-) diff --git a/block/bio.c b/block/bio.c index ab59a491a883..b97f3991c904 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1245,11 +1245,11 @@ static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page, */ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) { + iov_iter_extraction_t extraction_flags = 0; unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt; unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt; struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt; struct page **pages = (struct page **)bv; - unsigned int gup_flags = 0; ssize_t size, left; unsigned len, i = 0; size_t offset, trim; @@ -1264,7 +1264,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) pages += entries_left * (PAGE_PTRS_PER_BVEC - 1); if (bio->bi_bdev && blk_queue_pci_p2pdma(bio->bi_bdev->bd_disk->queue)) - gup_flags |= FOLL_PCI_P2PDMA; + extraction_flags |= ITER_ALLOW_P2PDMA; /* * Each segment in the iov is required to be a block size multiple. @@ -1275,7 +1275,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) */ size = iov_iter_get_pages(iter, pages, UINT_MAX - bio->bi_iter.bi_size, - nr_pages, &offset, gup_flags); + nr_pages, &offset, extraction_flags); if (unlikely(size <= 0)) return size ? size : -EFAULT; diff --git a/block/blk-map.c b/block/blk-map.c index 19940c978c73..080dd60485be 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -265,9 +265,9 @@ static struct bio *blk_rq_map_bio_alloc(struct request *rq, static int bio_map_user_iov(struct request *rq, struct iov_iter *iter, gfp_t gfp_mask) { + iov_iter_extraction_t extraction_flags = 0; unsigned int max_sectors = queue_max_hw_sectors(rq->q); unsigned int nr_vecs = iov_iter_npages(iter, BIO_MAX_VECS); - unsigned int gup_flags = 0; struct bio *bio; int ret; int j; @@ -280,7 +280,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter, return -ENOMEM; if (blk_queue_pci_p2pdma(rq->q)) - gup_flags |= FOLL_PCI_P2PDMA; + extraction_flags |= ITER_ALLOW_P2PDMA; while (iov_iter_count(iter)) { struct page **pages, *stack_pages[UIO_FASTIOV]; @@ -291,10 +291,10 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter, if (nr_vecs <= ARRAY_SIZE(stack_pages)) { pages = stack_pages; bytes = iov_iter_get_pages(iter, pages, LONG_MAX, - nr_vecs, &offs, gup_flags); + nr_vecs, &offs, extraction_flags); } else { bytes = iov_iter_get_pages_alloc(iter, &pages, - LONG_MAX, &offs, gup_flags); + LONG_MAX, &offs, extraction_flags); } if (unlikely(bytes <= 0)) { ret = bytes ? bytes : -EFAULT; diff --git a/include/linux/uio.h b/include/linux/uio.h index dcc0ca5ef491..af70e4c9ea27 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -12,6 +12,8 @@ struct page; +typedef unsigned int __bitwise iov_iter_extraction_t; + struct kvec { void *iov_base; /* and that should *never* hold a userland pointer */ size_t iov_len; @@ -238,12 +240,12 @@ void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray * loff_t start, size_t count); ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, size_t *start, - unsigned gup_flags); + iov_iter_extraction_t extraction_flags); ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, size_t *start); ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, size_t maxsize, size_t *start, - unsigned gup_flags); + iov_iter_extraction_t extraction_flags); ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, struct page ***pages, size_t maxsize, size_t *start); int iov_iter_npages(const struct iov_iter *i, int maxpages); @@ -346,4 +348,8 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction, }; } +/* Flags for iov_iter_get/extract_pages*() */ +/* Allow P2PDMA on the extracted pages */ +#define ITER_ALLOW_P2PDMA ((__force iov_iter_extraction_t)0x01) + #endif diff --git a/lib/iov_iter.c b/lib/iov_iter.c index adc5e8aa8ae8..34ee3764d0fa 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1020,9 +1020,9 @@ static struct page *first_bvec_segment(const struct iov_iter *i, static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, size_t maxsize, unsigned int maxpages, size_t *start, - unsigned int gup_flags) + iov_iter_extraction_t extraction_flags) { - unsigned int n; + unsigned int n, gup_flags = 0; if (maxsize > i->count) maxsize = i->count; @@ -1030,6 +1030,8 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i, return 0; if (maxsize > MAX_RW_COUNT) maxsize = MAX_RW_COUNT; + if (extraction_flags & ITER_ALLOW_P2PDMA) + gup_flags |= FOLL_PCI_P2PDMA; if (likely(user_backed_iter(i))) { unsigned long addr; @@ -1081,14 +1083,14 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i, ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, - size_t *start, unsigned gup_flags) + size_t *start, iov_iter_extraction_t extraction_flags) { if (!maxpages) return 0; BUG_ON(!pages); return __iov_iter_get_pages_alloc(i, &pages, maxsize, maxpages, - start, gup_flags); + start, extraction_flags); } EXPORT_SYMBOL_GPL(iov_iter_get_pages); @@ -1101,14 +1103,14 @@ EXPORT_SYMBOL(iov_iter_get_pages2); ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, size_t maxsize, - size_t *start, unsigned gup_flags) + size_t *start, iov_iter_extraction_t extraction_flags) { ssize_t len; *pages = NULL; len = __iov_iter_get_pages_alloc(i, pages, maxsize, ~0U, start, - gup_flags); + extraction_flags); if (len <= 0) { kvfree(*pages); *pages = NULL; From patchwork Thu Feb 9 10:29:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 54845 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp255689wrn; Thu, 9 Feb 2023 02:33:47 -0800 (PST) X-Google-Smtp-Source: AK7set/OIvX1ywH43Ht466bM6l7Y+H0dvAeINejWxJfjOQWJMd2Y4SnDT5IKAPy+T8G/CTcti4v1 X-Received: by 2002:a17:90a:58:b0:231:10da:59fc with SMTP id 24-20020a17090a005800b0023110da59fcmr6365948pjb.2.1675938827516; Thu, 09 Feb 2023 02:33:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675938827; cv=none; d=google.com; s=arc-20160816; b=n5pnYdZqEIiPsQ4FfAYSVlWbbzbvv1NCbV2d6Uo/Rg/MvbMcKcaGYLBeY408tfZbE2 7OLDtRdDoeOXQqmN1j1LmVAu/Vxf/5iGwdg4IPoHIEQRVRhi6zVAfLQdHLvosFCkEF0Q 0HUzrBYb8JJVEJTsNZfBfWOAZDIwArKmViVIznFkzO+EWiqIEG2Eh/8gCklHrvfek0SR HWl4MyZ46c2vgy+UlsIbjb/EJRCj7tIafrek2I2AZpms1xi4atGhgktaCrV6LUCalYmW whPYM/d57efiU3QrEV60P+p09sWMPCERV8BIKBgjKne5h7y6iecae66huUh3iIoECNGO wS0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=G1Az3EBV9SOKUryI5J/Szq/Ryazp7FGIfAGTSAZ/Hdw=; b=mlwOKIZkRW6arWh+6wYDWohuKbmLgvvPlfIDoPTTnL53Zka9h/pdltoUieTWCHvjGM WoiOqKEm60WE+lwK0CPUMS9wqJtp6ppDPqkXn73/0qQCP0m+d8VH+9ykQSfGV2UwcGJl r0vBDg+2ffC2pOuTrzLqyfv0BBTO5qLaWHx/fzR2E8vqcyABPwWJAU2XQsLwxnrx/Onc Ib6J1FeTsdaO0UYkegcMOUeRAPykj83HGQ0AoFPVwEHx5JeRMOyi2x78a7LtVZPZLQj1 WA9lhyVN5v1L4lFW8b6SNBab6ixp9a8fPG5Kq+lxdYe0itFBktbpSxAA5E39/Ozbwiy3 9R9g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=jHN+1Fdv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n4-20020a17090a9f0400b002309f0bd759si4971125pjp.92.2023.02.09.02.33.34; Thu, 09 Feb 2023 02:33:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=jHN+1Fdv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230239AbjBIKcA (ORCPT + 99 others); Thu, 9 Feb 2023 05:32:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50936 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229914AbjBIKbg (ORCPT ); Thu, 9 Feb 2023 05:31:36 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4AEE21BAF0 for ; Thu, 9 Feb 2023 02:30:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675938625; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=G1Az3EBV9SOKUryI5J/Szq/Ryazp7FGIfAGTSAZ/Hdw=; b=jHN+1Fdves7JZjHRhGI0ozMMamwPgyIZGeRYwLmFtFycT4bvujmjaSboiaXIjeu4MTT1bp kzYkOD9LMdgwFbduadjIFo0ZS+keWBCenUkR2FJcTdCjnxrsc/SSBmeNWHZOj0AGTrRCPq cjbGsfGSN0k2vVP047C07vG2WxpLuYs= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-321-jaEj5sufMhODZlB0qhi8BQ-1; Thu, 09 Feb 2023 05:30:23 -0500 X-MC-Unique: jaEj5sufMhODZlB0qhi8BQ-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 02181884DC9; Thu, 9 Feb 2023 10:30:15 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id EE831492B00; Thu, 9 Feb 2023 10:30:12 +0000 (UTC) From: David Howells To: Jens Axboe , Al Viro , Christoph Hellwig Cc: David Howells , Matthew Wilcox , Jan Kara , Jeff Layton , David Hildenbrand , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Christoph Hellwig , John Hubbard Subject: [PATCH v13 06/12] iov_iter: Add a function to extract a page list from an iterator Date: Thu, 9 Feb 2023 10:29:48 +0000 Message-Id: <20230209102954.528942-7-dhowells@redhat.com> In-Reply-To: <20230209102954.528942-1-dhowells@redhat.com> References: <20230209102954.528942-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757349232225411001?= X-GMAIL-MSGID: =?utf-8?q?1757349232225411001?= Add a function, iov_iter_extract_pages(), to extract a list of pages from an iterator. The pages may be returned with a pin added or nothing, depending on the type of iterator. Add a second function, iov_iter_extract_will_pin(), to determine how the cleanup should be done. There are two cases: (1) ITER_IOVEC or ITER_UBUF iterator. Extracted pages will have pins (FOLL_PIN) obtained on them so that a concurrent fork() will forcibly copy the page so that DMA is done to/from the parent's buffer and is unavailable to/unaffected by the child process. iov_iter_extract_will_pin() will return true for this case. The caller should use something like unpin_user_page() to dispose of the page. (2) Any other sort of iterator. No refs or pins are obtained on the page, the assumption is made that the caller will manage page retention. iov_iter_extract_will_pin() will return false. The pages don't need additional disposal. Signed-off-by: David Howells Reviewed-by: Christoph Hellwig cc: Al Viro cc: John Hubbard cc: David Hildenbrand cc: Matthew Wilcox cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org --- Notes: ver #12) - ITER_PIPE is gone, so drop related bits. - Don't specify FOLL_PIN as that's implied by pin_user_pages_fast(). ver #11) - Fix iov_iter_extract_kvec_pages() to include the offset into the page in the returned starting offset. - Use __bitwise for the extraction flags ver #10) - Fix use of i->kvec in iov_iter_extract_bvec_pages() to be i->bvec. ver #9) - Rename iov_iter_extract_mode() to iov_iter_extract_will_pin() and make it return true/false not FOLL_PIN/0 as FOLL_PIN is going to be made private to mm/. - Change extract_flags to extraction_flags. ver #8) - It seems that all DIO is supposed to be done under FOLL_PIN now, and not FOLL_GET, so switch to only using pin_user_pages() for user-backed iters. - Wrap an argument in brackets in the iov_iter_extract_mode() macro. - Drop the extract_flags argument to iov_iter_extract_mode() for now [hch]. ver #7) - Switch to passing in iter-specific flags rather than FOLL_* flags. - Drop the direction flags for now. - Use ITER_ALLOW_P2PDMA to request FOLL_PCI_P2PDMA. - Disallow use of ITER_ALLOW_P2PDMA with non-user-backed iter. - Add support for extraction from KVEC-type iters. - Use iov_iter_advance() rather than open-coding it. - Make BVEC- and KVEC-type skip over initial empty vectors. ver #6) - Add back the function to indicate the cleanup mode. - Drop the cleanup_mode return arg to iov_iter_extract_pages(). - Pass FOLL_SOURCE/DEST_BUF in gup_flags. Check this against the iter data_source. ver #4) - Use ITER_SOURCE/DEST instead of WRITE/READ. - Allow additional FOLL_* flags, such as FOLL_PCI_P2PDMA to be passed in. ver #3) - Switch to using EXPORT_SYMBOL_GPL to prevent indirect 3rd-party access to get/pin_user_pages_fast()[1]. include/linux/uio.h | 27 ++++- lib/iov_iter.c | 264 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 290 insertions(+), 1 deletion(-) diff --git a/include/linux/uio.h b/include/linux/uio.h index af70e4c9ea27..cf6658066736 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -347,9 +347,34 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction, .count = count }; } - /* Flags for iov_iter_get/extract_pages*() */ /* Allow P2PDMA on the extracted pages */ #define ITER_ALLOW_P2PDMA ((__force iov_iter_extraction_t)0x01) +ssize_t iov_iter_extract_pages(struct iov_iter *i, struct page ***pages, + size_t maxsize, unsigned int maxpages, + iov_iter_extraction_t extraction_flags, + size_t *offset0); + +/** + * iov_iter_extract_will_pin - Indicate how pages from the iterator will be retained + * @iter: The iterator + * + * Examine the iterator and indicate by returning true or false as to how, if + * at all, pages extracted from the iterator will be retained by the extraction + * function. + * + * %true indicates that the pages will have a pin placed in them that the + * caller must unpin. This is must be done for DMA/async DIO to force fork() + * to forcibly copy a page for the child (the parent must retain the original + * page). + * + * %false indicates that no measures are taken and that it's up to the caller + * to retain the pages. + */ +static inline bool iov_iter_extract_will_pin(const struct iov_iter *iter) +{ + return user_backed_iter(iter); +} + #endif diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 34ee3764d0fa..8d34b6552179 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1487,3 +1487,267 @@ void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state) i->iov -= state->nr_segs - i->nr_segs; i->nr_segs = state->nr_segs; } + +/* + * Extract a list of contiguous pages from an ITER_XARRAY iterator. This does not + * get references on the pages, nor does it get a pin on them. + */ +static ssize_t iov_iter_extract_xarray_pages(struct iov_iter *i, + struct page ***pages, size_t maxsize, + unsigned int maxpages, + iov_iter_extraction_t extraction_flags, + size_t *offset0) +{ + struct page *page, **p; + unsigned int nr = 0, offset; + loff_t pos = i->xarray_start + i->iov_offset; + pgoff_t index = pos >> PAGE_SHIFT; + XA_STATE(xas, i->xarray, index); + + offset = pos & ~PAGE_MASK; + *offset0 = offset; + + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + p = *pages; + + rcu_read_lock(); + for (page = xas_load(&xas); page; page = xas_next(&xas)) { + if (xas_retry(&xas, page)) + continue; + + /* Has the page moved or been split? */ + if (unlikely(page != xas_reload(&xas))) { + xas_reset(&xas); + continue; + } + + p[nr++] = find_subpage(page, xas.xa_index); + if (nr == maxpages) + break; + } + rcu_read_unlock(); + + maxsize = min_t(size_t, nr * PAGE_SIZE - offset, maxsize); + iov_iter_advance(i, maxsize); + return maxsize; +} + +/* + * Extract a list of contiguous pages from an ITER_BVEC iterator. This does + * not get references on the pages, nor does it get a pin on them. + */ +static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i, + struct page ***pages, size_t maxsize, + unsigned int maxpages, + iov_iter_extraction_t extraction_flags, + size_t *offset0) +{ + struct page **p, *page; + size_t skip = i->iov_offset, offset; + int k; + + for (;;) { + if (i->nr_segs == 0) + return 0; + maxsize = min(maxsize, i->bvec->bv_len - skip); + if (maxsize) + break; + i->iov_offset = 0; + i->nr_segs--; + i->bvec++; + skip = 0; + } + + skip += i->bvec->bv_offset; + page = i->bvec->bv_page + skip / PAGE_SIZE; + offset = skip % PAGE_SIZE; + *offset0 = offset; + + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + p = *pages; + for (k = 0; k < maxpages; k++) + p[k] = page + k; + + maxsize = min_t(size_t, maxsize, maxpages * PAGE_SIZE - offset); + iov_iter_advance(i, maxsize); + return maxsize; +} + +/* + * Extract a list of virtually contiguous pages from an ITER_KVEC iterator. + * This does not get references on the pages, nor does it get a pin on them. + */ +static ssize_t iov_iter_extract_kvec_pages(struct iov_iter *i, + struct page ***pages, size_t maxsize, + unsigned int maxpages, + iov_iter_extraction_t extraction_flags, + size_t *offset0) +{ + struct page **p, *page; + const void *kaddr; + size_t skip = i->iov_offset, offset, len; + int k; + + for (;;) { + if (i->nr_segs == 0) + return 0; + maxsize = min(maxsize, i->kvec->iov_len - skip); + if (maxsize) + break; + i->iov_offset = 0; + i->nr_segs--; + i->kvec++; + skip = 0; + } + + kaddr = i->kvec->iov_base + skip; + offset = (unsigned long)kaddr & ~PAGE_MASK; + *offset0 = offset; + + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + p = *pages; + + kaddr -= offset; + len = offset + maxsize; + for (k = 0; k < maxpages; k++) { + size_t seg = min_t(size_t, len, PAGE_SIZE); + + if (is_vmalloc_or_module_addr(kaddr)) + page = vmalloc_to_page(kaddr); + else + page = virt_to_page(kaddr); + + p[k] = page; + len -= seg; + kaddr += PAGE_SIZE; + } + + maxsize = min_t(size_t, maxsize, maxpages * PAGE_SIZE - offset); + iov_iter_advance(i, maxsize); + return maxsize; +} + +/* + * Extract a list of contiguous pages from a user iterator and get a pin on + * each of them. This should only be used if the iterator is user-backed + * (IOBUF/UBUF). + * + * It does not get refs on the pages, but the pages must be unpinned by the + * caller once the transfer is complete. + * + * This is safe to be used where background IO/DMA *is* going to be modifying + * the buffer; using a pin rather than a ref makes forces fork() to give the + * child a copy of the page. + */ +static ssize_t iov_iter_extract_user_pages(struct iov_iter *i, + struct page ***pages, + size_t maxsize, + unsigned int maxpages, + iov_iter_extraction_t extraction_flags, + size_t *offset0) +{ + unsigned long addr; + unsigned int gup_flags = 0; + size_t offset; + int res; + + if (i->data_source == ITER_DEST) + gup_flags |= FOLL_WRITE; + if (extraction_flags & ITER_ALLOW_P2PDMA) + gup_flags |= FOLL_PCI_P2PDMA; + if (i->nofault) + gup_flags |= FOLL_NOFAULT; + + addr = first_iovec_segment(i, &maxsize); + *offset0 = offset = addr % PAGE_SIZE; + addr &= PAGE_MASK; + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + res = pin_user_pages_fast(addr, maxpages, gup_flags, *pages); + if (unlikely(res <= 0)) + return res; + maxsize = min_t(size_t, maxsize, res * PAGE_SIZE - offset); + iov_iter_advance(i, maxsize); + return maxsize; +} + +/** + * iov_iter_extract_pages - Extract a list of contiguous pages from an iterator + * @i: The iterator to extract from + * @pages: Where to return the list of pages + * @maxsize: The maximum amount of iterator to extract + * @maxpages: The maximum size of the list of pages + * @extraction_flags: Flags to qualify request + * @offset0: Where to return the starting offset into (*@pages)[0] + * + * Extract a list of contiguous pages from the current point of the iterator, + * advancing the iterator. The maximum number of pages and the maximum amount + * of page contents can be set. + * + * If *@pages is NULL, a page list will be allocated to the required size and + * *@pages will be set to its base. If *@pages is not NULL, it will be assumed + * that the caller allocated a page list at least @maxpages in size and this + * will be filled in. + * + * @extraction_flags can have ITER_ALLOW_P2PDMA set to request peer-to-peer DMA + * be allowed on the pages extracted. + * + * The iov_iter_extract_will_pin() function can be used to query how cleanup + * should be performed. + * + * Extra refs or pins on the pages may be obtained as follows: + * + * (*) If the iterator is user-backed (ITER_IOVEC/ITER_UBUF), pins will be + * added to the pages, but refs will not be taken. + * iov_iter_extract_will_pin() will return true. + * + * (*) If the iterator is ITER_KVEC, ITER_BVEC or ITER_XARRAY, the pages are + * merely listed; no extra refs or pins are obtained. + * iov_iter_extract_will_pin() will return 0. + * + * Note also: + * + * (*) Use with ITER_DISCARD is not supported as that has no content. + * + * On success, the function sets *@pages to the new pagelist, if allocated, and + * sets *offset0 to the offset into the first page. + * + * It may also return -ENOMEM and -EFAULT. + */ +ssize_t iov_iter_extract_pages(struct iov_iter *i, + struct page ***pages, + size_t maxsize, + unsigned int maxpages, + iov_iter_extraction_t extraction_flags, + size_t *offset0) +{ + maxsize = min_t(size_t, min_t(size_t, maxsize, i->count), MAX_RW_COUNT); + if (!maxsize) + return 0; + + if (likely(user_backed_iter(i))) + return iov_iter_extract_user_pages(i, pages, maxsize, + maxpages, extraction_flags, + offset0); + if (iov_iter_is_kvec(i)) + return iov_iter_extract_kvec_pages(i, pages, maxsize, + maxpages, extraction_flags, + offset0); + if (iov_iter_is_bvec(i)) + return iov_iter_extract_bvec_pages(i, pages, maxsize, + maxpages, extraction_flags, + offset0); + if (iov_iter_is_xarray(i)) + return iov_iter_extract_xarray_pages(i, pages, maxsize, + maxpages, extraction_flags, + offset0); + return -EFAULT; +} +EXPORT_SYMBOL_GPL(iov_iter_extract_pages); From patchwork Thu Feb 9 10:29:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 54851 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp256087wrn; Thu, 9 Feb 2023 02:34:57 -0800 (PST) X-Google-Smtp-Source: AK7set9gNeBYD0aROQZv6fxPwC72AWquIKv0rPBkfoSlVTirrbcoAOtpurtQL2tTRye1K1KHUyFY X-Received: by 2002:a05:6a20:7f92:b0:bc:9007:e53 with SMTP id d18-20020a056a207f9200b000bc90070e53mr15075776pzj.0.1675938897557; Thu, 09 Feb 2023 02:34:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675938897; cv=none; d=google.com; s=arc-20160816; b=yirwT6YjCAckLqSKXTf+sItVYDSIo7jv7poL8vF+vLZqmtXUcB6fz7qZYDlfP1oNil EuPpbzAkmFczxEmoeF/ysIlgSwVTSAqj/BiPNoMaKIEjfvmwOU/eiK0Ncai+fGA+NxaJ rjbTm+vRu/36f8hA0vZRGvXV/chBXcxlXHWlPziZ45E8ePRC1ZjAmUbiVwh+8LJvwYT3 3P6omRVkAILFcXjnixf1a4PtxHJrPMawFTBnGnVv87AFTmOcel4561PnACNlSgvUKJbT pFfP+UNJbDNDWADOHN95axBR3ObCumtkdkJD//+2V41wU+ltBTTSpGd5BZPQVsJ41xMP Cy2g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=AA3SDxqAw6tV+jeEhwhhXCCPQx7iw6NiGu/cAPT0aQk=; b=Cwz4KeTAp8Msty99e/Uo3sbEGnBGOwaz6Qsoo3iITTKGpWw4k7mn0wC9Fy5+izzgUp GH7P2Tob8ByNW95SftUCVxx2Cq8gH8nwwtgSkINz5MYfhRoLP5JxG3D9JrS1CKtxtLwl ofW4K6eSFJax/mfPLE4HfbW0kCVDGp87MNwrByDgMeYvRAgiho9c0WNk3TLSJrGlnnZt SdFE3FQ+RNkSy0IWT3QMmDPjvTf3Czs6JTe6z8r5wPKt+kjjniWFiZ1ofUPjbNBlOF0z pg2OZ6cKf6/nq7qd+Il6vmytjSQSaTq49PUpisBtpYAi3eAVLiSKX/CgU6FuECDg4y9G T9mA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="Z0mlKeO/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e14-20020a63370e000000b004e06fea7d4fsi165252pga.220.2023.02.09.02.34.44; Thu, 09 Feb 2023 02:34:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="Z0mlKeO/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229823AbjBIKcM (ORCPT + 99 others); Thu, 9 Feb 2023 05:32:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50978 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229973AbjBIKbg (ORCPT ); Thu, 9 Feb 2023 05:31:36 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9292532525 for ; Thu, 9 Feb 2023 02:30:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675938627; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AA3SDxqAw6tV+jeEhwhhXCCPQx7iw6NiGu/cAPT0aQk=; b=Z0mlKeO/EjZSVCXGvuR3N+a7yjIuBqjPQWtZY0am8frZiUxlV2MRnYTk36uvcc8xDGKnmv rHZ+D2bMz2DD9Uh882VX7DXegSs+eTswPn142xevXn1R1vM9NN2N92dvA+yU4pxE+b4f6I lyMtK7qhNcR0Gb4HiMmAYaD0R9htn+E= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-480-gCLZSpzUOyqIbLen8hMYnQ-1; Thu, 09 Feb 2023 05:30:23 -0500 X-MC-Unique: gCLZSpzUOyqIbLen8hMYnQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7E0DF18E005F; Thu, 9 Feb 2023 10:30:17 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 986F41121314; Thu, 9 Feb 2023 10:30:15 +0000 (UTC) From: David Howells To: Jens Axboe , Al Viro , Christoph Hellwig Cc: David Howells , Matthew Wilcox , Jan Kara , Jeff Layton , David Hildenbrand , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, John Hubbard Subject: [PATCH v13 07/12] iomap: Don't get an reference on ZERO_PAGE for direct I/O block zeroing Date: Thu, 9 Feb 2023 10:29:49 +0000 Message-Id: <20230209102954.528942-8-dhowells@redhat.com> In-Reply-To: <20230209102954.528942-1-dhowells@redhat.com> References: <20230209102954.528942-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757349305453053575?= X-GMAIL-MSGID: =?utf-8?q?1757349305453053575?= ZERO_PAGE can't go away, no need to hold an extra reference. Signed-off-by: David Howells Reviewed-by: David Hildenbrand Reviewed-by: John Hubbard cc: Al Viro cc: David Hildenbrand cc: linux-fsdevel@vger.kernel.org --- fs/iomap/direct-io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 9804714b1751..47db4ead1e74 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -202,7 +202,7 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio, bio->bi_private = dio; bio->bi_end_io = iomap_dio_bio_end_io; - get_page(page); + bio_set_flag(bio, BIO_NO_PAGE_REF); __bio_add_page(bio, page, len, 0); iomap_dio_submit_bio(iter, dio, bio, pos); } From patchwork Thu Feb 9 10:29:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 54843 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp255638wrn; Thu, 9 Feb 2023 02:33:40 -0800 (PST) X-Google-Smtp-Source: AK7set9SSfvXePZH6Hg47ISnx35RmzEUSz5RjN1QoZeGk/oZCZ/pvJYKaUMgRzJOD1Rx8Wcvjlgi X-Received: by 2002:a05:6a00:4f:b0:5a8:4dea:7014 with SMTP id i15-20020a056a00004f00b005a84dea7014mr2916724pfk.27.1675938820215; Thu, 09 Feb 2023 02:33:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675938820; cv=none; d=google.com; s=arc-20160816; b=Is+U1JVOdN+Wdmx4ODqcTa0VMrKEQ4a3Qio5oiv+v9drTVyfM3u+4DvQdkqvCMTU1N X8oddHoRfapyRaIIEpb0tVPwv3N1VJHsiTWYUjSipCdpFz7ysExtx+5MCXPW7ikeaUKe PhM3tdJ2UkjkU4e4U/WLVTZ+A9wAPFWQDg7aVqKwODv1IuHy3XWmpoWyD0pxK0imw/zh khoMl8VSxgxw4FRkXyXSu9chBsNrmnwP6DEw5uS+egpaaKR5Dqfq148W6//5XRO99A3r g55XPfd5NQO0+AK7wDnbhsmrCS0KKnt5jy7vXZgFePjim5pbu/5yG9JnLkWZmVGJRlG7 Ufgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=9PNg3WPaFWzxoLjSof2qU8psox3R+qpqOMDOMyjiLi8=; b=Q4bkOSpDd4q4Odja1qWmOSyIUQMqgA0npeSSlH154uivzI1/UkfDG/J1+x96OHwEx7 u+BvTI3Q5WOLDfQea2CKvv+mnU3X/rtmwz0+oitUZ951azD2UqRL525M/wY5vgmet0nn f1ts2empEOlmtFCsIowzGAZZXAcmFuS2brr2uWU47k8Nv7CuC3wzw0xF+GgUWN7c2qY3 n6eirIeYjBAfhMAqEdd+ewK8XxCiqCBwXeZktqpB/V2U7uPoBKiKY9socjhcp9Hb4Qzi VUbeqOR2n4+RILHcgtQwcoYW5r8OA9pLoQQc4CrCwVcS0YlKyKPD5G1dbS+PrWl+jyLf MNGg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Tbbc1PoH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y7-20020aa78f27000000b00593ccf5ac60si1430120pfr.194.2023.02.09.02.33.26; Thu, 09 Feb 2023 02:33:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Tbbc1PoH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230214AbjBIKbx (ORCPT + 99 others); Thu, 9 Feb 2023 05:31:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50918 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229826AbjBIKbe (ORCPT ); Thu, 9 Feb 2023 05:31:34 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 371DD2367D for ; Thu, 9 Feb 2023 02:30:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675938627; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9PNg3WPaFWzxoLjSof2qU8psox3R+qpqOMDOMyjiLi8=; b=Tbbc1PoHGxkF3GCeCubs0Dpupx/xEhfrPFzcMmU2Q05Ec9NyFyfhDphGOZK4PHao1QaxzQ 1g+tvCeXfWj4zXCAMVzRVZ16kG74H9mQ+as+EqweocM8DHxYXStpSBvKpkng8AAq4zgRlk zZUvBB6JPJhBwFMxIQ8Dt3N4GveM4Kk= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-557-B72h0Av4Pby1SBTw9mI3sg-1; Thu, 09 Feb 2023 05:30:23 -0500 X-MC-Unique: B72h0Av4Pby1SBTw9mI3sg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2ACFD832E70; Thu, 9 Feb 2023 10:30:20 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3DD82C16022; Thu, 9 Feb 2023 10:30:18 +0000 (UTC) From: David Howells To: Jens Axboe , Al Viro , Christoph Hellwig Cc: David Howells , Matthew Wilcox , Jan Kara , Jeff Layton , David Hildenbrand , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Christoph Hellwig , John Hubbard Subject: [PATCH v13 08/12] block: Fix bio_flagged() so that gcc can better optimise it Date: Thu, 9 Feb 2023 10:29:50 +0000 Message-Id: <20230209102954.528942-9-dhowells@redhat.com> In-Reply-To: <20230209102954.528942-1-dhowells@redhat.com> References: <20230209102954.528942-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757349224470818857?= X-GMAIL-MSGID: =?utf-8?q?1757349224470818857?= Fix bio_flagged() so that multiple instances of it, such as: if (bio_flagged(bio, BIO_PAGE_REFFED) || bio_flagged(bio, BIO_PAGE_PINNED)) can be combined by the gcc optimiser into a single test in assembly (arguably, this is a compiler optimisation issue[1]). The missed optimisation stems from bio_flagged() comparing the result of the bitwise-AND to zero. This results in an out-of-line bio_release_page() being compiled to something like: <+0>: mov 0x14(%rdi),%eax <+3>: test $0x1,%al <+5>: jne 0xffffffff816dac53 <+7>: test $0x2,%al <+9>: je 0xffffffff816dac5c <+11>: movzbl %sil,%esi <+15>: jmp 0xffffffff816daba1 <__bio_release_pages> <+20>: jmp 0xffffffff81d0b800 <__x86_return_thunk> However, the test is superfluous as the return type is bool. Removing it results in: <+0>: testb $0x3,0x14(%rdi) <+4>: je 0xffffffff816e4af4 <+6>: movzbl %sil,%esi <+10>: jmp 0xffffffff816dab7c <__bio_release_pages> <+15>: jmp 0xffffffff81d0b7c0 <__x86_return_thunk> instead. Also, the MOVZBL instruction looks unnecessary[2] - I think it's just 're-booling' the mark_dirty parameter. Signed-off-by: David Howells Reviewed-by: Christoph Hellwig Reviewed-by: John Hubbard cc: Jens Axboe cc: linux-block@vger.kernel.org Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108370 [1] Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108371 [2] Link: https://lore.kernel.org/r/167391056756.2311931.356007731815807265.stgit@warthog.procyon.org.uk/ # v6 --- include/linux/bio.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/bio.h b/include/linux/bio.h index c1da63f6c808..10366b8bdb13 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -227,7 +227,7 @@ static inline void bio_cnt_set(struct bio *bio, unsigned int count) static inline bool bio_flagged(struct bio *bio, unsigned int bit) { - return (bio->bi_flags & (1U << bit)) != 0; + return bio->bi_flags & (1U << bit); } static inline void bio_set_flag(struct bio *bio, unsigned int bit) From patchwork Thu Feb 9 10:29:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 54848 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp255846wrn; Thu, 9 Feb 2023 02:34:12 -0800 (PST) X-Google-Smtp-Source: AK7set9wThVnKQS4sM7k5AQlP/s3gDnAoRU2B5HXRVUmB9RKir5iGvfU2LXnVdzxKnlbeyCnm8+w X-Received: by 2002:a17:903:244f:b0:199:30a6:376c with SMTP id l15-20020a170903244f00b0019930a6376cmr10895222pls.68.1675938852128; Thu, 09 Feb 2023 02:34:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675938852; cv=none; d=google.com; s=arc-20160816; b=A+SqdLBBFgi4LFXJ1R9KC8qj6wHQqGlMy2nTdmPxS4ru9EIL0D01pSCv0Tz953E4CT SAzdDbt++IeH8gvxSgcaK/lUcR16WOD1MYfhOTCoD9Z0kQnIgElkLcaVq1rYpPLUsNfm YgyW0D0MJOC5uAkbQEGehO1MFbFO1pPYqKiObKrbu1KFK+4g8h9gKnfv4TuNtkXoNQPc 9MMSEeDGKzDQCtP6OHyq3H+XOmyEn05cU1U9Dpn5aWX7bKd7n4j3xsUQpNXHTbx1mc8Z rR2vPLkXxWJsAN7lFBDW/y1VSS9KRVNBcxjLlVKZTHt82KUiX9f6dDQNr9xXbMiRtfj/ j2VQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=dqniJi1fcm9DuVb2jGUzwy7ATEE/wnT//KjRosSsKEA=; b=DSjxnA6QNcI/GTz3ZaA0aiElDrEzqrPC85OK2L5A8M4Ux9BQONE9eE1JR0zPfd1dxs vofpnL8f6OyGd2UOqFsH/mOi/RHVBcCaZOIbx43GudDo5J5sQrNCth9ioAN+AoeLntKP zViqy0jzm4f71RV+jfsMwcmXwvzu3krU92iQ9sDIPH/CJ6e2sUtVbIQx4/MMe+iTNzTu vSvO4RbQLyHXt4mGa+xBhzeBlEOzDg6uNT2NO6p9H6cNuSbv+pW9W4bdlnFYGQSnCUgJ oRvfZZzvfG8RvX6cTU5mG1fuo2j+mJvIvPD0Kgu54UoAcA7nN1WxQiVM6Ur2Zivo0UEr sMIA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=SCPTGHtU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id jc19-20020a17090325d300b0019621d4bdf9si1425621plb.496.2023.02.09.02.34.00; Thu, 09 Feb 2023 02:34:12 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=SCPTGHtU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229782AbjBIKc2 (ORCPT + 99 others); Thu, 9 Feb 2023 05:32:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50988 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230122AbjBIKbh (ORCPT ); Thu, 9 Feb 2023 05:31:37 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3A99032531 for ; Thu, 9 Feb 2023 02:30:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675938629; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dqniJi1fcm9DuVb2jGUzwy7ATEE/wnT//KjRosSsKEA=; b=SCPTGHtUPzGtQUQHQE7uuVKWm72HSgLnnMf5zJ7Gfd3trKkIJE4+r3ZvJCO5LUuKeOk69q /WJ/l63BZiMK/zDzivedpYK1XRDrZF5W6gEHFG8oIYSowMVNzlld9Uye/8es3kvuQoh8m9 dKUY54nEKOCKkRbtGWorBazmD7Evxfk= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-286-2Kg3XRHMMZGGmZ87YUpyow-1; Thu, 09 Feb 2023 05:30:25 -0500 X-MC-Unique: 2Kg3XRHMMZGGmZ87YUpyow-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B451F800DA6; Thu, 9 Feb 2023 10:30:22 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id C2EA8492C3F; Thu, 9 Feb 2023 10:30:20 +0000 (UTC) From: David Howells To: Jens Axboe , Al Viro , Christoph Hellwig Cc: David Howells , Matthew Wilcox , Jan Kara , Jeff Layton , David Hildenbrand , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Christoph Hellwig , John Hubbard Subject: [PATCH v13 09/12] block: Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED with inverted logic Date: Thu, 9 Feb 2023 10:29:51 +0000 Message-Id: <20230209102954.528942-10-dhowells@redhat.com> In-Reply-To: <20230209102954.528942-1-dhowells@redhat.com> References: <20230209102954.528942-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757349258017992867?= X-GMAIL-MSGID: =?utf-8?q?1757349258017992867?= From: Christoph Hellwig Replace BIO_NO_PAGE_REF with a BIO_PAGE_REFFED flag that has the inverted meaning is only set when a page reference has been acquired that needs to be released by bio_release_pages(). Signed-off-by: Christoph Hellwig Signed-off-by: David Howells Reviewed-by: John Hubbard cc: Al Viro cc: Jens Axboe cc: Jan Kara cc: Matthew Wilcox cc: Logan Gunthorpe cc: linux-block@vger.kernel.org --- Notes: ver #8) - Split out from another patch [hch]. - Don't default to BIO_PAGE_REFFED [hch]. ver #5) - Split from patch that uses iov_iter_extract_pages(). block/bio.c | 2 +- block/blk-map.c | 1 + fs/direct-io.c | 2 ++ fs/iomap/direct-io.c | 1 - include/linux/bio.h | 2 +- include/linux/blk_types.h | 2 +- 6 files changed, 6 insertions(+), 4 deletions(-) diff --git a/block/bio.c b/block/bio.c index b97f3991c904..bf9bf53232be 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1198,7 +1198,6 @@ void bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter) bio->bi_io_vec = (struct bio_vec *)iter->bvec; bio->bi_iter.bi_bvec_done = iter->iov_offset; bio->bi_iter.bi_size = size; - bio_set_flag(bio, BIO_NO_PAGE_REF); bio_set_flag(bio, BIO_CLONED); } @@ -1343,6 +1342,7 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) return 0; } + bio_set_flag(bio, BIO_PAGE_REFFED); do { ret = __bio_iov_iter_get_pages(bio, iter); } while (!ret && iov_iter_count(iter) && !bio_full(bio, 0)); diff --git a/block/blk-map.c b/block/blk-map.c index 080dd60485be..f1f70b50388d 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -282,6 +282,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter, if (blk_queue_pci_p2pdma(rq->q)) extraction_flags |= ITER_ALLOW_P2PDMA; + bio_set_flag(bio, BIO_PAGE_REFFED); while (iov_iter_count(iter)) { struct page **pages, *stack_pages[UIO_FASTIOV]; ssize_t bytes; diff --git a/fs/direct-io.c b/fs/direct-io.c index 03d381377ae1..07810465fc9d 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -403,6 +403,8 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio, bio->bi_end_io = dio_bio_end_aio; else bio->bi_end_io = dio_bio_end_io; + /* for now require references for all pages */ + bio_set_flag(bio, BIO_PAGE_REFFED); sdio->bio = bio; sdio->logical_offset_in_bio = sdio->cur_page_fs_offset; } diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 47db4ead1e74..c0e75900e754 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -202,7 +202,6 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio, bio->bi_private = dio; bio->bi_end_io = iomap_dio_bio_end_io; - bio_set_flag(bio, BIO_NO_PAGE_REF); __bio_add_page(bio, page, len, 0); iomap_dio_submit_bio(iter, dio, bio, pos); } diff --git a/include/linux/bio.h b/include/linux/bio.h index 10366b8bdb13..805957c99147 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -484,7 +484,7 @@ void zero_fill_bio(struct bio *bio); static inline void bio_release_pages(struct bio *bio, bool mark_dirty) { - if (!bio_flagged(bio, BIO_NO_PAGE_REF)) + if (bio_flagged(bio, BIO_PAGE_REFFED)) __bio_release_pages(bio, mark_dirty); } diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 99be590f952f..7daa261f4f98 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -318,7 +318,7 @@ struct bio { * bio flags */ enum { - BIO_NO_PAGE_REF, /* don't put release vec pages */ + BIO_PAGE_REFFED, /* put pages in bio_release_pages() */ BIO_CLONED, /* doesn't own data */ BIO_BOUNCED, /* bio is a bounce bio */ BIO_QUIET, /* Make BIO Quiet */ From patchwork Thu Feb 9 10:29:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 54850 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp256029wrn; Thu, 9 Feb 2023 02:34:50 -0800 (PST) X-Google-Smtp-Source: AK7set9ureewEk7/XJK+BiCthP6XgPjkES30CTkVtKRdQCyU/xZRoGZpn7C8gWdmTIryborrbjvJ X-Received: by 2002:a05:6a00:148e:b0:594:1f1c:3d3a with SMTP id v14-20020a056a00148e00b005941f1c3d3amr5632879pfu.15.1675938889762; Thu, 09 Feb 2023 02:34:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675938889; cv=none; d=google.com; s=arc-20160816; b=NRZzZi4iE2MAmGnxqhUCnjpoBXK6qkdXloF6tAtBKbYgEYZIE0wPGME4ONWLUOu/u0 lCD786yfbvWfjU5p1jBTlb2mTsOGT2BZjuTB3FGm1bKg9fupSmR5VvFd4HRUfapBto/f 7urPEgVGVmX+XyPZhdWNXJGTK6k7GqDhoNpR0hLMXw/QEimHrucdXqKp2IOUHxXwFDXf T2xzFvyzQ0Ny8efSdRXP62CY6P+JSQTPWTZf+bNG5DnIYMB1+cpondqgr/cbE9J7ZxsH bY8uRmwhdzFVkpj4UEJBIA1nNuXzi3t0gdg4FO0DcbZR4VlpdYMS3dpkuLgH0wRZs0Ln ikcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=GZNnQ4PFOAio/M36ha2lSqPmGocJT5Xn1bbdC3s23DY=; b=j5KdU6sEJs11CO1w8SMZUM+DJeq+ezRY942V/TT/AAvdNW0EQ5TAWuFPqAseuEP/Te +wsQJ7Sfd7pvZdDNh1pGJOJCaUajSOLfxuozLG1yzOmndY6x3ej6mwu8dx7AQtgonz2x 72ppuNTFZoeU/M+oO+A3biOwcPiMoPjB4q6b7a5hEGJVL+ZVsWlLzGtVolEKPRD5ryV2 o/Jwqvk6zeQ58W9D6vyaIMlGVj9FLegqKo8xsjkpj8nyp4PRzVNE/5U1ZO9t1pdV7U5v GwxTEotXhc5bZY9IjtWnTQ5LqyBt4l3p9Wd8bLftw1w20jc76z07rsJKeraYfC/dznt0 B9KQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=EUHfI6FY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e17-20020aa798d1000000b0058dc3db12fasi1375388pfm.351.2023.02.09.02.34.37; Thu, 09 Feb 2023 02:34:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=EUHfI6FY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229865AbjBIKdZ (ORCPT + 99 others); Thu, 9 Feb 2023 05:33:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51286 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230190AbjBIKbs (ORCPT ); Thu, 9 Feb 2023 05:31:48 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B2C8432E58 for ; Thu, 9 Feb 2023 02:30:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675938633; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GZNnQ4PFOAio/M36ha2lSqPmGocJT5Xn1bbdC3s23DY=; b=EUHfI6FYp7qX8eHOvJ1fLmLXh32umLutNx/t/iqqOckaF/BH0mUG34xLznkbNbpGYO4b7Z bmrPjWQUD2kMbV2NH3rcK3ZcO3XiNSYiZ1kS2ySyj2SzetsqbYPKYAmU/7U6CDbRWthQNf nmrEsr8SgECfcaCQ9PI22bGxJQXAzWk= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-528-NoFlqzZSNlqsj-nOj8RXSw-1; Thu, 09 Feb 2023 05:30:26 -0500 X-MC-Unique: NoFlqzZSNlqsj-nOj8RXSw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 848DD85D06D; Thu, 9 Feb 2023 10:30:25 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8B36F140EBF6; Thu, 9 Feb 2023 10:30:23 +0000 (UTC) From: David Howells To: Jens Axboe , Al Viro , Christoph Hellwig Cc: David Howells , Matthew Wilcox , Jan Kara , Jeff Layton , David Hildenbrand , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Christoph Hellwig , John Hubbard Subject: [PATCH v13 10/12] block: Add BIO_PAGE_PINNED and associated infrastructure Date: Thu, 9 Feb 2023 10:29:52 +0000 Message-Id: <20230209102954.528942-11-dhowells@redhat.com> In-Reply-To: <20230209102954.528942-1-dhowells@redhat.com> References: <20230209102954.528942-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757349297285312934?= X-GMAIL-MSGID: =?utf-8?q?1757349297285312934?= Add BIO_PAGE_PINNED to indicate that the pages in a bio are pinned (FOLL_PIN) and that the pin will need removing. Signed-off-by: David Howells Reviewed-by: Christoph Hellwig Reviewed-by: John Hubbard cc: Al Viro cc: Jens Axboe cc: Jan Kara cc: Matthew Wilcox cc: Logan Gunthorpe cc: linux-block@vger.kernel.org --- Notes: ver #10) - Drop bio_set_cleanup_mode(), open coding it instead. ver #9) - Only consider pinning in bio_set_cleanup_mode(). Ref'ing pages in struct bio is going away. - page_put_unpin() is removed; call unpin_user_page() and put_page() directly. - Use bio_release_page() in __bio_release_pages(). - BIO_PAGE_PINNED and BIO_PAGE_REFFED can't both be set, so use if-else when testing both of them. ver #8) - Move the infrastructure to clean up pinned pages to this patch [hch]. - Put BIO_PAGE_PINNED before BIO_PAGE_REFFED as the latter should probably be removed at some point. FOLL_PIN can then be renumbered first. block/bio.c | 6 +++--- block/blk.h | 12 ++++++++++++ include/linux/bio.h | 3 ++- include/linux/blk_types.h | 1 + 4 files changed, 18 insertions(+), 4 deletions(-) diff --git a/block/bio.c b/block/bio.c index bf9bf53232be..547e38883934 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1176,7 +1176,7 @@ void __bio_release_pages(struct bio *bio, bool mark_dirty) bio_for_each_segment_all(bvec, bio, iter_all) { if (mark_dirty && !PageCompound(bvec->bv_page)) set_page_dirty_lock(bvec->bv_page); - put_page(bvec->bv_page); + bio_release_page(bio, bvec->bv_page); } } EXPORT_SYMBOL_GPL(__bio_release_pages); @@ -1496,8 +1496,8 @@ void bio_set_pages_dirty(struct bio *bio) * the BIO and re-dirty the pages in process context. * * It is expected that bio_check_pages_dirty() will wholly own the BIO from - * here on. It will run one put_page() against each page and will run one - * bio_put() against the BIO. + * here on. It will unpin each page and will run one bio_put() against the + * BIO. */ static void bio_dirty_fn(struct work_struct *work); diff --git a/block/blk.h b/block/blk.h index 4c3b3325219a..f02381405311 100644 --- a/block/blk.h +++ b/block/blk.h @@ -425,6 +425,18 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio, struct page *page, unsigned int len, unsigned int offset, unsigned int max_sectors, bool *same_page); +/* + * Clean up a page appropriately, where the page may be pinned, may have a + * ref taken on it or neither. + */ +static inline void bio_release_page(struct bio *bio, struct page *page) +{ + if (bio_flagged(bio, BIO_PAGE_PINNED)) + unpin_user_page(page); + else if (bio_flagged(bio, BIO_PAGE_REFFED)) + put_page(page); +} + struct request_queue *blk_alloc_queue(int node_id); int disk_scan_partitions(struct gendisk *disk, fmode_t mode, void *owner); diff --git a/include/linux/bio.h b/include/linux/bio.h index 805957c99147..b2c09997d79c 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -484,7 +484,8 @@ void zero_fill_bio(struct bio *bio); static inline void bio_release_pages(struct bio *bio, bool mark_dirty) { - if (bio_flagged(bio, BIO_PAGE_REFFED)) + if (bio_flagged(bio, BIO_PAGE_REFFED) || + bio_flagged(bio, BIO_PAGE_PINNED)) __bio_release_pages(bio, mark_dirty); } diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 7daa261f4f98..a0e339ff3d09 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -318,6 +318,7 @@ struct bio { * bio flags */ enum { + BIO_PAGE_PINNED, /* Unpin pages in bio_release_pages() */ BIO_PAGE_REFFED, /* put pages in bio_release_pages() */ BIO_CLONED, /* doesn't own data */ BIO_BOUNCED, /* bio is a bounce bio */ From patchwork Thu Feb 9 10:29:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 54853 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp256251wrn; Thu, 9 Feb 2023 02:35:22 -0800 (PST) X-Google-Smtp-Source: AK7set9iL8HtNawsOkIu6ffBUmdQ/kN92P3TKKGYTdXCh/fAne3ioOGcsi4G0gi26V3gIlCo4WFN X-Received: by 2002:a05:6a20:669e:b0:be:ea27:3c24 with SMTP id o30-20020a056a20669e00b000beea273c24mr10361793pzh.41.1675938922455; Thu, 09 Feb 2023 02:35:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675938922; cv=none; d=google.com; s=arc-20160816; b=VaXvKJptZveuE5p4ocOhMiz5dss5OXV7Lvrbob8CIRlQXMIuFKjxq2amz/cahjfl/V ZMKvA9gQZpK9bNBRBxUbmc/tJ054V9jHvLn5j5JMm9VHuFpFGHzyVW5+HihMbf/CCj9s Vly0kQn9BFRUaOigl+SDK/6IfBngLx4QjXWQm/zxkgYoGiz0NMhB4F0bgVd1pi2Q3h4r +GCqUuoB1b0WWQAhX1wk4h95I3pVFvcAldFjWZw/DbsAjpHxmlcgvQIPsNnQi3lWB/Xh JFWp5tKLiW3qcwdCdvufvq+2PFXpu3IVK1/mesDXgdFL9hLSIKleMKW5v/q6qufNZBIr J0Kg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=YX3eOMnEnHmNO2fsUm7QGJ9rpDZ86GbDPShQEUjns7k=; b=FBZftgWvZ6jFwUBG8wZOUvq4J3iRTf1GJXP1Arzl3DqR4/mWTwdpdb3uIFehL/QNLA i0seVjyWabAuH993/Dk0sNRkgkA46sm+v+DugsY4hCsSDXLA4Ot/L+kX/mbRbMa/HNH4 somMMycJeJ8HaTBsD1vmPP2aOQJtX5cH1CepUpM92bCYV7I0elPE4IUEdsjlc17jx1YK Vmo44O1XDAdkrva8/ESdFPR4smeK3mYLJ3yypSbal/ycHyFGIwTm8zxsEEGyE6UKZWLR 370iB8vtNMi0YLCvwDy/7qeJvAops9tb1zLlHdc6cGvdBGXVlcAZu1pT9Fw+k8BSPtM+ uJnQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=UwNpya8K; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n25-20020a637219000000b004e493d57bd3si1473751pgc.543.2023.02.09.02.35.09; Thu, 09 Feb 2023 02:35:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=UwNpya8K; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230002AbjBIKdt (ORCPT + 99 others); Thu, 9 Feb 2023 05:33:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52020 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229689AbjBIKc3 (ORCPT ); Thu, 9 Feb 2023 05:32:29 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2995934C2E for ; Thu, 9 Feb 2023 02:30:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675938635; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YX3eOMnEnHmNO2fsUm7QGJ9rpDZ86GbDPShQEUjns7k=; b=UwNpya8KEL5/ZFtVj1/3+5HoUmubVFIlb1nudzrivlPHIV9Oki1Ym9CSOsy74+xP7lMclo d0pWgHa1YH90mc9lj/JGRplCK9xFPAq2ZPuL/hdvHZAUWCOKNw9/2GU8tDMr+gZxcIWMhh jgsKWwjeQw6C9atZWOZQtJxlNmTnZAI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-331-cAnVFdLZOYyXeog1AF3Bew-1; Thu, 09 Feb 2023 05:30:30 -0500 X-MC-Unique: cAnVFdLZOYyXeog1AF3Bew-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0FFD885D064; Thu, 9 Feb 2023 10:30:28 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 207EC1121315; Thu, 9 Feb 2023 10:30:26 +0000 (UTC) From: David Howells To: Jens Axboe , Al Viro , Christoph Hellwig Cc: David Howells , Matthew Wilcox , Jan Kara , Jeff Layton , David Hildenbrand , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Christoph Hellwig , John Hubbard Subject: [PATCH v13 11/12] block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages Date: Thu, 9 Feb 2023 10:29:53 +0000 Message-Id: <20230209102954.528942-12-dhowells@redhat.com> In-Reply-To: <20230209102954.528942-1-dhowells@redhat.com> References: <20230209102954.528942-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757349331490429660?= X-GMAIL-MSGID: =?utf-8?q?1757349331490429660?= This will pin pages or leave them unaltered rather than getting a ref on them as appropriate to the iterator. The pages need to be pinned for DIO rather than having refs taken on them to prevent VM copy-on-write from malfunctioning during a concurrent fork() (the result of the I/O could otherwise end up being affected by/visible to the child process). Signed-off-by: David Howells Reviewed-by: Christoph Hellwig Reviewed-by: John Hubbard cc: Al Viro cc: Jens Axboe cc: Jan Kara cc: Matthew Wilcox cc: Logan Gunthorpe cc: linux-block@vger.kernel.org --- Notes: ver #10) - Drop bio_set_cleanup_mode(), open coding it instead. ver #8) - Split the patch up a bit [hch]. - We should only be using pinned/non-pinned pages and not ref'd pages, so adjust the comments appropriately. ver #7) - Don't treat BIO_PAGE_REFFED/PINNED as being the same as FOLL_GET/PIN. ver #5) - Transcribe the FOLL_* flags returned by iov_iter_extract_pages() to BIO_* flags and got rid of bi_cleanup_mode. - Replaced BIO_NO_PAGE_REF to BIO_PAGE_REFFED in the preceding patch. block/bio.c | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/block/bio.c b/block/bio.c index 547e38883934..fc57f0aa098e 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1212,7 +1212,7 @@ static int bio_iov_add_page(struct bio *bio, struct page *page, } if (same_page) - put_page(page); + bio_release_page(bio, page); return 0; } @@ -1226,7 +1226,7 @@ static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page, queue_max_zone_append_sectors(q), &same_page) != len) return -EINVAL; if (same_page) - put_page(page); + bio_release_page(bio, page); return 0; } @@ -1237,10 +1237,10 @@ static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page, * @bio: bio to add pages to * @iter: iov iterator describing the region to be mapped * - * Pins pages from *iter and appends them to @bio's bvec array. The - * pages will have to be released using put_page() when done. - * For multi-segment *iter, this function only adds pages from the - * next non-empty segment of the iov iterator. + * Extracts pages from *iter and appends them to @bio's bvec array. The pages + * will have to be cleaned up in the way indicated by the BIO_PAGE_PINNED flag. + * For a multi-segment *iter, this function only adds pages from the next + * non-empty segment of the iov iterator. */ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) { @@ -1272,9 +1272,9 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) * result to ensure the bio's total size is correct. The remainder of * the iov data will be picked up in the next bio iteration. */ - size = iov_iter_get_pages(iter, pages, - UINT_MAX - bio->bi_iter.bi_size, - nr_pages, &offset, extraction_flags); + size = iov_iter_extract_pages(iter, &pages, + UINT_MAX - bio->bi_iter.bi_size, + nr_pages, extraction_flags, &offset); if (unlikely(size <= 0)) return size ? size : -EFAULT; @@ -1307,7 +1307,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) iov_iter_revert(iter, left); out: while (i < nr_pages) - put_page(pages[i++]); + bio_release_page(bio, pages[i++]); return ret; } @@ -1342,7 +1342,8 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) return 0; } - bio_set_flag(bio, BIO_PAGE_REFFED); + if (iov_iter_extract_will_pin(iter)) + bio_set_flag(bio, BIO_PAGE_PINNED); do { ret = __bio_iov_iter_get_pages(bio, iter); } while (!ret && iov_iter_count(iter) && !bio_full(bio, 0)); From patchwork Thu Feb 9 10:29:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 54852 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp256114wrn; Thu, 9 Feb 2023 02:35:01 -0800 (PST) X-Google-Smtp-Source: AK7set/1MqaON9wkHMxA5KpJ5q9nfw2HyDOhGP+5BT/0p59JUtraMVb6XB9t7slafnu7+TG7JmWE X-Received: by 2002:a05:6a21:3397:b0:c2:e30c:9be5 with SMTP id yy23-20020a056a21339700b000c2e30c9be5mr9262091pzb.12.1675938901000; Thu, 09 Feb 2023 02:35:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675938900; cv=none; d=google.com; s=arc-20160816; b=MCynh0q0Qp1EbXDYL8Qs27Niz279phpWXOJKNqxjqHQ4yDCCCMDqN/GFFR4MOKk5Pz 7LSR/VegiS3ciUq6LCeiz2cj5R0EtqxEuk2dnGb/s4e2a7ob3mbtY64sglp7HEdKe79W gTIllb0y5PDMSlR7p+U6ImOuHtuHwq6rwvnn52qMmSayJeHQ8m59Ked/HYjneWWhhXGN rQzY0e7DORrq+UidGk6rcjNVILKgi7YG8clT1fd7EbEtqrYKLLUMT5A1EfSezLj21n4M BIQhA4MxEubHjHkb5T8SLgvPHhKhn/GGxuls0mAnY25hplPgX2kES6ycgC+/UvIgAWsw Y1kA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=7RYncx3d2QCFno76YxQ26A3gARIW2WsBk/g5n5z32ew=; b=CP2Kum0GWrn6cye1yV1npm7Qtt68lli375nTIu6Od8+eWqKNjXgYSSc5ptC/Qr9Q7R d5Wl5Z9M/z8nYeiUHDvqezKkvnGDUZSIOD6pp2zV9eFWOAxDfl+LiDdiHPOZa6Nn5T6F X61EHiUzMMQT+k1Y/zK0ODAHz/e+/xWHYbiELi1VPI7NKOBBSu2O+yXRkFV97ZmLnwyi Qlksuf5CMiYMfqXrYNk0b3pnRfEQ1ZzwGMjYcnw+EpqJbliYyAd5gsaQ5PyQvoNTOcTr e1aQCraD0cp75ITjqPJu8xJPQppTzlT17D6eOoTz2LzVq5PFqnO/7AtQSCXtnLLipmk3 GwZw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=KOKZsLzY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m23-20020a637117000000b004f1bfbf42d0si1812659pgc.135.2023.02.09.02.34.48; Thu, 09 Feb 2023 02:35:00 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=KOKZsLzY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229911AbjBIKdn (ORCPT + 99 others); Thu, 9 Feb 2023 05:33:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52000 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229881AbjBIKc2 (ORCPT ); Thu, 9 Feb 2023 05:32:28 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A203B34C23 for ; Thu, 9 Feb 2023 02:30:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675938634; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7RYncx3d2QCFno76YxQ26A3gARIW2WsBk/g5n5z32ew=; b=KOKZsLzYA9YeQE+dygKUxbWFT3v3jX/jRXQXg9MNgGNuQrBAjjCwJLawHvwI7OhV+7kVNH efFAJ+DVfU8ilEPk1fz+6Xeo/JG1B4MiiiYaTtjUdb69yOxRaz7497Ws8HrdPzFpaG7xya 0J0WG38V/bZt2v7U8KrVFtfqGXkvYgA= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-657-CE4ph_C5NsGWDH9cPINp8A-1; Thu, 09 Feb 2023 05:30:31 -0500 X-MC-Unique: CE4ph_C5NsGWDH9cPINp8A-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B5B3385D062; Thu, 9 Feb 2023 10:30:30 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id C4D75140EBF6; Thu, 9 Feb 2023 10:30:28 +0000 (UTC) From: David Howells To: Jens Axboe , Al Viro , Christoph Hellwig Cc: David Howells , Matthew Wilcox , Jan Kara , Jeff Layton , David Hildenbrand , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Christoph Hellwig , John Hubbard Subject: [PATCH v13 12/12] block: convert bio_map_user_iov to use iov_iter_extract_pages Date: Thu, 9 Feb 2023 10:29:54 +0000 Message-Id: <20230209102954.528942-13-dhowells@redhat.com> In-Reply-To: <20230209102954.528942-1-dhowells@redhat.com> References: <20230209102954.528942-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757349308694331123?= X-GMAIL-MSGID: =?utf-8?q?1757349308694331123?= This will pin pages or leave them unaltered rather than getting a ref on them as appropriate to the iterator. The pages need to be pinned for DIO rather than having refs taken on them to prevent VM copy-on-write from malfunctioning during a concurrent fork() (the result of the I/O could otherwise end up being visible to/affected by the child process). Signed-off-by: David Howells Reviewed-by: Christoph Hellwig Reviewed-by: John Hubbard cc: Al Viro cc: Jens Axboe cc: Jan Kara cc: Matthew Wilcox cc: Logan Gunthorpe cc: linux-block@vger.kernel.org --- Notes: ver #10) - Drop bio_set_cleanup_mode(), open coding it instead. ver #8) - Split the patch up a bit [hch]. - We should only be using pinned/non-pinned pages and not ref'd pages, so adjust the comments appropriately. ver #7) - Don't treat BIO_PAGE_REFFED/PINNED as being the same as FOLL_GET/PIN. ver #5) - Transcribe the FOLL_* flags returned by iov_iter_extract_pages() to BIO_* flags and got rid of bi_cleanup_mode. - Replaced BIO_NO_PAGE_REF to BIO_PAGE_REFFED in the preceding patch. block/blk-map.c | 23 +++++++++++------------ 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/block/blk-map.c b/block/blk-map.c index f1f70b50388d..0f1593e144da 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -281,22 +281,21 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter, if (blk_queue_pci_p2pdma(rq->q)) extraction_flags |= ITER_ALLOW_P2PDMA; + if (iov_iter_extract_will_pin(iter)) + bio_set_flag(bio, BIO_PAGE_PINNED); - bio_set_flag(bio, BIO_PAGE_REFFED); while (iov_iter_count(iter)) { - struct page **pages, *stack_pages[UIO_FASTIOV]; + struct page *stack_pages[UIO_FASTIOV]; + struct page **pages = stack_pages; ssize_t bytes; size_t offs; int npages; - if (nr_vecs <= ARRAY_SIZE(stack_pages)) { - pages = stack_pages; - bytes = iov_iter_get_pages(iter, pages, LONG_MAX, - nr_vecs, &offs, extraction_flags); - } else { - bytes = iov_iter_get_pages_alloc(iter, &pages, - LONG_MAX, &offs, extraction_flags); - } + if (nr_vecs > ARRAY_SIZE(stack_pages)) + pages = NULL; + + bytes = iov_iter_extract_pages(iter, &pages, LONG_MAX, + nr_vecs, extraction_flags, &offs); if (unlikely(bytes <= 0)) { ret = bytes ? bytes : -EFAULT; goto out_unmap; @@ -318,7 +317,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter, if (!bio_add_hw_page(rq->q, bio, page, n, offs, max_sectors, &same_page)) { if (same_page) - put_page(page); + bio_release_page(bio, page); break; } @@ -330,7 +329,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter, * release the pages we didn't map into the bio, if any */ while (j < npages) - put_page(pages[j++]); + bio_release_page(bio, pages[j++]); if (pages != stack_pages) kvfree(pages); /* couldn't stuff something into bio? */