From patchwork Thu Feb 16 21:47:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 58242 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp546333wrn; Thu, 16 Feb 2023 13:50:00 -0800 (PST) X-Google-Smtp-Source: AK7set+K0x7bkEL1fFB4+4EkJWAG5DE+Arnw85X1t6MY5dvk14Mf/iRUWyVUrGorDVPcThebICx4 X-Received: by 2002:a17:906:2998:b0:8b1:2c37:ae97 with SMTP id x24-20020a170906299800b008b12c37ae97mr7486921eje.43.1676584200163; Thu, 16 Feb 2023 13:50:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676584200; cv=none; d=google.com; s=arc-20160816; b=d10Gq716eRNxAlnOsWqdlT0GLU6sXVBly83jYS1I5fGOKR0VlWO1BQWzYqIldYIovh UzBGJsBxQiYq9eOfKs289jK2ISeOCEB99QTUnLS1nxKj02B17KS4n18UZtmm198hn3Nc iflkPgoV41yooRvOllHC6P6xE3DQ8WWMPMYPkJP2HhzYAo/tGWTOXBdFdyr3nGgGi3Vf x15STa5qlodiKjCke4ENe+4jatcHo2LTzVkN+wLp/CX0fQ/aifyHeBr1avErtSOD4593 MMfGz2Bxbhxx7WTH7COcwsh7H/uNNwBuPSzx76kmoO/Y3yuYYkjC3NTCFtuZFBhRpSmG iOYA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=iluk0YvlYbmqrBsJ+zi3CUpWIoo6vLBgAkY0zGemXLc=; b=YiMBFHcw9qNfrSOxSUiqSVDiGTNmqyYlSJwQBTI4Ef8LTP1efO+pZp2WxRik/lB4bm 7L6eqaMEWcIpzbH+7sQPrW6RA6uhKPubCmCdbziUIbvAEa66O3L9S0hsqFyYNlaVRli1 zHPnOheRYre/vWHSIK4q11pn8v6GpFIE6sjsfvqE4bAKpdmaywEkpEflBngxcBYEBKOl 6HcKl8SkG0PiikWo9NYnrw494qmh2AaN1E3JR2fYalFEFC1m+K1CZyln3B6loQtJvurv huGy7wwQWCpx2MLuLfIYl38RRaTxAW0+aZ3kRKXzjpJBYvGnU2JwMqHFAYz7WbpcztZb lVrg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=OYVIC3QP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gj21-20020a170907741500b008b144330158si2973522ejc.996.2023.02.16.13.49.36; Thu, 16 Feb 2023 13:50:00 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=OYVIC3QP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230123AbjBPVtT (ORCPT + 99 others); Thu, 16 Feb 2023 16:49:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36274 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230083AbjBPVtR (ORCPT ); Thu, 16 Feb 2023 16:49:17 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 876C438012 for ; Thu, 16 Feb 2023 13:48:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676584096; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iluk0YvlYbmqrBsJ+zi3CUpWIoo6vLBgAkY0zGemXLc=; b=OYVIC3QPnpIrBFnTVRH7k6xzc3udEczB/F9eohImuzBVepMHrjOqQFV0UNEcKrLSwXgPbn 70bSLXwlS+s/VMzPiEkhR9hnDMalepYxmJ44aAtWJnUbeyr8YPdEuDwuWs+bqUpzSE4tKX KdEbHV7i5fUkWD3R11Z+SPShxOsnZwk= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-668-Jom6SA2lN1mnPzx0-IYKGQ-1; Thu, 16 Feb 2023 16:48:11 -0500 X-MC-Unique: Jom6SA2lN1mnPzx0-IYKGQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 774593C0F686; Thu, 16 Feb 2023 21:48:10 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 81E7D401014C; Thu, 16 Feb 2023 21:47:50 +0000 (UTC) From: David Howells To: Steve French Cc: David Howells , Jens Axboe , Al Viro , Shyam Prasad N , Rohith Surabattula , Tom Talpey , Stefan Metzmacher , Christoph Hellwig , Matthew Wilcox , Jeff Layton , linux-cifs@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Christoph Hellwig , David Hildenbrand , John Hubbard Subject: [PATCH 01/17] mm: Pass info, not iter, into filemap_get_pages() Date: Thu, 16 Feb 2023 21:47:29 +0000 Message-Id: <20230216214745.3985496-2-dhowells@redhat.com> In-Reply-To: <20230216214745.3985496-1-dhowells@redhat.com> References: <20230216214745.3985496-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758025954151871880?= X-GMAIL-MSGID: =?utf-8?q?1758025954151871880?= filemap_get_pages() and a number of functions that it calls take an iterator to provide two things: the number of bytes to be got from the file specified and whether partially uptodate pages are allowed. Change these functions so that this information is passed in directly. This allows it to be called without having an iterator to hand. Signed-off-by: David Howells Reviewed-by: Christoph Hellwig Reviewed-by: Jens Axboe cc: Christoph Hellwig cc: Matthew Wilcox cc: Al Viro cc: David Hildenbrand cc: John Hubbard cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org --- mm/filemap.c | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index c4d4ace9cc70..876e77278d2a 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2440,21 +2440,19 @@ static int filemap_read_folio(struct file *file, filler_t filler, } static bool filemap_range_uptodate(struct address_space *mapping, - loff_t pos, struct iov_iter *iter, struct folio *folio) + loff_t pos, size_t count, struct folio *folio, + bool need_uptodate) { - int count; - if (folio_test_uptodate(folio)) return true; /* pipes can't handle partially uptodate pages */ - if (iov_iter_is_pipe(iter)) + if (need_uptodate) return false; if (!mapping->a_ops->is_partially_uptodate) return false; if (mapping->host->i_blkbits >= folio_shift(folio)) return false; - count = iter->count; if (folio_pos(folio) > pos) { count -= folio_pos(folio) - pos; pos = 0; @@ -2466,8 +2464,8 @@ static bool filemap_range_uptodate(struct address_space *mapping, } static int filemap_update_page(struct kiocb *iocb, - struct address_space *mapping, struct iov_iter *iter, - struct folio *folio) + struct address_space *mapping, size_t count, + struct folio *folio, bool need_uptodate) { int error; @@ -2501,7 +2499,8 @@ static int filemap_update_page(struct kiocb *iocb, goto unlock; error = 0; - if (filemap_range_uptodate(mapping, iocb->ki_pos, iter, folio)) + if (filemap_range_uptodate(mapping, iocb->ki_pos, count, folio, + need_uptodate)) goto unlock; error = -EAGAIN; @@ -2577,8 +2576,8 @@ static int filemap_readahead(struct kiocb *iocb, struct file *file, return 0; } -static int filemap_get_pages(struct kiocb *iocb, struct iov_iter *iter, - struct folio_batch *fbatch) +static int filemap_get_pages(struct kiocb *iocb, size_t count, + struct folio_batch *fbatch, bool need_uptodate) { struct file *filp = iocb->ki_filp; struct address_space *mapping = filp->f_mapping; @@ -2588,7 +2587,7 @@ static int filemap_get_pages(struct kiocb *iocb, struct iov_iter *iter, struct folio *folio; int err = 0; - last_index = DIV_ROUND_UP(iocb->ki_pos + iter->count, PAGE_SIZE); + last_index = DIV_ROUND_UP(iocb->ki_pos + count, PAGE_SIZE); retry: if (fatal_signal_pending(current)) return -EINTR; @@ -2621,7 +2620,8 @@ static int filemap_get_pages(struct kiocb *iocb, struct iov_iter *iter, if ((iocb->ki_flags & IOCB_WAITQ) && folio_batch_count(fbatch) > 1) iocb->ki_flags |= IOCB_NOWAIT; - err = filemap_update_page(iocb, mapping, iter, folio); + err = filemap_update_page(iocb, mapping, count, folio, + need_uptodate); if (err) goto err; } @@ -2691,7 +2691,8 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter, if (unlikely(iocb->ki_pos >= i_size_read(inode))) break; - error = filemap_get_pages(iocb, iter, &fbatch); + error = filemap_get_pages(iocb, iter->count, &fbatch, + iov_iter_is_pipe(iter)); if (error < 0) break; From patchwork Thu Feb 16 21:47:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 58246 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp546574wrn; Thu, 16 Feb 2023 13:50:46 -0800 (PST) X-Google-Smtp-Source: AK7set+JoKYWEcA4JEuongqtokgrIuh3cKvSntUHiLKRKxG/PZ7pDtSTM7hpj05Ah1P9IOWzabnf X-Received: by 2002:a17:906:6686:b0:87d:ed60:cb7e with SMTP id z6-20020a170906668600b0087ded60cb7emr7419170ejo.68.1676584246324; Thu, 16 Feb 2023 13:50:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676584246; cv=none; d=google.com; s=arc-20160816; b=qoP9uo/7Pr+nIDo1ih6Pr4qr2dnDjVQm1RlB8PD1s04Hk+rXXZS5nJiQsNbC8N8fk1 f8n/avuB6KfaHknc/zVLi5xxNcqpQcSfDXOt0ZFVOXKo0P6a5goEQ8x86Ij8valIeY0O 2YAPNf9pK4WeNY5cPJ2LBsKsKiPOG1eSjyFpzjZ9vQEfuSJ58V1O77L+GnuHlrmlYUcr J34OpoJPqSQzRyQNyCUEdZNus9fiBCK7QsVStaqkYLu3HMLM8m9OTZeRJyA9LAWqnoZI j0PLTYWOHim5CeLTT3M5t0vnswfwWUSs21eD+e8mGZL2fb7AU0KqbBLE0PL8B7M8vnFM QB8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=aTLk2eEjUUGfJZpasC0aeUIJz1OFiWbQqwHM/iVEKgQ=; b=bT92zmyiQ347oVU73vtZgBR2PGXGHrk+C+JQ+ewN4RpGdk2ncKard0Wg3mEC8aawL7 l+WPdEAK8YNGCCNgW1ugA9GceFaiypDAMToJnC+v7yV4VCzVVNg9c+5XrKUd9xFkED8e JSCCwC2gOl+B+Qy36AcVoVFsKCOuisk4dydiyXpS6JSXbq66xPz2wugIK27wyBcX4wAx voUU+MJ6azj/HCS3q0bA+Gr5MwxcK6GPZ9AMw4MhPYNgDcrevNn0PJaQd3x6tFuOo3Qt BZ4LL6aHvD/5ib05qQWG8YQKf6CQ/iV9pOeuJyAOzz7jPIQtlzDtq4tIgFtsPbRLY5Gi l9aQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="Z/9fqueI"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v22-20020a17090651d600b008b14b6437easi3478018ejk.771.2023.02.16.13.50.22; Thu, 16 Feb 2023 13:50:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="Z/9fqueI"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230286AbjBPVtt (ORCPT + 99 others); Thu, 16 Feb 2023 16:49:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36418 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230238AbjBPVt2 (ORCPT ); Thu, 16 Feb 2023 16:49:28 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1500043464 for ; Thu, 16 Feb 2023 13:48:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676584110; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aTLk2eEjUUGfJZpasC0aeUIJz1OFiWbQqwHM/iVEKgQ=; b=Z/9fqueIA82bb/Hvq1DBniyB7tuqC+jiIWQ7T6HcInZv0A4G/lGtC3W9D8WzX4gorYsMqH Ivw76a92RlX+46bzKWa5Jqa4bGBGdbyu5raaARCft8QMNhWChUbhGFNSkrKPlfSz3G8mgo hCHY16RH/PRhdAbaLo6BKwylk12ORUE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-460-PfIjLxiFPsiEdCok_9z79A-1; Thu, 16 Feb 2023 16:48:14 -0500 X-MC-Unique: PfIjLxiFPsiEdCok_9z79A-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 51530811E9C; Thu, 16 Feb 2023 21:48:13 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3518A492B10; Thu, 16 Feb 2023 21:48:11 +0000 (UTC) From: David Howells To: Steve French Cc: David Howells , Jens Axboe , Al Viro , Shyam Prasad N , Rohith Surabattula , Tom Talpey , Stefan Metzmacher , Christoph Hellwig , Matthew Wilcox , Jeff Layton , linux-cifs@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Christoph Hellwig , David Hildenbrand , John Hubbard Subject: [PATCH 02/17] splice: Add a func to do a splice from a buffered file without ITER_PIPE Date: Thu, 16 Feb 2023 21:47:30 +0000 Message-Id: <20230216214745.3985496-3-dhowells@redhat.com> In-Reply-To: <20230216214745.3985496-1-dhowells@redhat.com> References: <20230216214745.3985496-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758026003123628004?= X-GMAIL-MSGID: =?utf-8?q?1758026003123628004?= Provide a function to do splice read from a buffered file, pulling the folios out of the pagecache directly by calling filemap_get_pages() to do any required reading and then pasting the returned folios into the pipe. A helper function is provided to do the actual folio pasting and will handle multipage folios by splicing as many of the relevant subpages as will fit into the pipe. The code is loosely based on filemap_read() and might belong in mm/filemap.c with that as it needs to use filemap_get_pages(). Signed-off-by: David Howells Reviewed-by: Jens Axboe cc: Christoph Hellwig cc: Al Viro cc: David Hildenbrand cc: John Hubbard cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org --- include/linux/fs.h | 3 ++ mm/filemap.c | 128 +++++++++++++++++++++++++++++++++++++++++++++ mm/internal.h | 6 +++ 3 files changed, 137 insertions(+) diff --git a/include/linux/fs.h b/include/linux/fs.h index c1769a2c5d70..28743e38df91 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3163,6 +3163,9 @@ ssize_t vfs_iocb_iter_write(struct file *file, struct kiocb *iocb, struct iov_iter *iter); /* fs/splice.c */ +ssize_t filemap_splice_read(struct file *in, loff_t *ppos, + struct pipe_inode_info *pipe, + size_t len, unsigned int flags); extern ssize_t generic_file_splice_read(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int); extern ssize_t iter_file_splice_write(struct pipe_inode_info *, diff --git a/mm/filemap.c b/mm/filemap.c index 876e77278d2a..8c7b135c8e23 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -42,6 +42,8 @@ #include #include #include +#include +#include #include #include #include "internal.h" @@ -2842,6 +2844,132 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) } EXPORT_SYMBOL(generic_file_read_iter); +/* + * Splice subpages from a folio into a pipe. + */ +size_t splice_folio_into_pipe(struct pipe_inode_info *pipe, + struct folio *folio, loff_t fpos, size_t size) +{ + struct page *page; + size_t spliced = 0, offset = offset_in_folio(folio, fpos); + + page = folio_page(folio, offset / PAGE_SIZE); + size = min(size, folio_size(folio) - offset); + offset %= PAGE_SIZE; + + while (spliced < size && + !pipe_full(pipe->head, pipe->tail, pipe->max_usage)) { + struct pipe_buffer *buf = pipe_head_buf(pipe); + size_t part = min_t(size_t, PAGE_SIZE - offset, size - spliced); + + *buf = (struct pipe_buffer) { + .ops = &page_cache_pipe_buf_ops, + .page = page, + .offset = offset, + .len = part, + }; + folio_get(folio); + pipe->head++; + page++; + spliced += part; + offset = 0; + } + + return spliced; +} + +/* + * Splice folios from the pagecache of a buffered (ie. non-O_DIRECT) file into + * a pipe. + */ +ssize_t filemap_splice_read(struct file *in, loff_t *ppos, + struct pipe_inode_info *pipe, + size_t len, unsigned int flags) +{ + struct folio_batch fbatch; + struct kiocb iocb; + size_t total_spliced = 0, used, npages; + loff_t isize, end_offset; + bool writably_mapped; + int i, error = 0; + + init_sync_kiocb(&iocb, in); + iocb.ki_pos = *ppos; + + /* Work out how much data we can actually add into the pipe */ + used = pipe_occupancy(pipe->head, pipe->tail); + npages = max_t(ssize_t, pipe->max_usage - used, 0); + len = min_t(size_t, len, npages * PAGE_SIZE); + + folio_batch_init(&fbatch); + + do { + cond_resched(); + + if (*ppos >= i_size_read(file_inode(in))) + break; + + iocb.ki_pos = *ppos; + error = filemap_get_pages(&iocb, len, &fbatch, true); + if (error < 0) + break; + + /* + * i_size must be checked after we know the pages are Uptodate. + * + * Checking i_size after the check allows us to calculate + * the correct value for "nr", which means the zero-filled + * part of the page is not copied back to userspace (unless + * another truncate extends the file - this is desired though). + */ + isize = i_size_read(file_inode(in)); + if (unlikely(*ppos >= isize)) + break; + end_offset = min_t(loff_t, isize, *ppos + len); + + /* + * Once we start copying data, we don't want to be touching any + * cachelines that might be contended: + */ + writably_mapped = mapping_writably_mapped(in->f_mapping); + + for (i = 0; i < folio_batch_count(&fbatch); i++) { + struct folio *folio = fbatch.folios[i]; + size_t n; + + if (folio_pos(folio) >= end_offset) + goto out; + folio_mark_accessed(folio); + + /* + * If users can be writing to this folio using arbitrary + * virtual addresses, take care of potential aliasing + * before reading the folio on the kernel side. + */ + if (writably_mapped) + flush_dcache_folio(folio); + + n = splice_folio_into_pipe(pipe, folio, *ppos, len); + if (!n) + goto out; + len -= n; + total_spliced += n; + *ppos += n; + in->f_ra.prev_pos = *ppos; + if (pipe_full(pipe->head, pipe->tail, pipe->max_usage)) + goto out; + } + + folio_batch_release(&fbatch); + } while (len); + +out: + folio_batch_release(&fbatch); + file_accessed(in); + + return total_spliced ? total_spliced : error; +} + static inline loff_t folio_seek_hole_data(struct xa_state *xas, struct address_space *mapping, struct folio *folio, loff_t start, loff_t end, bool seek_data) diff --git a/mm/internal.h b/mm/internal.h index bcf75a8b032d..6d4ca98f3844 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -794,6 +794,12 @@ struct migration_target_control { gfp_t gfp_mask; }; +/* + * mm/filemap.c + */ +size_t splice_folio_into_pipe(struct pipe_inode_info *pipe, + struct folio *folio, loff_t fpos, size_t size); + /* * mm/vmalloc.c */ From patchwork Thu Feb 16 21:47:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 58244 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp546458wrn; Thu, 16 Feb 2023 13:50:25 -0800 (PST) X-Google-Smtp-Source: AK7set9VeCNV5nN+0IkkJu55dMeASfvBiV4naX62GtI/SVYdKzxQVc2UjIBy9G5UVTVUSYGrFT3V X-Received: by 2002:a17:906:8556:b0:879:ab3:93cd with SMTP id h22-20020a170906855600b008790ab393cdmr7601986ejy.46.1676584225202; Thu, 16 Feb 2023 13:50:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676584225; cv=none; d=google.com; s=arc-20160816; b=o43gMiKStf2U7JjsnK+//nre6OL/KUDHVRglTHy443apNgBBbzCFhoWt7VCZKlseqT 5C01N3j5Gzd6aeztFYNYCcvFfci9Zn2Q901XIESwKJu6tGvzRYGjmTk8ALLJEJM8o5kA hCqtarf+rMcT2araJNwhqTcUgv37zM3UM/7Wg2xEKu83J82ttoJ5rC0QzMCGC/VpCs75 Zhoyqibn1tMvkfiIKYqQZopJmmQnTw2MMihAs5A3FfrxFcbxqfHy67lT/Rx+cVkuyrPX QKXmi5te5rYAnvWswl9kdCJmYfVZQV98x2+q+i2fulRjCzNMWgi9lHeo/8TcvvuamiOQ AleQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=LFENKL+WQlWvhbDc2HxenGINBq9+lQWBR5ko6ef/r4M=; b=O7QSB6jz/9HLOsLEuTqw153SLFD5cPB6xSDIoJUAUamwW6GdncftZmqxKnPZR5iJz1 zIAi79A/CnWybcJLJvgr/ekWip21V3pEiNx6LKeveqSacUEeJxEWUuVwXew9jAAnEc0/ RdrzeA/wMz7g321b6nV+5wn9cRGl9PTF1B1yr/l/uq47gb9rXuOpqovdedmxeqOQErq+ 6g5m3wc15jpJTN2ZTfTI2f0E5TEBdK2wnS470vwZlR4RAySwtd3i7w05vfWfvnEZ4Iif xoIVwvtWIpXB6Fy4IW1xhZiWHYY1jeAhuI4ef71QRS3TBaLJ6E+/fmA03UpUdVmdVpjl b9hg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ViuFyVdS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s18-20020a170906221200b008b17e21e082si860765ejs.295.2023.02.16.13.50.01; Thu, 16 Feb 2023 13:50:25 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ViuFyVdS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230186AbjBPVtc (ORCPT + 99 others); Thu, 16 Feb 2023 16:49:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36328 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230138AbjBPVtU (ORCPT ); Thu, 16 Feb 2023 16:49:20 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F9A03B868 for ; Thu, 16 Feb 2023 13:48:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676584100; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LFENKL+WQlWvhbDc2HxenGINBq9+lQWBR5ko6ef/r4M=; b=ViuFyVdS6vSNIQM223c65lCRmxkaqQ2rmX/di7ts2xGlA5QW6RxFlqEQhv/MAfQ0Vh11qf 9D4cVaEywJQEd8mNjmLKQWFwBcgvPRCa2GWJ7c4INSvXctFgJ7Nv6x43Cpxf1PvutTWX13 iHtPn+MivIju6RlK1oC//akaN8+h+UY= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-267-_tSm9z9QMGu8nYP2_juG9w-1; Thu, 16 Feb 2023 16:48:17 -0500 X-MC-Unique: _tSm9z9QMGu8nYP2_juG9w-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 25232802D32; Thu, 16 Feb 2023 21:48:16 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id E2AC5492B15; Thu, 16 Feb 2023 21:48:13 +0000 (UTC) From: David Howells To: Steve French Cc: David Howells , Jens Axboe , Al Viro , Shyam Prasad N , Rohith Surabattula , Tom Talpey , Stefan Metzmacher , Christoph Hellwig , Matthew Wilcox , Jeff Layton , linux-cifs@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, syzbot+a440341a59e3b7142895@syzkaller.appspotmail.com, Christoph Hellwig , David Hildenbrand , John Hubbard Subject: [PATCH 03/17] splice: Add a func to do a splice from an O_DIRECT file without ITER_PIPE Date: Thu, 16 Feb 2023 21:47:31 +0000 Message-Id: <20230216214745.3985496-4-dhowells@redhat.com> In-Reply-To: <20230216214745.3985496-1-dhowells@redhat.com> References: <20230216214745.3985496-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758025980529137367?= X-GMAIL-MSGID: =?utf-8?q?1758025980529137367?= Implement a function, direct_file_splice(), that deals with this by using an ITER_BVEC iterator instead of an ITER_PIPE iterator as the former won't free its buffers when reverted. The function bulk allocates all the buffers it thinks it is going to use in advance, does the read synchronously and only then trims the buffer down. The pages we did use get pushed into the pipe. This fixes a problem with the upcoming iov_iter_extract_pages() function, whereby pages extracted from a non-user-backed iterator such as ITER_PIPE aren't pinned. __iomap_dio_rw(), however, calls iov_iter_revert() to shorten the iterator to just the bufferage it is going to use - which has the side-effect of freeing the excess pipe buffers, even though they're attached to a bio and may get written to by DMA (thanks to Hillf Danton for spotting this[1]). This then causes memory corruption that is particularly noticable when the syzbot test[2] is run. The test boils down to: out = creat(argv[1], 0666); ftruncate(out, 0x800); lseek(out, 0x200, SEEK_SET); in = open(argv[1], O_RDONLY | O_DIRECT | O_NOFOLLOW); sendfile(out, in, NULL, 0x1dd00); run repeatedly in parallel. What I think is happening is that ftruncate() occasionally shortens the DIO read that's about to be made by sendfile's splice core by reducing i_size. This should be more efficient for DIO read by virtue of doing a bulk page allocation, but slightly less efficient by ignoring any partial page in the pipe. Reported-by: syzbot+a440341a59e3b7142895@syzkaller.appspotmail.com Signed-off-by: David Howells Reviewed-by: Jens Axboe cc: Christoph Hellwig cc: Al Viro cc: David Hildenbrand cc: John Hubbard cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20230207094731.1390-1-hdanton@sina.com/ [1] Link: https://lore.kernel.org/r/000000000000b0b3c005f3a09383@google.com/ [2] --- fs/splice.c | 92 +++++++++++++++++++++++++++++++++++++++ include/linux/fs.h | 3 ++ include/linux/pipe_fs_i.h | 20 +++++++++ lib/iov_iter.c | 6 --- 4 files changed, 115 insertions(+), 6 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index 5969b7a1d353..4c6332854b63 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -282,6 +282,98 @@ void splice_shrink_spd(struct splice_pipe_desc *spd) kfree(spd->partial); } +/* + * Splice data from an O_DIRECT file into pages and then add them to the output + * pipe. + */ +ssize_t direct_splice_read(struct file *in, loff_t *ppos, + struct pipe_inode_info *pipe, + size_t len, unsigned int flags) +{ + struct iov_iter to; + struct bio_vec *bv; + struct kiocb kiocb; + struct page **pages; + ssize_t ret; + size_t used, npages, chunk, remain, reclaim; + int i; + + /* Work out how much data we can actually add into the pipe */ + used = pipe_occupancy(pipe->head, pipe->tail); + npages = max_t(ssize_t, pipe->max_usage - used, 0); + len = min_t(size_t, len, npages * PAGE_SIZE); + npages = DIV_ROUND_UP(len, PAGE_SIZE); + + bv = kzalloc(array_size(npages, sizeof(bv[0])) + + array_size(npages, sizeof(struct page *)), GFP_KERNEL); + if (!bv) + return -ENOMEM; + + pages = (void *)(bv + npages); + npages = alloc_pages_bulk_array(GFP_USER, npages, pages); + if (!npages) { + kfree(bv); + return -ENOMEM; + } + + remain = len = min_t(size_t, len, npages * PAGE_SIZE); + + for (i = 0; i < npages; i++) { + chunk = min_t(size_t, PAGE_SIZE, remain); + bv[i].bv_page = pages[i]; + bv[i].bv_offset = 0; + bv[i].bv_len = chunk; + remain -= chunk; + } + + /* Do the I/O */ + iov_iter_bvec(&to, ITER_DEST, bv, npages, len); + init_sync_kiocb(&kiocb, in); + kiocb.ki_pos = *ppos; + ret = call_read_iter(in, &kiocb, &to); + + reclaim = npages * PAGE_SIZE; + remain = 0; + if (ret > 0) { + reclaim -= ret; + remain = ret; + *ppos = kiocb.ki_pos; + file_accessed(in); + } else if (ret < 0) { + /* + * callers of ->splice_read() expect -EAGAIN on + * "can't put anything in there", rather than -EFAULT. + */ + if (ret == -EFAULT) + ret = -EAGAIN; + } + + /* Free any pages that didn't get touched at all. */ + reclaim /= PAGE_SIZE; + if (reclaim) { + npages -= reclaim; + release_pages(pages + npages, reclaim); + } + + /* Push the remaining pages into the pipe. */ + for (i = 0; i < npages; i++) { + struct pipe_buffer *buf = pipe_head_buf(pipe); + + chunk = min_t(size_t, remain, PAGE_SIZE); + *buf = (struct pipe_buffer) { + .ops = &default_pipe_buf_ops, + .page = bv[i].bv_page, + .offset = 0, + .len = chunk, + }; + pipe->head++; + remain -= chunk; + } + + kfree(bv); + return ret; +} + /** * generic_file_splice_read - splice data from file to a pipe * @in: file to splice from diff --git a/include/linux/fs.h b/include/linux/fs.h index 28743e38df91..551c9403f9b3 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3166,6 +3166,9 @@ ssize_t vfs_iocb_iter_write(struct file *file, struct kiocb *iocb, ssize_t filemap_splice_read(struct file *in, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags); +ssize_t direct_splice_read(struct file *in, loff_t *ppos, + struct pipe_inode_info *pipe, + size_t len, unsigned int flags); extern ssize_t generic_file_splice_read(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int); extern ssize_t iter_file_splice_write(struct pipe_inode_info *, diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h index 6cb65df3e3ba..d2c3f16cf6b1 100644 --- a/include/linux/pipe_fs_i.h +++ b/include/linux/pipe_fs_i.h @@ -156,6 +156,26 @@ static inline bool pipe_full(unsigned int head, unsigned int tail, return pipe_occupancy(head, tail) >= limit; } +/** + * pipe_buf - Return the pipe buffer for the specified slot in the pipe ring + * @pipe: The pipe to access + * @slot: The slot of interest + */ +static inline struct pipe_buffer *pipe_buf(const struct pipe_inode_info *pipe, + unsigned int slot) +{ + return &pipe->bufs[slot & (pipe->ring_size - 1)]; +} + +/** + * pipe_head_buf - Return the pipe buffer at the head of the pipe ring + * @pipe: The pipe to access + */ +static inline struct pipe_buffer *pipe_head_buf(const struct pipe_inode_info *pipe) +{ + return pipe_buf(pipe, pipe->head); +} + /** * pipe_buf_get - get a reference to a pipe_buffer * @pipe: the pipe that the buffer belongs to diff --git a/lib/iov_iter.c b/lib/iov_iter.c index f9a3ff37ecd1..47c484551c59 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -186,12 +186,6 @@ static int copyin(void *to, const void __user *from, size_t n) return res; } -static inline struct pipe_buffer *pipe_buf(const struct pipe_inode_info *pipe, - unsigned int slot) -{ - return &pipe->bufs[slot & (pipe->ring_size - 1)]; -} - #ifdef PIPE_PARANOIA static bool sanity(const struct iov_iter *i) { From patchwork Thu Feb 16 21:47:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 58243 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp546446wrn; Thu, 16 Feb 2023 13:50:23 -0800 (PST) X-Google-Smtp-Source: AK7set8rFaoHgEZJUHUsFWjSflY7vnUt+4yippKDfoTnPxotSEDN7HPhWeDbJoy3QNX0xxTUGEgk X-Received: by 2002:a17:907:1005:b0:8b1:fc:b06d with SMTP id ox5-20020a170907100500b008b100fcb06dmr6916700ejb.77.1676584222939; Thu, 16 Feb 2023 13:50:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676584222; cv=none; d=google.com; s=arc-20160816; b=TQbtTBh28dY9sCbwg3Eyd0myWcmKO7I0SaCFSPFgFqMOonG+9+SWG6v18KMWE/jy2p k6cROvznR//8JRf8vzk7CXoxx7x+sDmn4Ypi6MN6BI+QqtukVSjV/go1pslKmha6vWex q0xpSZl5mapQMZCk3LsnSB3dT8xWc3VhiCzgdBx5NV7Ot2x5p3PNY2G185tbKT6eyLbK Ds3cKi43JNSmJTfl3kxJOLCAk21hruRxKcBicbJCe+jnrVPOWamqe/hXaK1mYbUp/oxP UStjNyvqpnErVcjyo92XitKp4eYsuB7Y7/S6R7ODe+y8LqIIQjnNcC9a9eLeLs2IzxXp XcKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=XQdifRk6LoRlY8bBJqWjSiNz8KoUNxlUmZ1XJ6Y/O6c=; b=opCxwSG22eI0Ws/TjqHcIfNLPOPdPU6cpgLtlvPc3kKjuwWwaQwtjiMtVj909ehoV6 QFsBBSC0gFCn/sE0bVs+kfCjTRq03DcN63nN0ZuvhHuYfKpd8L6qzngPJ8/rwcgqwe/g YLdv4dDWFRRLE+Oc8TL/lszVoNR9bMoSYkeziIsxvltkVwEaC++l65eo58JWk4rOT7PN 0wlXG5hrPmomd3OyaFame6zcQwqe0w2xvazlKgXdLyq/gECTSBJjBxAbWH9eE5LdlECF ILpcA4ctKw6FWAqxsYj3pEjQbk0qcUPrJQ7aN5zkVlM7Y7jV+4xvLu180fQ8adBy1lJ4 Rlrw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=fLJiXt7B; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id fc10-20020a1709073a4a00b00882f2f8c966si2801451ejc.342.2023.02.16.13.49.58; Thu, 16 Feb 2023 13:50:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=fLJiXt7B; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230230AbjBPVt2 (ORCPT + 99 others); Thu, 16 Feb 2023 16:49:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36308 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230130AbjBPVtU (ORCPT ); Thu, 16 Feb 2023 16:49:20 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1A1142DFB for ; Thu, 16 Feb 2023 13:48:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676584105; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XQdifRk6LoRlY8bBJqWjSiNz8KoUNxlUmZ1XJ6Y/O6c=; b=fLJiXt7BJTbwHvwH7sfyKzWmDGLOi0gbj66BY7I+XlQBn36EHA8IakyNP2b1jKsiIb2fDh zComy44VMMOsIk7iYSG5Akyejk9YhKo9R46YwmSitgyOtqklSzAcfOQSs2ZD5Ql+tLpgKI HtFtKikHCRJnACfXGC3ST579gsRGQmk= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-558-TZxIysogMpWNukeCTYhZIA-1; Thu, 16 Feb 2023 16:48:19 -0500 X-MC-Unique: TZxIysogMpWNukeCTYhZIA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0DF2F3C10146; Thu, 16 Feb 2023 21:48:19 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id D6ECD1121314; Thu, 16 Feb 2023 21:48:16 +0000 (UTC) From: David Howells To: Steve French Cc: David Howells , Jens Axboe , Al Viro , Shyam Prasad N , Rohith Surabattula , Tom Talpey , Stefan Metzmacher , Christoph Hellwig , Matthew Wilcox , Jeff Layton , linux-cifs@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Christoph Hellwig , John Hubbard , Logan Gunthorpe Subject: [PATCH 04/17] iov_iter: Define flags to qualify page extraction. Date: Thu, 16 Feb 2023 21:47:32 +0000 Message-Id: <20230216214745.3985496-5-dhowells@redhat.com> In-Reply-To: <20230216214745.3985496-1-dhowells@redhat.com> References: <20230216214745.3985496-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758025978321657809?= X-GMAIL-MSGID: =?utf-8?q?1758025978321657809?= Define flags to qualify page extraction to pass into iov_iter_*_pages*() rather than passing in FOLL_* flags. For now only a flag to allow peer-to-peer DMA is supported. Signed-off-by: David Howells Reviewed-by: Christoph Hellwig Reviewed-by: John Hubbard Reviewed-by: Jens Axboe cc: Al Viro cc: Logan Gunthorpe cc: linux-fsdevel@vger.kernel.org cc: linux-block@vger.kernel.org --- block/bio.c | 6 +++--- block/blk-map.c | 8 ++++---- include/linux/uio.h | 10 ++++++++-- lib/iov_iter.c | 14 ++++++++------ 4 files changed, 23 insertions(+), 15 deletions(-) diff --git a/block/bio.c b/block/bio.c index ab59a491a883..b97f3991c904 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1245,11 +1245,11 @@ static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page, */ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) { + iov_iter_extraction_t extraction_flags = 0; unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt; unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt; struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt; struct page **pages = (struct page **)bv; - unsigned int gup_flags = 0; ssize_t size, left; unsigned len, i = 0; size_t offset, trim; @@ -1264,7 +1264,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) pages += entries_left * (PAGE_PTRS_PER_BVEC - 1); if (bio->bi_bdev && blk_queue_pci_p2pdma(bio->bi_bdev->bd_disk->queue)) - gup_flags |= FOLL_PCI_P2PDMA; + extraction_flags |= ITER_ALLOW_P2PDMA; /* * Each segment in the iov is required to be a block size multiple. @@ -1275,7 +1275,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) */ size = iov_iter_get_pages(iter, pages, UINT_MAX - bio->bi_iter.bi_size, - nr_pages, &offset, gup_flags); + nr_pages, &offset, extraction_flags); if (unlikely(size <= 0)) return size ? size : -EFAULT; diff --git a/block/blk-map.c b/block/blk-map.c index 19940c978c73..080dd60485be 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -265,9 +265,9 @@ static struct bio *blk_rq_map_bio_alloc(struct request *rq, static int bio_map_user_iov(struct request *rq, struct iov_iter *iter, gfp_t gfp_mask) { + iov_iter_extraction_t extraction_flags = 0; unsigned int max_sectors = queue_max_hw_sectors(rq->q); unsigned int nr_vecs = iov_iter_npages(iter, BIO_MAX_VECS); - unsigned int gup_flags = 0; struct bio *bio; int ret; int j; @@ -280,7 +280,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter, return -ENOMEM; if (blk_queue_pci_p2pdma(rq->q)) - gup_flags |= FOLL_PCI_P2PDMA; + extraction_flags |= ITER_ALLOW_P2PDMA; while (iov_iter_count(iter)) { struct page **pages, *stack_pages[UIO_FASTIOV]; @@ -291,10 +291,10 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter, if (nr_vecs <= ARRAY_SIZE(stack_pages)) { pages = stack_pages; bytes = iov_iter_get_pages(iter, pages, LONG_MAX, - nr_vecs, &offs, gup_flags); + nr_vecs, &offs, extraction_flags); } else { bytes = iov_iter_get_pages_alloc(iter, &pages, - LONG_MAX, &offs, gup_flags); + LONG_MAX, &offs, extraction_flags); } if (unlikely(bytes <= 0)) { ret = bytes ? bytes : -EFAULT; diff --git a/include/linux/uio.h b/include/linux/uio.h index 9f158238edba..eec6ed8a627a 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -13,6 +13,8 @@ struct page; struct pipe_inode_info; +typedef unsigned int __bitwise iov_iter_extraction_t; + struct kvec { void *iov_base; /* and that should *never* hold a userland pointer */ size_t iov_len; @@ -252,12 +254,12 @@ void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray * loff_t start, size_t count); ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, size_t *start, - unsigned gup_flags); + iov_iter_extraction_t extraction_flags); ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, size_t *start); ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, size_t maxsize, size_t *start, - unsigned gup_flags); + iov_iter_extraction_t extraction_flags); ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, struct page ***pages, size_t maxsize, size_t *start); int iov_iter_npages(const struct iov_iter *i, int maxpages); @@ -360,4 +362,8 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction, }; } +/* Flags for iov_iter_get/extract_pages*() */ +/* Allow P2PDMA on the extracted pages */ +#define ITER_ALLOW_P2PDMA ((__force iov_iter_extraction_t)0x01) + #endif diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 47c484551c59..9d4949ea9b27 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1426,9 +1426,9 @@ static struct page *first_bvec_segment(const struct iov_iter *i, static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, size_t maxsize, unsigned int maxpages, size_t *start, - unsigned int gup_flags) + iov_iter_extraction_t extraction_flags) { - unsigned int n; + unsigned int n, gup_flags = 0; if (maxsize > i->count) maxsize = i->count; @@ -1436,6 +1436,8 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i, return 0; if (maxsize > MAX_RW_COUNT) maxsize = MAX_RW_COUNT; + if (extraction_flags & ITER_ALLOW_P2PDMA) + gup_flags |= FOLL_PCI_P2PDMA; if (likely(user_backed_iter(i))) { unsigned long addr; @@ -1489,14 +1491,14 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i, ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, - size_t *start, unsigned gup_flags) + size_t *start, iov_iter_extraction_t extraction_flags) { if (!maxpages) return 0; BUG_ON(!pages); return __iov_iter_get_pages_alloc(i, &pages, maxsize, maxpages, - start, gup_flags); + start, extraction_flags); } EXPORT_SYMBOL_GPL(iov_iter_get_pages); @@ -1509,14 +1511,14 @@ EXPORT_SYMBOL(iov_iter_get_pages2); ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, size_t maxsize, - size_t *start, unsigned gup_flags) + size_t *start, iov_iter_extraction_t extraction_flags) { ssize_t len; *pages = NULL; len = __iov_iter_get_pages_alloc(i, pages, maxsize, ~0U, start, - gup_flags); + extraction_flags); if (len <= 0) { kvfree(*pages); *pages = NULL; From patchwork Thu Feb 16 21:47:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 58245 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp546498wrn; Thu, 16 Feb 2023 13:50:32 -0800 (PST) X-Google-Smtp-Source: AK7set996G7d0FNc5PEoPqdYnG4zaduyU4IC23XlGFzmjd3/MMhRzLJLBJBb5ldtC3ivayYWTCAu X-Received: by 2002:aa7:c447:0:b0:4ac:c4c2:1c51 with SMTP id n7-20020aa7c447000000b004acc4c21c51mr8870255edr.27.1676584231862; Thu, 16 Feb 2023 13:50:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676584231; cv=none; d=google.com; s=arc-20160816; b=mzlMQKWCwPM2SHqeEzDY9pENsRqeu34YMQIz8zqsG9hgpsek4YknctfM8+82wtJ9wy JbiLuCJr+zYvjeF39VA4AKQHdYTo6JJqfWJaE1odx98NdKVRsZeDlQ29QB8QnDcTpRer RamH0hs877llVKjFXouMnHR8Luh/EuVZop7ZsesebJKhXow7N1sl4y6SYJOhdgMaPfWN n5IN1mOdA7ogFdFbf5wCupG1nsFEwbR7aMHEWDwYJP6ZfhSrRJj7jK8lWAzLTHkGV+AP VREsiW6piQIYU7iN/QYaq5UCpAzivvtjp/j0jIDdJXp/3ivAprLFhOhnhjXLoO2tZyFo v8LA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=18pP9YVGNzQMD0mCsioarqrrlin+eDKe62LyFsBlk0Y=; b=jSjVVQTDQhW0s4v6ivugOi7dBi3jiqAzLPBD4QS+4286PEqa8xMv0SrD1+TRbtp5PS MApi71BmWyL+FmgiAPfNzsystr/pb8XBx7R+rrvveZ1JwnWw5NVGwtQs1tjVOaBBEhcx Ml135ZpQvpqjzzm+FJdxeN2RV3/bbZ7fV5auDpsnjT8Q4KhB2q4QAqTnOx0NBxrqy5qH ra+FiT7fzVzgKmkvvaEUF7iWkyZmXSfKLUW4S7b/oWEeZr0yJtyjjjKL5QGYK/j2pnYh w9c7fVvqa6lbp8T96HdzjZIZuZgWwlGTq00psJN4cG4UCAZ2ETpgTkErdfKR+8PUf/iz F/Iw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XJU7rOTq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r26-20020aa7cfda000000b004ad7cfe06e2si289570edy.199.2023.02.16.13.50.08; Thu, 16 Feb 2023 13:50:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XJU7rOTq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230153AbjBPVte (ORCPT + 99 others); Thu, 16 Feb 2023 16:49:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36378 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230149AbjBPVtW (ORCPT ); Thu, 16 Feb 2023 16:49:22 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 038BD3432F for ; Thu, 16 Feb 2023 13:48:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676584107; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=18pP9YVGNzQMD0mCsioarqrrlin+eDKe62LyFsBlk0Y=; b=XJU7rOTq91gYKRSuiltlbAUzkPrj3FhSCvgeAa4Pu8ss0pooALxz1soGqUqwGmsoz8Ipyi gkwPZWgrJcX06cjlwpJwCC1KAp9AwnRwbheei/Zfk0JuES/IyZmhhAKxpbg5kofNwlzZzN TkLI1wcpuLYx8XWiPSSXaAKc4oyYU10= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-569-_rauNtcuNFaFFcISeV-l4w-1; Thu, 16 Feb 2023 16:48:23 -0500 X-MC-Unique: _rauNtcuNFaFFcISeV-l4w-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CF9AC3C10146; Thu, 16 Feb 2023 21:48:21 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9EA81492B15; Thu, 16 Feb 2023 21:48:19 +0000 (UTC) From: David Howells To: Steve French Cc: David Howells , Jens Axboe , Al Viro , Shyam Prasad N , Rohith Surabattula , Tom Talpey , Stefan Metzmacher , Christoph Hellwig , Matthew Wilcox , Jeff Layton , linux-cifs@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Christoph Hellwig , John Hubbard , David Hildenbrand Subject: [PATCH 05/17] iov_iter: Add a function to extract a page list from an iterator Date: Thu, 16 Feb 2023 21:47:33 +0000 Message-Id: <20230216214745.3985496-6-dhowells@redhat.com> In-Reply-To: <20230216214745.3985496-1-dhowells@redhat.com> References: <20230216214745.3985496-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758025987122497887?= X-GMAIL-MSGID: =?utf-8?q?1758025987122497887?= Add a function, iov_iter_extract_pages(), to extract a list of pages from an iterator. The pages may be returned with a pin added or nothing, depending on the type of iterator. Add a second function, iov_iter_extract_will_pin(), to determine how the cleanup should be done. There are two cases: (1) ITER_IOVEC or ITER_UBUF iterator. Extracted pages will have pins (FOLL_PIN) obtained on them so that a concurrent fork() will forcibly copy the page so that DMA is done to/from the parent's buffer and is unavailable to/unaffected by the child process. iov_iter_extract_will_pin() will return true for this case. The caller should use something like unpin_user_page() to dispose of the page. (2) Any other sort of iterator. No refs or pins are obtained on the page, the assumption is made that the caller will manage page retention. iov_iter_extract_will_pin() will return false. The pages don't need additional disposal. Signed-off-by: David Howells Reviewed-by: Christoph Hellwig Reviewed-by: Jens Axboe cc: Al Viro cc: John Hubbard cc: David Hildenbrand cc: Matthew Wilcox cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org --- include/linux/uio.h | 27 ++++- lib/iov_iter.c | 264 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 290 insertions(+), 1 deletion(-) diff --git a/include/linux/uio.h b/include/linux/uio.h index eec6ed8a627a..514e3b7b06b8 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -361,9 +361,34 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction, .count = count }; } - /* Flags for iov_iter_get/extract_pages*() */ /* Allow P2PDMA on the extracted pages */ #define ITER_ALLOW_P2PDMA ((__force iov_iter_extraction_t)0x01) +ssize_t iov_iter_extract_pages(struct iov_iter *i, struct page ***pages, + size_t maxsize, unsigned int maxpages, + iov_iter_extraction_t extraction_flags, + size_t *offset0); + +/** + * iov_iter_extract_will_pin - Indicate how pages from the iterator will be retained + * @iter: The iterator + * + * Examine the iterator and indicate by returning true or false as to how, if + * at all, pages extracted from the iterator will be retained by the extraction + * function. + * + * %true indicates that the pages will have a pin placed in them that the + * caller must unpin. This is must be done for DMA/async DIO to force fork() + * to forcibly copy a page for the child (the parent must retain the original + * page). + * + * %false indicates that no measures are taken and that it's up to the caller + * to retain the pages. + */ +static inline bool iov_iter_extract_will_pin(const struct iov_iter *iter) +{ + return user_backed_iter(iter); +} + #endif diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 9d4949ea9b27..e53b16235385 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1910,3 +1910,267 @@ void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state) i->iov -= state->nr_segs - i->nr_segs; i->nr_segs = state->nr_segs; } + +/* + * Extract a list of contiguous pages from an ITER_XARRAY iterator. This does not + * get references on the pages, nor does it get a pin on them. + */ +static ssize_t iov_iter_extract_xarray_pages(struct iov_iter *i, + struct page ***pages, size_t maxsize, + unsigned int maxpages, + iov_iter_extraction_t extraction_flags, + size_t *offset0) +{ + struct page *page, **p; + unsigned int nr = 0, offset; + loff_t pos = i->xarray_start + i->iov_offset; + pgoff_t index = pos >> PAGE_SHIFT; + XA_STATE(xas, i->xarray, index); + + offset = pos & ~PAGE_MASK; + *offset0 = offset; + + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + p = *pages; + + rcu_read_lock(); + for (page = xas_load(&xas); page; page = xas_next(&xas)) { + if (xas_retry(&xas, page)) + continue; + + /* Has the page moved or been split? */ + if (unlikely(page != xas_reload(&xas))) { + xas_reset(&xas); + continue; + } + + p[nr++] = find_subpage(page, xas.xa_index); + if (nr == maxpages) + break; + } + rcu_read_unlock(); + + maxsize = min_t(size_t, nr * PAGE_SIZE - offset, maxsize); + iov_iter_advance(i, maxsize); + return maxsize; +} + +/* + * Extract a list of contiguous pages from an ITER_BVEC iterator. This does + * not get references on the pages, nor does it get a pin on them. + */ +static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i, + struct page ***pages, size_t maxsize, + unsigned int maxpages, + iov_iter_extraction_t extraction_flags, + size_t *offset0) +{ + struct page **p, *page; + size_t skip = i->iov_offset, offset; + int k; + + for (;;) { + if (i->nr_segs == 0) + return 0; + maxsize = min(maxsize, i->bvec->bv_len - skip); + if (maxsize) + break; + i->iov_offset = 0; + i->nr_segs--; + i->bvec++; + skip = 0; + } + + skip += i->bvec->bv_offset; + page = i->bvec->bv_page + skip / PAGE_SIZE; + offset = skip % PAGE_SIZE; + *offset0 = offset; + + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + p = *pages; + for (k = 0; k < maxpages; k++) + p[k] = page + k; + + maxsize = min_t(size_t, maxsize, maxpages * PAGE_SIZE - offset); + iov_iter_advance(i, maxsize); + return maxsize; +} + +/* + * Extract a list of virtually contiguous pages from an ITER_KVEC iterator. + * This does not get references on the pages, nor does it get a pin on them. + */ +static ssize_t iov_iter_extract_kvec_pages(struct iov_iter *i, + struct page ***pages, size_t maxsize, + unsigned int maxpages, + iov_iter_extraction_t extraction_flags, + size_t *offset0) +{ + struct page **p, *page; + const void *kaddr; + size_t skip = i->iov_offset, offset, len; + int k; + + for (;;) { + if (i->nr_segs == 0) + return 0; + maxsize = min(maxsize, i->kvec->iov_len - skip); + if (maxsize) + break; + i->iov_offset = 0; + i->nr_segs--; + i->kvec++; + skip = 0; + } + + kaddr = i->kvec->iov_base + skip; + offset = (unsigned long)kaddr & ~PAGE_MASK; + *offset0 = offset; + + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + p = *pages; + + kaddr -= offset; + len = offset + maxsize; + for (k = 0; k < maxpages; k++) { + size_t seg = min_t(size_t, len, PAGE_SIZE); + + if (is_vmalloc_or_module_addr(kaddr)) + page = vmalloc_to_page(kaddr); + else + page = virt_to_page(kaddr); + + p[k] = page; + len -= seg; + kaddr += PAGE_SIZE; + } + + maxsize = min_t(size_t, maxsize, maxpages * PAGE_SIZE - offset); + iov_iter_advance(i, maxsize); + return maxsize; +} + +/* + * Extract a list of contiguous pages from a user iterator and get a pin on + * each of them. This should only be used if the iterator is user-backed + * (IOBUF/UBUF). + * + * It does not get refs on the pages, but the pages must be unpinned by the + * caller once the transfer is complete. + * + * This is safe to be used where background IO/DMA *is* going to be modifying + * the buffer; using a pin rather than a ref makes forces fork() to give the + * child a copy of the page. + */ +static ssize_t iov_iter_extract_user_pages(struct iov_iter *i, + struct page ***pages, + size_t maxsize, + unsigned int maxpages, + iov_iter_extraction_t extraction_flags, + size_t *offset0) +{ + unsigned long addr; + unsigned int gup_flags = 0; + size_t offset; + int res; + + if (i->data_source == ITER_DEST) + gup_flags |= FOLL_WRITE; + if (extraction_flags & ITER_ALLOW_P2PDMA) + gup_flags |= FOLL_PCI_P2PDMA; + if (i->nofault) + gup_flags |= FOLL_NOFAULT; + + addr = first_iovec_segment(i, &maxsize); + *offset0 = offset = addr % PAGE_SIZE; + addr &= PAGE_MASK; + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + res = pin_user_pages_fast(addr, maxpages, gup_flags, *pages); + if (unlikely(res <= 0)) + return res; + maxsize = min_t(size_t, maxsize, res * PAGE_SIZE - offset); + iov_iter_advance(i, maxsize); + return maxsize; +} + +/** + * iov_iter_extract_pages - Extract a list of contiguous pages from an iterator + * @i: The iterator to extract from + * @pages: Where to return the list of pages + * @maxsize: The maximum amount of iterator to extract + * @maxpages: The maximum size of the list of pages + * @extraction_flags: Flags to qualify request + * @offset0: Where to return the starting offset into (*@pages)[0] + * + * Extract a list of contiguous pages from the current point of the iterator, + * advancing the iterator. The maximum number of pages and the maximum amount + * of page contents can be set. + * + * If *@pages is NULL, a page list will be allocated to the required size and + * *@pages will be set to its base. If *@pages is not NULL, it will be assumed + * that the caller allocated a page list at least @maxpages in size and this + * will be filled in. + * + * @extraction_flags can have ITER_ALLOW_P2PDMA set to request peer-to-peer DMA + * be allowed on the pages extracted. + * + * The iov_iter_extract_will_pin() function can be used to query how cleanup + * should be performed. + * + * Extra refs or pins on the pages may be obtained as follows: + * + * (*) If the iterator is user-backed (ITER_IOVEC/ITER_UBUF), pins will be + * added to the pages, but refs will not be taken. + * iov_iter_extract_will_pin() will return true. + * + * (*) If the iterator is ITER_KVEC, ITER_BVEC or ITER_XARRAY, the pages are + * merely listed; no extra refs or pins are obtained. + * iov_iter_extract_will_pin() will return 0. + * + * Note also: + * + * (*) Use with ITER_DISCARD is not supported as that has no content. + * + * On success, the function sets *@pages to the new pagelist, if allocated, and + * sets *offset0 to the offset into the first page. + * + * It may also return -ENOMEM and -EFAULT. + */ +ssize_t iov_iter_extract_pages(struct iov_iter *i, + struct page ***pages, + size_t maxsize, + unsigned int maxpages, + iov_iter_extraction_t extraction_flags, + size_t *offset0) +{ + maxsize = min_t(size_t, min_t(size_t, maxsize, i->count), MAX_RW_COUNT); + if (!maxsize) + return 0; + + if (likely(user_backed_iter(i))) + return iov_iter_extract_user_pages(i, pages, maxsize, + maxpages, extraction_flags, + offset0); + if (iov_iter_is_kvec(i)) + return iov_iter_extract_kvec_pages(i, pages, maxsize, + maxpages, extraction_flags, + offset0); + if (iov_iter_is_bvec(i)) + return iov_iter_extract_bvec_pages(i, pages, maxsize, + maxpages, extraction_flags, + offset0); + if (iov_iter_is_xarray(i)) + return iov_iter_extract_xarray_pages(i, pages, maxsize, + maxpages, extraction_flags, + offset0); + return -EFAULT; +} +EXPORT_SYMBOL_GPL(iov_iter_extract_pages); From patchwork Thu Feb 16 21:47:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 58248 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp546656wrn; Thu, 16 Feb 2023 13:50:58 -0800 (PST) X-Google-Smtp-Source: AK7set/UjB8v2L9K2DOeltgILl7bNgvnCqzpwy2RkLjryMXfFUPOuXYPBwsQ/zgMPxY6ieR7Ho/v X-Received: by 2002:a17:906:94d8:b0:8b1:3a8f:a51 with SMTP id d24-20020a17090694d800b008b13a8f0a51mr7133353ejy.15.1676584257859; Thu, 16 Feb 2023 13:50:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676584257; cv=none; d=google.com; s=arc-20160816; b=Sz2d0EE8ZzKmCKNM/CeA3V4H0WhsBOZzqwFhK16xhsmMB75HNr194raVtwqKZ5VLcX p5VRBczFGVBShy5Kb+REINgLHNVD/054At2x3HxtIJLYNHGGxxqa+el/p81gEtGADg/t tDuGyUZFcehNj2nR2Jnyxl1sN9oy0Xj8zFKHHR/XCueYWLdZ8rHF5YCExVmrttTT8AxW 3NVx40xeD7WBnvBNtLCwmwZcOARKhjQa7l3wnTQt4JkAEhZqFOWpeL5nbFpzp8VtH2XI n5aVrYt3VZ5HSh6aM3u+7lWI+3eZ6demHXQ98MipOU+ymaLH+WQ/tRkyAnE57SJ/ur0h qNaw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Gp2W96yfCpLFU4rVrbJwG/yzMjTMitm55rNRmKmTZrE=; b=x/HHggUzoIxwPItdDMYyJVO/wOpAgc2WKHbx0EvGTvHe7I9JSQ5LlL0/idD5xGXvzI jNORieVzUPw6/4GrVFdeaqAPJzlxhNKrd8uQp66yr5SxGrhlLaDODmnRnQr0beaD7Srk KArJ8/dTo1wQf5kZ/AVUmDnYYqNia7X+NP+xzQGXbbs23flP6AnqH9Vyz2JUUDcA8Ajy IDXR9HzVJF1AugmwNvs8CO21V9kfkc/PvVoW6Kr1fBMMxAOVJd32pobXIYl7AcKwMFIb +Fj3DGU3Zaipxvt2eR53kwzhPp8NGuWo7TfnP78srW/TJj+NETmxOxE6U7fHL8QPKKza lF4A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NCYDy7Qr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a12-20020aa7d74c000000b004a2758d0cc5si2926894eds.226.2023.02.16.13.50.34; Thu, 16 Feb 2023 13:50:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NCYDy7Qr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230299AbjBPVuD (ORCPT + 99 others); Thu, 16 Feb 2023 16:50:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36520 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230253AbjBPVtd (ORCPT ); Thu, 16 Feb 2023 16:49:33 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8C0163B3E3 for ; Thu, 16 Feb 2023 13:48:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676584112; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Gp2W96yfCpLFU4rVrbJwG/yzMjTMitm55rNRmKmTZrE=; b=NCYDy7QrRYnQTUa2fFmBx7ioGI7fGKOqoF5JQqJk4f8S+BeTVD/Qeq42RY/HHwBe4C7n8v Vhli2fIH3s0qtdfHoCH+Wk91F/pMcY6P2Z6gnfETOjbhsL5bMzWDKta6TaCRLCMmJm5ql+ +tUHwnuD30JLNUHhUyCTMiwrkLiHh7g= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-654-t-XLGBKJN1OaorYTO7C7CA-1; Thu, 16 Feb 2023 16:48:25 -0500 X-MC-Unique: t-XLGBKJN1OaorYTO7C7CA-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D2CB738123AF; Thu, 16 Feb 2023 21:48:24 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8BB9E492B15; Thu, 16 Feb 2023 21:48:22 +0000 (UTC) From: David Howells To: Steve French Cc: David Howells , Jens Axboe , Al Viro , Shyam Prasad N , Rohith Surabattula , Tom Talpey , Stefan Metzmacher , Christoph Hellwig , Matthew Wilcox , Jeff Layton , linux-cifs@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Steve French , Christoph Hellwig , David Hildenbrand , John Hubbard Subject: [PATCH 06/17] splice: Export filemap/direct_splice_read() Date: Thu, 16 Feb 2023 21:47:34 +0000 Message-Id: <20230216214745.3985496-7-dhowells@redhat.com> In-Reply-To: <20230216214745.3985496-1-dhowells@redhat.com> References: <20230216214745.3985496-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758026014602005750?= X-GMAIL-MSGID: =?utf-8?q?1758026014602005750?= filemap_splice_read() and direct_splice_read() should be exported. Signed-off-by: David Howells cc: Steve French cc: Jens Axboe cc: Christoph Hellwig cc: Al Viro cc: David Hildenbrand cc: John Hubbard cc: linux-cifs@vger.kernel.org cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org --- fs/splice.c | 1 + mm/filemap.c | 1 + 2 files changed, 2 insertions(+) diff --git a/fs/splice.c b/fs/splice.c index 4c6332854b63..928c7be2f318 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -373,6 +373,7 @@ ssize_t direct_splice_read(struct file *in, loff_t *ppos, kfree(bv); return ret; } +EXPORT_SYMBOL(direct_splice_read); /** * generic_file_splice_read - splice data from file to a pipe diff --git a/mm/filemap.c b/mm/filemap.c index 8c7b135c8e23..570f86578f7c 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2969,6 +2969,7 @@ ssize_t filemap_splice_read(struct file *in, loff_t *ppos, return total_spliced ? total_spliced : error; } +EXPORT_SYMBOL(filemap_splice_read); static inline loff_t folio_seek_hole_data(struct xa_state *xas, struct address_space *mapping, struct folio *folio, From patchwork Thu Feb 16 21:47:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 58247 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp546620wrn; Thu, 16 Feb 2023 13:50:52 -0800 (PST) X-Google-Smtp-Source: AK7set9T0fWaroQkSc+HEu+MhqXQlKHFWVRGRgXSW27A+IAdTFwcAURGuutaqHJUboOOJjB42NZp X-Received: by 2002:a17:907:10c3:b0:8b1:bafe:6135 with SMTP id rv3-20020a17090710c300b008b1bafe6135mr363623ejb.60.1676584252711; Thu, 16 Feb 2023 13:50:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676584252; cv=none; d=google.com; s=arc-20160816; b=vUIJL+sfjKC9+bmZHY2GEh57+9jJM0y0qO10c26cJ7/5jvNt5wFKGPmnitvG2MFLXK 2v2NW07l3KIdtgniwVY2tecebmX6cMqv9Z5qXxXFIR2XJdYAU13uaqzqqQn9ZKh58wDh E3UP42V/LKjFJN/w4AHJfpntow/K/xFw0wz2dy+wEz1pVqcJEOdqxZozABKNiX3IyDs0 bHNhuQzS07rOUwDmsm4Rx4rK8QWwTQ/7Sb/zm5qY3OlwOPF218eI2NMW+auv7XU1sIT+ G3HmlzYF6u3wQjaCF5GVSWKaecVZu0rN8grpZc3F0k/PdslAJ9eKLcQ03N+FifJxAVee uP0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=hl97XF3hOliR+/qiliOpOdWdf+h1RkO7skZ504Yz8AE=; b=kS3fuWaGKKHXPIkZ0a19aObXahIJL86z7Gfykv0sCrC1/O2A0dRmpYsoKxLED/UFrv jtb8ukH6LUV9VYfeUI3pWf4ow4d1XKfUvcbuxr321YM6m19bXPRdXFxJ1u+SwuuJJD6l EZQKHcbegJ+swZgJ1AdumAVGaEZG1tl7nC0AmCXqeaBoPA6z2zuCGrXzDvY6bAOGXJNZ Rw9J3+k1gRxqdMMBdFnOW9E8Fx/jPuLf1aMTWzM/FSuyNzBjSi5eYH1Kz8c6jzUZF+AY KC/x1yKlv8d4nfNi8uUvAf/Y1AgZMy8C6AlG18w+Pq7MhsTNXiyc1RcleVnq7etRNWb+ cJKQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=i2j3XsWY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 20-20020a170906025400b008b13a214bfbsi4627299ejl.893.2023.02.16.13.50.27; Thu, 16 Feb 2023 13:50:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=i2j3XsWY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230227AbjBPVt7 (ORCPT + 99 others); Thu, 16 Feb 2023 16:49:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36482 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230185AbjBPVtc (ORCPT ); Thu, 16 Feb 2023 16:49:32 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8383E46143 for ; Thu, 16 Feb 2023 13:48:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676584111; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hl97XF3hOliR+/qiliOpOdWdf+h1RkO7skZ504Yz8AE=; b=i2j3XsWYzY/S8K7USi890A8RDxnUMp7BevPotXYYS6MvUX6qMxk5d++Vc80pSH+S7VMNRN 1QdZNzM3SkoD4zWlYSlcooXJqQ3qui003KX7ByctsEBgLFrY9riirudVFSkFd+JX1wV66p ONFqk7D+A6wuu7BV/l3flHqKq5CYp44= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-9-ugZV4pIFMu6kB63jpxbEvQ-1; Thu, 16 Feb 2023 16:48:28 -0500 X-MC-Unique: ugZV4pIFMu6kB63jpxbEvQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5FA0B38123AF; Thu, 16 Feb 2023 21:48:27 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7765640C10FA; Thu, 16 Feb 2023 21:48:25 +0000 (UTC) From: David Howells To: Steve French Cc: David Howells , Jens Axboe , Al Viro , Shyam Prasad N , Rohith Surabattula , Tom Talpey , Stefan Metzmacher , Christoph Hellwig , Matthew Wilcox , Jeff Layton , linux-cifs@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Steve French Subject: [PATCH 07/17] cifs: Implement splice_read to pass down ITER_BVEC not ITER_PIPE Date: Thu, 16 Feb 2023 21:47:35 +0000 Message-Id: <20230216214745.3985496-8-dhowells@redhat.com> In-Reply-To: <20230216214745.3985496-1-dhowells@redhat.com> References: <20230216214745.3985496-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758026009131870306?= X-GMAIL-MSGID: =?utf-8?q?1758026009131870306?= Provide cifs_splice_read() to use a bvec rather than an pipe iterator as the latter cannot so easily be split and advanced, which is necessary to pass an iterator down to the bottom levels. Upstream cifs gets around this problem by using iov_iter_get_pages() to prefill the pipe and then passing the list of pages down. This is done by: (1) Bulk-allocate a bunch of pages to carry as much of the requested amount of data as possible, but without overrunning the available slots in the pipe and add them to an ITER_BVEC. (2) Synchronously call ->read_iter() to read into the buffer. (3) Discard any unused pages. (4) Load the remaining pages into the pipe in order and advance the head pointer. Signed-off-by: David Howells cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: Jeff Layton cc: Al Viro cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/166732028113.3186319.1793644937097301358.stgit@warthog.procyon.org.uk/ # rfc --- fs/cifs/cifsfs.c | 12 ++++++------ fs/cifs/cifsfs.h | 3 +++ fs/cifs/file.c | 16 ++++++++++++++++ 3 files changed, 25 insertions(+), 6 deletions(-) diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c index 10e00c624922..4f1afcd3f8be 100644 --- a/fs/cifs/cifsfs.c +++ b/fs/cifs/cifsfs.c @@ -1358,7 +1358,7 @@ const struct file_operations cifs_file_ops = { .fsync = cifs_fsync, .flush = cifs_flush, .mmap = cifs_file_mmap, - .splice_read = generic_file_splice_read, + .splice_read = cifs_splice_read, .splice_write = iter_file_splice_write, .llseek = cifs_llseek, .unlocked_ioctl = cifs_ioctl, @@ -1378,7 +1378,7 @@ const struct file_operations cifs_file_strict_ops = { .fsync = cifs_strict_fsync, .flush = cifs_flush, .mmap = cifs_file_strict_mmap, - .splice_read = generic_file_splice_read, + .splice_read = cifs_splice_read, .splice_write = iter_file_splice_write, .llseek = cifs_llseek, .unlocked_ioctl = cifs_ioctl, @@ -1398,7 +1398,7 @@ const struct file_operations cifs_file_direct_ops = { .fsync = cifs_fsync, .flush = cifs_flush, .mmap = cifs_file_mmap, - .splice_read = generic_file_splice_read, + .splice_read = direct_splice_read, .splice_write = iter_file_splice_write, .unlocked_ioctl = cifs_ioctl, .copy_file_range = cifs_copy_file_range, @@ -1416,7 +1416,7 @@ const struct file_operations cifs_file_nobrl_ops = { .fsync = cifs_fsync, .flush = cifs_flush, .mmap = cifs_file_mmap, - .splice_read = generic_file_splice_read, + .splice_read = cifs_splice_read, .splice_write = iter_file_splice_write, .llseek = cifs_llseek, .unlocked_ioctl = cifs_ioctl, @@ -1434,7 +1434,7 @@ const struct file_operations cifs_file_strict_nobrl_ops = { .fsync = cifs_strict_fsync, .flush = cifs_flush, .mmap = cifs_file_strict_mmap, - .splice_read = generic_file_splice_read, + .splice_read = cifs_splice_read, .splice_write = iter_file_splice_write, .llseek = cifs_llseek, .unlocked_ioctl = cifs_ioctl, @@ -1452,7 +1452,7 @@ const struct file_operations cifs_file_direct_nobrl_ops = { .fsync = cifs_fsync, .flush = cifs_flush, .mmap = cifs_file_mmap, - .splice_read = generic_file_splice_read, + .splice_read = direct_splice_read, .splice_write = iter_file_splice_write, .unlocked_ioctl = cifs_ioctl, .copy_file_range = cifs_copy_file_range, diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h index 63a0ac2b9355..25decebbc478 100644 --- a/fs/cifs/cifsfs.h +++ b/fs/cifs/cifsfs.h @@ -100,6 +100,9 @@ extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to); extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from); extern ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from); extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from); +extern ssize_t cifs_splice_read(struct file *in, loff_t *ppos, + struct pipe_inode_info *pipe, size_t len, + unsigned int flags); extern int cifs_flock(struct file *pfile, int cmd, struct file_lock *plock); extern int cifs_lock(struct file *, int, struct file_lock *); extern int cifs_fsync(struct file *, loff_t, loff_t, int); diff --git a/fs/cifs/file.c b/fs/cifs/file.c index e216bc9b7abf..ddf6f572af81 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -5275,3 +5275,19 @@ const struct address_space_operations cifs_addr_ops_smallbuf = { .launder_folio = cifs_launder_folio, .migrate_folio = filemap_migrate_folio, }; + +/* + * Splice data from a file into a pipe. + */ +ssize_t cifs_splice_read(struct file *in, loff_t *ppos, + struct pipe_inode_info *pipe, size_t len, + unsigned int flags) +{ + if (unlikely(*ppos >= file_inode(in)->i_sb->s_maxbytes)) + return 0; + if (unlikely(!len)) + return 0; + if (in->f_flags & O_DIRECT) + return direct_splice_read(in, ppos, pipe, len, flags); + return filemap_splice_read(in, ppos, pipe, len, flags); +} From patchwork Thu Feb 16 21:47:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 58252 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp546858wrn; Thu, 16 Feb 2023 13:51:29 -0800 (PST) X-Google-Smtp-Source: AK7set9KmhQ1fVAD1YYACxQqL1azBYIo1NADWEB5ELfSJRXehc1WHll6geAwHT+NG1shj1ElFBS9 X-Received: by 2002:a05:6402:684:b0:4ad:7bd9:5ebb with SMTP id f4-20020a056402068400b004ad7bd95ebbmr827371edy.15.1676584289417; Thu, 16 Feb 2023 13:51:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676584289; cv=none; d=google.com; s=arc-20160816; b=U2zOzbqpiqVQXkvFjxIKylHjTrf5M4s1ClR5Qwg+wKyrDfW27IrW0RjsgHpN7eKkJb hJ9lyXQdri4e92KG1kbD4OCi2F4kh9YfXLP7yB4tD8WRRU3mC55s9AoWitxLX7VlwnHB 2P+kkCKTP0ug3oC0sHl6S6R4lkayXunry1ARQS9I15yynQY7fOb16gowsfaUmVVa/BQI QHIouj2fDS4mZdWgJiAoFeNp/8fVtHsRdI8WLIa/xqSDAsE9aSifVWRDtQSiFbRM7AqK txNeT95cuTkCQ6V9Gx/av66TntK/xtTk1WORM35bqR8QTsMqemHYZ5P67orXkKlNdExg +Mhg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=3Y4j0lnR/W/3vRUqeGPawzcVqSHyHS0pEsYo/MQFDo8=; b=YxlTc2FfX9qwXqm5dfFVC8VkbknIWLE0iBj2njiXnIM05vhI7DtBwBQ0RT19R4RXnr lsC9eEwGooNZ58nazW8uP4uwqXm1xKuuDonwT8EDL837HSMewmXtaMHi5gkOww6tgDqO CfAAFsG5xtEgWOn57bmZhE5Dd+97he/auUYSebN+XULmoyMzIH/gGIJQVE/Obi6dGBdh ZLzw/8zdE7giaKVU5ewx3FK4GoAxi+ZydAzms9snoH/i46um5bRYfPTRrN+IClpCylim HxBDKSg4J+PpdNEV0SjTKai5rBvOAO/VOILaRo3AX7R3MlYLHBnvt5CZaHEbUGLqT+Kn iPjw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=BDQJ5f0L; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v7-20020aa7cd47000000b004acbbb30292si3087320edw.24.2023.02.16.13.51.06; Thu, 16 Feb 2023 13:51:29 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=BDQJ5f0L; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230269AbjBPVuG (ORCPT + 99 others); Thu, 16 Feb 2023 16:50:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36594 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230220AbjBPVte (ORCPT ); Thu, 16 Feb 2023 16:49:34 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C0094474EB for ; Thu, 16 Feb 2023 13:48:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676584115; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3Y4j0lnR/W/3vRUqeGPawzcVqSHyHS0pEsYo/MQFDo8=; b=BDQJ5f0LDXXogDX6YuhtaFdo+Sj/Vod15FNDiIQgM8BNng9vnJ/wBtY5szKNF+/YZL+Ilp MPHEZp9nJHliLD1mQTeFOBjRPgMA4QiywBsqVj9FSyNAys9RVe1d0nLtn0WK1Co8hU93z0 39Qg9Hzk/hhU+ulorHvatgVBzYwY/4A= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-573-hZ6PAfdZNkyg85S-HML7Vw-1; Thu, 16 Feb 2023 16:48:31 -0500 X-MC-Unique: hZ6PAfdZNkyg85S-HML7Vw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 409912800498; Thu, 16 Feb 2023 21:48:30 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 237522166B31; Thu, 16 Feb 2023 21:48:28 +0000 (UTC) From: David Howells To: Steve French Cc: David Howells , Jens Axboe , Al Viro , Shyam Prasad N , Rohith Surabattula , Tom Talpey , Stefan Metzmacher , Christoph Hellwig , Matthew Wilcox , Jeff Layton , linux-cifs@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Steve French , linux-cachefs@redhat.com Subject: [PATCH 08/17] netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator Date: Thu, 16 Feb 2023 21:47:36 +0000 Message-Id: <20230216214745.3985496-9-dhowells@redhat.com> In-Reply-To: <20230216214745.3985496-1-dhowells@redhat.com> References: <20230216214745.3985496-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758026047937067899?= X-GMAIL-MSGID: =?utf-8?q?1758026047937067899?= Add a function to extract the pages from a user-space supplied iterator (UBUF- or IOVEC-type) into a BVEC-type iterator, retaining the pages by getting a pin on them (as FOLL_PIN) as we go. This is useful in three situations: (1) A userspace thread may have a sibling that unmaps or remaps the process's VM during the operation, changing the assignment of the pages and potentially causing an error. Retaining the pages keeps some pages around, even if this occurs; futher, we find out at the point of extraction if EFAULT is going to be incurred. (2) Pages might get swapped out/discarded if not retained, so we want to retain them to avoid the reload causing a deadlock due to a DIO from/to an mmapped region on the same file. (3) The iterator may get passed to sendmsg() by the filesystem. If a fault occurs, we may get a short write to a TCP stream that's then tricky to recover from. We don't deal with other types of iterator here, leaving it to other mechanisms to retain the pages (eg. PG_locked, PG_writeback and the pipe lock). Signed-off-by: David Howells cc: Jeff Layton cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: linux-cachefs@redhat.com cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org --- fs/netfs/Makefile | 1 + fs/netfs/iterator.c | 103 ++++++++++++++++++++++++++++++++++++++++++ include/linux/netfs.h | 4 ++ 3 files changed, 108 insertions(+) create mode 100644 fs/netfs/iterator.c diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile index f684c0cd1ec5..386d6fb92793 100644 --- a/fs/netfs/Makefile +++ b/fs/netfs/Makefile @@ -3,6 +3,7 @@ netfs-y := \ buffered_read.o \ io.o \ + iterator.o \ main.o \ objects.o diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c new file mode 100644 index 000000000000..6f0d79080abc --- /dev/null +++ b/fs/netfs/iterator.c @@ -0,0 +1,103 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* Iterator helpers. + * + * Copyright (C) 2022 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include +#include +#include +#include +#include "internal.h" + +/** + * netfs_extract_user_iter - Extract the pages from a user iterator into a bvec + * @orig: The original iterator + * @orig_len: The amount of iterator to copy + * @new: The iterator to be set up + * @extraction_flags: Flags to qualify the request + * + * Extract the page fragments from the given amount of the source iterator and + * build up a second iterator that refers to all of those bits. This allows + * the original iterator to disposed of. + * + * @extraction_flags can have ITER_ALLOW_P2PDMA set to request peer-to-peer DMA be + * allowed on the pages extracted. + * + * On success, the number of elements in the bvec is returned, the original + * iterator will have been advanced by the amount extracted. + * + * The iov_iter_extract_mode() function should be used to query how cleanup + * should be performed. + */ +ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len, + struct iov_iter *new, + iov_iter_extraction_t extraction_flags) +{ + struct bio_vec *bv = NULL; + struct page **pages; + unsigned int cur_npages; + unsigned int max_pages; + unsigned int npages = 0; + unsigned int i; + ssize_t ret; + size_t count = orig_len, offset, len; + size_t bv_size, pg_size; + + if (WARN_ON_ONCE(!iter_is_ubuf(orig) && !iter_is_iovec(orig))) + return -EIO; + + max_pages = iov_iter_npages(orig, INT_MAX); + bv_size = array_size(max_pages, sizeof(*bv)); + bv = kvmalloc(bv_size, GFP_KERNEL); + if (!bv) + return -ENOMEM; + + /* Put the page list at the end of the bvec list storage. bvec + * elements are larger than page pointers, so as long as we work + * 0->last, we should be fine. + */ + pg_size = array_size(max_pages, sizeof(*pages)); + pages = (void *)bv + bv_size - pg_size; + + while (count && npages < max_pages) { + ret = iov_iter_extract_pages(orig, &pages, count, + max_pages - npages, extraction_flags, + &offset); + if (ret < 0) { + pr_err("Couldn't get user pages (rc=%zd)\n", ret); + break; + } + + if (ret > count) { + pr_err("get_pages rc=%zd more than %zu\n", ret, count); + break; + } + + count -= ret; + ret += offset; + cur_npages = DIV_ROUND_UP(ret, PAGE_SIZE); + + if (npages + cur_npages > max_pages) { + pr_err("Out of bvec array capacity (%u vs %u)\n", + npages + cur_npages, max_pages); + break; + } + + for (i = 0; i < cur_npages; i++) { + len = ret > PAGE_SIZE ? PAGE_SIZE : ret; + bv[npages + i].bv_page = *pages++; + bv[npages + i].bv_offset = offset; + bv[npages + i].bv_len = len - offset; + ret -= len; + offset = 0; + } + + npages += cur_npages; + } + + iov_iter_bvec(new, orig->data_source, bv, npages, orig_len - count); + return npages; +} +EXPORT_SYMBOL_GPL(netfs_extract_user_iter); diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 4c76ddfb6a67..b11a84f6c32b 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -17,6 +17,7 @@ #include #include #include +#include enum netfs_sreq_ref_trace; @@ -296,6 +297,9 @@ void netfs_get_subrequest(struct netfs_io_subrequest *subreq, void netfs_put_subrequest(struct netfs_io_subrequest *subreq, bool was_async, enum netfs_sreq_ref_trace what); void netfs_stats_show(struct seq_file *); +ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len, + struct iov_iter *new, + iov_iter_extraction_t extraction_flags); /** * netfs_inode - Get the netfs inode context from the inode From patchwork Thu Feb 16 21:47:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 58249 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp546742wrn; Thu, 16 Feb 2023 13:51:12 -0800 (PST) X-Google-Smtp-Source: AK7set9t9Djd5sDjOikSuZYdACME5Si9fHRucXKTY+ygySMxPP1z0qHWDH42qb3MCxQf5WXyxitA X-Received: by 2002:a17:906:6d53:b0:878:6b39:6d2a with SMTP id a19-20020a1709066d5300b008786b396d2amr8812792ejt.46.1676584271805; Thu, 16 Feb 2023 13:51:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676584271; cv=none; d=google.com; s=arc-20160816; b=u44Qi1OnsW/U/qz9qNq0w03tRwigm2nIDMqzYHRVRpi3g7yBiDog9+/DFrld7LVmFt nE4HXUG0wRRrOmEySy37IMLHWc8/uZg0t191secA+qoWpCpwV8J3n9VpoYa5lYvQT1Au nl71JXxAPKLhYWUztEFquu6v79es93dkZizRhsm33w5kvKMoWBKsdVZFQuuZR3W2KvzU b3ASffWNb4qqEx3CPOhNejvY7YYA4y3/Iw9bPrqm1C2CyO50jC+TqGJ1JA/S0lVC6rhq V29f5n4BpiwlxTiqSHkU/m0cIXhCzop7PKQ2r9uEqB4s8hp2WLpIZ8HUja3yYMhZwS/C fwsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=B56pRrcWAY22m0G05dfkHZvKQrRhHOkH7s+6u+KVVg8=; b=CWaETZEgcYBPRnRRsxoa0M7c0Dir+71KQDWf4bKD3p6SF25KLTl8EZERpT5k94NmbD 0V850i7Zrr4cooyQK5VpUakGBJEb+fzGRc/Dw99vGutzl9WIeMMhTcM8+cAM/6p3PD8P kE/akVt7IWxqHwDBrmhab/YTjKnI24G6wEG8o6/Ay+eyEGm6IofxgJZgYNUITD8UlYT6 HgANiyLDZYZyCKmKvlwmNYcGzDyVA9R+Lx+JHa54TJdFwHNZ+TtBBp09H86I/f9NZNbd 13zyAFX9jjA2Y9Pj5bn69TBFl6JiO+8HomUkPyw++s0L28gzKDfK35Mz72mCZyEZgVbk CjcA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XurEWkQZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id kb7-20020a170907924700b008b14b443aaesi2577039ejb.158.2023.02.16.13.50.49; Thu, 16 Feb 2023 13:51:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XurEWkQZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230379AbjBPVuW (ORCPT + 99 others); Thu, 16 Feb 2023 16:50:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36120 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230254AbjBPVt4 (ORCPT ); Thu, 16 Feb 2023 16:49:56 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E75C33C78B for ; Thu, 16 Feb 2023 13:48:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676584119; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=B56pRrcWAY22m0G05dfkHZvKQrRhHOkH7s+6u+KVVg8=; b=XurEWkQZ8VgQ9R8YekNT76VXsDuhJIHapv40hOcCvM11B4efqzSk1e4g3sxrQuUe3rHmCH B3Q56Qai1OIEqiw4g74pPmOgiSjnAGDOLt63Uihwwzb/2PqukwTOjQ+wJq3NI/xGTJ6kc6 oaoLMIN1yESaZA8E6PfQhBKOBs3CKZQ= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-497-o7QVwY4PNzGubMfK7y7HHQ-1; Thu, 16 Feb 2023 16:48:33 -0500 X-MC-Unique: o7QVwY4PNzGubMfK7y7HHQ-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id DED7538123AF; Thu, 16 Feb 2023 21:48:32 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id D9AA2492B15; Thu, 16 Feb 2023 21:48:30 +0000 (UTC) From: David Howells To: Steve French Cc: David Howells , Jens Axboe , Al Viro , Shyam Prasad N , Rohith Surabattula , Tom Talpey , Stefan Metzmacher , Christoph Hellwig , Matthew Wilcox , Jeff Layton , linux-cifs@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Steve French , linux-cachefs@redhat.com Subject: [PATCH 09/17] netfs: Add a function to extract an iterator into a scatterlist Date: Thu, 16 Feb 2023 21:47:37 +0000 Message-Id: <20230216214745.3985496-10-dhowells@redhat.com> In-Reply-To: <20230216214745.3985496-1-dhowells@redhat.com> References: <20230216214745.3985496-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758026029183639085?= X-GMAIL-MSGID: =?utf-8?q?1758026029183639085?= Provide a function for filling in a scatterlist from the list of pages contained in an iterator. If the iterator is UBUF- or IOBUF-type, the pages have a pin taken on them (as FOLL_PIN). If the iterator is BVEC-, KVEC- or XARRAY-type, no pin is taken on the pages and it is left to the caller to manage their lifetime. It cannot be assumed that a ref can be validly taken, particularly in the case of a KVEC iterator. Signed-off-by: David Howells cc: Jeff Layton cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: linux-cachefs@redhat.com cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org --- fs/netfs/iterator.c | 268 ++++++++++++++++++++++++++++++++++++++++++ include/linux/netfs.h | 4 + mm/vmalloc.c | 1 + 3 files changed, 273 insertions(+) diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c index 6f0d79080abc..80d7ff440cac 100644 --- a/fs/netfs/iterator.c +++ b/fs/netfs/iterator.c @@ -7,7 +7,9 @@ #include #include +#include #include +#include #include #include "internal.h" @@ -101,3 +103,269 @@ ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len, return npages; } EXPORT_SYMBOL_GPL(netfs_extract_user_iter); + +/* + * Extract and pin a list of up to sg_max pages from UBUF- or IOVEC-class + * iterators, and add them to the scatterlist. + */ +static ssize_t netfs_extract_user_to_sg(struct iov_iter *iter, + ssize_t maxsize, + struct sg_table *sgtable, + unsigned int sg_max, + iov_iter_extraction_t extraction_flags) +{ + struct scatterlist *sg = sgtable->sgl + sgtable->nents; + struct page **pages; + unsigned int npages; + ssize_t ret = 0, res; + size_t len, off; + + /* We decant the page list into the tail of the scatterlist */ + pages = (void *)sgtable->sgl + array_size(sg_max, sizeof(struct scatterlist)); + pages -= sg_max; + + do { + res = iov_iter_extract_pages(iter, &pages, maxsize, sg_max, + extraction_flags, &off); + if (res < 0) + goto failed; + + len = res; + maxsize -= len; + ret += len; + npages = DIV_ROUND_UP(off + len, PAGE_SIZE); + sg_max -= npages; + + for (; npages < 0; npages--) { + struct page *page = *pages; + size_t seg = min_t(size_t, PAGE_SIZE - off, len); + + *pages++ = NULL; + sg_set_page(sg, page, len, off); + sgtable->nents++; + sg++; + len -= seg; + off = 0; + } + } while (maxsize > 0 && sg_max > 0); + + return ret; + +failed: + while (sgtable->nents > sgtable->orig_nents) + put_page(sg_page(&sgtable->sgl[--sgtable->nents])); + return res; +} + +/* + * Extract up to sg_max pages from a BVEC-type iterator and add them to the + * scatterlist. The pages are not pinned. + */ +static ssize_t netfs_extract_bvec_to_sg(struct iov_iter *iter, + ssize_t maxsize, + struct sg_table *sgtable, + unsigned int sg_max, + iov_iter_extraction_t extraction_flags) +{ + const struct bio_vec *bv = iter->bvec; + struct scatterlist *sg = sgtable->sgl + sgtable->nents; + unsigned long start = iter->iov_offset; + unsigned int i; + ssize_t ret = 0; + + for (i = 0; i < iter->nr_segs; i++) { + size_t off, len; + + len = bv[i].bv_len; + if (start >= len) { + start -= len; + continue; + } + + len = min_t(size_t, maxsize, len - start); + off = bv[i].bv_offset + start; + + sg_set_page(sg, bv[i].bv_page, len, off); + sgtable->nents++; + sg++; + sg_max--; + + ret += len; + maxsize -= len; + if (maxsize <= 0 || sg_max == 0) + break; + start = 0; + } + + if (ret > 0) + iov_iter_advance(iter, ret); + return ret; +} + +/* + * Extract up to sg_max pages from a KVEC-type iterator and add them to the + * scatterlist. This can deal with vmalloc'd buffers as well as kmalloc'd or + * static buffers. The pages are not pinned. + */ +static ssize_t netfs_extract_kvec_to_sg(struct iov_iter *iter, + ssize_t maxsize, + struct sg_table *sgtable, + unsigned int sg_max, + iov_iter_extraction_t extraction_flags) +{ + const struct kvec *kv = iter->kvec; + struct scatterlist *sg = sgtable->sgl + sgtable->nents; + unsigned long start = iter->iov_offset; + unsigned int i; + ssize_t ret = 0; + + for (i = 0; i < iter->nr_segs; i++) { + struct page *page; + unsigned long kaddr; + size_t off, len, seg; + + len = kv[i].iov_len; + if (start >= len) { + start -= len; + continue; + } + + kaddr = (unsigned long)kv[i].iov_base + start; + off = kaddr & ~PAGE_MASK; + len = min_t(size_t, maxsize, len - start); + kaddr &= PAGE_MASK; + + maxsize -= len; + ret += len; + do { + seg = min_t(size_t, len, PAGE_SIZE - off); + if (is_vmalloc_or_module_addr((void *)kaddr)) + page = vmalloc_to_page((void *)kaddr); + else + page = virt_to_page(kaddr); + + sg_set_page(sg, page, len, off); + sgtable->nents++; + sg++; + sg_max--; + + len -= seg; + kaddr += PAGE_SIZE; + off = 0; + } while (len > 0 && sg_max > 0); + + if (maxsize <= 0 || sg_max == 0) + break; + start = 0; + } + + if (ret > 0) + iov_iter_advance(iter, ret); + return ret; +} + +/* + * Extract up to sg_max folios from an XARRAY-type iterator and add them to + * the scatterlist. The pages are not pinned. + */ +static ssize_t netfs_extract_xarray_to_sg(struct iov_iter *iter, + ssize_t maxsize, + struct sg_table *sgtable, + unsigned int sg_max, + iov_iter_extraction_t extraction_flags) +{ + struct scatterlist *sg = sgtable->sgl + sgtable->nents; + struct xarray *xa = iter->xarray; + struct folio *folio; + loff_t start = iter->xarray_start + iter->iov_offset; + pgoff_t index = start / PAGE_SIZE; + ssize_t ret = 0; + size_t offset, len; + XA_STATE(xas, xa, index); + + rcu_read_lock(); + + xas_for_each(&xas, folio, ULONG_MAX) { + if (xas_retry(&xas, folio)) + continue; + if (WARN_ON(xa_is_value(folio))) + break; + if (WARN_ON(folio_test_hugetlb(folio))) + break; + + offset = offset_in_folio(folio, start); + len = min_t(size_t, maxsize, folio_size(folio) - offset); + + sg_set_page(sg, folio_page(folio, 0), len, offset); + sgtable->nents++; + sg++; + sg_max--; + + maxsize -= len; + ret += len; + if (maxsize <= 0 || sg_max == 0) + break; + } + + rcu_read_unlock(); + if (ret > 0) + iov_iter_advance(iter, ret); + return ret; +} + +/** + * netfs_extract_iter_to_sg - Extract pages from an iterator and add ot an sglist + * @iter: The iterator to extract from + * @maxsize: The amount of iterator to copy + * @sgtable: The scatterlist table to fill in + * @sg_max: Maximum number of elements in @sgtable that may be filled + * @extraction_flags: Flags to qualify the request + * + * Extract the page fragments from the given amount of the source iterator and + * add them to a scatterlist that refers to all of those bits, to a maximum + * addition of @sg_max elements. + * + * The pages referred to by UBUF- and IOVEC-type iterators are extracted and + * pinned; BVEC-, KVEC- and XARRAY-type are extracted but aren't pinned; PIPE- + * and DISCARD-type are not supported. + * + * No end mark is placed on the scatterlist; that's left to the caller. + * + * @extraction_flags can have ITER_ALLOW_P2PDMA set to request peer-to-peer DMA + * be allowed on the pages extracted. + * + * If successul, @sgtable->nents is updated to include the number of elements + * added and the number of bytes added is returned. @sgtable->orig_nents is + * left unaltered. + * + * The iov_iter_extract_mode() function should be used to query how cleanup + * should be performed. + */ +ssize_t netfs_extract_iter_to_sg(struct iov_iter *iter, size_t maxsize, + struct sg_table *sgtable, unsigned int sg_max, + iov_iter_extraction_t extraction_flags) +{ + if (maxsize == 0) + return 0; + + switch (iov_iter_type(iter)) { + case ITER_UBUF: + case ITER_IOVEC: + return netfs_extract_user_to_sg(iter, maxsize, sgtable, sg_max, + extraction_flags); + case ITER_BVEC: + return netfs_extract_bvec_to_sg(iter, maxsize, sgtable, sg_max, + extraction_flags); + case ITER_KVEC: + return netfs_extract_kvec_to_sg(iter, maxsize, sgtable, sg_max, + extraction_flags); + case ITER_XARRAY: + return netfs_extract_xarray_to_sg(iter, maxsize, sgtable, sg_max, + extraction_flags); + default: + pr_err("%s(%u) unsupported\n", __func__, iov_iter_type(iter)); + WARN_ON_ONCE(1); + return -EIO; + } +} +EXPORT_SYMBOL_GPL(netfs_extract_iter_to_sg); diff --git a/include/linux/netfs.h b/include/linux/netfs.h index b11a84f6c32b..a1f3522daa69 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -300,6 +300,10 @@ void netfs_stats_show(struct seq_file *); ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len, struct iov_iter *new, iov_iter_extraction_t extraction_flags); +struct sg_table; +ssize_t netfs_extract_iter_to_sg(struct iov_iter *iter, size_t len, + struct sg_table *sgtable, unsigned int sg_max, + iov_iter_extraction_t extraction_flags); /** * netfs_inode - Get the netfs inode context from the inode diff --git a/mm/vmalloc.c b/mm/vmalloc.c index ca71de7c9d77..61f5bec0f2b6 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -656,6 +656,7 @@ int is_vmalloc_or_module_addr(const void *x) #endif return is_vmalloc_addr(x); } +EXPORT_SYMBOL_GPL(is_vmalloc_or_module_addr); /* * Walk a vmap address to the struct page it maps. Huge vmap mappings will From patchwork Thu Feb 16 21:47:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 58250 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp546779wrn; Thu, 16 Feb 2023 13:51:15 -0800 (PST) X-Google-Smtp-Source: AK7set+y7KgUB+ddEVfGlqTuRxAU0VUwvGe7xyRC/CU6cgZHAWHM9e0+WISR2a+JsEtQbcRPy/QV X-Received: by 2002:a17:907:78c9:b0:87b:3d29:2982 with SMTP id kv9-20020a17090778c900b0087b3d292982mr7754099ejc.11.1676584275364; Thu, 16 Feb 2023 13:51:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676584275; cv=none; d=google.com; s=arc-20160816; b=s0yWO1XlW4iO0xUwWbV0blbZ0+f49owgtO7ktfTb3qWxxwd9S7bWqrwLlZW3BjRSFO GJR9OX1kb1WOECd7ePRRvgSNoa6E3MvKsFoEfvVjL3hlISTAaX6bHAHlk7A6ZQdPo9M8 Ror+2bP6iVUA9S5a9Jhz9REwOSZa/8Kbdf2zQQ1y3Ft19NTA5MapvKDDcx62Kzsz2Tb5 kCWGbH2YymUHD6/6+7Tc/Z2Ts5nF3gxcdRhkL2KoGr5YLmPG9yQVJ0pIWPBGPdwAN+mx CwvzbpXNxI50olRis71+RoouVXMI4iiuzHAReJ3MdIUXlWXFLC/MxGeTaGQadCSHSiXq EpxA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=e8eQEU7H5KI+mR6DaJwt19VwGfbirTBzTPLMp2PHuNY=; b=hP1UjAGG/L3YyGWjBlaj7fY44BpZ62l8uZPracmNSzjHCDus5Sj+8D0r2Jhud1BiWo e2r5oMVukgqNTl9TqNzrKcwl/jD5BRmXazLtQzJ3ajEsL5qnJsTusqxpxBvSt94ySyBL fEOpL135BsQVgzT9xLeRAy1GyEa4ImqE/8JfdWxRpPi3F/+1UE+nJ3y897HVd3LeHZPs PkxZiy9+neUeD66RNfrDXS6ix9BTQxF30QJmtM7RVpHJozfZasE9BVBCdV/msqkkujx/ jx9oLt1H494VsTXnDDi6TtbPOd7ZsQ/OSJVZjwKlVt0GQkYKgYSIky84Dx+XRcqovduq LKHQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=g3Aep2oo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id jr8-20020a170906514800b008b1329a1af2si2305124ejc.167.2023.02.16.13.50.52; Thu, 16 Feb 2023 13:51:15 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=g3Aep2oo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230332AbjBPVuZ (ORCPT + 99 others); Thu, 16 Feb 2023 16:50:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36956 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230210AbjBPVt6 (ORCPT ); Thu, 16 Feb 2023 16:49:58 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DE2E25034A for ; Thu, 16 Feb 2023 13:48:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676584122; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=e8eQEU7H5KI+mR6DaJwt19VwGfbirTBzTPLMp2PHuNY=; b=g3Aep2ooaqIB08u7Bes8t/1R6Cz4MMzv2/0mPiwwSLIF4P2nAMhKrxkUmDPw8FRTUbp4sk O4c7hLPXVbEqaQicvNsRfJ3xNc6tdv52tEBnF9A1+0rKuIz3sH25E8xxvoO3hY7P0I/PLD BPekBkwiE8rYjtt+fBhm7kCkQaSR/nM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-671-9yLGMC7UPNqociWeikV9Bw-1; Thu, 16 Feb 2023 16:48:36 -0500 X-MC-Unique: 9yLGMC7UPNqociWeikV9Bw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9E522800B24; Thu, 16 Feb 2023 21:48:35 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 99D1EC15BA0; Thu, 16 Feb 2023 21:48:33 +0000 (UTC) From: David Howells To: Steve French Cc: David Howells , Jens Axboe , Al Viro , Shyam Prasad N , Rohith Surabattula , Tom Talpey , Stefan Metzmacher , Christoph Hellwig , Matthew Wilcox , Jeff Layton , linux-cifs@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Steve French , linux-rdma@vger.kernel.org Subject: [PATCH 10/17] cifs: Add a function to build an RDMA SGE list from an iterator Date: Thu, 16 Feb 2023 21:47:38 +0000 Message-Id: <20230216214745.3985496-11-dhowells@redhat.com> In-Reply-To: <20230216214745.3985496-1-dhowells@redhat.com> References: <20230216214745.3985496-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758026033078488445?= X-GMAIL-MSGID: =?utf-8?q?1758026033078488445?= Add a function to add elements onto an RDMA SGE list representing page fragments extracted from a BVEC-, KVEC- or XARRAY-type iterator and DMA mapped until the maximum number of elements is reached. Nothing is done to make sure the pages remain present - that must be done by the caller. Signed-off-by: David Howells cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: Tom Talpey cc: Jeff Layton cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-rdma@vger.kernel.org Link: https://lore.kernel.org/r/166697256704.61150.17388516338310645808.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732028840.3186319.8512284239779728860.stgit@warthog.procyon.org.uk/ # rfc --- fs/cifs/smbdirect.c | 214 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 214 insertions(+) diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c index 8c816b25ce7c..3e0aacddc291 100644 --- a/fs/cifs/smbdirect.c +++ b/fs/cifs/smbdirect.c @@ -44,6 +44,17 @@ static int smbd_post_send_page(struct smbd_connection *info, static void destroy_mr_list(struct smbd_connection *info); static int allocate_mr_list(struct smbd_connection *info); +struct smb_extract_to_rdma { + struct ib_sge *sge; + unsigned int nr_sge; + unsigned int max_sge; + struct ib_device *device; + u32 local_dma_lkey; + enum dma_data_direction direction; +}; +static ssize_t smb_extract_iter_to_rdma(struct iov_iter *iter, size_t len, + struct smb_extract_to_rdma *rdma); + /* SMBD version number */ #define SMBD_V1 0x0100 @@ -2490,3 +2501,206 @@ int smbd_deregister_mr(struct smbd_mr *smbdirect_mr) return rc; } + +static bool smb_set_sge(struct smb_extract_to_rdma *rdma, + struct page *lowest_page, size_t off, size_t len) +{ + struct ib_sge *sge = &rdma->sge[rdma->nr_sge]; + u64 addr; + + addr = ib_dma_map_page(rdma->device, lowest_page, + off, len, rdma->direction); + if (ib_dma_mapping_error(rdma->device, addr)) + return false; + + sge->addr = addr; + sge->length = len; + sge->lkey = rdma->local_dma_lkey; + rdma->nr_sge++; + return true; +} + +/* + * Extract page fragments from a BVEC-class iterator and add them to an RDMA + * element list. The pages are not pinned. + */ +static ssize_t smb_extract_bvec_to_rdma(struct iov_iter *iter, + struct smb_extract_to_rdma *rdma, + ssize_t maxsize) +{ + const struct bio_vec *bv = iter->bvec; + unsigned long start = iter->iov_offset; + unsigned int i; + ssize_t ret = 0; + + for (i = 0; i < iter->nr_segs; i++) { + size_t off, len; + + len = bv[i].bv_len; + if (start >= len) { + start -= len; + continue; + } + + len = min_t(size_t, maxsize, len - start); + off = bv[i].bv_offset + start; + + if (!smb_set_sge(rdma, bv[i].bv_page, off, len)) + return -EIO; + + ret += len; + maxsize -= len; + if (rdma->nr_sge >= rdma->max_sge || maxsize <= 0) + break; + start = 0; + } + + return ret; +} + +/* + * Extract fragments from a KVEC-class iterator and add them to an RDMA list. + * This can deal with vmalloc'd buffers as well as kmalloc'd or static buffers. + * The pages are not pinned. + */ +static ssize_t smb_extract_kvec_to_rdma(struct iov_iter *iter, + struct smb_extract_to_rdma *rdma, + ssize_t maxsize) +{ + const struct kvec *kv = iter->kvec; + unsigned long start = iter->iov_offset; + unsigned int i; + ssize_t ret = 0; + + for (i = 0; i < iter->nr_segs; i++) { + struct page *page; + unsigned long kaddr; + size_t off, len, seg; + + len = kv[i].iov_len; + if (start >= len) { + start -= len; + continue; + } + + kaddr = (unsigned long)kv[i].iov_base + start; + off = kaddr & ~PAGE_MASK; + len = min_t(size_t, maxsize, len - start); + kaddr &= PAGE_MASK; + + maxsize -= len; + do { + seg = min_t(size_t, len, PAGE_SIZE - off); + + if (is_vmalloc_or_module_addr((void *)kaddr)) + page = vmalloc_to_page((void *)kaddr); + else + page = virt_to_page(kaddr); + + if (!smb_set_sge(rdma, page, off, seg)) + return -EIO; + + ret += seg; + len -= seg; + kaddr += PAGE_SIZE; + off = 0; + } while (len > 0 && rdma->nr_sge < rdma->max_sge); + + if (rdma->nr_sge >= rdma->max_sge || maxsize <= 0) + break; + start = 0; + } + + return ret; +} + +/* + * Extract folio fragments from an XARRAY-class iterator and add them to an + * RDMA list. The folios are not pinned. + */ +static ssize_t smb_extract_xarray_to_rdma(struct iov_iter *iter, + struct smb_extract_to_rdma *rdma, + ssize_t maxsize) +{ + struct xarray *xa = iter->xarray; + struct folio *folio; + loff_t start = iter->xarray_start + iter->iov_offset; + pgoff_t index = start / PAGE_SIZE; + ssize_t ret = 0; + size_t off, len; + XA_STATE(xas, xa, index); + + rcu_read_lock(); + + xas_for_each(&xas, folio, ULONG_MAX) { + if (xas_retry(&xas, folio)) + continue; + if (WARN_ON(xa_is_value(folio))) + break; + if (WARN_ON(folio_test_hugetlb(folio))) + break; + + off = offset_in_folio(folio, start); + len = min_t(size_t, maxsize, folio_size(folio) - off); + + if (!smb_set_sge(rdma, folio_page(folio, 0), off, len)) { + rcu_read_lock(); + return -EIO; + } + + maxsize -= len; + ret += len; + if (rdma->nr_sge >= rdma->max_sge || maxsize <= 0) + break; + } + + rcu_read_unlock(); + return ret; +} + +/* + * Extract page fragments from up to the given amount of the source iterator + * and build up an RDMA list that refers to all of those bits. The RDMA list + * is appended to, up to the maximum number of elements set in the parameter + * block. + * + * The extracted page fragments are not pinned or ref'd in any way; if an + * IOVEC/UBUF-type iterator is to be used, it should be converted to a + * BVEC-type iterator and the pages pinned, ref'd or otherwise held in some + * way. + */ +static ssize_t smb_extract_iter_to_rdma(struct iov_iter *iter, size_t len, + struct smb_extract_to_rdma *rdma) +{ + ssize_t ret; + int before = rdma->nr_sge; + + switch (iov_iter_type(iter)) { + case ITER_BVEC: + ret = smb_extract_bvec_to_rdma(iter, rdma, len); + break; + case ITER_KVEC: + ret = smb_extract_kvec_to_rdma(iter, rdma, len); + break; + case ITER_XARRAY: + ret = smb_extract_xarray_to_rdma(iter, rdma, len); + break; + default: + WARN_ON_ONCE(1); + return -EIO; + } + + if (ret > 0) { + iov_iter_advance(iter, ret); + } else if (ret < 0) { + while (rdma->nr_sge > before) { + struct ib_sge *sge = &rdma->sge[rdma->nr_sge--]; + + ib_dma_unmap_single(rdma->device, sge->addr, sge->length, + rdma->direction); + sge->addr = 0; + } + } + + return ret; +} From patchwork Thu Feb 16 21:47:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 58253 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp547036wrn; Thu, 16 Feb 2023 13:51:54 -0800 (PST) X-Google-Smtp-Source: AK7set9ErMCvw+4VS0XjFYh6sQ1MyGkU71xqwPbJqgQ6CUN7zYZbGyDbDsxM+Ytpm/Wp3Jg7DuXV X-Received: by 2002:a17:906:d0d1:b0:8b1:7b56:c4b8 with SMTP id bq17-20020a170906d0d100b008b17b56c4b8mr1358424ejb.60.1676584314551; Thu, 16 Feb 2023 13:51:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676584314; cv=none; d=google.com; s=arc-20160816; b=zR+jwKy25LkJdIEZnfNIDYP0D8QVY2K/WMXfqBw3deyrtLnscZ6ogl0PvfnA0MU4VN S9DX8sdAsChNvpYnzC+L1yS1k2/FAQwMfOmidp/OnlDW0t3AJluH5eW7mmReDzFceJC8 W6wU3mdjB/Whh8oIPI53aQKcdbLoO1q1JTGCmbp2cYL4lko9iBSsl6TllKAaj7HMz6aQ o55JV/GbiU2p2/guI8J5P5/4lbdO/uV684q3p61A4YYyp+ZPOp48n6dojOZkKE0DuZu8 b15u4ekcGNZUWArXrGd594PdWCqYSNHzqog3UDxzxD+0vgHKYlUDKQkFFHhS+4qN+cME 1uzQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=jlR4JcCg1XC8eTF3I+rHcc7NxJf4yOAtTyxChrY5i+Y=; b=zhV9ZxDYE/5LAGgWzBos8LO5svKCAn1e0v1QxUQMDSF0INByK0SgF+2Fh9F3q2jGeR qYxnlOIK2JoOuw6ofoJAqDgHqikJEZBZF6Gb8bYhRMAZ7V3uS+/rN1K+OtG+8XVT12Jk 9EAHq621ZXIDlffcv9GGhTiidxNbIRvHDcmfIBasIhFqID6zrW9ytLRQ1bGpWdLqboaH CrcOpu1uJrdyba06ohUltexG8iiFg8A/SPypKxzCRits5m9xYlSdncWZogPvad8J3Hav udfX6PLS6S7rG92NjFD+u7ccGPgJUs+8/oskoJJliZPe/iOHHGRI1Z+OqlgQZTYyydcH 2i1Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=VT2rfh7p; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id jr23-20020a170906515700b008b135479ae3si2850422ejc.69.2023.02.16.13.51.31; Thu, 16 Feb 2023 13:51:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=VT2rfh7p; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230397AbjBPVuc (ORCPT + 99 others); Thu, 16 Feb 2023 16:50:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37034 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230268AbjBPVuD (ORCPT ); Thu, 16 Feb 2023 16:50:03 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 443C4505E5 for ; Thu, 16 Feb 2023 13:48:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676584122; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jlR4JcCg1XC8eTF3I+rHcc7NxJf4yOAtTyxChrY5i+Y=; b=VT2rfh7p893K3pJAWXnPOkIWNv6LQzhfKaLkFff+/jMS0XTO4x4e+r2S+1zm7rg3gWBnXL Z3NcNYAkEVX9djgoJ7T9FDt/EIPV1xZfbVxOWMiBA1e1As4HmFd0jgigyaB22CALrvDg+B Wfns/iC2MCWuue0RDdmWhVY9IR5hHVE= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-260-Qs6jji0IPnqiPKBLR7YHAw-1; Thu, 16 Feb 2023 16:48:39 -0500 X-MC-Unique: Qs6jji0IPnqiPKBLR7YHAw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3C9431C05AF6; Thu, 16 Feb 2023 21:48:38 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3E0D751FF; Thu, 16 Feb 2023 21:48:36 +0000 (UTC) From: David Howells To: Steve French Cc: David Howells , Jens Axboe , Al Viro , Shyam Prasad N , Rohith Surabattula , Tom Talpey , Stefan Metzmacher , Christoph Hellwig , Matthew Wilcox , Jeff Layton , linux-cifs@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Steve French , linux-crypto@vger.kernel.org Subject: [PATCH 11/17] cifs: Add a function to Hash the contents of an iterator Date: Thu, 16 Feb 2023 21:47:39 +0000 Message-Id: <20230216214745.3985496-12-dhowells@redhat.com> In-Reply-To: <20230216214745.3985496-1-dhowells@redhat.com> References: <20230216214745.3985496-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758026074554815861?= X-GMAIL-MSGID: =?utf-8?q?1758026074554815861?= Add a function to push the contents of a BVEC-, KVEC- or XARRAY-type iterator into a symmetric hash algorithm. UBUF- and IOBUF-type iterators are not supported on the assumption that either we're doing buffered I/O, in which case we won't see them, or we're doing direct I/O, in which case the iterator will have been extracted into a BVEC-type iterator higher up. Signed-off-by: David Howells cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: Jeff Layton cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-crypto@vger.kernel.org Link: https://lore.kernel.org/r/166697257423.61150.12070648579830206483.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732029577.3186319.17162612653237909961.stgit@warthog.procyon.org.uk/ # rfc --- fs/cifs/cifsencrypt.c | 144 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 144 insertions(+) diff --git a/fs/cifs/cifsencrypt.c b/fs/cifs/cifsencrypt.c index cbc18b4a9cb2..7be589aeb520 100644 --- a/fs/cifs/cifsencrypt.c +++ b/fs/cifs/cifsencrypt.c @@ -24,6 +24,150 @@ #include "../smbfs_common/arc4.h" #include +/* + * Hash data from a BVEC-type iterator. + */ +static int cifs_shash_bvec(const struct iov_iter *iter, ssize_t maxsize, + struct shash_desc *shash) +{ + const struct bio_vec *bv = iter->bvec; + unsigned long start = iter->iov_offset; + unsigned int i; + void *p; + int ret; + + for (i = 0; i < iter->nr_segs; i++) { + size_t off, len; + + len = bv[i].bv_len; + if (start >= len) { + start -= len; + continue; + } + + len = min_t(size_t, maxsize, len - start); + off = bv[i].bv_offset + start; + + p = kmap_local_page(bv[i].bv_page); + ret = crypto_shash_update(shash, p + off, len); + kunmap_local(p); + if (ret < 0) + return ret; + + maxsize -= len; + if (maxsize <= 0) + break; + start = 0; + } + + return 0; +} + +/* + * Hash data from a KVEC-type iterator. + */ +static int cifs_shash_kvec(const struct iov_iter *iter, ssize_t maxsize, + struct shash_desc *shash) +{ + const struct kvec *kv = iter->kvec; + unsigned long start = iter->iov_offset; + unsigned int i; + int ret; + + for (i = 0; i < iter->nr_segs; i++) { + size_t len; + + len = kv[i].iov_len; + if (start >= len) { + start -= len; + continue; + } + + len = min_t(size_t, maxsize, len - start); + ret = crypto_shash_update(shash, kv[i].iov_base + start, len); + if (ret < 0) + return ret; + maxsize -= len; + + if (maxsize <= 0) + break; + start = 0; + } + + return 0; +} + +/* + * Hash data from an XARRAY-type iterator. + */ +static ssize_t cifs_shash_xarray(const struct iov_iter *iter, ssize_t maxsize, + struct shash_desc *shash) +{ + struct folio *folios[16], *folio; + unsigned int nr, i, j, npages; + loff_t start = iter->xarray_start + iter->iov_offset; + pgoff_t last, index = start / PAGE_SIZE; + ssize_t ret = 0; + size_t len, offset, foffset; + void *p; + + if (maxsize == 0) + return 0; + + last = (start + maxsize - 1) / PAGE_SIZE; + do { + nr = xa_extract(iter->xarray, (void **)folios, index, last, + ARRAY_SIZE(folios), XA_PRESENT); + if (nr == 0) + return -EIO; + + for (i = 0; i < nr; i++) { + folio = folios[i]; + npages = folio_nr_pages(folio); + foffset = start - folio_pos(folio); + offset = foffset % PAGE_SIZE; + for (j = foffset / PAGE_SIZE; j < npages; j++) { + len = min_t(size_t, maxsize, PAGE_SIZE - offset); + p = kmap_local_page(folio_page(folio, j)); + ret = crypto_shash_update(shash, p, len); + kunmap_local(p); + if (ret < 0) + return ret; + maxsize -= len; + if (maxsize <= 0) + return 0; + start += len; + offset = 0; + index++; + } + } + } while (nr == ARRAY_SIZE(folios)); + return 0; +} + +/* + * Pass the data from an iterator into a hash. + */ +static int cifs_shash_iter(const struct iov_iter *iter, size_t maxsize, + struct shash_desc *shash) +{ + if (maxsize == 0) + return 0; + + switch (iov_iter_type(iter)) { + case ITER_BVEC: + return cifs_shash_bvec(iter, maxsize, shash); + case ITER_KVEC: + return cifs_shash_kvec(iter, maxsize, shash); + case ITER_XARRAY: + return cifs_shash_xarray(iter, maxsize, shash); + default: + pr_err("cifs_shash_iter(%u) unsupported\n", iov_iter_type(iter)); + WARN_ON_ONCE(1); + return -EIO; + } +} + int __cifs_calc_signature(struct smb_rqst *rqst, struct TCP_Server_Info *server, char *signature, struct shash_desc *shash) From patchwork Thu Feb 16 21:47:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 58251 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp546786wrn; Thu, 16 Feb 2023 13:51:18 -0800 (PST) X-Google-Smtp-Source: AK7set9Yc98+8iTGc2pxExK2KXdnx41QIQjiq/cQsO2zgTQVxkvQkklHZliVcItHUSFgwHwR4jbe X-Received: by 2002:aa7:d393:0:b0:4ac:c5d4:cfe7 with SMTP id x19-20020aa7d393000000b004acc5d4cfe7mr7162674edq.27.1676584278670; Thu, 16 Feb 2023 13:51:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676584278; cv=none; d=google.com; s=arc-20160816; b=OL8514HuJkuRLbIrB+4X1tGYVFda8lGHo807AoNVW1MyJBqMnGTN3KoSG6WIOfa4Hn iBxW9vmfFcUWH9rQ22ypM9xot0VyXw0OwXBr2Vm5AQIexk4DVMIAn+BwGfyugQME5VNu VPm9SuuV/rKvfOkdMivTWfLnt52FyMnmrLEW7Pd3ITFfKZ+vSS+ClJynMLNEPpy4rkLH B022Gjqs0ofEgcBkUHKq1KQIdku00+OoXGuifYz6SAoGrHlOITpBLmYLfzEZbA2oYcFU n1i4jGCccddPow+tae/Zi3kXoUfXG3/3PtZs9zn0VWInGfIsr3aLPesiXRFIzWdT/fqM EgiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=jQcJ17qVNUKIVAthm+EjBPaJCPdilCzc5b5eGZTuD8w=; b=hXzdt5H0fZP5YrS73BSQ756KcMrs0kuIfPkp7keggGaLmAk4DJw9GHEit4MNyaPqpK yfcGeE7zumyQ1+35vY+xwxFqUiETl6LCxRoclfVJ/ftWqBfNponyMXqIAnQDX8Pb48HX O68LMIAbKX9OsiGO54lTIl91SDd91wYfAJVvCFrgraY0L9yucq+1rqFe/5D9YDWcuEc7 5MTyskyjGYBNdnnNm6FF4KSWNgkPELAXoYBPzJGF/zsQZ3cfjIT/pQlVaT98Whpxq04D GUXBT9RPvgXqcdQsrIix/kaF2dTukL8NX1pgEm5SySgNEnEppmz/1nizC9hLLtse3yQv 79nA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=e8qLQTDy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m11-20020a50ef0b000000b004ad797532easi917669eds.34.2023.02.16.13.50.55; Thu, 16 Feb 2023 13:51:18 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=e8qLQTDy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230330AbjBPVu2 (ORCPT + 99 others); Thu, 16 Feb 2023 16:50:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37070 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230302AbjBPVuD (ORCPT ); Thu, 16 Feb 2023 16:50:03 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F015D528B7 for ; Thu, 16 Feb 2023 13:48:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676584125; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jQcJ17qVNUKIVAthm+EjBPaJCPdilCzc5b5eGZTuD8w=; b=e8qLQTDyVaziXiCmHxnQTHqkyCS1OZ0O0XN1G4saybiNw+f+4i4QoaHMfHJee1su7GI3w2 6PKAVvIcoAQUc8RA/G4sppE6yS4NhlcqQO/dMKIpCV8pKt7rSj0Lxl2OAjra9+mClRYJor vPhoZtreE4MpKHYKLDiOn+uzeaJhCiE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-167-P9DRTr1UPGurnYjvUQcAQg-1; Thu, 16 Feb 2023 16:48:41 -0500 X-MC-Unique: P9DRTr1UPGurnYjvUQcAQg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CF965185A7A4; Thu, 16 Feb 2023 21:48:40 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id EBF2A40C945A; Thu, 16 Feb 2023 21:48:38 +0000 (UTC) From: David Howells To: Steve French Cc: David Howells , Jens Axboe , Al Viro , Shyam Prasad N , Rohith Surabattula , Tom Talpey , Stefan Metzmacher , Christoph Hellwig , Matthew Wilcox , Jeff Layton , linux-cifs@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Steve French Subject: [PATCH 12/17] cifs: Add some helper functions Date: Thu, 16 Feb 2023 21:47:40 +0000 Message-Id: <20230216214745.3985496-13-dhowells@redhat.com> In-Reply-To: <20230216214745.3985496-1-dhowells@redhat.com> References: <20230216214745.3985496-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758026036609426768?= X-GMAIL-MSGID: =?utf-8?q?1758026036609426768?= Add some helper functions to manipulate the folio marks by iterating through a list of folios held in an xarray rather than using a page list. Signed-off-by: David Howells cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: Jeff Layton cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/164928616583.457102.15157033997163988344.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165211418840.3154751.3090684430628501879.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165348878940.2106726.204291614267188735.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165364825674.3334034.3356201708659748648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/166126394799.708021.10637797063862600488.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/166697258147.61150.9940790486999562110.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732030314.3186319.9209944805565413627.stgit@warthog.procyon.org.uk/ # rfc --- fs/cifs/cifsfs.h | 3 ++ fs/cifs/file.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 96 insertions(+) diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h index 25decebbc478..ea628da503c6 100644 --- a/fs/cifs/cifsfs.h +++ b/fs/cifs/cifsfs.h @@ -113,6 +113,9 @@ extern int cifs_file_strict_mmap(struct file *file, struct vm_area_struct *vma); extern const struct file_operations cifs_dir_ops; extern int cifs_dir_open(struct inode *inode, struct file *file); extern int cifs_readdir(struct file *file, struct dir_context *ctx); +extern void cifs_pages_written_back(struct inode *inode, loff_t start, unsigned int len); +extern void cifs_pages_write_failed(struct inode *inode, loff_t start, unsigned int len); +extern void cifs_pages_write_redirty(struct inode *inode, loff_t start, unsigned int len); /* Functions related to dir entries */ extern const struct dentry_operations cifs_dentry_ops; diff --git a/fs/cifs/file.c b/fs/cifs/file.c index ddf6f572af81..09240b8b018a 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -36,6 +36,99 @@ #include "cifs_ioctl.h" #include "cached_dir.h" +/* + * Completion of write to server. + */ +void cifs_pages_written_back(struct inode *inode, loff_t start, unsigned int len) +{ + struct address_space *mapping = inode->i_mapping; + struct folio *folio; + pgoff_t end; + + XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE); + + if (!len) + return; + + rcu_read_lock(); + + end = (start + len - 1) / PAGE_SIZE; + xas_for_each(&xas, folio, end) { + if (!folio_test_writeback(folio)) { + WARN_ONCE(1, "bad %x @%llx page %lx %lx\n", + len, start, folio_index(folio), end); + continue; + } + + folio_detach_private(folio); + folio_end_writeback(folio); + } + + rcu_read_unlock(); +} + +/* + * Failure of write to server. + */ +void cifs_pages_write_failed(struct inode *inode, loff_t start, unsigned int len) +{ + struct address_space *mapping = inode->i_mapping; + struct folio *folio; + pgoff_t end; + + XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE); + + if (!len) + return; + + rcu_read_lock(); + + end = (start + len - 1) / PAGE_SIZE; + xas_for_each(&xas, folio, end) { + if (!folio_test_writeback(folio)) { + WARN_ONCE(1, "bad %x @%llx page %lx %lx\n", + len, start, folio_index(folio), end); + continue; + } + + folio_set_error(folio); + folio_end_writeback(folio); + } + + rcu_read_unlock(); +} + +/* + * Redirty pages after a temporary failure. + */ +void cifs_pages_write_redirty(struct inode *inode, loff_t start, unsigned int len) +{ + struct address_space *mapping = inode->i_mapping; + struct folio *folio; + pgoff_t end; + + XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE); + + if (!len) + return; + + rcu_read_lock(); + + end = (start + len - 1) / PAGE_SIZE; + xas_for_each(&xas, folio, end) { + if (!folio_test_writeback(folio)) { + WARN_ONCE(1, "bad %x @%llx page %lx %lx\n", + len, start, folio_index(folio), end); + continue; + } + + filemap_dirty_folio(folio->mapping, folio); + folio_end_writeback(folio); + } + + rcu_read_unlock(); +} + /* * Mark as invalid, all open files on tree connections since they * were closed when session to server was lost. From patchwork Thu Feb 16 21:47:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 58256 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp548148wrn; Thu, 16 Feb 2023 13:55:11 -0800 (PST) X-Google-Smtp-Source: AK7set80HN85oeyiCDZlv5SyweBmk/ZZP5NgZ/AJNJ4va5EUrvQK1FvEl1jniWExX6VSB027Ard5 X-Received: by 2002:a17:906:c28f:b0:889:5686:486a with SMTP id r15-20020a170906c28f00b008895686486amr7933239ejz.30.1676584511729; Thu, 16 Feb 2023 13:55:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676584511; cv=none; d=google.com; s=arc-20160816; b=H/dU/evwt+x3TZritXUauN03FR5o/BUKePRdEll8WsiUwPnPXV7JCLYFq8PosdNl+j /AJYvL897+e9Kt24hd4XHHoyD0ZJgog29BPSRbIDzu0FVOkXy7g3yNN2dxuFzlzs3jFV fMHlv6l1REyBvpF+RNQH7+pvOklJzy9oTEU0g/PZnF8hWGhxc3pf0P7KXReVze+Eatir x7rs+CREaftWYV/uarxThFtuK0LTg7yUHC5CHHfk44GTT2//5/JANPh1w7et1w2t9Gos ZgHs8nvlPoSY/076T+WiEHswPKmWjr5Mb6SWt0LMV1q4hapgWdLqKcjiMz1SF56Awn0v 805w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ZbR1ERx60swPmHLlvc4O0e0pHhwkZikBmUg0OVIfTvo=; b=FWnapDWxNvVBWjL1zh2SQp4e5cDZqn7c4HD/ww3kKkCRPiAP3hFSkI85qJktx6yDpO ndXhd0q6xAqrj7Ejy5bxtFMxZZ8gwholpCr/HE+gMYbujf4gaxeJeHKz6/dSJzLMbV9q kdqBeyCD+sxdRMS9dKnP1FvBjiWg6qtiVrWBGLSNQ/dScb9C6ZqBWuOfJbmSJsEm8qRP Ov8NBlvXhP9XBe6wpMHa4wR1YH8UtlHz1C5syatXsg44fk4XkJn2stMED3E7biI8/kXx s/pqzBCaojT5h7MqqueOqlrGhlD4ns1GN/jn5mmgrPeQ4WNKC9QQwc4pkHrEAHF1Qr5z Tpaw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="UQ5q/TTO"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id wj8-20020a170907050800b008b1289de5basi1224156ejb.737.2023.02.16.13.54.47; Thu, 16 Feb 2023 13:55:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="UQ5q/TTO"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230298AbjBPVuy (ORCPT + 99 others); Thu, 16 Feb 2023 16:50:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36328 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230344AbjBPVuP (ORCPT ); Thu, 16 Feb 2023 16:50:15 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3A11253803 for ; Thu, 16 Feb 2023 13:48:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676584127; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZbR1ERx60swPmHLlvc4O0e0pHhwkZikBmUg0OVIfTvo=; b=UQ5q/TTO/xk7WWehvahW6vpMttAH2jFOy8p9ZwR8I1WTwvKxM4dFYpFvDVXriK3qJKN0p4 6oTZ6zlRmq8+v/U8JpQVzGPBa241oaqV1OwQZjjzWR3x+j6pTwrhRQAYVUz9DHlQQqzhjH xjG2kEDbZuVDmHhyd4q41kCul1wa1Us= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-515-O-HRLX8QP9akrW_MwZnTmA-1; Thu, 16 Feb 2023 16:48:44 -0500 X-MC-Unique: O-HRLX8QP9akrW_MwZnTmA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6773985A5B1; Thu, 16 Feb 2023 21:48:43 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7979251FF; Thu, 16 Feb 2023 21:48:41 +0000 (UTC) From: David Howells To: Steve French Cc: David Howells , Jens Axboe , Al Viro , Shyam Prasad N , Rohith Surabattula , Tom Talpey , Stefan Metzmacher , Christoph Hellwig , Matthew Wilcox , Jeff Layton , linux-cifs@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Steve French Subject: [PATCH 13/17] cifs: Add a function to read into an iter from a socket Date: Thu, 16 Feb 2023 21:47:41 +0000 Message-Id: <20230216214745.3985496-14-dhowells@redhat.com> In-Reply-To: <20230216214745.3985496-1-dhowells@redhat.com> References: <20230216214745.3985496-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758026280944139158?= X-GMAIL-MSGID: =?utf-8?q?1758026280944139158?= Add a helper function to read data from a socket into the given iterator. Signed-off-by: David Howells cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: Jeff Layton cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/164928617874.457102.10021662143234315566.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165211419563.3154751.18431990381145195050.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165348879662.2106726.16881134187242702351.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165364826398.3334034.12541600783145647319.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/166126395495.708021.12328677373159554478.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/166697258876.61150.3530237818849429372.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732031039.3186319.10691316510079412635.stgit@warthog.procyon.org.uk/ # rfc --- fs/cifs/cifsproto.h | 3 +++ fs/cifs/connect.c | 14 ++++++++++++++ 2 files changed, 17 insertions(+) diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h index 1207b39686fb..cb7a3fe89278 100644 --- a/fs/cifs/cifsproto.h +++ b/fs/cifs/cifsproto.h @@ -244,6 +244,9 @@ extern int cifs_read_page_from_socket(struct TCP_Server_Info *server, struct page *page, unsigned int page_offset, unsigned int to_read); +int cifs_read_iter_from_socket(struct TCP_Server_Info *server, + struct iov_iter *iter, + unsigned int to_read); extern int cifs_setup_cifs_sb(struct cifs_sb_info *cifs_sb); void cifs_mount_put_conns(struct cifs_mount_ctx *mnt_ctx); int cifs_mount_get_session(struct cifs_mount_ctx *mnt_ctx); diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index b2a04b4e89a5..152b457b849f 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -765,6 +765,20 @@ cifs_read_page_from_socket(struct TCP_Server_Info *server, struct page *page, return cifs_readv_from_socket(server, &smb_msg); } +int +cifs_read_iter_from_socket(struct TCP_Server_Info *server, struct iov_iter *iter, + unsigned int to_read) +{ + struct msghdr smb_msg = { .msg_iter = *iter }; + int ret; + + iov_iter_truncate(&smb_msg.msg_iter, to_read); + ret = cifs_readv_from_socket(server, &smb_msg); + if (ret > 0) + iov_iter_advance(iter, ret); + return ret; +} + static bool is_smb_response(struct TCP_Server_Info *server, unsigned char type) { From patchwork Thu Feb 16 21:47:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 58257 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp548243wrn; Thu, 16 Feb 2023 13:55:27 -0800 (PST) X-Google-Smtp-Source: AK7set8kK+0KlNrPG6vC6ACBdiTYFpU0EK9HGExICjiGt7KUgLaT+Es0BtaAwF6l2RKcjICi+5YU X-Received: by 2002:a05:6402:b38:b0:4ad:7203:9b2a with SMTP id bo24-20020a0564020b3800b004ad72039b2amr1912636edb.19.1676584527026; Thu, 16 Feb 2023 13:55:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676584527; cv=none; d=google.com; s=arc-20160816; b=b1BcjEO/cJqNUQnFmq4qFhpzkxCng0Wlrmw5w/SZBXZWUOyDQ9mBsNzavpLQLRxq0U qwr0yOY7GF0aQDH6wXPMZK1VddPD/HfllUUJFrLAyNDYQqqQxI981M3JRZ7EXTK+YDMs eeGkHeHnoaUBtWP4AT+NBfe1+2rIItGnRISF5QQMkPjfxPsgxPELBVd0gqo1XIW18oo2 VgspbHSdJ8EUFc/aI7fBvSDfPa6D+hkmcAmUJGxN10zjcHMuhG8c2tstuL0lIIVDmRQL FJpud20Ol/31jQfcvBr/1wQomESsohlJ6ZZcSHTfCyZI7fljtlWrHFlvMI7Se7vgsMcf 7AgQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=w8LQxWnkBgqvz4cIMil1irq6J1Qn5Tx24/1K6rR/sq0=; b=ZaVbkEDFbUFXXITTdvfDzY7ZGDxW4T6VLqaVRkP8EIOygCgeGuy51kk22kytW2Lhe1 qGQ0R+ksuJoejboXfv0YrXMONIStYCm6Vo4NVlLuL1e4GbKlZzaZjwRHYOhvGsu8Tvg/ uqYrtkpGGj8P4lf8Fr4V1uC7VrZxYHGGseFhMsG1ln/hH1DmqL5O1yeD4I6GgJgteeFG iGZDmHOytgsJS0ywikac+LTLGz7nEm5Je89wdN85pL5jnkYSNrVD3OBBhnbvo7ZlXKqm QaI5IdQ8FuzyUowsqnJ2MnoEg3iv6HmqG5TEcAJzsqtStWnGbIt9Bu4kGrZgNALGRmD/ xUbw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=JkEBUu8U; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c18-20020aa7d612000000b004ab2521aed1si3294696edr.293.2023.02.16.13.55.03; Thu, 16 Feb 2023 13:55:26 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=JkEBUu8U; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230309AbjBPVvM (ORCPT + 99 others); Thu, 16 Feb 2023 16:51:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36366 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230351AbjBPVuQ (ORCPT ); Thu, 16 Feb 2023 16:50:16 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0287C53837 for ; Thu, 16 Feb 2023 13:48:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676584132; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=w8LQxWnkBgqvz4cIMil1irq6J1Qn5Tx24/1K6rR/sq0=; b=JkEBUu8UkcNX8lzgiuvF0gImE1TR90QbwDjxFh5wBm/V/C68jOkm+Ccf9W9gsjJsgr0Wzk hJu1+qXXUhdiX2BLnxyWZKDRkrh0JFBERrH0YkEj3WznNuT7MrarBqZdDny2Ll++t7psCF EBLacdGbNYpc68Frd1IDDuU3RUHz8W4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-133-xMcY1n1TN-SlWPvCfB6VVg-1; Thu, 16 Feb 2023 16:48:47 -0500 X-MC-Unique: xMcY1n1TN-SlWPvCfB6VVg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7C73F183B3C6; Thu, 16 Feb 2023 21:48:46 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2712A40C945A; Thu, 16 Feb 2023 21:48:44 +0000 (UTC) From: David Howells To: Steve French Cc: David Howells , Jens Axboe , Al Viro , Shyam Prasad N , Rohith Surabattula , Tom Talpey , Stefan Metzmacher , Christoph Hellwig , Matthew Wilcox , Jeff Layton , linux-cifs@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Steve French , Paulo Alcantara Subject: [PATCH 14/17] cifs: Change the I/O paths to use an iterator rather than a page list Date: Thu, 16 Feb 2023 21:47:42 +0000 Message-Id: <20230216214745.3985496-15-dhowells@redhat.com> In-Reply-To: <20230216214745.3985496-1-dhowells@redhat.com> References: <20230216214745.3985496-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758026296960953658?= X-GMAIL-MSGID: =?utf-8?q?1758026296960953658?= Currently, the cifs I/O paths hand lists of pages from the VM interface routines at the top all the way through the intervening layers to the socket interface at the bottom. This is a problem, however, for interfacing with netfslib which passes an iterator through to the ->issue_read() method (and will pass an iterator through to the ->issue_write() method in future). Netfslib takes over bounce buffering for direct I/O, async I/O and encrypted content, so cifs doesn't need to do that. Netfslib also converts IOVEC-type iterators into BVEC-type iterators if necessary. Further, cifs needs foliating - and folios may come in a variety of sizes, so a page list pointing to an array of heterogeneous pages may cause problems in places such as where crypto is done. Change the cifs I/O paths to hand iov_iter iterators all the way through instead. Notes: (1) Some old routines are #if'd out to be removed in a follow up patch so as to avoid confusing diff, thereby making the diff output easier to follow. I've removed functions that don't overlap with anything added. (2) struct smb_rqst loses rq_pages, rq_offset, rq_npages, rq_pagesz and rq_tailsz which describe the pages forming the buffer; instead there's an rq_iter describing the source buffer and an rq_buffer which is used to hold the buffer for encryption. (3) struct cifs_readdata and cifs_writedata are similarly modified to smb_rqst. The ->read_into_pages() and ->copy_into_pages() are then replaced with passing the iterator directly to the socket. The iterators are stored in these structs so that they are persistent and don't get deallocated when the function returns (unlike if they were stack variables). (4) Buffered writeback is overhauled, borrowing the code from the afs filesystem to gather up contiguous runs of folios. The XARRAY-type iterator is then used to refer directly to the pagecache and can be passed to the socket to transmit data directly from there. This includes: cifs_extend_writeback() cifs_write_back_from_locked_folio() cifs_writepages_region() cifs_writepages() (5) Pages are converted to folios. (6) Direct I/O uses netfs_extract_user_iter() to create a BVEC-type iterator from an IOBUF/UBUF-type source iterator. (7) smb2_get_aead_req() uses netfs_extract_iter_to_sg() to extract page fragments from the iterator into the scatterlists that the crypto layer prefers. (8) smb2_init_transform_rq() attached pages to smb_rqst::rq_buffer, an xarray, to use as a bounce buffer for encryption. An XARRAY-type iterator can then be used to pass the bounce buffer to lower layers. Signed-off-by: David Howells cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: Paulo Alcantara cc: Jeff Layton cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/164311907995.2806745.400147335497304099.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/164928620163.457102.11602306234438271112.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165211420279.3154751.15923591172438186144.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165348880385.2106726.3220789453472800240.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165364827111.3334034.934805882842932881.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/166126396180.708021.271013668175370826.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/166697259595.61150.5982032408321852414.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732031756.3186319.12528413619888902872.stgit@warthog.procyon.org.uk/ # rfc --- fs/cifs/Kconfig | 1 + fs/cifs/cifsencrypt.c | 28 +- fs/cifs/cifsglob.h | 66 +-- fs/cifs/cifsproto.h | 8 +- fs/cifs/cifssmb.c | 15 +- fs/cifs/file.c | 1197 ++++++++++++++++++++++++++--------------- fs/cifs/fscache.c | 22 +- fs/cifs/fscache.h | 10 +- fs/cifs/misc.c | 128 +---- fs/cifs/smb2ops.c | 362 ++++++------- fs/cifs/smb2pdu.c | 53 +- fs/cifs/smbdirect.c | 262 ++++----- fs/cifs/smbdirect.h | 4 +- fs/cifs/transport.c | 54 +- 14 files changed, 1122 insertions(+), 1088 deletions(-) diff --git a/fs/cifs/Kconfig b/fs/cifs/Kconfig index bbf63a9eb927..4c0d53bf931a 100644 --- a/fs/cifs/Kconfig +++ b/fs/cifs/Kconfig @@ -18,6 +18,7 @@ config CIFS select DNS_RESOLVER select ASN1 select OID_REGISTRY + select NETFS_SUPPORT help This is the client VFS module for the SMB3 family of network file protocols (including the most recent, most secure dialect SMB3.1.1). diff --git a/fs/cifs/cifsencrypt.c b/fs/cifs/cifsencrypt.c index 7be589aeb520..357bd27a7fd1 100644 --- a/fs/cifs/cifsencrypt.c +++ b/fs/cifs/cifsencrypt.c @@ -169,11 +169,11 @@ static int cifs_shash_iter(const struct iov_iter *iter, size_t maxsize, } int __cifs_calc_signature(struct smb_rqst *rqst, - struct TCP_Server_Info *server, char *signature, - struct shash_desc *shash) + struct TCP_Server_Info *server, char *signature, + struct shash_desc *shash) { int i; - int rc; + ssize_t rc; struct kvec *iov = rqst->rq_iov; int n_vec = rqst->rq_nvec; @@ -205,25 +205,9 @@ int __cifs_calc_signature(struct smb_rqst *rqst, } } - /* now hash over the rq_pages array */ - for (i = 0; i < rqst->rq_npages; i++) { - void *kaddr; - unsigned int len, offset; - - rqst_page_get_length(rqst, i, &len, &offset); - - kaddr = (char *) kmap(rqst->rq_pages[i]) + offset; - - rc = crypto_shash_update(shash, kaddr, len); - if (rc) { - cifs_dbg(VFS, "%s: Could not update with payload\n", - __func__); - kunmap(rqst->rq_pages[i]); - return rc; - } - - kunmap(rqst->rq_pages[i]); - } + rc = cifs_shash_iter(&rqst->rq_iter, iov_iter_count(&rqst->rq_iter), shash); + if (rc < 0) + return rc; rc = crypto_shash_final(shash, signature); if (rc) diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index 1d893bea4723..893c2e21eb8e 100644 --- a/fs/cifs/cifsglob.h +++ b/fs/cifs/cifsglob.h @@ -216,11 +216,9 @@ static inline void cifs_free_open_info(struct cifs_open_info_data *data) struct smb_rqst { struct kvec *rq_iov; /* array of kvecs */ unsigned int rq_nvec; /* number of kvecs in array */ - struct page **rq_pages; /* pointer to array of page ptrs */ - unsigned int rq_offset; /* the offset to the 1st page */ - unsigned int rq_npages; /* number pages in array */ - unsigned int rq_pagesz; /* page size to use */ - unsigned int rq_tailsz; /* length of last page */ + size_t rq_iter_size; /* Amount of data in ->rq_iter */ + struct iov_iter rq_iter; /* Data iterator */ + struct xarray rq_buffer; /* Page buffer for encryption */ }; struct mid_q_entry; @@ -1428,10 +1426,11 @@ struct cifs_aio_ctx { struct cifsFileInfo *cfile; struct bio_vec *bv; loff_t pos; - unsigned int npages; + unsigned int nr_pinned_pages; ssize_t rc; unsigned int len; unsigned int total_len; + unsigned int bv_need_unpin; /* If ->bv[] needs unpinning */ bool should_dirty; /* * Indicates if this aio_ctx is for direct_io, @@ -1449,28 +1448,18 @@ struct cifs_readdata { struct address_space *mapping; struct cifs_aio_ctx *ctx; __u64 offset; + ssize_t got_bytes; unsigned int bytes; - unsigned int got_bytes; pid_t pid; int result; struct work_struct work; - int (*read_into_pages)(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, - unsigned int len); - int (*copy_into_pages)(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, - struct iov_iter *iter); + struct iov_iter iter; struct kvec iov[2]; struct TCP_Server_Info *server; #ifdef CONFIG_CIFS_SMB_DIRECT struct smbd_mr *mr; #endif - unsigned int pagesz; - unsigned int page_offset; - unsigned int tailsz; struct cifs_credits credits; - unsigned int nr_pages; - struct page **pages; }; /* asynchronous write support */ @@ -1482,6 +1471,8 @@ struct cifs_writedata { struct work_struct work; struct cifsFileInfo *cfile; struct cifs_aio_ctx *ctx; + struct iov_iter iter; + struct bio_vec *bv; __u64 offset; pid_t pid; unsigned int bytes; @@ -1490,12 +1481,7 @@ struct cifs_writedata { #ifdef CONFIG_CIFS_SMB_DIRECT struct smbd_mr *mr; #endif - unsigned int pagesz; - unsigned int page_offset; - unsigned int tailsz; struct cifs_credits credits; - unsigned int nr_pages; - struct page **pages; }; /* @@ -2155,9 +2141,9 @@ static inline void move_cifs_info_to_smb2(struct smb2_file_all_info *dst, const dst->FileNameLength = src->FileNameLength; } -static inline unsigned int cifs_get_num_sgs(const struct smb_rqst *rqst, - int num_rqst, - const u8 *sig) +static inline int cifs_get_num_sgs(const struct smb_rqst *rqst, + int num_rqst, + const u8 *sig) { unsigned int len, skip; unsigned int nents = 0; @@ -2177,6 +2163,19 @@ static inline unsigned int cifs_get_num_sgs(const struct smb_rqst *rqst, * rqst[1+].rq_iov[0+] data to be encrypted/decrypted */ for (i = 0; i < num_rqst; i++) { + /* We really don't want a mixture of pinned and unpinned pages + * in the sglist. It's hard to keep track of which is what. + * Instead, we convert to a BVEC-type iterator higher up. + */ + if (WARN_ON_ONCE(user_backed_iter(&rqst[i].rq_iter))) + return -EIO; + + /* We also don't want to have any extra refs or pins to clean + * up in the sglist. + */ + if (WARN_ON_ONCE(iov_iter_extract_will_pin(&rqst[i].rq_iter))) + return -EIO; + for (j = 0; j < rqst[i].rq_nvec; j++) { struct kvec *iov = &rqst[i].rq_iov[j]; @@ -2190,7 +2189,7 @@ static inline unsigned int cifs_get_num_sgs(const struct smb_rqst *rqst, } skip = 0; } - nents += rqst[i].rq_npages; + nents += iov_iter_npages(&rqst[i].rq_iter, INT_MAX); } nents += DIV_ROUND_UP(offset_in_page(sig) + SMB2_SIGNATURE_SIZE, PAGE_SIZE); return nents; @@ -2199,9 +2198,9 @@ static inline unsigned int cifs_get_num_sgs(const struct smb_rqst *rqst, /* We can not use the normal sg_set_buf() as we will sometimes pass a * stack object as buf. */ -static inline struct scatterlist *cifs_sg_set_buf(struct scatterlist *sg, - const void *buf, - unsigned int buflen) +static inline void cifs_sg_set_buf(struct sg_table *sgtable, + const void *buf, + unsigned int buflen) { unsigned long addr = (unsigned long)buf; unsigned int off = offset_in_page(addr); @@ -2211,16 +2210,17 @@ static inline struct scatterlist *cifs_sg_set_buf(struct scatterlist *sg, do { unsigned int len = min_t(unsigned int, buflen, PAGE_SIZE - off); - sg_set_page(sg++, vmalloc_to_page((void *)addr), len, off); + sg_set_page(&sgtable->sgl[sgtable->nents++], + vmalloc_to_page((void *)addr), len, off); off = 0; addr += PAGE_SIZE; buflen -= len; } while (buflen); } else { - sg_set_page(sg++, virt_to_page(addr), buflen, off); + sg_set_page(&sgtable->sgl[sgtable->nents++], + virt_to_page(addr), buflen, off); } - return sg; } #endif /* _CIFS_GLOB_H */ diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h index cb7a3fe89278..2873f68a051c 100644 --- a/fs/cifs/cifsproto.h +++ b/fs/cifs/cifsproto.h @@ -584,10 +584,7 @@ int cifs_readv_receive(struct TCP_Server_Info *server, struct mid_q_entry *mid); int cifs_async_writev(struct cifs_writedata *wdata, void (*release)(struct kref *kref)); void cifs_writev_complete(struct work_struct *work); -struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages, - work_func_t complete); -struct cifs_writedata *cifs_writedata_direct_alloc(struct page **pages, - work_func_t complete); +struct cifs_writedata *cifs_writedata_alloc(work_func_t complete); void cifs_writedata_release(struct kref *refcount); int cifs_query_mf_symlink(unsigned int xid, struct cifs_tcon *tcon, struct cifs_sb_info *cifs_sb, @@ -604,13 +601,10 @@ enum securityEnum cifs_select_sectype(struct TCP_Server_Info *, enum securityEnum); struct cifs_aio_ctx *cifs_aio_ctx_alloc(void); void cifs_aio_ctx_release(struct kref *refcount); -int setup_aio_ctx_iter(struct cifs_aio_ctx *ctx, struct iov_iter *iter, int rw); int cifs_alloc_hash(const char *name, struct shash_desc **sdesc); void cifs_free_hash(struct shash_desc **sdesc); -void rqst_page_get_length(const struct smb_rqst *rqst, unsigned int page, - unsigned int *len, unsigned int *offset); struct cifs_chan * cifs_ses_find_chan(struct cifs_ses *ses, struct TCP_Server_Info *server); int cifs_try_adding_channels(struct cifs_sb_info *cifs_sb, struct cifs_ses *ses); diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c index 8c014a3ff9e0..730ae3273698 100644 --- a/fs/cifs/cifssmb.c +++ b/fs/cifs/cifssmb.c @@ -24,6 +24,7 @@ #include #include #include "cifspdu.h" +#include "cifsfs.h" #include "cifsglob.h" #include "cifsacl.h" #include "cifsproto.h" @@ -1294,11 +1295,8 @@ cifs_readv_callback(struct mid_q_entry *mid) struct TCP_Server_Info *server = tcon->ses->server; struct smb_rqst rqst = { .rq_iov = rdata->iov, .rq_nvec = 2, - .rq_pages = rdata->pages, - .rq_offset = rdata->page_offset, - .rq_npages = rdata->nr_pages, - .rq_pagesz = rdata->pagesz, - .rq_tailsz = rdata->tailsz }; + .rq_iter_size = iov_iter_count(&rdata->iter), + .rq_iter = rdata->iter }; struct cifs_credits credits = { .value = 1, .instance = 0 }; cifs_dbg(FYI, "%s: mid=%llu state=%d result=%d bytes=%u\n", @@ -1737,11 +1735,8 @@ cifs_async_writev(struct cifs_writedata *wdata, rqst.rq_iov = iov; rqst.rq_nvec = 2; - rqst.rq_pages = wdata->pages; - rqst.rq_offset = wdata->page_offset; - rqst.rq_npages = wdata->nr_pages; - rqst.rq_pagesz = wdata->pagesz; - rqst.rq_tailsz = wdata->tailsz; + rqst.rq_iter = wdata->iter; + rqst.rq_iter_size = iov_iter_count(&wdata->iter); cifs_dbg(FYI, "async write at %llu %u bytes\n", wdata->offset, wdata->bytes); diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 09240b8b018a..33779d184692 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -36,6 +36,32 @@ #include "cifs_ioctl.h" #include "cached_dir.h" +/* + * Remove the dirty flags from a span of pages. + */ +static void cifs_undirty_folios(struct inode *inode, loff_t start, unsigned int len) +{ + struct address_space *mapping = inode->i_mapping; + struct folio *folio; + pgoff_t end; + + XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE); + + rcu_read_lock(); + + end = (start + len - 1) / PAGE_SIZE; + xas_for_each_marked(&xas, folio, end, PAGECACHE_TAG_DIRTY) { + xas_pause(&xas); + rcu_read_unlock(); + folio_lock(folio); + folio_clear_dirty_for_io(folio); + folio_unlock(folio); + rcu_read_lock(); + } + + rcu_read_unlock(); +} + /* * Completion of write to server. */ @@ -2391,7 +2417,6 @@ cifs_writedata_release(struct kref *refcount) if (wdata->cfile) cifsFileInfo_put(wdata->cfile); - kvfree(wdata->pages); kfree(wdata); } @@ -2402,51 +2427,49 @@ cifs_writedata_release(struct kref *refcount) static void cifs_writev_requeue(struct cifs_writedata *wdata) { - int i, rc = 0; + int rc = 0; struct inode *inode = d_inode(wdata->cfile->dentry); struct TCP_Server_Info *server; - unsigned int rest_len; + unsigned int rest_len = wdata->bytes; + loff_t fpos = wdata->offset; server = tlink_tcon(wdata->cfile->tlink)->ses->server; - i = 0; - rest_len = wdata->bytes; do { struct cifs_writedata *wdata2; - unsigned int j, nr_pages, wsize, tailsz, cur_len; + unsigned int wsize, cur_len; wsize = server->ops->wp_retry_size(inode); if (wsize < rest_len) { - nr_pages = wsize / PAGE_SIZE; - if (!nr_pages) { - rc = -EOPNOTSUPP; + if (wsize < PAGE_SIZE) { + rc = -ENOTSUPP; break; } - cur_len = nr_pages * PAGE_SIZE; - tailsz = PAGE_SIZE; + cur_len = min(round_down(wsize, PAGE_SIZE), rest_len); } else { - nr_pages = DIV_ROUND_UP(rest_len, PAGE_SIZE); cur_len = rest_len; - tailsz = rest_len - (nr_pages - 1) * PAGE_SIZE; } - wdata2 = cifs_writedata_alloc(nr_pages, cifs_writev_complete); + wdata2 = cifs_writedata_alloc(cifs_writev_complete); if (!wdata2) { rc = -ENOMEM; break; } - for (j = 0; j < nr_pages; j++) { - wdata2->pages[j] = wdata->pages[i + j]; - lock_page(wdata2->pages[j]); - clear_page_dirty_for_io(wdata2->pages[j]); - } - wdata2->sync_mode = wdata->sync_mode; - wdata2->nr_pages = nr_pages; - wdata2->offset = page_offset(wdata2->pages[0]); - wdata2->pagesz = PAGE_SIZE; - wdata2->tailsz = tailsz; - wdata2->bytes = cur_len; + wdata2->offset = fpos; + wdata2->bytes = cur_len; + wdata2->iter = wdata->iter; + + iov_iter_advance(&wdata2->iter, fpos - wdata->offset); + iov_iter_truncate(&wdata2->iter, wdata2->bytes); + + if (iov_iter_is_xarray(&wdata2->iter)) + /* Check for pages having been redirtied and clean + * them. We can do this by walking the xarray. If + * it's not an xarray, then it's a DIO and we shouldn't + * be mucking around with the page bits. + */ + cifs_undirty_folios(inode, fpos, cur_len); rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY, &wdata2->cfile); @@ -2461,33 +2484,22 @@ cifs_writev_requeue(struct cifs_writedata *wdata) cifs_writedata_release); } - for (j = 0; j < nr_pages; j++) { - unlock_page(wdata2->pages[j]); - if (rc != 0 && !is_retryable_error(rc)) { - SetPageError(wdata2->pages[j]); - end_page_writeback(wdata2->pages[j]); - put_page(wdata2->pages[j]); - } - } - kref_put(&wdata2->refcount, cifs_writedata_release); if (rc) { if (is_retryable_error(rc)) continue; - i += nr_pages; + fpos += cur_len; + rest_len -= cur_len; break; } + fpos += cur_len; rest_len -= cur_len; - i += nr_pages; - } while (i < wdata->nr_pages); + } while (rest_len > 0); - /* cleanup remaining pages from the original wdata */ - for (; i < wdata->nr_pages; i++) { - SetPageError(wdata->pages[i]); - end_page_writeback(wdata->pages[i]); - put_page(wdata->pages[i]); - } + /* Clean up remaining pages from the original wdata */ + if (iov_iter_is_xarray(&wdata->iter)) + cifs_pages_write_failed(inode, fpos, rest_len); if (rc != 0 && !is_retryable_error(rc)) mapping_set_error(inode->i_mapping, rc); @@ -2500,7 +2512,6 @@ cifs_writev_complete(struct work_struct *work) struct cifs_writedata *wdata = container_of(work, struct cifs_writedata, work); struct inode *inode = d_inode(wdata->cfile->dentry); - int i = 0; if (wdata->result == 0) { spin_lock(&inode->i_lock); @@ -2511,45 +2522,24 @@ cifs_writev_complete(struct work_struct *work) } else if (wdata->sync_mode == WB_SYNC_ALL && wdata->result == -EAGAIN) return cifs_writev_requeue(wdata); - for (i = 0; i < wdata->nr_pages; i++) { - struct page *page = wdata->pages[i]; + if (wdata->result == -EAGAIN) + cifs_pages_write_redirty(inode, wdata->offset, wdata->bytes); + else if (wdata->result < 0) + cifs_pages_write_failed(inode, wdata->offset, wdata->bytes); + else + cifs_pages_written_back(inode, wdata->offset, wdata->bytes); - if (wdata->result == -EAGAIN) - __set_page_dirty_nobuffers(page); - else if (wdata->result < 0) - SetPageError(page); - end_page_writeback(page); - cifs_readpage_to_fscache(inode, page); - put_page(page); - } if (wdata->result != -EAGAIN) mapping_set_error(inode->i_mapping, wdata->result); kref_put(&wdata->refcount, cifs_writedata_release); } -struct cifs_writedata * -cifs_writedata_alloc(unsigned int nr_pages, work_func_t complete) -{ - struct cifs_writedata *writedata = NULL; - struct page **pages = - kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS); - if (pages) { - writedata = cifs_writedata_direct_alloc(pages, complete); - if (!writedata) - kvfree(pages); - } - - return writedata; -} - -struct cifs_writedata * -cifs_writedata_direct_alloc(struct page **pages, work_func_t complete) +struct cifs_writedata *cifs_writedata_alloc(work_func_t complete) { struct cifs_writedata *wdata; wdata = kzalloc(sizeof(*wdata), GFP_NOFS); if (wdata != NULL) { - wdata->pages = pages; kref_init(&wdata->refcount); INIT_LIST_HEAD(&wdata->list); init_completion(&wdata->done); @@ -2558,7 +2548,6 @@ cifs_writedata_direct_alloc(struct page **pages, work_func_t complete) return wdata; } - static int cifs_partialpagewrite(struct page *page, unsigned from, unsigned to) { struct address_space *mapping = page->mapping; @@ -2617,6 +2606,7 @@ static int cifs_partialpagewrite(struct page *page, unsigned from, unsigned to) return rc; } +#if 0 // TODO: Remove for iov_iter support static struct cifs_writedata * wdata_alloc_and_fillpages(pgoff_t tofind, struct address_space *mapping, pgoff_t end, pgoff_t *index, @@ -2922,6 +2912,375 @@ static int cifs_writepages(struct address_space *mapping, set_bit(CIFS_INO_MODIFIED_ATTR, &CIFS_I(inode)->flags); return rc; } +#endif + +/* + * Extend the region to be written back to include subsequent contiguously + * dirty pages if possible, but don't sleep while doing so. + */ +static void cifs_extend_writeback(struct address_space *mapping, + long *_count, + loff_t start, + int max_pages, + size_t max_len, + unsigned int *_len) +{ + struct folio_batch batch; + struct folio *folio; + unsigned int psize, nr_pages; + size_t len = *_len; + pgoff_t index = (start + len) / PAGE_SIZE; + bool stop = true; + unsigned int i; + + XA_STATE(xas, &mapping->i_pages, index); + folio_batch_init(&batch); + + do { + /* Firstly, we gather up a batch of contiguous dirty pages + * under the RCU read lock - but we can't clear the dirty flags + * there if any of those pages are mapped. + */ + rcu_read_lock(); + + xas_for_each(&xas, folio, ULONG_MAX) { + stop = true; + if (xas_retry(&xas, folio)) + continue; + if (xa_is_value(folio)) + break; + if (folio_index(folio) != index) + break; + if (!folio_try_get_rcu(folio)) { + xas_reset(&xas); + continue; + } + nr_pages = folio_nr_pages(folio); + if (nr_pages > max_pages) + break; + + /* Has the page moved or been split? */ + if (unlikely(folio != xas_reload(&xas))) { + folio_put(folio); + break; + } + + if (!folio_trylock(folio)) { + folio_put(folio); + break; + } + if (!folio_test_dirty(folio) || folio_test_writeback(folio)) { + folio_unlock(folio); + folio_put(folio); + break; + } + + max_pages -= nr_pages; + psize = folio_size(folio); + len += psize; + stop = false; + if (max_pages <= 0 || len >= max_len || *_count <= 0) + stop = true; + + index += nr_pages; + if (!folio_batch_add(&batch, folio)) + break; + if (stop) + break; + } + + if (!stop) + xas_pause(&xas); + rcu_read_unlock(); + + /* Now, if we obtained any pages, we can shift them to being + * writable and mark them for caching. + */ + if (!folio_batch_count(&batch)) + break; + + for (i = 0; i < folio_batch_count(&batch); i++) { + folio = batch.folios[i]; + /* The folio should be locked, dirty and not undergoing + * writeback from the loop above. + */ + if (!folio_clear_dirty_for_io(folio)) + WARN_ON(1); + if (folio_start_writeback(folio)) + WARN_ON(1); + + *_count -= folio_nr_pages(folio); + folio_unlock(folio); + } + + folio_batch_release(&batch); + cond_resched(); + } while (!stop); + + *_len = len; +} + +/* + * Write back the locked page and any subsequent non-locked dirty pages. + */ +static ssize_t cifs_write_back_from_locked_folio(struct address_space *mapping, + struct writeback_control *wbc, + struct folio *folio, + loff_t start, loff_t end) +{ + struct inode *inode = mapping->host; + struct TCP_Server_Info *server; + struct cifs_writedata *wdata; + struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); + struct cifs_credits credits_on_stack; + struct cifs_credits *credits = &credits_on_stack; + struct cifsFileInfo *cfile = NULL; + unsigned int xid, wsize, len; + loff_t i_size = i_size_read(inode); + size_t max_len; + long count = wbc->nr_to_write; + int rc; + + /* The folio should be locked, dirty and not undergoing writeback. */ + if (folio_start_writeback(folio)) + WARN_ON(1); + + count -= folio_nr_pages(folio); + len = folio_size(folio); + + xid = get_xid(); + server = cifs_pick_channel(cifs_sb_master_tcon(cifs_sb)->ses); + + rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY, &cfile); + if (rc) { + cifs_dbg(VFS, "No writable handle in writepages rc=%d\n", rc); + goto err_xid; + } + + rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->wsize, + &wsize, credits); + if (rc != 0) + goto err_close; + + wdata = cifs_writedata_alloc(cifs_writev_complete); + if (!wdata) { + rc = -ENOMEM; + goto err_uncredit; + } + + wdata->sync_mode = wbc->sync_mode; + wdata->offset = folio_pos(folio); + wdata->pid = cfile->pid; + wdata->credits = credits_on_stack; + wdata->cfile = cfile; + wdata->server = server; + cfile = NULL; + + /* Find all consecutive lockable dirty pages, stopping when we find a + * page that is not immediately lockable, is not dirty or is missing, + * or we reach the end of the range. + */ + if (start < i_size) { + /* Trim the write to the EOF; the extra data is ignored. Also + * put an upper limit on the size of a single storedata op. + */ + max_len = wsize; + max_len = min_t(unsigned long long, max_len, end - start + 1); + max_len = min_t(unsigned long long, max_len, i_size - start); + + if (len < max_len) { + int max_pages = INT_MAX; + +#ifdef CONFIG_CIFS_SMB_DIRECT + if (server->smbd_conn) + max_pages = server->smbd_conn->max_frmr_depth; +#endif + max_pages -= folio_nr_pages(folio); + + if (max_pages > 0) + cifs_extend_writeback(mapping, &count, start, + max_pages, max_len, &len); + } + len = min_t(loff_t, len, max_len); + } + + wdata->bytes = len; + + /* We now have a contiguous set of dirty pages, each with writeback + * set; the first page is still locked at this point, but all the rest + * have been unlocked. + */ + folio_unlock(folio); + + if (start < i_size) { + iov_iter_xarray(&wdata->iter, ITER_SOURCE, &mapping->i_pages, + start, len); + + rc = adjust_credits(wdata->server, &wdata->credits, wdata->bytes); + if (rc) + goto err_wdata; + + if (wdata->cfile->invalidHandle) + rc = -EAGAIN; + else + rc = wdata->server->ops->async_writev(wdata, + cifs_writedata_release); + if (rc >= 0) { + kref_put(&wdata->refcount, cifs_writedata_release); + goto err_close; + } + } else { + /* The dirty region was entirely beyond the EOF. */ + cifs_pages_written_back(inode, start, len); + rc = 0; + } + +err_wdata: + kref_put(&wdata->refcount, cifs_writedata_release); +err_uncredit: + add_credits_and_wake_if(server, credits, 0); +err_close: + if (cfile) + cifsFileInfo_put(cfile); +err_xid: + free_xid(xid); + if (rc == 0) { + wbc->nr_to_write = count; + } else if (is_retryable_error(rc)) { + cifs_pages_write_redirty(inode, start, len); + } else { + cifs_pages_write_failed(inode, start, len); + mapping_set_error(mapping, rc); + } + /* Indication to update ctime and mtime as close is deferred */ + set_bit(CIFS_INO_MODIFIED_ATTR, &CIFS_I(inode)->flags); + return rc; +} + +/* + * write a region of pages back to the server + */ +static int cifs_writepages_region(struct address_space *mapping, + struct writeback_control *wbc, + loff_t start, loff_t end, loff_t *_next) +{ + struct folio *folio; + struct page *head_page; + ssize_t ret; + int n, skips = 0; + + do { + pgoff_t index = start / PAGE_SIZE; + + n = find_get_pages_range_tag(mapping, &index, end / PAGE_SIZE, + PAGECACHE_TAG_DIRTY, 1, &head_page); + if (!n) + break; + + folio = page_folio(head_page); + start = folio_pos(folio); /* May regress with THPs */ + + /* At this point we hold neither the i_pages lock nor the + * page lock: the page may be truncated or invalidated + * (changing page->mapping to NULL), or even swizzled + * back from swapper_space to tmpfs file mapping + */ + if (wbc->sync_mode != WB_SYNC_NONE) { + ret = folio_lock_killable(folio); + if (ret < 0) { + folio_put(folio); + return ret; + } + } else { + if (!folio_trylock(folio)) { + folio_put(folio); + return 0; + } + } + + if (folio_mapping(folio) != mapping || + !folio_test_dirty(folio)) { + start += folio_size(folio); + folio_unlock(folio); + folio_put(folio); + continue; + } + + if (folio_test_writeback(folio) || + folio_test_fscache(folio)) { + folio_unlock(folio); + if (wbc->sync_mode != WB_SYNC_NONE) { + folio_wait_writeback(folio); +#ifdef CONFIG_CIFS_FSCACHE + folio_wait_fscache(folio); +#endif + } else { + start += folio_size(folio); + } + folio_put(folio); + if (wbc->sync_mode == WB_SYNC_NONE) { + if (skips >= 5 || need_resched()) + break; + skips++; + } + continue; + } + + if (!folio_clear_dirty_for_io(folio)) + /* We hold the page lock - it should've been dirty. */ + WARN_ON(1); + + ret = cifs_write_back_from_locked_folio(mapping, wbc, folio, start, end); + folio_put(folio); + if (ret < 0) + return ret; + + start += ret; + cond_resched(); + } while (wbc->nr_to_write > 0); + + *_next = start; + return 0; +} + +/* + * Write some of the pending data back to the server + */ +static int cifs_writepages(struct address_space *mapping, + struct writeback_control *wbc) +{ + loff_t start, next; + int ret; + + /* We have to be careful as we can end up racing with setattr() + * truncating the pagecache since the caller doesn't take a lock here + * to prevent it. + */ + + if (wbc->range_cyclic) { + start = mapping->writeback_index * PAGE_SIZE; + ret = cifs_writepages_region(mapping, wbc, start, LLONG_MAX, &next); + if (ret == 0) { + mapping->writeback_index = next / PAGE_SIZE; + if (start > 0 && wbc->nr_to_write > 0) { + ret = cifs_writepages_region(mapping, wbc, 0, + start, &next); + if (ret == 0) + mapping->writeback_index = + next / PAGE_SIZE; + } + } + } else if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) { + ret = cifs_writepages_region(mapping, wbc, 0, LLONG_MAX, &next); + if (wbc->nr_to_write > 0 && ret == 0) + mapping->writeback_index = next / PAGE_SIZE; + } else { + ret = cifs_writepages_region(mapping, wbc, + wbc->range_start, wbc->range_end, &next); + } + + return ret; +} static int cifs_writepage_locked(struct page *page, struct writeback_control *wbc) @@ -2972,6 +3331,7 @@ static int cifs_write_end(struct file *file, struct address_space *mapping, struct inode *inode = mapping->host; struct cifsFileInfo *cfile = file->private_data; struct cifs_sb_info *cifs_sb = CIFS_SB(cfile->dentry->d_sb); + struct folio *folio = page_folio(page); __u32 pid; if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD) @@ -2982,14 +3342,14 @@ static int cifs_write_end(struct file *file, struct address_space *mapping, cifs_dbg(FYI, "write_end for page %p from pos %lld with %d bytes\n", page, pos, copied); - if (PageChecked(page)) { + if (folio_test_checked(folio)) { if (copied == len) - SetPageUptodate(page); - ClearPageChecked(page); - } else if (!PageUptodate(page) && copied == PAGE_SIZE) - SetPageUptodate(page); + folio_mark_uptodate(folio); + folio_clear_checked(folio); + } else if (!folio_test_uptodate(folio) && copied == PAGE_SIZE) + folio_mark_uptodate(folio); - if (!PageUptodate(page)) { + if (!folio_test_uptodate(folio)) { char *page_data; unsigned offset = pos & (PAGE_SIZE - 1); unsigned int xid; @@ -3149,6 +3509,7 @@ int cifs_flush(struct file *file, fl_owner_t id) return rc; } +#if 0 // TODO: Remove for iov_iter support static int cifs_write_allocate_pages(struct page **pages, unsigned long num_pages) { @@ -3189,17 +3550,15 @@ size_t get_numpages(const size_t wsize, const size_t len, size_t *cur_len) return num_pages; } +#endif static void cifs_uncached_writedata_release(struct kref *refcount) { - int i; struct cifs_writedata *wdata = container_of(refcount, struct cifs_writedata, refcount); kref_put(&wdata->ctx->refcount, cifs_aio_ctx_release); - for (i = 0; i < wdata->nr_pages; i++) - put_page(wdata->pages[i]); cifs_writedata_release(refcount); } @@ -3225,6 +3584,7 @@ cifs_uncached_writev_complete(struct work_struct *work) kref_put(&wdata->refcount, cifs_uncached_writedata_release); } +#if 0 // TODO: Remove for iov_iter support static int wdata_fill_from_iovec(struct cifs_writedata *wdata, struct iov_iter *from, size_t *len, unsigned long *num_pages) @@ -3266,6 +3626,7 @@ wdata_fill_from_iovec(struct cifs_writedata *wdata, struct iov_iter *from, *num_pages = i + 1; return 0; } +#endif static int cifs_resend_wdata(struct cifs_writedata *wdata, struct list_head *wdata_list, @@ -3337,23 +3698,57 @@ cifs_resend_wdata(struct cifs_writedata *wdata, struct list_head *wdata_list, return rc; } +/* + * Select span of a bvec iterator we're going to use. Limit it by both maximum + * size and maximum number of segments. + */ +static size_t cifs_limit_bvec_subset(const struct iov_iter *iter, size_t max_size, + size_t max_segs, unsigned int *_nsegs) +{ + const struct bio_vec *bvecs = iter->bvec; + unsigned int nbv = iter->nr_segs, ix = 0, nsegs = 0; + size_t len, span = 0, n = iter->count; + size_t skip = iter->iov_offset; + + if (WARN_ON(!iov_iter_is_bvec(iter)) || n == 0) + return 0; + + while (n && ix < nbv && skip) { + len = bvecs[ix].bv_len; + if (skip < len) + break; + skip -= len; + n -= len; + ix++; + } + + while (n && ix < nbv) { + len = min3(n, bvecs[ix].bv_len - skip, max_size); + span += len; + nsegs++; + ix++; + if (span >= max_size || nsegs >= max_segs) + break; + skip = 0; + n -= len; + } + + *_nsegs = nsegs; + return span; +} + static int -cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from, +cifs_write_from_iter(loff_t fpos, size_t len, struct iov_iter *from, struct cifsFileInfo *open_file, struct cifs_sb_info *cifs_sb, struct list_head *wdata_list, struct cifs_aio_ctx *ctx) { int rc = 0; - size_t cur_len; - unsigned long nr_pages, num_pages, i; + size_t cur_len, max_len; struct cifs_writedata *wdata; - struct iov_iter saved_from = *from; - loff_t saved_offset = offset; pid_t pid; struct TCP_Server_Info *server; - struct page **pagevec; - size_t start; - unsigned int xid; + unsigned int xid, max_segs = INT_MAX; if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD) pid = open_file->pid; @@ -3363,10 +3758,20 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from, server = cifs_pick_channel(tlink_tcon(open_file->tlink)->ses); xid = get_xid(); +#ifdef CONFIG_CIFS_SMB_DIRECT + if (server->smbd_conn) + max_segs = server->smbd_conn->max_frmr_depth; +#endif + do { - unsigned int wsize; struct cifs_credits credits_on_stack; struct cifs_credits *credits = &credits_on_stack; + unsigned int wsize, nsegs = 0; + + if (signal_pending(current)) { + rc = -EINTR; + break; + } if (open_file->invalidHandle) { rc = cifs_reopen_file(open_file, false); @@ -3381,99 +3786,42 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from, if (rc) break; - cur_len = min_t(const size_t, len, wsize); - - if (ctx->direct_io) { - ssize_t result; - - result = iov_iter_get_pages_alloc2( - from, &pagevec, cur_len, &start); - if (result < 0) { - cifs_dbg(VFS, - "direct_writev couldn't get user pages (rc=%zd) iter type %d iov_offset %zd count %zd\n", - result, iov_iter_type(from), - from->iov_offset, from->count); - dump_stack(); - - rc = result; - add_credits_and_wake_if(server, credits, 0); - break; - } - cur_len = (size_t)result; - - nr_pages = - (cur_len + start + PAGE_SIZE - 1) / PAGE_SIZE; - - wdata = cifs_writedata_direct_alloc(pagevec, - cifs_uncached_writev_complete); - if (!wdata) { - rc = -ENOMEM; - for (i = 0; i < nr_pages; i++) - put_page(pagevec[i]); - kvfree(pagevec); - add_credits_and_wake_if(server, credits, 0); - break; - } - - - wdata->page_offset = start; - wdata->tailsz = - nr_pages > 1 ? - cur_len - (PAGE_SIZE - start) - - (nr_pages - 2) * PAGE_SIZE : - cur_len; - } else { - nr_pages = get_numpages(wsize, len, &cur_len); - wdata = cifs_writedata_alloc(nr_pages, - cifs_uncached_writev_complete); - if (!wdata) { - rc = -ENOMEM; - add_credits_and_wake_if(server, credits, 0); - break; - } - - rc = cifs_write_allocate_pages(wdata->pages, nr_pages); - if (rc) { - kvfree(wdata->pages); - kfree(wdata); - add_credits_and_wake_if(server, credits, 0); - break; - } - - num_pages = nr_pages; - rc = wdata_fill_from_iovec( - wdata, from, &cur_len, &num_pages); - if (rc) { - for (i = 0; i < nr_pages; i++) - put_page(wdata->pages[i]); - kvfree(wdata->pages); - kfree(wdata); - add_credits_and_wake_if(server, credits, 0); - break; - } + max_len = min_t(const size_t, len, wsize); + if (!max_len) { + rc = -EAGAIN; + add_credits_and_wake_if(server, credits, 0); + break; + } - /* - * Bring nr_pages down to the number of pages we - * actually used, and free any pages that we didn't use. - */ - for ( ; nr_pages > num_pages; nr_pages--) - put_page(wdata->pages[nr_pages - 1]); + cur_len = cifs_limit_bvec_subset(from, max_len, max_segs, &nsegs); + cifs_dbg(FYI, "write_from_iter len=%zx/%zx nsegs=%u/%lu/%u\n", + cur_len, max_len, nsegs, from->nr_segs, max_segs); + if (cur_len == 0) { + rc = -EIO; + add_credits_and_wake_if(server, credits, 0); + break; + } - wdata->tailsz = cur_len - ((nr_pages - 1) * PAGE_SIZE); + wdata = cifs_writedata_alloc(cifs_uncached_writev_complete); + if (!wdata) { + rc = -ENOMEM; + add_credits_and_wake_if(server, credits, 0); + break; } wdata->sync_mode = WB_SYNC_ALL; - wdata->nr_pages = nr_pages; - wdata->offset = (__u64)offset; - wdata->cfile = cifsFileInfo_get(open_file); - wdata->server = server; - wdata->pid = pid; - wdata->bytes = cur_len; - wdata->pagesz = PAGE_SIZE; - wdata->credits = credits_on_stack; - wdata->ctx = ctx; + wdata->offset = (__u64)fpos; + wdata->cfile = cifsFileInfo_get(open_file); + wdata->server = server; + wdata->pid = pid; + wdata->bytes = cur_len; + wdata->credits = credits_on_stack; + wdata->iter = *from; + wdata->ctx = ctx; kref_get(&ctx->refcount); + iov_iter_truncate(&wdata->iter, cur_len); + rc = adjust_credits(server, &wdata->credits, wdata->bytes); if (!rc) { @@ -3488,16 +3836,14 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from, add_credits_and_wake_if(server, &wdata->credits, 0); kref_put(&wdata->refcount, cifs_uncached_writedata_release); - if (rc == -EAGAIN) { - *from = saved_from; - iov_iter_advance(from, offset - saved_offset); + if (rc == -EAGAIN) continue; - } break; } list_add_tail(&wdata->list, wdata_list); - offset += cur_len; + iov_iter_advance(from, cur_len); + fpos += cur_len; len -= cur_len; } while (len > 0); @@ -3596,8 +3942,6 @@ static ssize_t __cifs_writev( struct cifs_tcon *tcon; struct cifs_sb_info *cifs_sb; struct cifs_aio_ctx *ctx; - struct iov_iter saved_from = *from; - size_t len = iov_iter_count(from); int rc; /* @@ -3631,23 +3975,54 @@ static ssize_t __cifs_writev( ctx->iocb = iocb; ctx->pos = iocb->ki_pos; + ctx->direct_io = direct; + ctx->nr_pinned_pages = 0; - if (direct) { - ctx->direct_io = true; - ctx->iter = *from; - ctx->len = len; - } else { - rc = setup_aio_ctx_iter(ctx, from, ITER_SOURCE); - if (rc) { + if (user_backed_iter(from)) { + /* + * Extract IOVEC/UBUF-type iterators to a BVEC-type iterator as + * they contain references to the calling process's virtual + * memory layout which won't be available in an async worker + * thread. This also takes a pin on every folio involved. + */ + rc = netfs_extract_user_iter(from, iov_iter_count(from), + &ctx->iter, 0); + if (rc < 0) { kref_put(&ctx->refcount, cifs_aio_ctx_release); return rc; } + + ctx->nr_pinned_pages = rc; + ctx->bv = (void *)ctx->iter.bvec; + ctx->bv_need_unpin = iov_iter_extract_will_pin(&ctx->iter); + } else if ((iov_iter_is_bvec(from) || iov_iter_is_kvec(from)) && + !is_sync_kiocb(iocb)) { + /* + * If the op is asynchronous, we need to copy the list attached + * to a BVEC/KVEC-type iterator, but we assume that the storage + * will be pinned by the caller; in any case, we may or may not + * be able to pin the pages, so we don't try. + */ + ctx->bv = (void *)dup_iter(&ctx->iter, from, GFP_KERNEL); + if (!ctx->bv) { + kref_put(&ctx->refcount, cifs_aio_ctx_release); + return -ENOMEM; + } + } else { + /* + * Otherwise, we just pass the iterator down as-is and rely on + * the caller to make sure the pages referred to by the + * iterator don't evaporate. + */ + ctx->iter = *from; } + ctx->len = iov_iter_count(&ctx->iter); + /* grab a lock here due to read response handlers can access ctx */ mutex_lock(&ctx->aio_mutex); - rc = cifs_write_from_iter(iocb->ki_pos, ctx->len, &saved_from, + rc = cifs_write_from_iter(iocb->ki_pos, ctx->len, &ctx->iter, cfile, cifs_sb, &ctx->list, ctx); /* @@ -3790,14 +4165,12 @@ cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from) return written; } -static struct cifs_readdata * -cifs_readdata_direct_alloc(struct page **pages, work_func_t complete) +static struct cifs_readdata *cifs_readdata_alloc(work_func_t complete) { struct cifs_readdata *rdata; rdata = kzalloc(sizeof(*rdata), GFP_KERNEL); - if (rdata != NULL) { - rdata->pages = pages; + if (rdata) { kref_init(&rdata->refcount); INIT_LIST_HEAD(&rdata->list); init_completion(&rdata->done); @@ -3807,27 +4180,14 @@ cifs_readdata_direct_alloc(struct page **pages, work_func_t complete) return rdata; } -static struct cifs_readdata * -cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete) -{ - struct page **pages = - kcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL); - struct cifs_readdata *ret = NULL; - - if (pages) { - ret = cifs_readdata_direct_alloc(pages, complete); - if (!ret) - kfree(pages); - } - - return ret; -} - void cifs_readdata_release(struct kref *refcount) { struct cifs_readdata *rdata = container_of(refcount, struct cifs_readdata, refcount); + + if (rdata->ctx) + kref_put(&rdata->ctx->refcount, cifs_aio_ctx_release); #ifdef CONFIG_CIFS_SMB_DIRECT if (rdata->mr) { smbd_deregister_mr(rdata->mr); @@ -3837,85 +4197,9 @@ cifs_readdata_release(struct kref *refcount) if (rdata->cfile) cifsFileInfo_put(rdata->cfile); - kvfree(rdata->pages); kfree(rdata); } -static int -cifs_read_allocate_pages(struct cifs_readdata *rdata, unsigned int nr_pages) -{ - int rc = 0; - struct page *page; - unsigned int i; - - for (i = 0; i < nr_pages; i++) { - page = alloc_page(GFP_KERNEL|__GFP_HIGHMEM); - if (!page) { - rc = -ENOMEM; - break; - } - rdata->pages[i] = page; - } - - if (rc) { - unsigned int nr_page_failed = i; - - for (i = 0; i < nr_page_failed; i++) { - put_page(rdata->pages[i]); - rdata->pages[i] = NULL; - } - } - return rc; -} - -static void -cifs_uncached_readdata_release(struct kref *refcount) -{ - struct cifs_readdata *rdata = container_of(refcount, - struct cifs_readdata, refcount); - unsigned int i; - - kref_put(&rdata->ctx->refcount, cifs_aio_ctx_release); - for (i = 0; i < rdata->nr_pages; i++) { - put_page(rdata->pages[i]); - } - cifs_readdata_release(refcount); -} - -/** - * cifs_readdata_to_iov - copy data from pages in response to an iovec - * @rdata: the readdata response with list of pages holding data - * @iter: destination for our data - * - * This function copies data from a list of pages in a readdata response into - * an array of iovecs. It will first calculate where the data should go - * based on the info in the readdata and then copy the data into that spot. - */ -static int -cifs_readdata_to_iov(struct cifs_readdata *rdata, struct iov_iter *iter) -{ - size_t remaining = rdata->got_bytes; - unsigned int i; - - for (i = 0; i < rdata->nr_pages; i++) { - struct page *page = rdata->pages[i]; - size_t copy = min_t(size_t, remaining, PAGE_SIZE); - size_t written; - - if (unlikely(iov_iter_is_pipe(iter))) { - void *addr = kmap_atomic(page); - - written = copy_to_iter(addr, copy, iter); - kunmap_atomic(addr); - } else - written = copy_page_to_iter(page, 0, copy, iter); - remaining -= written; - if (written < copy && iov_iter_count(iter) > 0) - break; - } - return remaining ? -EFAULT : 0; -} - static void collect_uncached_read_data(struct cifs_aio_ctx *ctx); static void @@ -3927,9 +4211,11 @@ cifs_uncached_readv_complete(struct work_struct *work) complete(&rdata->done); collect_uncached_read_data(rdata->ctx); /* the below call can possibly free the last ref to aio ctx */ - kref_put(&rdata->refcount, cifs_uncached_readdata_release); + kref_put(&rdata->refcount, cifs_readdata_release); } +#if 0 // TODO: Remove for iov_iter support + static int uncached_fill_pages(struct TCP_Server_Info *server, struct cifs_readdata *rdata, struct iov_iter *iter, @@ -4003,6 +4289,7 @@ cifs_uncached_copy_into_pages(struct TCP_Server_Info *server, { return uncached_fill_pages(server, rdata, iter, iter->count); } +#endif static int cifs_resend_rdata(struct cifs_readdata *rdata, struct list_head *rdata_list, @@ -4072,37 +4359,36 @@ static int cifs_resend_rdata(struct cifs_readdata *rdata, } while (rc == -EAGAIN); fail: - kref_put(&rdata->refcount, cifs_uncached_readdata_release); + kref_put(&rdata->refcount, cifs_readdata_release); return rc; } static int -cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file, +cifs_send_async_read(loff_t fpos, size_t len, struct cifsFileInfo *open_file, struct cifs_sb_info *cifs_sb, struct list_head *rdata_list, struct cifs_aio_ctx *ctx) { struct cifs_readdata *rdata; - unsigned int npages, rsize; + unsigned int rsize, nsegs, max_segs = INT_MAX; struct cifs_credits credits_on_stack; struct cifs_credits *credits = &credits_on_stack; - size_t cur_len; + size_t cur_len, max_len; int rc; pid_t pid; struct TCP_Server_Info *server; - struct page **pagevec; - size_t start; - struct iov_iter direct_iov = ctx->iter; server = cifs_pick_channel(tlink_tcon(open_file->tlink)->ses); +#ifdef CONFIG_CIFS_SMB_DIRECT + if (server->smbd_conn) + max_segs = server->smbd_conn->max_frmr_depth; +#endif + if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD) pid = open_file->pid; else pid = current->tgid; - if (ctx->direct_io) - iov_iter_advance(&direct_iov, offset - ctx->pos); - do { if (open_file->invalidHandle) { rc = cifs_reopen_file(open_file, true); @@ -4122,78 +4408,37 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file, if (rc) break; - cur_len = min_t(const size_t, len, rsize); - - if (ctx->direct_io) { - ssize_t result; - - result = iov_iter_get_pages_alloc2( - &direct_iov, &pagevec, - cur_len, &start); - if (result < 0) { - cifs_dbg(VFS, - "Couldn't get user pages (rc=%zd) iter type %d iov_offset %zd count %zd\n", - result, iov_iter_type(&direct_iov), - direct_iov.iov_offset, - direct_iov.count); - dump_stack(); - - rc = result; - add_credits_and_wake_if(server, credits, 0); - break; - } - cur_len = (size_t)result; - - rdata = cifs_readdata_direct_alloc( - pagevec, cifs_uncached_readv_complete); - if (!rdata) { - add_credits_and_wake_if(server, credits, 0); - rc = -ENOMEM; - break; - } - - npages = (cur_len + start + PAGE_SIZE-1) / PAGE_SIZE; - rdata->page_offset = start; - rdata->tailsz = npages > 1 ? - cur_len-(PAGE_SIZE-start)-(npages-2)*PAGE_SIZE : - cur_len; - - } else { - - npages = DIV_ROUND_UP(cur_len, PAGE_SIZE); - /* allocate a readdata struct */ - rdata = cifs_readdata_alloc(npages, - cifs_uncached_readv_complete); - if (!rdata) { - add_credits_and_wake_if(server, credits, 0); - rc = -ENOMEM; - break; - } + max_len = min_t(size_t, len, rsize); - rc = cifs_read_allocate_pages(rdata, npages); - if (rc) { - kvfree(rdata->pages); - kfree(rdata); - add_credits_and_wake_if(server, credits, 0); - break; - } + cur_len = cifs_limit_bvec_subset(&ctx->iter, max_len, + max_segs, &nsegs); + cifs_dbg(FYI, "read-to-iter len=%zx/%zx nsegs=%u/%lu/%u\n", + cur_len, max_len, nsegs, ctx->iter.nr_segs, max_segs); + if (cur_len == 0) { + rc = -EIO; + add_credits_and_wake_if(server, credits, 0); + break; + } - rdata->tailsz = PAGE_SIZE; + rdata = cifs_readdata_alloc(cifs_uncached_readv_complete); + if (!rdata) { + add_credits_and_wake_if(server, credits, 0); + rc = -ENOMEM; + break; } - rdata->server = server; - rdata->cfile = cifsFileInfo_get(open_file); - rdata->nr_pages = npages; - rdata->offset = offset; - rdata->bytes = cur_len; - rdata->pid = pid; - rdata->pagesz = PAGE_SIZE; - rdata->read_into_pages = cifs_uncached_read_into_pages; - rdata->copy_into_pages = cifs_uncached_copy_into_pages; - rdata->credits = credits_on_stack; - rdata->ctx = ctx; + rdata->server = server; + rdata->cfile = cifsFileInfo_get(open_file); + rdata->offset = fpos; + rdata->bytes = cur_len; + rdata->pid = pid; + rdata->credits = credits_on_stack; + rdata->ctx = ctx; kref_get(&ctx->refcount); + rdata->iter = ctx->iter; + iov_iter_truncate(&rdata->iter, cur_len); + rc = adjust_credits(server, &rdata->credits, rdata->bytes); if (!rc) { @@ -4205,17 +4450,15 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file, if (rc) { add_credits_and_wake_if(server, &rdata->credits, 0); - kref_put(&rdata->refcount, - cifs_uncached_readdata_release); - if (rc == -EAGAIN) { - iov_iter_revert(&direct_iov, cur_len); + kref_put(&rdata->refcount, cifs_readdata_release); + if (rc == -EAGAIN) continue; - } break; } list_add_tail(&rdata->list, rdata_list); - offset += cur_len; + iov_iter_advance(&ctx->iter, cur_len); + fpos += cur_len; len -= cur_len; } while (len > 0); @@ -4257,22 +4500,6 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx) list_del_init(&rdata->list); INIT_LIST_HEAD(&tmp_list); - /* - * Got a part of data and then reconnect has - * happened -- fill the buffer and continue - * reading. - */ - if (got_bytes && got_bytes < rdata->bytes) { - rc = 0; - if (!ctx->direct_io) - rc = cifs_readdata_to_iov(rdata, to); - if (rc) { - kref_put(&rdata->refcount, - cifs_uncached_readdata_release); - continue; - } - } - if (ctx->direct_io) { /* * Re-use rdata as this is a @@ -4289,7 +4516,7 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx) &tmp_list, ctx); kref_put(&rdata->refcount, - cifs_uncached_readdata_release); + cifs_readdata_release); } list_splice(&tmp_list, &ctx->list); @@ -4297,8 +4524,6 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx) goto again; } else if (rdata->result) rc = rdata->result; - else if (!ctx->direct_io) - rc = cifs_readdata_to_iov(rdata, to); /* if there was a short read -- discard anything left */ if (rdata->got_bytes && rdata->got_bytes < rdata->bytes) @@ -4307,7 +4532,7 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx) ctx->total_len += rdata->got_bytes; } list_del_init(&rdata->list); - kref_put(&rdata->refcount, cifs_uncached_readdata_release); + kref_put(&rdata->refcount, cifs_readdata_release); } if (!ctx->direct_io) @@ -4367,26 +4592,53 @@ static ssize_t __cifs_readv( if (!ctx) return -ENOMEM; - ctx->cfile = cifsFileInfo_get(cfile); + ctx->pos = offset; + ctx->direct_io = direct; + ctx->len = len; + ctx->cfile = cifsFileInfo_get(cfile); + ctx->nr_pinned_pages = 0; if (!is_sync_kiocb(iocb)) ctx->iocb = iocb; - if (user_backed_iter(to)) - ctx->should_dirty = true; - - if (direct) { - ctx->pos = offset; - ctx->direct_io = true; - ctx->iter = *to; - ctx->len = len; - } else { - rc = setup_aio_ctx_iter(ctx, to, ITER_DEST); - if (rc) { + if (user_backed_iter(to)) { + /* + * Extract IOVEC/UBUF-type iterators to a BVEC-type iterator as + * they contain references to the calling process's virtual + * memory layout which won't be available in an async worker + * thread. This also takes a pin on every folio involved. + */ + rc = netfs_extract_user_iter(to, iov_iter_count(to), + &ctx->iter, 0); + if (rc < 0) { kref_put(&ctx->refcount, cifs_aio_ctx_release); return rc; } - len = ctx->len; + + ctx->nr_pinned_pages = rc; + ctx->bv = (void *)ctx->iter.bvec; + ctx->bv_need_unpin = iov_iter_extract_will_pin(&ctx->iter); + ctx->should_dirty = true; + } else if ((iov_iter_is_bvec(to) || iov_iter_is_kvec(to)) && + !is_sync_kiocb(iocb)) { + /* + * If the op is asynchronous, we need to copy the list attached + * to a BVEC/KVEC-type iterator, but we assume that the storage + * will be retained by the caller; in any case, we may or may + * not be able to pin the pages, so we don't try. + */ + ctx->bv = (void *)dup_iter(&ctx->iter, to, GFP_KERNEL); + if (!ctx->bv) { + kref_put(&ctx->refcount, cifs_aio_ctx_release); + return -ENOMEM; + } + } else { + /* + * Otherwise, we just pass the iterator down as-is and rely on + * the caller to make sure the pages referred to by the + * iterator don't evaporate. + */ + ctx->iter = *to; } if (direct) { @@ -4648,6 +4900,8 @@ int cifs_file_mmap(struct file *file, struct vm_area_struct *vma) return rc; } +#if 0 // TODO: Remove for iov_iter support + static void cifs_readv_complete(struct work_struct *work) { @@ -4778,19 +5032,74 @@ cifs_readpages_copy_into_pages(struct TCP_Server_Info *server, { return readpages_fill_pages(server, rdata, iter, iter->count); } +#endif + +/* + * Unlock a bunch of folios in the pagecache. + */ +static void cifs_unlock_folios(struct address_space *mapping, pgoff_t first, pgoff_t last) +{ + struct folio *folio; + XA_STATE(xas, &mapping->i_pages, first); + + rcu_read_lock(); + xas_for_each(&xas, folio, last) { + folio_unlock(folio); + } + rcu_read_unlock(); +} + +static void cifs_readahead_complete(struct work_struct *work) +{ + struct cifs_readdata *rdata = container_of(work, + struct cifs_readdata, work); + struct folio *folio; + pgoff_t last; + bool good = rdata->result == 0 || (rdata->result == -EAGAIN && rdata->got_bytes); + + XA_STATE(xas, &rdata->mapping->i_pages, rdata->offset / PAGE_SIZE); + + if (good) + cifs_readahead_to_fscache(rdata->mapping->host, + rdata->offset, rdata->bytes); + + if (iov_iter_count(&rdata->iter) > 0) + iov_iter_zero(iov_iter_count(&rdata->iter), &rdata->iter); + + last = (rdata->offset + rdata->bytes - 1) / PAGE_SIZE; + + rcu_read_lock(); + xas_for_each(&xas, folio, last) { + if (good) { + flush_dcache_folio(folio); + folio_mark_uptodate(folio); + } + folio_unlock(folio); + } + rcu_read_unlock(); + + kref_put(&rdata->refcount, cifs_readdata_release); +} static void cifs_readahead(struct readahead_control *ractl) { - int rc; struct cifsFileInfo *open_file = ractl->file->private_data; struct cifs_sb_info *cifs_sb = CIFS_FILE_SB(ractl->file); struct TCP_Server_Info *server; - pid_t pid; - unsigned int xid, nr_pages, last_batch_size = 0, cache_nr_pages = 0; - pgoff_t next_cached = ULONG_MAX; + unsigned int xid, nr_pages, cache_nr_pages = 0; + unsigned int ra_pages; + pgoff_t next_cached = ULONG_MAX, ra_index; bool caching = fscache_cookie_enabled(cifs_inode_cookie(ractl->mapping->host)) && cifs_inode_cookie(ractl->mapping->host)->cache_priv; bool check_cache = caching; + pid_t pid; + int rc = 0; + + /* Note that readahead_count() lags behind our dequeuing of pages from + * the ractl, wo we have to keep track for ourselves. + */ + ra_pages = readahead_count(ractl); + ra_index = readahead_index(ractl); xid = get_xid(); @@ -4799,22 +5108,21 @@ static void cifs_readahead(struct readahead_control *ractl) else pid = current->tgid; - rc = 0; server = cifs_pick_channel(tlink_tcon(open_file->tlink)->ses); cifs_dbg(FYI, "%s: file=%p mapping=%p num_pages=%u\n", - __func__, ractl->file, ractl->mapping, readahead_count(ractl)); + __func__, ractl->file, ractl->mapping, ra_pages); /* * Chop the readahead request up into rsize-sized read requests. */ - while ((nr_pages = readahead_count(ractl) - last_batch_size)) { - unsigned int i, got, rsize; - struct page *page; + while ((nr_pages = ra_pages)) { + unsigned int i, rsize; struct cifs_readdata *rdata; struct cifs_credits credits_on_stack; struct cifs_credits *credits = &credits_on_stack; - pgoff_t index = readahead_index(ractl) + last_batch_size; + struct folio *folio; + pgoff_t fsize; /* * Find out if we have anything cached in the range of @@ -4823,21 +5131,22 @@ static void cifs_readahead(struct readahead_control *ractl) if (caching) { if (check_cache) { rc = cifs_fscache_query_occupancy( - ractl->mapping->host, index, nr_pages, + ractl->mapping->host, ra_index, nr_pages, &next_cached, &cache_nr_pages); if (rc < 0) caching = false; check_cache = false; } - if (index == next_cached) { + if (ra_index == next_cached) { /* * TODO: Send a whole batch of pages to be read * by the cache. */ - struct folio *folio = readahead_folio(ractl); - - last_batch_size = folio_nr_pages(folio); + folio = readahead_folio(ractl); + fsize = folio_nr_pages(folio); + ra_pages -= fsize; + ra_index += fsize; if (cifs_readpage_from_fscache(ractl->mapping->host, &folio->page) < 0) { /* @@ -4848,8 +5157,8 @@ static void cifs_readahead(struct readahead_control *ractl) caching = false; } folio_unlock(folio); - next_cached++; - cache_nr_pages--; + next_cached += fsize; + cache_nr_pages -= fsize; if (cache_nr_pages == 0) check_cache = true; continue; @@ -4874,8 +5183,9 @@ static void cifs_readahead(struct readahead_control *ractl) &rsize, credits); if (rc) break; - nr_pages = min_t(size_t, rsize / PAGE_SIZE, readahead_count(ractl)); - nr_pages = min_t(size_t, nr_pages, next_cached - index); + nr_pages = min_t(size_t, rsize / PAGE_SIZE, ra_pages); + if (next_cached != ULONG_MAX) + nr_pages = min_t(size_t, nr_pages, next_cached - ra_index); /* * Give up immediately if rsize is too small to read an entire @@ -4888,33 +5198,31 @@ static void cifs_readahead(struct readahead_control *ractl) break; } - rdata = cifs_readdata_alloc(nr_pages, cifs_readv_complete); + rdata = cifs_readdata_alloc(cifs_readahead_complete); if (!rdata) { /* best to give up if we're out of mem */ add_credits_and_wake_if(server, credits, 0); break; } - got = __readahead_batch(ractl, rdata->pages, nr_pages); - if (got != nr_pages) { - pr_warn("__readahead_batch() returned %u/%u\n", - got, nr_pages); - nr_pages = got; - } - - rdata->nr_pages = nr_pages; - rdata->bytes = readahead_batch_length(ractl); + rdata->offset = ra_index * PAGE_SIZE; + rdata->bytes = nr_pages * PAGE_SIZE; rdata->cfile = cifsFileInfo_get(open_file); rdata->server = server; rdata->mapping = ractl->mapping; - rdata->offset = readahead_pos(ractl); rdata->pid = pid; - rdata->pagesz = PAGE_SIZE; - rdata->tailsz = PAGE_SIZE; - rdata->read_into_pages = cifs_readpages_read_into_pages; - rdata->copy_into_pages = cifs_readpages_copy_into_pages; rdata->credits = credits_on_stack; + for (i = 0; i < nr_pages; i++) { + if (!readahead_folio(ractl)) + WARN_ON(1); + } + ra_pages -= nr_pages; + ra_index += nr_pages; + + iov_iter_xarray(&rdata->iter, ITER_DEST, &rdata->mapping->i_pages, + rdata->offset, rdata->bytes); + rc = adjust_credits(server, &rdata->credits, rdata->bytes); if (!rc) { if (rdata->cfile->invalidHandle) @@ -4925,18 +5233,15 @@ static void cifs_readahead(struct readahead_control *ractl) if (rc) { add_credits_and_wake_if(server, &rdata->credits, 0); - for (i = 0; i < rdata->nr_pages; i++) { - page = rdata->pages[i]; - unlock_page(page); - put_page(page); - } + cifs_unlock_folios(rdata->mapping, + rdata->offset / PAGE_SIZE, + (rdata->offset + rdata->bytes - 1) / PAGE_SIZE); /* Fallback to the readpage in error/reconnect cases */ kref_put(&rdata->refcount, cifs_readdata_release); break; } kref_put(&rdata->refcount, cifs_readdata_release); - last_batch_size = nr_pages; } free_xid(xid); @@ -4978,10 +5283,6 @@ static int cifs_readpage_worker(struct file *file, struct page *page, flush_dcache_page(page); SetPageUptodate(page); - - /* send this page to the cache */ - cifs_readpage_to_fscache(file_inode(file), page); - rc = 0; io_error: diff --git a/fs/cifs/fscache.c b/fs/cifs/fscache.c index f6f3a6b75601..47c9f36c11fb 100644 --- a/fs/cifs/fscache.c +++ b/fs/cifs/fscache.c @@ -165,22 +165,16 @@ static int fscache_fallback_read_page(struct inode *inode, struct page *page) /* * Fallback page writing interface. */ -static int fscache_fallback_write_page(struct inode *inode, struct page *page, - bool no_space_allocated_yet) +static int fscache_fallback_write_pages(struct inode *inode, loff_t start, size_t len, + bool no_space_allocated_yet) { struct netfs_cache_resources cres; struct fscache_cookie *cookie = cifs_inode_cookie(inode); struct iov_iter iter; - struct bio_vec bvec[1]; - loff_t start = page_offset(page); - size_t len = PAGE_SIZE; int ret; memset(&cres, 0, sizeof(cres)); - bvec[0].bv_page = page; - bvec[0].bv_offset = 0; - bvec[0].bv_len = PAGE_SIZE; - iov_iter_bvec(&iter, ITER_SOURCE, bvec, ARRAY_SIZE(bvec), PAGE_SIZE); + iov_iter_xarray(&iter, ITER_SOURCE, &inode->i_mapping->i_pages, start, len); ret = fscache_begin_write_operation(&cres, cookie); if (ret < 0) @@ -189,7 +183,7 @@ static int fscache_fallback_write_page(struct inode *inode, struct page *page, ret = cres.ops->prepare_write(&cres, &start, &len, i_size_read(inode), no_space_allocated_yet); if (ret == 0) - ret = fscache_write(&cres, page_offset(page), &iter, NULL, NULL); + ret = fscache_write(&cres, start, &iter, NULL, NULL); fscache_end_operation(&cres); return ret; } @@ -213,12 +207,12 @@ int __cifs_readpage_from_fscache(struct inode *inode, struct page *page) return 0; } -void __cifs_readpage_to_fscache(struct inode *inode, struct page *page) +void __cifs_readahead_to_fscache(struct inode *inode, loff_t pos, size_t len) { - cifs_dbg(FYI, "%s: (fsc: %p, p: %p, i: %p)\n", - __func__, cifs_inode_cookie(inode), page, inode); + cifs_dbg(FYI, "%s: (fsc: %p, p: %llx, l: %zx, i: %p)\n", + __func__, cifs_inode_cookie(inode), pos, len, inode); - fscache_fallback_write_page(inode, page, true); + fscache_fallback_write_pages(inode, pos, len, true); } /* diff --git a/fs/cifs/fscache.h b/fs/cifs/fscache.h index 67b601041f0a..173999610997 100644 --- a/fs/cifs/fscache.h +++ b/fs/cifs/fscache.h @@ -90,7 +90,7 @@ static inline int cifs_fscache_query_occupancy(struct inode *inode, } extern int __cifs_readpage_from_fscache(struct inode *pinode, struct page *ppage); -extern void __cifs_readpage_to_fscache(struct inode *pinode, struct page *ppage); +extern void __cifs_readahead_to_fscache(struct inode *pinode, loff_t pos, size_t len); static inline int cifs_readpage_from_fscache(struct inode *inode, @@ -101,11 +101,11 @@ static inline int cifs_readpage_from_fscache(struct inode *inode, return -ENOBUFS; } -static inline void cifs_readpage_to_fscache(struct inode *inode, - struct page *page) +static inline void cifs_readahead_to_fscache(struct inode *inode, + loff_t pos, size_t len) { if (cifs_inode_cookie(inode)) - __cifs_readpage_to_fscache(inode, page); + __cifs_readahead_to_fscache(inode, pos, len); } #else /* CONFIG_CIFS_FSCACHE */ @@ -141,7 +141,7 @@ cifs_readpage_from_fscache(struct inode *inode, struct page *page) } static inline -void cifs_readpage_to_fscache(struct inode *inode, struct page *page) {} +void cifs_readahead_to_fscache(struct inode *inode, loff_t pos, size_t len) {} #endif /* CONFIG_CIFS_FSCACHE */ diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c index 2a19c7987c5b..967bc3b74def 100644 --- a/fs/cifs/misc.c +++ b/fs/cifs/misc.c @@ -966,16 +966,22 @@ cifs_aio_ctx_release(struct kref *refcount) /* * ctx->bv is only set if setup_aio_ctx_iter() was call successfuly - * which means that iov_iter_get_pages() was a success and thus that - * we have taken reference on pages. + * which means that iov_iter_extract_pages() was a success and thus + * that we may have references or pins on pages that we need to + * release. */ if (ctx->bv) { - unsigned i; + if (ctx->should_dirty || ctx->bv_need_unpin) { + unsigned i; - for (i = 0; i < ctx->npages; i++) { - if (ctx->should_dirty) - set_page_dirty(ctx->bv[i].bv_page); - put_page(ctx->bv[i].bv_page); + for (i = 0; i < ctx->nr_pinned_pages; i++) { + struct page *page = ctx->bv[i].bv_page; + + if (ctx->should_dirty) + set_page_dirty(page); + if (ctx->bv_need_unpin) + unpin_user_page(page); + } } kvfree(ctx->bv); } @@ -983,95 +989,6 @@ cifs_aio_ctx_release(struct kref *refcount) kfree(ctx); } -#define CIFS_AIO_KMALLOC_LIMIT (1024 * 1024) - -int -setup_aio_ctx_iter(struct cifs_aio_ctx *ctx, struct iov_iter *iter, int rw) -{ - ssize_t rc; - unsigned int cur_npages; - unsigned int npages = 0; - unsigned int i; - size_t len; - size_t count = iov_iter_count(iter); - unsigned int saved_len; - size_t start; - unsigned int max_pages = iov_iter_npages(iter, INT_MAX); - struct page **pages = NULL; - struct bio_vec *bv = NULL; - - if (iov_iter_is_kvec(iter)) { - memcpy(&ctx->iter, iter, sizeof(*iter)); - ctx->len = count; - iov_iter_advance(iter, count); - return 0; - } - - if (array_size(max_pages, sizeof(*bv)) <= CIFS_AIO_KMALLOC_LIMIT) - bv = kmalloc_array(max_pages, sizeof(*bv), GFP_KERNEL); - - if (!bv) { - bv = vmalloc(array_size(max_pages, sizeof(*bv))); - if (!bv) - return -ENOMEM; - } - - if (array_size(max_pages, sizeof(*pages)) <= CIFS_AIO_KMALLOC_LIMIT) - pages = kmalloc_array(max_pages, sizeof(*pages), GFP_KERNEL); - - if (!pages) { - pages = vmalloc(array_size(max_pages, sizeof(*pages))); - if (!pages) { - kvfree(bv); - return -ENOMEM; - } - } - - saved_len = count; - - while (count && npages < max_pages) { - rc = iov_iter_get_pages2(iter, pages, count, max_pages, &start); - if (rc < 0) { - cifs_dbg(VFS, "Couldn't get user pages (rc=%zd)\n", rc); - break; - } - - if (rc > count) { - cifs_dbg(VFS, "get pages rc=%zd more than %zu\n", rc, - count); - break; - } - - count -= rc; - rc += start; - cur_npages = DIV_ROUND_UP(rc, PAGE_SIZE); - - if (npages + cur_npages > max_pages) { - cifs_dbg(VFS, "out of vec array capacity (%u vs %u)\n", - npages + cur_npages, max_pages); - break; - } - - for (i = 0; i < cur_npages; i++) { - len = rc > PAGE_SIZE ? PAGE_SIZE : rc; - bv[npages + i].bv_page = pages[i]; - bv[npages + i].bv_offset = start; - bv[npages + i].bv_len = len - start; - rc -= len; - start = 0; - } - - npages += cur_npages; - } - - kvfree(pages); - ctx->bv = bv; - ctx->len = saved_len - count; - ctx->npages = npages; - iov_iter_bvec(&ctx->iter, rw, ctx->bv, npages, ctx->len); - return 0; -} - /** * cifs_alloc_hash - allocate hash and hash context together * @name: The name of the crypto hash algo @@ -1129,25 +1046,6 @@ cifs_free_hash(struct shash_desc **sdesc) *sdesc = NULL; } -/** - * rqst_page_get_length - obtain the length and offset for a page in smb_rqst - * @rqst: The request descriptor - * @page: The index of the page to query - * @len: Where to store the length for this page: - * @offset: Where to store the offset for this page - */ -void rqst_page_get_length(const struct smb_rqst *rqst, unsigned int page, - unsigned int *len, unsigned int *offset) -{ - *len = rqst->rq_pagesz; - *offset = (page == 0) ? rqst->rq_offset : 0; - - if (rqst->rq_npages == 1 || page == rqst->rq_npages-1) - *len = rqst->rq_tailsz; - else if (page == 0) - *len = rqst->rq_pagesz - rqst->rq_offset; -} - void extract_unc_hostname(const char *unc, const char **h, size_t *len) { const char *end; diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c index 665ccf8d979d..121faf3b2900 100644 --- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -4244,7 +4244,7 @@ fill_transform_hdr(struct smb2_transform_hdr *tr_hdr, unsigned int orig_len, static void *smb2_aead_req_alloc(struct crypto_aead *tfm, const struct smb_rqst *rqst, int num_rqst, const u8 *sig, u8 **iv, - struct aead_request **req, struct scatterlist **sgl, + struct aead_request **req, struct sg_table *sgt, unsigned int *num_sgs) { unsigned int req_size = sizeof(**req) + crypto_aead_reqsize(tfm); @@ -4253,43 +4253,42 @@ static void *smb2_aead_req_alloc(struct crypto_aead *tfm, const struct smb_rqst u8 *p; *num_sgs = cifs_get_num_sgs(rqst, num_rqst, sig); + if (IS_ERR_VALUE((long)(int)*num_sgs)) + return ERR_PTR(*num_sgs); len = iv_size; len += crypto_aead_alignmask(tfm) & ~(crypto_tfm_ctx_alignment() - 1); len = ALIGN(len, crypto_tfm_ctx_alignment()); len += req_size; len = ALIGN(len, __alignof__(struct scatterlist)); - len += *num_sgs * sizeof(**sgl); + len += array_size(*num_sgs, sizeof(struct scatterlist)); - p = kmalloc(len, GFP_ATOMIC); + p = kvzalloc(len, GFP_NOFS); if (!p) - return NULL; + return ERR_PTR(-ENOMEM); *iv = (u8 *)PTR_ALIGN(p, crypto_aead_alignmask(tfm) + 1); *req = (struct aead_request *)PTR_ALIGN(*iv + iv_size, crypto_tfm_ctx_alignment()); - *sgl = (struct scatterlist *)PTR_ALIGN((u8 *)*req + req_size, - __alignof__(struct scatterlist)); + sgt->sgl = (struct scatterlist *)PTR_ALIGN((u8 *)*req + req_size, + __alignof__(struct scatterlist)); return p; } -static void *smb2_get_aead_req(struct crypto_aead *tfm, const struct smb_rqst *rqst, +static void *smb2_get_aead_req(struct crypto_aead *tfm, struct smb_rqst *rqst, int num_rqst, const u8 *sig, u8 **iv, struct aead_request **req, struct scatterlist **sgl) { - unsigned int off, len, skip; - struct scatterlist *sg; - unsigned int num_sgs; - unsigned long addr; - int i, j; + struct sg_table sgtable = {}; + unsigned int skip, num_sgs, i, j; + ssize_t rc; void *p; - p = smb2_aead_req_alloc(tfm, rqst, num_rqst, sig, iv, req, sgl, &num_sgs); - if (!p) - return NULL; + p = smb2_aead_req_alloc(tfm, rqst, num_rqst, sig, iv, req, &sgtable, &num_sgs); + if (IS_ERR(p)) + return ERR_CAST(p); - sg_init_table(*sgl, num_sgs); - sg = *sgl; + sg_init_marker(sgtable.sgl, num_sgs); /* * The first rqst has a transform header where the @@ -4297,30 +4296,29 @@ static void *smb2_get_aead_req(struct crypto_aead *tfm, const struct smb_rqst *r */ skip = 20; - /* Assumes the first rqst has a transform header as the first iov. - * I.e. - * rqst[0].rq_iov[0] is transform header - * rqst[0].rq_iov[1+] data to be encrypted/decrypted - * rqst[1+].rq_iov[0+] data to be encrypted/decrypted - */ for (i = 0; i < num_rqst; i++) { - for (j = 0; j < rqst[i].rq_nvec; j++) { - struct kvec *iov = &rqst[i].rq_iov[j]; + struct iov_iter *iter = &rqst[i].rq_iter; + size_t count = iov_iter_count(iter); - addr = (unsigned long)iov->iov_base + skip; - len = iov->iov_len - skip; - sg = cifs_sg_set_buf(sg, (void *)addr, len); + for (j = 0; j < rqst[i].rq_nvec; j++) { + cifs_sg_set_buf(&sgtable, + rqst[i].rq_iov[j].iov_base + skip, + rqst[i].rq_iov[j].iov_len - skip); /* See the above comment on the 'skip' assignment */ skip = 0; } - for (j = 0; j < rqst[i].rq_npages; j++) { - rqst_page_get_length(&rqst[i], j, &len, &off); - sg_set_page(sg++, rqst[i].rq_pages[j], len, off); - } + sgtable.orig_nents = sgtable.nents; + + rc = netfs_extract_iter_to_sg(iter, count, &sgtable, + num_sgs - sgtable.nents, 0); + iov_iter_revert(iter, rc); + sgtable.orig_nents = sgtable.nents; } - cifs_sg_set_buf(sg, sig, SMB2_SIGNATURE_SIZE); + cifs_sg_set_buf(&sgtable, sig, SMB2_SIGNATURE_SIZE); + sg_mark_end(&sgtable.sgl[sgtable.nents - 1]); + *sgl = sgtable.sgl; return p; } @@ -4408,8 +4406,8 @@ crypt_message(struct TCP_Server_Info *server, int num_rqst, } creq = smb2_get_aead_req(tfm, rqst, num_rqst, sign, &iv, &req, &sg); - if (unlikely(!creq)) - return -ENOMEM; + if (unlikely(IS_ERR(creq))) + return PTR_ERR(creq); if (!enc) { memcpy(sign, &tr_hdr->Signature, SMB2_SIGNATURE_SIZE); @@ -4441,18 +4439,31 @@ crypt_message(struct TCP_Server_Info *server, int num_rqst, return rc; } +/* + * Clear a read buffer, discarding the folios which have XA_MARK_0 set. + */ +static void cifs_clear_xarray_buffer(struct xarray *buffer) +{ + struct folio *folio; + + XA_STATE(xas, buffer, 0); + + rcu_read_lock(); + xas_for_each_marked(&xas, folio, ULONG_MAX, XA_MARK_0) { + folio_put(folio); + } + rcu_read_unlock(); + xa_destroy(buffer); +} + void smb3_free_compound_rqst(int num_rqst, struct smb_rqst *rqst) { - int i, j; + int i; - for (i = 0; i < num_rqst; i++) { - if (rqst[i].rq_pages) { - for (j = rqst[i].rq_npages - 1; j >= 0; j--) - put_page(rqst[i].rq_pages[j]); - kfree(rqst[i].rq_pages); - } - } + for (i = 0; i < num_rqst; i++) + if (!xa_empty(&rqst[i].rq_buffer)) + cifs_clear_xarray_buffer(&rqst[i].rq_buffer); } /* @@ -4472,9 +4483,8 @@ static int smb3_init_transform_rq(struct TCP_Server_Info *server, int num_rqst, struct smb_rqst *new_rq, struct smb_rqst *old_rq) { - struct page **pages; struct smb2_transform_hdr *tr_hdr = new_rq[0].rq_iov[0].iov_base; - unsigned int npages; + struct page *page; unsigned int orig_len = 0; int i, j; int rc = -ENOMEM; @@ -4482,40 +4492,43 @@ smb3_init_transform_rq(struct TCP_Server_Info *server, int num_rqst, for (i = 1; i < num_rqst; i++) { struct smb_rqst *old = &old_rq[i - 1]; struct smb_rqst *new = &new_rq[i]; + struct xarray *buffer = &new->rq_buffer; + size_t size = iov_iter_count(&old->rq_iter), seg, copied = 0; orig_len += smb_rqst_len(server, old); new->rq_iov = old->rq_iov; new->rq_nvec = old->rq_nvec; - npages = old->rq_npages; - if (!npages) - continue; - - pages = kmalloc_array(npages, sizeof(struct page *), - GFP_KERNEL); - if (!pages) - goto err_free; - - new->rq_pages = pages; - new->rq_npages = npages; - new->rq_offset = old->rq_offset; - new->rq_pagesz = old->rq_pagesz; - new->rq_tailsz = old->rq_tailsz; - - for (j = 0; j < npages; j++) { - pages[j] = alloc_page(GFP_KERNEL|__GFP_HIGHMEM); - if (!pages[j]) - goto err_free; - } + xa_init(buffer); - /* copy pages form the old */ - for (j = 0; j < npages; j++) { - unsigned int offset, len; + if (size > 0) { + unsigned int npages = DIV_ROUND_UP(size, PAGE_SIZE); - rqst_page_get_length(new, j, &len, &offset); + for (j = 0; j < npages; j++) { + void *o; - memcpy_page(new->rq_pages[j], offset, - old->rq_pages[j], offset, len); + rc = -ENOMEM; + page = alloc_page(GFP_KERNEL|__GFP_HIGHMEM); + if (!page) + goto err_free; + page->index = j; + o = xa_store(buffer, j, page, GFP_KERNEL); + if (xa_is_err(o)) { + rc = xa_err(o); + put_page(page); + goto err_free; + } + + seg = min_t(size_t, size - copied, PAGE_SIZE); + if (copy_page_from_iter(page, 0, seg, &old->rq_iter) != seg) { + rc = -EFAULT; + goto err_free; + } + copied += seg; + } + iov_iter_xarray(&new->rq_iter, ITER_SOURCE, + buffer, 0, size); + new->rq_iter_size = size; } } @@ -4544,12 +4557,12 @@ smb3_is_transform_hdr(void *buf) static int decrypt_raw_data(struct TCP_Server_Info *server, char *buf, - unsigned int buf_data_size, struct page **pages, - unsigned int npages, unsigned int page_data_size, + unsigned int buf_data_size, struct iov_iter *iter, bool is_offloaded) { struct kvec iov[2]; struct smb_rqst rqst = {NULL}; + size_t iter_size = 0; int rc; iov[0].iov_base = buf; @@ -4559,10 +4572,11 @@ decrypt_raw_data(struct TCP_Server_Info *server, char *buf, rqst.rq_iov = iov; rqst.rq_nvec = 2; - rqst.rq_pages = pages; - rqst.rq_npages = npages; - rqst.rq_pagesz = PAGE_SIZE; - rqst.rq_tailsz = (page_data_size % PAGE_SIZE) ? : PAGE_SIZE; + if (iter) { + rqst.rq_iter = *iter; + rqst.rq_iter_size = iov_iter_count(iter); + iter_size = iov_iter_count(iter); + } rc = crypt_message(server, 1, &rqst, 0); cifs_dbg(FYI, "Decrypt message returned %d\n", rc); @@ -4573,73 +4587,37 @@ decrypt_raw_data(struct TCP_Server_Info *server, char *buf, memmove(buf, iov[1].iov_base, buf_data_size); if (!is_offloaded) - server->total_read = buf_data_size + page_data_size; + server->total_read = buf_data_size + iter_size; return rc; } static int -read_data_into_pages(struct TCP_Server_Info *server, struct page **pages, - unsigned int npages, unsigned int len) +cifs_copy_pages_to_iter(struct xarray *pages, unsigned int data_size, + unsigned int skip, struct iov_iter *iter) { - int i; - int length; + struct page *page; + unsigned long index; - for (i = 0; i < npages; i++) { - struct page *page = pages[i]; - size_t n; + xa_for_each(pages, index, page) { + size_t n, len = min_t(unsigned int, PAGE_SIZE - skip, data_size); - n = len; - if (len >= PAGE_SIZE) { - /* enough data to fill the page */ - n = PAGE_SIZE; - len -= n; - } else { - zero_user(page, len, PAGE_SIZE - len); - len = 0; + n = copy_page_to_iter(page, skip, len, iter); + if (n != len) { + cifs_dbg(VFS, "%s: something went wrong\n", __func__); + return -EIO; } - length = cifs_read_page_from_socket(server, page, 0, n); - if (length < 0) - return length; - server->total_read += length; - } - - return 0; -} - -static int -init_read_bvec(struct page **pages, unsigned int npages, unsigned int data_size, - unsigned int cur_off, struct bio_vec **page_vec) -{ - struct bio_vec *bvec; - int i; - - bvec = kcalloc(npages, sizeof(struct bio_vec), GFP_KERNEL); - if (!bvec) - return -ENOMEM; - - for (i = 0; i < npages; i++) { - bvec[i].bv_page = pages[i]; - bvec[i].bv_offset = (i == 0) ? cur_off : 0; - bvec[i].bv_len = min_t(unsigned int, PAGE_SIZE, data_size); - data_size -= bvec[i].bv_len; - } - - if (data_size != 0) { - cifs_dbg(VFS, "%s: something went wrong\n", __func__); - kfree(bvec); - return -EIO; + data_size -= n; + skip = 0; } - *page_vec = bvec; return 0; } static int handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid, - char *buf, unsigned int buf_len, struct page **pages, - unsigned int npages, unsigned int page_data_size, - bool is_offloaded) + char *buf, unsigned int buf_len, struct xarray *pages, + unsigned int pages_len, bool is_offloaded) { unsigned int data_offset; unsigned int data_len; @@ -4648,9 +4626,6 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid, unsigned int pad_len; struct cifs_readdata *rdata = mid->callback_data; struct smb2_hdr *shdr = (struct smb2_hdr *)buf; - struct bio_vec *bvec = NULL; - struct iov_iter iter; - struct kvec iov; int length; bool use_rdma_mr = false; @@ -4739,7 +4714,7 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid, return 0; } - if (data_len > page_data_size - pad_len) { + if (data_len > pages_len - pad_len) { /* data_len is corrupt -- discard frame */ rdata->result = -EIO; if (is_offloaded) @@ -4749,8 +4724,9 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid, return 0; } - rdata->result = init_read_bvec(pages, npages, page_data_size, - cur_off, &bvec); + /* Copy the data to the output I/O iterator. */ + rdata->result = cifs_copy_pages_to_iter(pages, pages_len, + cur_off, &rdata->iter); if (rdata->result != 0) { if (is_offloaded) mid->mid_state = MID_RESPONSE_MALFORMED; @@ -4758,14 +4734,16 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid, dequeue_mid(mid, rdata->result); return 0; } + rdata->got_bytes = pages_len; - iov_iter_bvec(&iter, ITER_SOURCE, bvec, npages, data_len); } else if (buf_len >= data_offset + data_len) { /* read response payload is in buf */ - WARN_ONCE(npages > 0, "read data can be either in buf or in pages"); - iov.iov_base = buf + data_offset; - iov.iov_len = data_len; - iov_iter_kvec(&iter, ITER_SOURCE, &iov, 1, data_len); + WARN_ONCE(pages && !xa_empty(pages), + "read data can be either in buf or in pages"); + length = copy_to_iter(buf + data_offset, data_len, &rdata->iter); + if (length < 0) + return length; + rdata->got_bytes = data_len; } else { /* read response payload cannot be in both buf and pages */ WARN_ONCE(1, "buf can not contain only a part of read data"); @@ -4777,26 +4755,18 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid, return 0; } - length = rdata->copy_into_pages(server, rdata, &iter); - - kfree(bvec); - - if (length < 0) - return length; - if (is_offloaded) mid->mid_state = MID_RESPONSE_RECEIVED; else dequeue_mid(mid, false); - return length; + return 0; } struct smb2_decrypt_work { struct work_struct decrypt; struct TCP_Server_Info *server; - struct page **ppages; + struct xarray buffer; char *buf; - unsigned int npages; unsigned int len; }; @@ -4805,11 +4775,13 @@ static void smb2_decrypt_offload(struct work_struct *work) { struct smb2_decrypt_work *dw = container_of(work, struct smb2_decrypt_work, decrypt); - int i, rc; + int rc; struct mid_q_entry *mid; + struct iov_iter iter; + iov_iter_xarray(&iter, ITER_DEST, &dw->buffer, 0, dw->len); rc = decrypt_raw_data(dw->server, dw->buf, dw->server->vals->read_rsp_size, - dw->ppages, dw->npages, dw->len, true); + &iter, true); if (rc) { cifs_dbg(VFS, "error decrypting rc=%d\n", rc); goto free_pages; @@ -4823,7 +4795,7 @@ static void smb2_decrypt_offload(struct work_struct *work) mid->decrypted = true; rc = handle_read_data(dw->server, mid, dw->buf, dw->server->vals->read_rsp_size, - dw->ppages, dw->npages, dw->len, + &dw->buffer, dw->len, true); if (rc >= 0) { #ifdef CONFIG_CIFS_STATS2 @@ -4856,10 +4828,7 @@ static void smb2_decrypt_offload(struct work_struct *work) } free_pages: - for (i = dw->npages-1; i >= 0; i--) - put_page(dw->ppages[i]); - - kfree(dw->ppages); + cifs_clear_xarray_buffer(&dw->buffer); cifs_small_buf_release(dw->buf); kfree(dw); } @@ -4869,47 +4838,65 @@ static int receive_encrypted_read(struct TCP_Server_Info *server, struct mid_q_entry **mid, int *num_mids) { + struct page *page; char *buf = server->smallbuf; struct smb2_transform_hdr *tr_hdr = (struct smb2_transform_hdr *)buf; - unsigned int npages; - struct page **pages; - unsigned int len; + struct iov_iter iter; + unsigned int len, npages; unsigned int buflen = server->pdu_size; int rc; int i = 0; struct smb2_decrypt_work *dw; + dw = kzalloc(sizeof(struct smb2_decrypt_work), GFP_KERNEL); + if (!dw) + return -ENOMEM; + xa_init(&dw->buffer); + INIT_WORK(&dw->decrypt, smb2_decrypt_offload); + dw->server = server; + *num_mids = 1; len = min_t(unsigned int, buflen, server->vals->read_rsp_size + sizeof(struct smb2_transform_hdr)) - HEADER_SIZE(server) + 1; rc = cifs_read_from_socket(server, buf + HEADER_SIZE(server) - 1, len); if (rc < 0) - return rc; + goto free_dw; server->total_read += rc; len = le32_to_cpu(tr_hdr->OriginalMessageSize) - server->vals->read_rsp_size; + dw->len = len; npages = DIV_ROUND_UP(len, PAGE_SIZE); - pages = kmalloc_array(npages, sizeof(struct page *), GFP_KERNEL); - if (!pages) { - rc = -ENOMEM; - goto discard_data; - } - + rc = -ENOMEM; for (; i < npages; i++) { - pages[i] = alloc_page(GFP_KERNEL|__GFP_HIGHMEM); - if (!pages[i]) { - rc = -ENOMEM; + void *old; + + page = alloc_page(GFP_KERNEL|__GFP_HIGHMEM); + if (!page) + goto discard_data; + page->index = i; + old = xa_store(&dw->buffer, i, page, GFP_KERNEL); + if (xa_is_err(old)) { + rc = xa_err(old); + put_page(page); goto discard_data; } } - /* read read data into pages */ - rc = read_data_into_pages(server, pages, npages, len); - if (rc) - goto free_pages; + iov_iter_xarray(&iter, ITER_DEST, &dw->buffer, 0, npages * PAGE_SIZE); + + /* Read the data into the buffer and clear excess bufferage. */ + rc = cifs_read_iter_from_socket(server, &iter, dw->len); + if (rc < 0) + goto discard_data; + + server->total_read += rc; + if (rc < npages * PAGE_SIZE) + iov_iter_zero(npages * PAGE_SIZE - rc, &iter); + iov_iter_revert(&iter, npages * PAGE_SIZE); + iov_iter_truncate(&iter, dw->len); rc = cifs_discard_remaining_data(server); if (rc) @@ -4922,39 +4909,28 @@ receive_encrypted_read(struct TCP_Server_Info *server, struct mid_q_entry **mid, if ((server->min_offload) && (server->in_flight > 1) && (server->pdu_size >= server->min_offload)) { - dw = kmalloc(sizeof(struct smb2_decrypt_work), GFP_KERNEL); - if (dw == NULL) - goto non_offloaded_decrypt; - dw->buf = server->smallbuf; server->smallbuf = (char *)cifs_small_buf_get(); - INIT_WORK(&dw->decrypt, smb2_decrypt_offload); - - dw->npages = npages; - dw->server = server; - dw->ppages = pages; - dw->len = len; queue_work(decrypt_wq, &dw->decrypt); *num_mids = 0; /* worker thread takes care of finding mid */ return -1; } -non_offloaded_decrypt: rc = decrypt_raw_data(server, buf, server->vals->read_rsp_size, - pages, npages, len, false); + &iter, false); if (rc) goto free_pages; *mid = smb2_find_mid(server, buf); - if (*mid == NULL) + if (*mid == NULL) { cifs_dbg(FYI, "mid not found\n"); - else { + } else { cifs_dbg(FYI, "mid found\n"); (*mid)->decrypted = true; rc = handle_read_data(server, *mid, buf, server->vals->read_rsp_size, - pages, npages, len, false); + &dw->buffer, dw->len, false); if (rc >= 0) { if (server->ops->is_network_name_deleted) { server->ops->is_network_name_deleted(buf, @@ -4964,9 +4940,9 @@ receive_encrypted_read(struct TCP_Server_Info *server, struct mid_q_entry **mid, } free_pages: - for (i = i - 1; i >= 0; i--) - put_page(pages[i]); - kfree(pages); + cifs_clear_xarray_buffer(&dw->buffer); +free_dw: + kfree(dw); return rc; discard_data: cifs_discard_remaining_data(server); @@ -5004,7 +4980,7 @@ receive_encrypted_standard(struct TCP_Server_Info *server, server->total_read += length; buf_size = pdu_length - sizeof(struct smb2_transform_hdr); - length = decrypt_raw_data(server, buf, buf_size, NULL, 0, 0, false); + length = decrypt_raw_data(server, buf, buf_size, NULL, false); if (length) return length; @@ -5103,7 +5079,7 @@ smb3_handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid) char *buf = server->large_buf ? server->bigbuf : server->smallbuf; return handle_read_data(server, mid, buf, server->pdu_size, - NULL, 0, 0, false); + NULL, 0, false); } static int diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index b16b41d35560..541d8174afb9 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -4140,10 +4140,8 @@ smb2_new_read_req(void **buf, unsigned int *total_len, struct smbd_buffer_descriptor_v1 *v1; bool need_invalidate = server->dialect == SMB30_PROT_ID; - rdata->mr = smbd_register_mr( - server->smbd_conn, rdata->pages, - rdata->nr_pages, rdata->page_offset, - rdata->tailsz, true, need_invalidate); + rdata->mr = smbd_register_mr(server->smbd_conn, &rdata->iter, + true, need_invalidate); if (!rdata->mr) return -EAGAIN; @@ -4200,15 +4198,9 @@ smb2_readv_callback(struct mid_q_entry *mid) (struct smb2_hdr *)rdata->iov[0].iov_base; struct cifs_credits credits = { .value = 0, .instance = 0 }; struct smb_rqst rqst = { .rq_iov = &rdata->iov[1], - .rq_nvec = 1, }; - - if (rdata->got_bytes) { - rqst.rq_pages = rdata->pages; - rqst.rq_offset = rdata->page_offset; - rqst.rq_npages = rdata->nr_pages; - rqst.rq_pagesz = rdata->pagesz; - rqst.rq_tailsz = rdata->tailsz; - } + .rq_nvec = 1, + .rq_iter = rdata->iter, + .rq_iter_size = iov_iter_count(&rdata->iter), }; WARN_ONCE(rdata->server != mid->server, "rdata server %p != mid server %p", @@ -4226,6 +4218,8 @@ smb2_readv_callback(struct mid_q_entry *mid) if (server->sign && !mid->decrypted) { int rc; + iov_iter_revert(&rqst.rq_iter, rdata->got_bytes); + iov_iter_truncate(&rqst.rq_iter, rdata->got_bytes); rc = smb2_verify_signature(&rqst, server); if (rc) cifs_tcon_dbg(VFS, "SMB signature verification returned error = %d\n", @@ -4568,7 +4562,7 @@ smb2_async_writev(struct cifs_writedata *wdata, req->VolatileFileId = io_parms->volatile_fid; req->WriteChannelInfoOffset = 0; req->WriteChannelInfoLength = 0; - req->Channel = 0; + req->Channel = SMB2_CHANNEL_NONE; req->Offset = cpu_to_le64(io_parms->offset); req->DataOffset = cpu_to_le16( offsetof(struct smb2_write_req, Buffer)); @@ -4588,26 +4582,18 @@ smb2_async_writev(struct cifs_writedata *wdata, */ if (smb3_use_rdma_offload(io_parms)) { struct smbd_buffer_descriptor_v1 *v1; + size_t data_size = iov_iter_count(&wdata->iter); bool need_invalidate = server->dialect == SMB30_PROT_ID; - wdata->mr = smbd_register_mr( - server->smbd_conn, wdata->pages, - wdata->nr_pages, wdata->page_offset, - wdata->tailsz, false, need_invalidate); + wdata->mr = smbd_register_mr(server->smbd_conn, &wdata->iter, + false, need_invalidate); if (!wdata->mr) { rc = -EAGAIN; goto async_writev_out; } req->Length = 0; req->DataOffset = 0; - if (wdata->nr_pages > 1) - req->RemainingBytes = - cpu_to_le32( - (wdata->nr_pages - 1) * wdata->pagesz - - wdata->page_offset + wdata->tailsz - ); - else - req->RemainingBytes = cpu_to_le32(wdata->tailsz); + req->RemainingBytes = cpu_to_le32(data_size); req->Channel = SMB2_CHANNEL_RDMA_V1_INVALIDATE; if (need_invalidate) req->Channel = SMB2_CHANNEL_RDMA_V1; @@ -4626,19 +4612,14 @@ smb2_async_writev(struct cifs_writedata *wdata, rqst.rq_iov = iov; rqst.rq_nvec = 1; - rqst.rq_pages = wdata->pages; - rqst.rq_offset = wdata->page_offset; - rqst.rq_npages = wdata->nr_pages; - rqst.rq_pagesz = wdata->pagesz; - rqst.rq_tailsz = wdata->tailsz; + rqst.rq_iter = wdata->iter; + rqst.rq_iter_size = iov_iter_count(&rqst.rq_iter); #ifdef CONFIG_CIFS_SMB_DIRECT - if (wdata->mr) { + if (wdata->mr) iov[0].iov_len += sizeof(struct smbd_buffer_descriptor_v1); - rqst.rq_npages = 0; - } #endif - cifs_dbg(FYI, "async write at %llu %u bytes\n", - io_parms->offset, io_parms->length); + cifs_dbg(FYI, "async write at %llu %u bytes iter=%zx\n", + io_parms->offset, io_parms->length, iov_iter_count(&rqst.rq_iter)); #ifdef CONFIG_CIFS_SMB_DIRECT /* For RDMA read, I/O size is in RemainingBytes not in Length */ diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c index 3e0aacddc291..0eb32bbfc467 100644 --- a/fs/cifs/smbdirect.c +++ b/fs/cifs/smbdirect.c @@ -34,12 +34,6 @@ static int smbd_post_recv( struct smbd_response *response); static int smbd_post_send_empty(struct smbd_connection *info); -static int smbd_post_send_data( - struct smbd_connection *info, - struct kvec *iov, int n_vec, int remaining_data_length); -static int smbd_post_send_page(struct smbd_connection *info, - struct page *page, unsigned long offset, - size_t size, int remaining_data_length); static void destroy_mr_list(struct smbd_connection *info); static int allocate_mr_list(struct smbd_connection *info); @@ -986,24 +980,6 @@ static int smbd_post_send_sgl(struct smbd_connection *info, return rc; } -/* - * Send a page - * page: the page to send - * offset: offset in the page to send - * size: length in the page to send - * remaining_data_length: remaining data to send in this payload - */ -static int smbd_post_send_page(struct smbd_connection *info, struct page *page, - unsigned long offset, size_t size, int remaining_data_length) -{ - struct scatterlist sgl; - - sg_init_table(&sgl, 1); - sg_set_page(&sgl, page, size, offset); - - return smbd_post_send_sgl(info, &sgl, size, remaining_data_length); -} - /* * Send an empty message * Empty message is used to extend credits to peer to for keep live @@ -1015,35 +991,6 @@ static int smbd_post_send_empty(struct smbd_connection *info) return smbd_post_send_sgl(info, NULL, 0, 0); } -/* - * Send a data buffer - * iov: the iov array describing the data buffers - * n_vec: number of iov array - * remaining_data_length: remaining data to send following this packet - * in segmented SMBD packet - */ -static int smbd_post_send_data( - struct smbd_connection *info, struct kvec *iov, int n_vec, - int remaining_data_length) -{ - int i; - u32 data_length = 0; - struct scatterlist sgl[SMBDIRECT_MAX_SEND_SGE - 1]; - - if (n_vec > SMBDIRECT_MAX_SEND_SGE - 1) { - cifs_dbg(VFS, "Can't fit data to SGL, n_vec=%d\n", n_vec); - return -EINVAL; - } - - sg_init_table(sgl, n_vec); - for (i = 0; i < n_vec; i++) { - data_length += iov[i].iov_len; - sg_set_buf(&sgl[i], iov[i].iov_base, iov[i].iov_len); - } - - return smbd_post_send_sgl(info, sgl, data_length, remaining_data_length); -} - /* * Post a receive request to the transport * The remote peer can only send data when a receive request is posted @@ -1986,6 +1933,42 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg) return rc; } +/* + * Send the contents of an iterator + * @iter: The iterator to send + * @_remaining_data_length: remaining data to send in this payload + */ +static int smbd_post_send_iter(struct smbd_connection *info, + struct iov_iter *iter, + int *_remaining_data_length) +{ + struct scatterlist sgl[SMBDIRECT_MAX_SEND_SGE - 1]; + unsigned int max_payload = info->max_send_size - sizeof(struct smbd_data_transfer); + ssize_t rc; + + /* We're not expecting a user-backed iter */ + WARN_ON(iov_iter_extract_will_pin(iter)); + + do { + struct sg_table sgtable = { .sgl = sgl }; + size_t maxlen = min_t(size_t, *_remaining_data_length, max_payload); + + sg_init_table(sgtable.sgl, ARRAY_SIZE(sgl)); + rc = netfs_extract_iter_to_sg(iter, maxlen, + &sgtable, ARRAY_SIZE(sgl), 0); + if (rc < 0) + break; + if (WARN_ON_ONCE(sgtable.nents == 0)) + return -EIO; + + sg_mark_end(&sgl[sgtable.nents - 1]); + *_remaining_data_length -= rc; + rc = smbd_post_send_sgl(info, sgl, rc, *_remaining_data_length); + } while (rc == 0 && iov_iter_count(iter) > 0); + + return rc; +} + /* * Send data to transport * Each rqst is transported as a SMBDirect payload @@ -1996,18 +1979,10 @@ int smbd_send(struct TCP_Server_Info *server, int num_rqst, struct smb_rqst *rqst_array) { struct smbd_connection *info = server->smbd_conn; - struct kvec vecs[SMBDIRECT_MAX_SEND_SGE - 1]; - int nvecs; - int size; - unsigned int buflen, remaining_data_length; - unsigned int offset, remaining_vec_data_length; - int start, i, j; - int max_iov_size = - info->max_send_size - sizeof(struct smbd_data_transfer); - struct kvec *iov; - int rc; struct smb_rqst *rqst; - int rqst_idx; + struct iov_iter iter; + unsigned int remaining_data_length, klen; + int rc, i, rqst_idx; if (info->transport_status != SMBD_CONNECTED) return -EAGAIN; @@ -2034,84 +2009,36 @@ int smbd_send(struct TCP_Server_Info *server, rqst_idx = 0; do { rqst = &rqst_array[rqst_idx]; - iov = rqst->rq_iov; cifs_dbg(FYI, "Sending smb (RDMA): idx=%d smb_len=%lu\n", - rqst_idx, smb_rqst_len(server, rqst)); - remaining_vec_data_length = 0; - for (i = 0; i < rqst->rq_nvec; i++) { - remaining_vec_data_length += iov[i].iov_len; - dump_smb(iov[i].iov_base, iov[i].iov_len); - } - - log_write(INFO, "rqst_idx=%d nvec=%d rqst->rq_npages=%d rq_pagesz=%d rq_tailsz=%d buflen=%lu\n", - rqst_idx, rqst->rq_nvec, - rqst->rq_npages, rqst->rq_pagesz, - rqst->rq_tailsz, smb_rqst_len(server, rqst)); - - start = 0; - offset = 0; - do { - buflen = 0; - i = start; - j = 0; - while (i < rqst->rq_nvec && - j < SMBDIRECT_MAX_SEND_SGE - 1 && - buflen < max_iov_size) { - - vecs[j].iov_base = iov[i].iov_base + offset; - if (buflen + iov[i].iov_len > max_iov_size) { - vecs[j].iov_len = - max_iov_size - iov[i].iov_len; - buflen = max_iov_size; - offset = vecs[j].iov_len; - } else { - vecs[j].iov_len = - iov[i].iov_len - offset; - buflen += vecs[j].iov_len; - offset = 0; - ++i; - } - ++j; - } + rqst_idx, smb_rqst_len(server, rqst)); + for (i = 0; i < rqst->rq_nvec; i++) + dump_smb(rqst->rq_iov[i].iov_base, rqst->rq_iov[i].iov_len); + + log_write(INFO, "RDMA-WR[%u] nvec=%d len=%u iter=%zu rqlen=%lu\n", + rqst_idx, rqst->rq_nvec, remaining_data_length, + iov_iter_count(&rqst->rq_iter), smb_rqst_len(server, rqst)); + + /* Send the metadata pages. */ + klen = 0; + for (i = 0; i < rqst->rq_nvec; i++) + klen += rqst->rq_iov[i].iov_len; + iov_iter_kvec(&iter, ITER_SOURCE, rqst->rq_iov, rqst->rq_nvec, klen); + + rc = smbd_post_send_iter(info, &iter, &remaining_data_length); + if (rc < 0) + break; - remaining_vec_data_length -= buflen; - remaining_data_length -= buflen; - log_write(INFO, "sending %s iov[%d] from start=%d nvecs=%d remaining_data_length=%d\n", - remaining_vec_data_length > 0 ? - "partial" : "complete", - rqst->rq_nvec, start, j, - remaining_data_length); - - start = i; - rc = smbd_post_send_data(info, vecs, j, remaining_data_length); - if (rc) - goto done; - } while (remaining_vec_data_length > 0); - - /* now sending pages if there are any */ - for (i = 0; i < rqst->rq_npages; i++) { - rqst_page_get_length(rqst, i, &buflen, &offset); - nvecs = (buflen + max_iov_size - 1) / max_iov_size; - log_write(INFO, "sending pages buflen=%d nvecs=%d\n", - buflen, nvecs); - for (j = 0; j < nvecs; j++) { - size = min_t(unsigned int, max_iov_size, remaining_data_length); - remaining_data_length -= size; - log_write(INFO, "sending pages i=%d offset=%d size=%d remaining_data_length=%d\n", - i, j * max_iov_size + offset, size, - remaining_data_length); - rc = smbd_post_send_page( - info, rqst->rq_pages[i], - j*max_iov_size + offset, - size, remaining_data_length); - if (rc) - goto done; - } + if (iov_iter_count(&rqst->rq_iter) > 0) { + /* And then the data pages if there are any */ + rc = smbd_post_send_iter(info, &rqst->rq_iter, + &remaining_data_length); + if (rc < 0) + break; } + } while (++rqst_idx < num_rqst); -done: /* * As an optimization, we don't wait for individual I/O to finish * before sending the next one. @@ -2315,27 +2242,48 @@ static struct smbd_mr *get_mr(struct smbd_connection *info) goto again; } +/* + * Transcribe the pages from an iterator into an MR scatterlist. + * @iter: The iterator to transcribe + * @_remaining_data_length: remaining data to send in this payload + */ +static int smbd_iter_to_mr(struct smbd_connection *info, + struct iov_iter *iter, + struct scatterlist *sgl, + unsigned int num_pages) +{ + struct sg_table sgtable = { .sgl = sgl }; + int ret; + + sg_init_table(sgl, num_pages); + + ret = netfs_extract_iter_to_sg(iter, iov_iter_count(iter), + &sgtable, num_pages, 0); + WARN_ON(ret < 0); + return ret; +} + /* * Register memory for RDMA read/write - * pages[]: the list of pages to register memory with - * num_pages: the number of pages to register - * tailsz: if non-zero, the bytes to register in the last page + * iter: the buffer to register memory with * writing: true if this is a RDMA write (SMB read), false for RDMA read * need_invalidate: true if this MR needs to be locally invalidated after I/O * return value: the MR registered, NULL if failed. */ -struct smbd_mr *smbd_register_mr( - struct smbd_connection *info, struct page *pages[], int num_pages, - int offset, int tailsz, bool writing, bool need_invalidate) +struct smbd_mr *smbd_register_mr(struct smbd_connection *info, + struct iov_iter *iter, + bool writing, bool need_invalidate) { struct smbd_mr *smbdirect_mr; - int rc, i; + int rc, num_pages; enum dma_data_direction dir; struct ib_reg_wr *reg_wr; + num_pages = iov_iter_npages(iter, info->max_frmr_depth + 1); if (num_pages > info->max_frmr_depth) { log_rdma_mr(ERR, "num_pages=%d max_frmr_depth=%d\n", num_pages, info->max_frmr_depth); + WARN_ON_ONCE(1); return NULL; } @@ -2344,32 +2292,16 @@ struct smbd_mr *smbd_register_mr( log_rdma_mr(ERR, "get_mr returning NULL\n"); return NULL; } + + dir = writing ? DMA_FROM_DEVICE : DMA_TO_DEVICE; + smbdirect_mr->dir = dir; smbdirect_mr->need_invalidate = need_invalidate; smbdirect_mr->sgl_count = num_pages; - sg_init_table(smbdirect_mr->sgl, num_pages); - - log_rdma_mr(INFO, "num_pages=0x%x offset=0x%x tailsz=0x%x\n", - num_pages, offset, tailsz); - - if (num_pages == 1) { - sg_set_page(&smbdirect_mr->sgl[0], pages[0], tailsz, offset); - goto skip_multiple_pages; - } - /* We have at least two pages to register */ - sg_set_page( - &smbdirect_mr->sgl[0], pages[0], PAGE_SIZE - offset, offset); - i = 1; - while (i < num_pages - 1) { - sg_set_page(&smbdirect_mr->sgl[i], pages[i], PAGE_SIZE, 0); - i++; - } - sg_set_page(&smbdirect_mr->sgl[i], pages[i], - tailsz ? tailsz : PAGE_SIZE, 0); + log_rdma_mr(INFO, "num_pages=0x%x count=0x%zx\n", + num_pages, iov_iter_count(iter)); + smbd_iter_to_mr(info, iter, smbdirect_mr->sgl, num_pages); -skip_multiple_pages: - dir = writing ? DMA_FROM_DEVICE : DMA_TO_DEVICE; - smbdirect_mr->dir = dir; rc = ib_dma_map_sg(info->id->device, smbdirect_mr->sgl, num_pages, dir); if (!rc) { log_rdma_mr(ERR, "ib_dma_map_sg num_pages=%x dir=%x rc=%x\n", diff --git a/fs/cifs/smbdirect.h b/fs/cifs/smbdirect.h index 207ef979cd51..be2cf18b7fec 100644 --- a/fs/cifs/smbdirect.h +++ b/fs/cifs/smbdirect.h @@ -302,8 +302,8 @@ struct smbd_mr { /* Interfaces to register and deregister MR for RDMA read/write */ struct smbd_mr *smbd_register_mr( - struct smbd_connection *info, struct page *pages[], int num_pages, - int offset, int tailsz, bool writing, bool need_invalidate); + struct smbd_connection *info, struct iov_iter *iter, + bool writing, bool need_invalidate); int smbd_deregister_mr(struct smbd_mr *mr); #else diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c index 83e931824bf2..7ff67a27b361 100644 --- a/fs/cifs/transport.c +++ b/fs/cifs/transport.c @@ -270,26 +270,7 @@ smb_rqst_len(struct TCP_Server_Info *server, struct smb_rqst *rqst) for (i = 0; i < nvec; i++) buflen += iov[i].iov_len; - /* - * Add in the page array if there is one. The caller needs to make - * sure rq_offset and rq_tailsz are set correctly. If a buffer of - * multiple pages ends at page boundary, rq_tailsz needs to be set to - * PAGE_SIZE. - */ - if (rqst->rq_npages) { - if (rqst->rq_npages == 1) - buflen += rqst->rq_tailsz; - else { - /* - * If there is more than one page, calculate the - * buffer length based on rq_offset and rq_tailsz - */ - buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) - - rqst->rq_offset; - buflen += rqst->rq_tailsz; - } - } - + buflen += iov_iter_count(&rqst->rq_iter); return buflen; } @@ -376,23 +357,15 @@ __smb_send_rqst(struct TCP_Server_Info *server, int num_rqst, total_len += sent; - /* now walk the page array and send each page in it */ - for (i = 0; i < rqst[j].rq_npages; i++) { - struct bio_vec bvec; - - bvec.bv_page = rqst[j].rq_pages[i]; - rqst_page_get_length(&rqst[j], i, &bvec.bv_len, - &bvec.bv_offset); - - iov_iter_bvec(&smb_msg.msg_iter, ITER_SOURCE, - &bvec, 1, bvec.bv_len); + if (iov_iter_count(&rqst[j].rq_iter) > 0) { + smb_msg.msg_iter = rqst[j].rq_iter; rc = smb_send_kvec(server, &smb_msg, &sent); if (rc < 0) break; - total_len += sent; } - } + +} unmask: sigprocmask(SIG_SETMASK, &oldmask, NULL); @@ -1654,11 +1627,11 @@ int cifs_discard_remaining_data(struct TCP_Server_Info *server) { unsigned int rfclen = server->pdu_size; - int remaining = rfclen + HEADER_PREAMBLE_SIZE(server) - + size_t remaining = rfclen + HEADER_PREAMBLE_SIZE(server) - server->total_read; while (remaining > 0) { - int length; + ssize_t length; length = cifs_discard_from_socket(server, min_t(size_t, remaining, @@ -1804,10 +1777,15 @@ cifs_readv_receive(struct TCP_Server_Info *server, struct mid_q_entry *mid) return cifs_readv_discard(server, mid); } - length = rdata->read_into_pages(server, rdata, data_len); - if (length < 0) - return length; - +#ifdef CONFIG_CIFS_SMB_DIRECT + if (rdata->mr) + length = data_len; /* An RDMA read is already done. */ + else +#endif + length = cifs_read_iter_from_socket(server, &rdata->iter, + data_len); + if (length > 0) + rdata->got_bytes += length; server->total_read += length; cifs_dbg(FYI, "total_read=%u buflen=%u remaining=%u\n", From patchwork Thu Feb 16 21:47:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 58255 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp547317wrn; Thu, 16 Feb 2023 13:52:34 -0800 (PST) X-Google-Smtp-Source: AK7set/wyCRJYShsL2Zwo1z6nsS0T9GgAvchzRyHlByWSnWrn0AESKzmADb8JMreZTVh9gALJF/h X-Received: by 2002:a17:906:ce45:b0:87b:dac0:b23b with SMTP id se5-20020a170906ce4500b0087bdac0b23bmr7093904ejb.55.1676584353988; Thu, 16 Feb 2023 13:52:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676584353; cv=none; d=google.com; s=arc-20160816; b=AabtWu+DNoxyh1kx3EflDAe+amGgCv5J/hnp8Hfan/CcquH7HwkA4aTVUkMovzO3jF FqB6L0pz/SntBBUwSVfCR8ZeEqgyGlmlgC6Fw+nrYBqdaiH1RAKx65MllgPiEv85kYZf XXPPTVi8xwgZwBcXXREmOg0uNeF3pJZFjAnzAuldK/a/pyyj0HTBJgGHzsUzzPdXXENr JLvpgf1XO8MCtY4GDedSYkTe2w0/9UO/raoEF9CJJeY/oYsgPBro4tkdf5rnzbEm22Ks maY163bYefc7UrTRMGvzOSVfUO63Q6dpN0kiXoliX2lMB1VwWAI627hXJG6Vodd9YBIc F28w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=G8bLco6RK/TV/x/zhGZ3kDinA9d0Eqcu5AZrfy3jfiY=; b=yUmLPRkNJIUjqV9Wvc3n63sr07aHe4gBkyoYdd2Ad4mNvs9wbHXlh9acV6KIRWdT7U K/Tf+Rxa1w+GHXmFmvV0pM+3VIU0ed1eNCiCzzTlsd7gplEevvIPOwacZa7nXJYXWe5Y 7zZ0WUKmZRy3CrS4EujkZJ5AyxTWRZbMSiRsCh126qpQfuJV5REGqpnxmjJgdcnnLXE3 9Vv+kFzg0PwtMyIcJzdh/MXGmjfzJFQpg8DdtE+CR5WW3+A1iVSbSWF76cXR55dZQIfS IwZO/mWg3KklOALJLoNuTRyUjGkmK8GgbWI+FPa1by3Fnu1lytE6lO3oQ7pPfXuHjaeq MVsg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=YgcXyLvf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id jr8-20020a170906514800b008b1329a1af2si2305124ejc.167.2023.02.16.13.52.10; Thu, 16 Feb 2023 13:52:33 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=YgcXyLvf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230435AbjBPVuu (ORCPT + 99 others); Thu, 16 Feb 2023 16:50:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36302 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230346AbjBPVuQ (ORCPT ); Thu, 16 Feb 2023 16:50:16 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F61B5381D for ; Thu, 16 Feb 2023 13:48:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676584133; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=G8bLco6RK/TV/x/zhGZ3kDinA9d0Eqcu5AZrfy3jfiY=; b=YgcXyLvfAQcgKcJxyBMqlaxHCy1cdrzVfWx+Q3ER4ilsarAf/cMROrwAOORYr7VQmqLoJd Jm+FxiwByjMcsv9DxDvCnkIj+/zxtmq2IPTJCvEbGHtV4lNWxpEDD8BvhAA2wEI97z6Trw Rxys7doYT4eFRazoK4qqef091mI4EL8= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-654-RayOGYqxP7eYvFXjA8177w-1; Thu, 16 Feb 2023 16:48:50 -0500 X-MC-Unique: RayOGYqxP7eYvFXjA8177w-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2D97085A5A3; Thu, 16 Feb 2023 21:48:49 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1A5F2492B17; Thu, 16 Feb 2023 21:48:47 +0000 (UTC) From: David Howells To: Steve French Cc: David Howells , Jens Axboe , Al Viro , Shyam Prasad N , Rohith Surabattula , Tom Talpey , Stefan Metzmacher , Christoph Hellwig , Matthew Wilcox , Jeff Layton , linux-cifs@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Steve French , linux-rdma@vger.kernel.org Subject: [PATCH 15/17] cifs: Build the RDMA SGE list directly from an iterator Date: Thu, 16 Feb 2023 21:47:43 +0000 Message-Id: <20230216214745.3985496-16-dhowells@redhat.com> In-Reply-To: <20230216214745.3985496-1-dhowells@redhat.com> References: <20230216214745.3985496-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758026115780339636?= X-GMAIL-MSGID: =?utf-8?q?1758026115780339636?= In the depths of the cifs RDMA code, extract part of an iov iterator directly into an SGE list without going through an intermediate scatterlist. Note that this doesn't support extraction from an IOBUF- or UBUF-type iterator (ie. user-supplied buffer). The assumption is that the higher layers will extract those to a BVEC-type iterator first and do whatever is required to stop the pages from going away. Signed-off-by: David Howells cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: Tom Talpey cc: Jeff Layton cc: linux-cifs@vger.kernel.org cc: linux-rdma@vger.kernel.org Link: https://lore.kernel.org/r/166697260361.61150.5064013393408112197.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732032518.3186319.1859601819981624629.stgit@warthog.procyon.org.uk/ # rfc --- fs/cifs/smbdirect.c | 153 ++++++++++++++++++-------------------------- fs/cifs/smbdirect.h | 3 +- 2 files changed, 63 insertions(+), 93 deletions(-) diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c index 0eb32bbfc467..31c4dc8212c3 100644 --- a/fs/cifs/smbdirect.c +++ b/fs/cifs/smbdirect.c @@ -828,16 +828,16 @@ static int smbd_post_send(struct smbd_connection *info, return rc; } -static int smbd_post_send_sgl(struct smbd_connection *info, - struct scatterlist *sgl, int data_length, int remaining_data_length) +static int smbd_post_send_iter(struct smbd_connection *info, + struct iov_iter *iter, + int *_remaining_data_length) { - int num_sgs; int i, rc; int header_length; + int data_length; struct smbd_request *request; struct smbd_data_transfer *packet; int new_credits; - struct scatterlist *sg; wait_credit: /* Wait for send credits. A SMBD packet needs one credit */ @@ -881,6 +881,30 @@ static int smbd_post_send_sgl(struct smbd_connection *info, } request->info = info; + memset(request->sge, 0, sizeof(request->sge)); + + /* Fill in the data payload to find out how much data we can add */ + if (iter) { + struct smb_extract_to_rdma extract = { + .nr_sge = 1, + .max_sge = SMBDIRECT_MAX_SEND_SGE, + .sge = request->sge, + .device = info->id->device, + .local_dma_lkey = info->pd->local_dma_lkey, + .direction = DMA_TO_DEVICE, + }; + + rc = smb_extract_iter_to_rdma(iter, *_remaining_data_length, + &extract); + if (rc < 0) + goto err_dma; + data_length = rc; + request->num_sge = extract.nr_sge; + *_remaining_data_length -= data_length; + } else { + data_length = 0; + request->num_sge = 1; + } /* Fill in the packet header */ packet = smbd_request_payload(request); @@ -902,7 +926,7 @@ static int smbd_post_send_sgl(struct smbd_connection *info, else packet->data_offset = cpu_to_le32(24); packet->data_length = cpu_to_le32(data_length); - packet->remaining_data_length = cpu_to_le32(remaining_data_length); + packet->remaining_data_length = cpu_to_le32(*_remaining_data_length); packet->padding = 0; log_outgoing(INFO, "credits_requested=%d credits_granted=%d data_offset=%d data_length=%d remaining_data_length=%d\n", @@ -918,7 +942,6 @@ static int smbd_post_send_sgl(struct smbd_connection *info, if (!data_length) header_length = offsetof(struct smbd_data_transfer, padding); - request->num_sge = 1; request->sge[0].addr = ib_dma_map_single(info->id->device, (void *)packet, header_length, @@ -932,23 +955,6 @@ static int smbd_post_send_sgl(struct smbd_connection *info, request->sge[0].length = header_length; request->sge[0].lkey = info->pd->local_dma_lkey; - /* Fill in the packet data payload */ - num_sgs = sgl ? sg_nents(sgl) : 0; - for_each_sg(sgl, sg, num_sgs, i) { - request->sge[i+1].addr = - ib_dma_map_page(info->id->device, sg_page(sg), - sg->offset, sg->length, DMA_TO_DEVICE); - if (ib_dma_mapping_error( - info->id->device, request->sge[i+1].addr)) { - rc = -EIO; - request->sge[i+1].addr = 0; - goto err_dma; - } - request->sge[i+1].length = sg->length; - request->sge[i+1].lkey = info->pd->local_dma_lkey; - request->num_sge++; - } - rc = smbd_post_send(info, request); if (!rc) return 0; @@ -987,8 +993,10 @@ static int smbd_post_send_sgl(struct smbd_connection *info, */ static int smbd_post_send_empty(struct smbd_connection *info) { + int remaining_data_length = 0; + info->count_send_empty++; - return smbd_post_send_sgl(info, NULL, 0, 0); + return smbd_post_send_iter(info, NULL, &remaining_data_length); } /* @@ -1933,42 +1941,6 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg) return rc; } -/* - * Send the contents of an iterator - * @iter: The iterator to send - * @_remaining_data_length: remaining data to send in this payload - */ -static int smbd_post_send_iter(struct smbd_connection *info, - struct iov_iter *iter, - int *_remaining_data_length) -{ - struct scatterlist sgl[SMBDIRECT_MAX_SEND_SGE - 1]; - unsigned int max_payload = info->max_send_size - sizeof(struct smbd_data_transfer); - ssize_t rc; - - /* We're not expecting a user-backed iter */ - WARN_ON(iov_iter_extract_will_pin(iter)); - - do { - struct sg_table sgtable = { .sgl = sgl }; - size_t maxlen = min_t(size_t, *_remaining_data_length, max_payload); - - sg_init_table(sgtable.sgl, ARRAY_SIZE(sgl)); - rc = netfs_extract_iter_to_sg(iter, maxlen, - &sgtable, ARRAY_SIZE(sgl), 0); - if (rc < 0) - break; - if (WARN_ON_ONCE(sgtable.nents == 0)) - return -EIO; - - sg_mark_end(&sgl[sgtable.nents - 1]); - *_remaining_data_length -= rc; - rc = smbd_post_send_sgl(info, sgl, rc, *_remaining_data_length); - } while (rc == 0 && iov_iter_count(iter) > 0); - - return rc; -} - /* * Send data to transport * Each rqst is transported as a SMBDirect payload @@ -2129,10 +2101,10 @@ static void destroy_mr_list(struct smbd_connection *info) cancel_work_sync(&info->mr_recovery_work); list_for_each_entry_safe(mr, tmp, &info->mr_list, list) { if (mr->state == MR_INVALIDATED) - ib_dma_unmap_sg(info->id->device, mr->sgl, - mr->sgl_count, mr->dir); + ib_dma_unmap_sg(info->id->device, mr->sgt.sgl, + mr->sgt.nents, mr->dir); ib_dereg_mr(mr->mr); - kfree(mr->sgl); + kfree(mr->sgt.sgl); kfree(mr); } } @@ -2167,11 +2139,10 @@ static int allocate_mr_list(struct smbd_connection *info) info->mr_type, info->max_frmr_depth); goto out; } - smbdirect_mr->sgl = kcalloc( - info->max_frmr_depth, - sizeof(struct scatterlist), - GFP_KERNEL); - if (!smbdirect_mr->sgl) { + smbdirect_mr->sgt.sgl = kcalloc(info->max_frmr_depth, + sizeof(struct scatterlist), + GFP_KERNEL); + if (!smbdirect_mr->sgt.sgl) { log_rdma_mr(ERR, "failed to allocate sgl\n"); ib_dereg_mr(smbdirect_mr->mr); goto out; @@ -2190,7 +2161,7 @@ static int allocate_mr_list(struct smbd_connection *info) list_for_each_entry_safe(smbdirect_mr, tmp, &info->mr_list, list) { ib_dereg_mr(smbdirect_mr->mr); - kfree(smbdirect_mr->sgl); + kfree(smbdirect_mr->sgt.sgl); kfree(smbdirect_mr); } return -ENOMEM; @@ -2244,22 +2215,20 @@ static struct smbd_mr *get_mr(struct smbd_connection *info) /* * Transcribe the pages from an iterator into an MR scatterlist. - * @iter: The iterator to transcribe - * @_remaining_data_length: remaining data to send in this payload */ static int smbd_iter_to_mr(struct smbd_connection *info, struct iov_iter *iter, - struct scatterlist *sgl, - unsigned int num_pages) + struct sg_table *sgt, + unsigned int max_sg) { - struct sg_table sgtable = { .sgl = sgl }; int ret; - sg_init_table(sgl, num_pages); + memset(sgt->sgl, 0, max_sg * sizeof(struct scatterlist)); - ret = netfs_extract_iter_to_sg(iter, iov_iter_count(iter), - &sgtable, num_pages, 0); + ret = netfs_extract_iter_to_sg(iter, iov_iter_count(iter), sgt, max_sg, 0); WARN_ON(ret < 0); + if (sgt->nents > 0) + sg_mark_end(&sgt->sgl[sgt->nents - 1]); return ret; } @@ -2296,25 +2265,27 @@ struct smbd_mr *smbd_register_mr(struct smbd_connection *info, dir = writing ? DMA_FROM_DEVICE : DMA_TO_DEVICE; smbdirect_mr->dir = dir; smbdirect_mr->need_invalidate = need_invalidate; - smbdirect_mr->sgl_count = num_pages; + smbdirect_mr->sgt.nents = 0; + smbdirect_mr->sgt.orig_nents = 0; - log_rdma_mr(INFO, "num_pages=0x%x count=0x%zx\n", - num_pages, iov_iter_count(iter)); - smbd_iter_to_mr(info, iter, smbdirect_mr->sgl, num_pages); + log_rdma_mr(INFO, "num_pages=0x%x count=0x%zx depth=%u\n", + num_pages, iov_iter_count(iter), info->max_frmr_depth); + smbd_iter_to_mr(info, iter, &smbdirect_mr->sgt, info->max_frmr_depth); - rc = ib_dma_map_sg(info->id->device, smbdirect_mr->sgl, num_pages, dir); + rc = ib_dma_map_sg(info->id->device, smbdirect_mr->sgt.sgl, + smbdirect_mr->sgt.nents, dir); if (!rc) { log_rdma_mr(ERR, "ib_dma_map_sg num_pages=%x dir=%x rc=%x\n", num_pages, dir, rc); goto dma_map_error; } - rc = ib_map_mr_sg(smbdirect_mr->mr, smbdirect_mr->sgl, num_pages, - NULL, PAGE_SIZE); - if (rc != num_pages) { + rc = ib_map_mr_sg(smbdirect_mr->mr, smbdirect_mr->sgt.sgl, + smbdirect_mr->sgt.nents, NULL, PAGE_SIZE); + if (rc != smbdirect_mr->sgt.nents) { log_rdma_mr(ERR, - "ib_map_mr_sg failed rc = %d num_pages = %x\n", - rc, num_pages); + "ib_map_mr_sg failed rc = %d nents = %x\n", + rc, smbdirect_mr->sgt.nents); goto map_mr_error; } @@ -2346,8 +2317,8 @@ struct smbd_mr *smbd_register_mr(struct smbd_connection *info, /* If all failed, attempt to recover this MR by setting it MR_ERROR*/ map_mr_error: - ib_dma_unmap_sg(info->id->device, smbdirect_mr->sgl, - smbdirect_mr->sgl_count, smbdirect_mr->dir); + ib_dma_unmap_sg(info->id->device, smbdirect_mr->sgt.sgl, + smbdirect_mr->sgt.nents, smbdirect_mr->dir); dma_map_error: smbdirect_mr->state = MR_ERROR; @@ -2414,8 +2385,8 @@ int smbd_deregister_mr(struct smbd_mr *smbdirect_mr) if (smbdirect_mr->state == MR_INVALIDATED) { ib_dma_unmap_sg( - info->id->device, smbdirect_mr->sgl, - smbdirect_mr->sgl_count, + info->id->device, smbdirect_mr->sgt.sgl, + smbdirect_mr->sgt.nents, smbdirect_mr->dir); smbdirect_mr->state = MR_READY; if (atomic_inc_return(&info->mr_ready_count) == 1) diff --git a/fs/cifs/smbdirect.h b/fs/cifs/smbdirect.h index be2cf18b7fec..83f239f376f0 100644 --- a/fs/cifs/smbdirect.h +++ b/fs/cifs/smbdirect.h @@ -288,8 +288,7 @@ struct smbd_mr { struct list_head list; enum mr_state state; struct ib_mr *mr; - struct scatterlist *sgl; - int sgl_count; + struct sg_table sgt; enum dma_data_direction dir; union { struct ib_reg_wr wr; From patchwork Thu Feb 16 21:47:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 58254 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp547151wrn; Thu, 16 Feb 2023 13:52:10 -0800 (PST) X-Google-Smtp-Source: AK7set+UBXClkWJ7zywhF2cjyM+aa1XHAM1vO81xYc9j/L3UGluSDr1Ws/4u9gQIWUfhYgXS3hwR X-Received: by 2002:a05:6402:2c9:b0:4ab:4c36:463c with SMTP id b9-20020a05640202c900b004ab4c36463cmr8281338edx.16.1676584329977; Thu, 16 Feb 2023 13:52:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676584329; cv=none; d=google.com; s=arc-20160816; b=slW9ekxkZi0+80mwDLspxrykGiCRjq9qOm9WUH9cgIqiUY2GSho94tXnffTHEyHMF/ 5BwAOQ75V/dLEdw1gk/Iw9IXQxL09TM2/7wR+97XZKnOCbMWN5PbrXXI0NcUAQrGE0NN xtFHuJDh6ZvMwcF+8Np3IINi3zYBw0eO1E9B8Quovtnqyv6BNndY7tVNe3H5kRHb5g1O rHVx0OtlY3igQEsrOJv/1aRBN3P5p8f1mKnQ8qSAc6I2LTkiaWdaV0sOGduN/NW6OFtp v0qUCvkSj0gC9+W7cStcIDYo0Ww+fAf8lo7HYF178dNSt7TAmmb54a/0KellmdrYCp7G o0vA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=y8N2Hvl8ceKkhdFGfd9u1dmLYX9njBqFAAbBAETPUuY=; b=vL8fOxJ0hIXqYBKRySd367mXt0jFmFS293MitNkUIgEoSPHzIOSYMqwNgdWnC3CAka 8L2kPVo2lWBQQXqd5J9sXrEukbu25VuCiMm1tPR5a29kXEI1OmVWtECwpplUCMw0eFzg X8YBBFG+nnuKeChoE2HkYmmejzIy0nwkd9e0NiDLC9ew8Y9ttTDn+weJ9euunqTCTmDU Qx64/4teZ4FR1C3Pn29wRsF/66CLJLV8lOKiXcStgXcNiDuHTJYG8nV/Jg0oGKs+Y1Kz 7v/oMtSUt362CvbAITd/awQmmcalBkkicMR8Mfvdm4fGoexU3lyB+FtmJ0tBYiTRQ2zI 6xPA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="Qmom/f9s"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i18-20020aa7c712000000b004ab4f62171asi3515293edq.279.2023.02.16.13.51.46; Thu, 16 Feb 2023 13:52:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="Qmom/f9s"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230429AbjBPVus (ORCPT + 99 others); Thu, 16 Feb 2023 16:50:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37360 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230341AbjBPVuP (ORCPT ); Thu, 16 Feb 2023 16:50:15 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1EB9C53813 for ; Thu, 16 Feb 2023 13:48:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676584136; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=y8N2Hvl8ceKkhdFGfd9u1dmLYX9njBqFAAbBAETPUuY=; b=Qmom/f9s6D7Pf01q8+nldt90ieKA9dmkccpgVu8U2ZGJy7nXRZ7l741j/KCLwcow6Q058C Fr6Vr4NKxdfkCSyQ1ieAjUZiH5luwjDLwvZndiWvmp1OLKQ1Vx9vWfkDfDCtKOhoa8DwO5 E7TeIjSKlKdSMrbdkjSXh735k7wkq1g= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-487-1Vb9B6dEMhSH75lsjU4fxA-1; Thu, 16 Feb 2023 16:48:52 -0500 X-MC-Unique: 1Vb9B6dEMhSH75lsjU4fxA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id DD6A0811E9C; Thu, 16 Feb 2023 21:48:51 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id DD28F2026D4B; Thu, 16 Feb 2023 21:48:49 +0000 (UTC) From: David Howells To: Steve French Cc: David Howells , Jens Axboe , Al Viro , Shyam Prasad N , Rohith Surabattula , Tom Talpey , Stefan Metzmacher , Christoph Hellwig , Matthew Wilcox , Jeff Layton , linux-cifs@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Steve French Subject: [PATCH 16/17] cifs: Remove unused code Date: Thu, 16 Feb 2023 21:47:44 +0000 Message-Id: <20230216214745.3985496-17-dhowells@redhat.com> In-Reply-To: <20230216214745.3985496-1-dhowells@redhat.com> References: <20230216214745.3985496-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758026090191871012?= X-GMAIL-MSGID: =?utf-8?q?1758026090191871012?= Remove a bunch of functions that are no longer used and are commented out after the conversion to use iterators throughout the I/O path. Signed-off-by: David Howells cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: Jeff Layton cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/164928621823.457102.8777804402615654773.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165211421039.3154751.15199634443157779005.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165348881165.2106726.2993852968344861224.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165364827876.3334034.9331465096417303889.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/166126396915.708021.2010212654244139442.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/166697261080.61150.17513116912567922274.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732033255.3186319.5527423437137895940.stgit@warthog.procyon.org.uk/ # rfc --- fs/cifs/file.c | 606 ------------------------------------------------- 1 file changed, 606 deletions(-) diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 33779d184692..60949fc352ed 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -2606,314 +2606,6 @@ static int cifs_partialpagewrite(struct page *page, unsigned from, unsigned to) return rc; } -#if 0 // TODO: Remove for iov_iter support -static struct cifs_writedata * -wdata_alloc_and_fillpages(pgoff_t tofind, struct address_space *mapping, - pgoff_t end, pgoff_t *index, - unsigned int *found_pages) -{ - struct cifs_writedata *wdata; - - wdata = cifs_writedata_alloc((unsigned int)tofind, - cifs_writev_complete); - if (!wdata) - return NULL; - - *found_pages = find_get_pages_range_tag(mapping, index, end, - PAGECACHE_TAG_DIRTY, tofind, wdata->pages); - return wdata; -} - -static unsigned int -wdata_prepare_pages(struct cifs_writedata *wdata, unsigned int found_pages, - struct address_space *mapping, - struct writeback_control *wbc, - pgoff_t end, pgoff_t *index, pgoff_t *next, bool *done) -{ - unsigned int nr_pages = 0, i; - struct page *page; - - for (i = 0; i < found_pages; i++) { - page = wdata->pages[i]; - /* - * At this point we hold neither the i_pages lock nor the - * page lock: the page may be truncated or invalidated - * (changing page->mapping to NULL), or even swizzled - * back from swapper_space to tmpfs file mapping - */ - - if (nr_pages == 0) - lock_page(page); - else if (!trylock_page(page)) - break; - - if (unlikely(page->mapping != mapping)) { - unlock_page(page); - break; - } - - if (!wbc->range_cyclic && page->index > end) { - *done = true; - unlock_page(page); - break; - } - - if (*next && (page->index != *next)) { - /* Not next consecutive page */ - unlock_page(page); - break; - } - - if (wbc->sync_mode != WB_SYNC_NONE) - wait_on_page_writeback(page); - - if (PageWriteback(page) || - !clear_page_dirty_for_io(page)) { - unlock_page(page); - break; - } - - /* - * This actually clears the dirty bit in the radix tree. - * See cifs_writepage() for more commentary. - */ - set_page_writeback(page); - if (page_offset(page) >= i_size_read(mapping->host)) { - *done = true; - unlock_page(page); - end_page_writeback(page); - break; - } - - wdata->pages[i] = page; - *next = page->index + 1; - ++nr_pages; - } - - /* reset index to refind any pages skipped */ - if (nr_pages == 0) - *index = wdata->pages[0]->index + 1; - - /* put any pages we aren't going to use */ - for (i = nr_pages; i < found_pages; i++) { - put_page(wdata->pages[i]); - wdata->pages[i] = NULL; - } - - return nr_pages; -} - -static int -wdata_send_pages(struct cifs_writedata *wdata, unsigned int nr_pages, - struct address_space *mapping, struct writeback_control *wbc) -{ - int rc; - - wdata->sync_mode = wbc->sync_mode; - wdata->nr_pages = nr_pages; - wdata->offset = page_offset(wdata->pages[0]); - wdata->pagesz = PAGE_SIZE; - wdata->tailsz = min(i_size_read(mapping->host) - - page_offset(wdata->pages[nr_pages - 1]), - (loff_t)PAGE_SIZE); - wdata->bytes = ((nr_pages - 1) * PAGE_SIZE) + wdata->tailsz; - wdata->pid = wdata->cfile->pid; - - rc = adjust_credits(wdata->server, &wdata->credits, wdata->bytes); - if (rc) - return rc; - - if (wdata->cfile->invalidHandle) - rc = -EAGAIN; - else - rc = wdata->server->ops->async_writev(wdata, - cifs_writedata_release); - - return rc; -} - -static int -cifs_writepage_locked(struct page *page, struct writeback_control *wbc); - -static int cifs_write_one_page(struct page *page, struct writeback_control *wbc, - void *data) -{ - struct address_space *mapping = data; - int ret; - - ret = cifs_writepage_locked(page, wbc); - unlock_page(page); - mapping_set_error(mapping, ret); - return ret; -} - -static int cifs_writepages(struct address_space *mapping, - struct writeback_control *wbc) -{ - struct inode *inode = mapping->host; - struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); - struct TCP_Server_Info *server; - bool done = false, scanned = false, range_whole = false; - pgoff_t end, index; - struct cifs_writedata *wdata; - struct cifsFileInfo *cfile = NULL; - int rc = 0; - int saved_rc = 0; - unsigned int xid; - - /* - * If wsize is smaller than the page cache size, default to writing - * one page at a time. - */ - if (cifs_sb->ctx->wsize < PAGE_SIZE) - return write_cache_pages(mapping, wbc, cifs_write_one_page, - mapping); - - xid = get_xid(); - if (wbc->range_cyclic) { - index = mapping->writeback_index; /* Start from prev offset */ - end = -1; - } else { - index = wbc->range_start >> PAGE_SHIFT; - end = wbc->range_end >> PAGE_SHIFT; - if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) - range_whole = true; - scanned = true; - } - server = cifs_pick_channel(cifs_sb_master_tcon(cifs_sb)->ses); - -retry: - while (!done && index <= end) { - unsigned int i, nr_pages, found_pages, wsize; - pgoff_t next = 0, tofind, saved_index = index; - struct cifs_credits credits_on_stack; - struct cifs_credits *credits = &credits_on_stack; - int get_file_rc = 0; - - if (cfile) - cifsFileInfo_put(cfile); - - rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY, &cfile); - - /* in case of an error store it to return later */ - if (rc) - get_file_rc = rc; - - rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->wsize, - &wsize, credits); - if (rc != 0) { - done = true; - break; - } - - tofind = min((wsize / PAGE_SIZE) - 1, end - index) + 1; - - wdata = wdata_alloc_and_fillpages(tofind, mapping, end, &index, - &found_pages); - if (!wdata) { - rc = -ENOMEM; - done = true; - add_credits_and_wake_if(server, credits, 0); - break; - } - - if (found_pages == 0) { - kref_put(&wdata->refcount, cifs_writedata_release); - add_credits_and_wake_if(server, credits, 0); - break; - } - - nr_pages = wdata_prepare_pages(wdata, found_pages, mapping, wbc, - end, &index, &next, &done); - - /* nothing to write? */ - if (nr_pages == 0) { - kref_put(&wdata->refcount, cifs_writedata_release); - add_credits_and_wake_if(server, credits, 0); - continue; - } - - wdata->credits = credits_on_stack; - wdata->cfile = cfile; - wdata->server = server; - cfile = NULL; - - if (!wdata->cfile) { - cifs_dbg(VFS, "No writable handle in writepages rc=%d\n", - get_file_rc); - if (is_retryable_error(get_file_rc)) - rc = get_file_rc; - else - rc = -EBADF; - } else - rc = wdata_send_pages(wdata, nr_pages, mapping, wbc); - - for (i = 0; i < nr_pages; ++i) - unlock_page(wdata->pages[i]); - - /* send failure -- clean up the mess */ - if (rc != 0) { - add_credits_and_wake_if(server, &wdata->credits, 0); - for (i = 0; i < nr_pages; ++i) { - if (is_retryable_error(rc)) - redirty_page_for_writepage(wbc, - wdata->pages[i]); - else - SetPageError(wdata->pages[i]); - end_page_writeback(wdata->pages[i]); - put_page(wdata->pages[i]); - } - if (!is_retryable_error(rc)) - mapping_set_error(mapping, rc); - } - kref_put(&wdata->refcount, cifs_writedata_release); - - if (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN) { - index = saved_index; - continue; - } - - /* Return immediately if we received a signal during writing */ - if (is_interrupt_error(rc)) { - done = true; - break; - } - - if (rc != 0 && saved_rc == 0) - saved_rc = rc; - - wbc->nr_to_write -= nr_pages; - if (wbc->nr_to_write <= 0) - done = true; - - index = next; - } - - if (!scanned && !done) { - /* - * We hit the last page and there is more work to be done: wrap - * back to the start of the file - */ - scanned = true; - index = 0; - goto retry; - } - - if (saved_rc != 0) - rc = saved_rc; - - if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0)) - mapping->writeback_index = index; - - if (cfile) - cifsFileInfo_put(cfile); - free_xid(xid); - /* Indication to update ctime and mtime as close is deferred */ - set_bit(CIFS_INO_MODIFIED_ATTR, &CIFS_I(inode)->flags); - return rc; -} -#endif - /* * Extend the region to be written back to include subsequent contiguously * dirty pages if possible, but don't sleep while doing so. @@ -3509,49 +3201,6 @@ int cifs_flush(struct file *file, fl_owner_t id) return rc; } -#if 0 // TODO: Remove for iov_iter support -static int -cifs_write_allocate_pages(struct page **pages, unsigned long num_pages) -{ - int rc = 0; - unsigned long i; - - for (i = 0; i < num_pages; i++) { - pages[i] = alloc_page(GFP_KERNEL|__GFP_HIGHMEM); - if (!pages[i]) { - /* - * save number of pages we have already allocated and - * return with ENOMEM error - */ - num_pages = i; - rc = -ENOMEM; - break; - } - } - - if (rc) { - for (i = 0; i < num_pages; i++) - put_page(pages[i]); - } - return rc; -} - -static inline -size_t get_numpages(const size_t wsize, const size_t len, size_t *cur_len) -{ - size_t num_pages; - size_t clen; - - clen = min_t(const size_t, len, wsize); - num_pages = DIV_ROUND_UP(clen, PAGE_SIZE); - - if (cur_len) - *cur_len = clen; - - return num_pages; -} -#endif - static void cifs_uncached_writedata_release(struct kref *refcount) { @@ -3584,50 +3233,6 @@ cifs_uncached_writev_complete(struct work_struct *work) kref_put(&wdata->refcount, cifs_uncached_writedata_release); } -#if 0 // TODO: Remove for iov_iter support -static int -wdata_fill_from_iovec(struct cifs_writedata *wdata, struct iov_iter *from, - size_t *len, unsigned long *num_pages) -{ - size_t save_len, copied, bytes, cur_len = *len; - unsigned long i, nr_pages = *num_pages; - - save_len = cur_len; - for (i = 0; i < nr_pages; i++) { - bytes = min_t(const size_t, cur_len, PAGE_SIZE); - copied = copy_page_from_iter(wdata->pages[i], 0, bytes, from); - cur_len -= copied; - /* - * If we didn't copy as much as we expected, then that - * may mean we trod into an unmapped area. Stop copying - * at that point. On the next pass through the big - * loop, we'll likely end up getting a zero-length - * write and bailing out of it. - */ - if (copied < bytes) - break; - } - cur_len = save_len - cur_len; - *len = cur_len; - - /* - * If we have no data to send, then that probably means that - * the copy above failed altogether. That's most likely because - * the address in the iovec was bogus. Return -EFAULT and let - * the caller free anything we allocated and bail out. - */ - if (!cur_len) - return -EFAULT; - - /* - * i + 1 now represents the number of pages we actually used in - * the copy phase above. - */ - *num_pages = i + 1; - return 0; -} -#endif - static int cifs_resend_wdata(struct cifs_writedata *wdata, struct list_head *wdata_list, struct cifs_aio_ctx *ctx) @@ -4214,83 +3819,6 @@ cifs_uncached_readv_complete(struct work_struct *work) kref_put(&rdata->refcount, cifs_readdata_release); } -#if 0 // TODO: Remove for iov_iter support - -static int -uncached_fill_pages(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, struct iov_iter *iter, - unsigned int len) -{ - int result = 0; - unsigned int i; - unsigned int nr_pages = rdata->nr_pages; - unsigned int page_offset = rdata->page_offset; - - rdata->got_bytes = 0; - rdata->tailsz = PAGE_SIZE; - for (i = 0; i < nr_pages; i++) { - struct page *page = rdata->pages[i]; - size_t n; - unsigned int segment_size = rdata->pagesz; - - if (i == 0) - segment_size -= page_offset; - else - page_offset = 0; - - - if (len <= 0) { - /* no need to hold page hostage */ - rdata->pages[i] = NULL; - rdata->nr_pages--; - put_page(page); - continue; - } - - n = len; - if (len >= segment_size) - /* enough data to fill the page */ - n = segment_size; - else - rdata->tailsz = len; - len -= n; - - if (iter) - result = copy_page_from_iter( - page, page_offset, n, iter); -#ifdef CONFIG_CIFS_SMB_DIRECT - else if (rdata->mr) - result = n; -#endif - else - result = cifs_read_page_from_socket( - server, page, page_offset, n); - if (result < 0) - break; - - rdata->got_bytes += result; - } - - return result != -ECONNABORTED && rdata->got_bytes > 0 ? - rdata->got_bytes : result; -} - -static int -cifs_uncached_read_into_pages(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, unsigned int len) -{ - return uncached_fill_pages(server, rdata, NULL, len); -} - -static int -cifs_uncached_copy_into_pages(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, - struct iov_iter *iter) -{ - return uncached_fill_pages(server, rdata, iter, iter->count); -} -#endif - static int cifs_resend_rdata(struct cifs_readdata *rdata, struct list_head *rdata_list, struct cifs_aio_ctx *ctx) @@ -4900,140 +4428,6 @@ int cifs_file_mmap(struct file *file, struct vm_area_struct *vma) return rc; } -#if 0 // TODO: Remove for iov_iter support - -static void -cifs_readv_complete(struct work_struct *work) -{ - unsigned int i, got_bytes; - struct cifs_readdata *rdata = container_of(work, - struct cifs_readdata, work); - - got_bytes = rdata->got_bytes; - for (i = 0; i < rdata->nr_pages; i++) { - struct page *page = rdata->pages[i]; - - if (rdata->result == 0 || - (rdata->result == -EAGAIN && got_bytes)) { - flush_dcache_page(page); - SetPageUptodate(page); - } else - SetPageError(page); - - if (rdata->result == 0 || - (rdata->result == -EAGAIN && got_bytes)) - cifs_readpage_to_fscache(rdata->mapping->host, page); - - unlock_page(page); - - got_bytes -= min_t(unsigned int, PAGE_SIZE, got_bytes); - - put_page(page); - rdata->pages[i] = NULL; - } - kref_put(&rdata->refcount, cifs_readdata_release); -} - -static int -readpages_fill_pages(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, struct iov_iter *iter, - unsigned int len) -{ - int result = 0; - unsigned int i; - u64 eof; - pgoff_t eof_index; - unsigned int nr_pages = rdata->nr_pages; - unsigned int page_offset = rdata->page_offset; - - /* determine the eof that the server (probably) has */ - eof = CIFS_I(rdata->mapping->host)->server_eof; - eof_index = eof ? (eof - 1) >> PAGE_SHIFT : 0; - cifs_dbg(FYI, "eof=%llu eof_index=%lu\n", eof, eof_index); - - rdata->got_bytes = 0; - rdata->tailsz = PAGE_SIZE; - for (i = 0; i < nr_pages; i++) { - struct page *page = rdata->pages[i]; - unsigned int to_read = rdata->pagesz; - size_t n; - - if (i == 0) - to_read -= page_offset; - else - page_offset = 0; - - n = to_read; - - if (len >= to_read) { - len -= to_read; - } else if (len > 0) { - /* enough for partial page, fill and zero the rest */ - zero_user(page, len + page_offset, to_read - len); - n = rdata->tailsz = len; - len = 0; - } else if (page->index > eof_index) { - /* - * The VFS will not try to do readahead past the - * i_size, but it's possible that we have outstanding - * writes with gaps in the middle and the i_size hasn't - * caught up yet. Populate those with zeroed out pages - * to prevent the VFS from repeatedly attempting to - * fill them until the writes are flushed. - */ - zero_user(page, 0, PAGE_SIZE); - flush_dcache_page(page); - SetPageUptodate(page); - unlock_page(page); - put_page(page); - rdata->pages[i] = NULL; - rdata->nr_pages--; - continue; - } else { - /* no need to hold page hostage */ - unlock_page(page); - put_page(page); - rdata->pages[i] = NULL; - rdata->nr_pages--; - continue; - } - - if (iter) - result = copy_page_from_iter( - page, page_offset, n, iter); -#ifdef CONFIG_CIFS_SMB_DIRECT - else if (rdata->mr) - result = n; -#endif - else - result = cifs_read_page_from_socket( - server, page, page_offset, n); - if (result < 0) - break; - - rdata->got_bytes += result; - } - - return result != -ECONNABORTED && rdata->got_bytes > 0 ? - rdata->got_bytes : result; -} - -static int -cifs_readpages_read_into_pages(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, unsigned int len) -{ - return readpages_fill_pages(server, rdata, NULL, len); -} - -static int -cifs_readpages_copy_into_pages(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, - struct iov_iter *iter) -{ - return readpages_fill_pages(server, rdata, iter, iter->count); -} -#endif - /* * Unlock a bunch of folios in the pagecache. */ From patchwork Thu Feb 16 21:47:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 58258 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp548434wrn; Thu, 16 Feb 2023 13:56:01 -0800 (PST) X-Google-Smtp-Source: AK7set8TeZxvgybX/9bBs/rAuRZY5rNvKEeGall2mWGrR/EeCzmcfWfCmlr/TiAY0yJCwLEroSmI X-Received: by 2002:a17:906:5490:b0:885:fb8a:7c3f with SMTP id r16-20020a170906549000b00885fb8a7c3fmr9221927ejo.65.1676584560917; Thu, 16 Feb 2023 13:56:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676584560; cv=none; d=google.com; s=arc-20160816; b=fHc3nbdgSDTstmZc7BMtm3EIQUhAXpLh3CDGo5s78VUvOgyhRMG3k13fw9WYh/I/7I wSmFLHLy/HWShyeCvjQG+f/7saN1zoZS6hL2SCaUxbOlKDkt1eSb05nHIgym8MHoFlvL 0YQL32kok03OcYbrzd3YNW2SOEQ6HCdZ/sG0WWJqNHQooLEAGAyZou7orcSGpnEKg03W DsNvhXJIsp/eFR/Q9vuQ9bt6njzgPw849JCdot5mKbGUAhP8ce9vuW0DRgWGd6i80D2X tLnrad09CUDVhDDH9rffq0zvTWl01HW1xDy7I+I762r6sjtkMZR0wFrxzsLR+pfYrvSt jn2g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=41+Q30ZI9D1dRm8dDNs28BfljdF5S2Rwn6LqsfU1dM4=; b=i98/PGnWZeshR4M0iZh78DnmvfydD4bS496tSBrIIJmBzI1V8Yo3fOP5dxA6CbZZSn up4bGhV6Qm5foLcl0oWxIoURRVSUyOkbHlllbWWQA+0ajulXLktmYWRx8IzMQ8zzEJOG oVQPm6/VsS3d524NcJ8Y6/wnS48cKF2FVR86BG0RJgwy0qfpus0WsyzTPDTkvJC3GO1D hbGvEEgDPza1BAwOLoeC9Id7WA/5yDEQgkQxNB5qqnWE8nIpaT0uzr9d3SLUmkbuzosS nzwVSDUE+YNMNUZ+yGjN4UlNhmUPkV2Na2oqg8RuFpmJbZs2/Tkoy8l/d2gYG6eAM0/z 4NGA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NCdTurLt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id mt30-20020a170907619e00b008aac189f502si1282937ejc.504.2023.02.16.13.55.37; Thu, 16 Feb 2023 13:56:00 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NCdTurLt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230385AbjBPVv2 (ORCPT + 99 others); Thu, 16 Feb 2023 16:51:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37556 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230375AbjBPVuV (ORCPT ); Thu, 16 Feb 2023 16:50:21 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 76D3352895 for ; Thu, 16 Feb 2023 13:48:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676584138; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=41+Q30ZI9D1dRm8dDNs28BfljdF5S2Rwn6LqsfU1dM4=; b=NCdTurLtiDgfk74Q2rf5Wkm/vaYKSWpz8tETrhdf9MUqR8PuLMtqCzqq6soXVT9Eouk5Hd RyKhsR7FqCv33Vojb/G3iShoxgUmevwpPC+gQI8LrYFDovzHEpgvG2TuTyKYyZ9t2r7wae z2IEZXlnGRSELL1bGsCtnT/iv2iLNJA= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-608-VO6ytfueMLuoZZiqci4i-A-1; Thu, 16 Feb 2023 16:48:55 -0500 X-MC-Unique: VO6ytfueMLuoZZiqci4i-A-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 632A71C05EB1; Thu, 16 Feb 2023 21:48:54 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8077F2166B30; Thu, 16 Feb 2023 21:48:52 +0000 (UTC) From: David Howells To: Steve French Cc: David Howells , Jens Axboe , Al Viro , Shyam Prasad N , Rohith Surabattula , Tom Talpey , Stefan Metzmacher , Christoph Hellwig , Matthew Wilcox , Jeff Layton , linux-cifs@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Steve French Subject: [PATCH 17/17] cifs: DIO to/from KVEC-type iterators should now work Date: Thu, 16 Feb 2023 21:47:45 +0000 Message-Id: <20230216214745.3985496-18-dhowells@redhat.com> In-Reply-To: <20230216214745.3985496-1-dhowells@redhat.com> References: <20230216214745.3985496-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758026332765291532?= X-GMAIL-MSGID: =?utf-8?q?1758026332765291532?= DIO to/from KVEC-type iterators should now work as the iterator is passed down to the socket in non-RDMA/non-crypto mode and in RDMA or crypto mode care is taken to handle vmap/vmalloc correctly and not take page refs when building a scatterlist. Signed-off-by: David Howells cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: Tom Talpey cc: Jeff Layton cc: linux-cifs@vger.kernel.org --- fs/cifs/file.c | 20 -------------------- 1 file changed, 20 deletions(-) diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 60949fc352ed..6969699632dc 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -3549,16 +3549,6 @@ static ssize_t __cifs_writev( struct cifs_aio_ctx *ctx; int rc; - /* - * iov_iter_get_pages_alloc doesn't work with ITER_KVEC. - * In this case, fall back to non-direct write function. - * this could be improved by getting pages directly in ITER_KVEC - */ - if (direct && iov_iter_is_kvec(from)) { - cifs_dbg(FYI, "use non-direct cifs_writev for kvec I/O\n"); - direct = false; - } - rc = generic_write_checks(iocb, from); if (rc <= 0) return rc; @@ -4092,16 +4082,6 @@ static ssize_t __cifs_readv( loff_t offset = iocb->ki_pos; struct cifs_aio_ctx *ctx; - /* - * iov_iter_get_pages_alloc() doesn't work with ITER_KVEC, - * fall back to data copy read path - * this could be improved by getting pages directly in ITER_KVEC - */ - if (direct && iov_iter_is_kvec(to)) { - cifs_dbg(FYI, "use non-direct cifs_user_readv for kvec I/O\n"); - direct = false; - } - len = iov_iter_count(to); if (!len) return 0;