From patchwork Mon Jan 16 23:08:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44352 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1446800wrn; Mon, 16 Jan 2023 15:10:49 -0800 (PST) X-Google-Smtp-Source: AMrXdXv79gP/rW+UIdcG+W2G/yhK5K9s452DpWDgBKyintSe+qL4Z01quL69Cj7PSEG/WEeClySQ X-Received: by 2002:a05:6402:5110:b0:49d:fff2:d4d7 with SMTP id m16-20020a056402511000b0049dfff2d4d7mr1122547edd.30.1673910649250; Mon, 16 Jan 2023 15:10:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673910649; cv=none; d=google.com; s=arc-20160816; b=WvTOYyjoTf4uHvys+7myZrUgvjeWCwtPfacKAtAhdijEm8hbQcgfs4nroXcgb2bjWg pvb5oH4A8H6rdW67vFU2+OXmRZ5N7bDS6CqJgAUfofgcoSAnvYMonGxh8maSLVKpzuC8 u36XM72jROPW35yMs3MF39m1IqJSSIwAvJdky+gkzn7JjN04j2CEAES0XY5YXuaTH/tc D0mphHTCR6Xbgy5wGmKxmSolK0h/WBJ672xr5SNMzOPkN3NBLEVhErn7n8gWWrkmdr8G mha2AOJhfLUo6RpSJT85M6kPV+SCZrjYFF7w2R/4xTt3vK5lliMogQcu9Y8dRTJIwLUY BOjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=U7dHEM5AmzKgzCCEEdwyvVJA92oDXUOBOpHeITrEM9k=; b=W0ZCPa/uvl0wCCIQQhdZHdBr3z5mICeKw82Qd3ubxHp13/ls27xN3tE888geedc2TY Lm1gP7dUZVoOJjKpBPQyukvRskT1I4ay92BYQQpInWPB+1n8rQq4VPJczAuse12Sr6sy PupOXyjjb37NbhHoFYYdxND3sprNMA7x8ZJ1yrOFIY81T9Vub1MtzyvWRuW6oTMDCKMZ pUc3eTQvvcaMqbpeCn2DQLqH6D353qO3nevjEXSBn0zD9JoFd2tRIa6uTeN6E71wyTun Io/0fpGez936nsQI08iiY2HOphgKA1Kmx9EVKCKWaABck3qvVCC1oEI+Pjatjn/Elbau ZysQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=SebS2ZJ5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j20-20020a05640211d400b0047db5ff4936si2046833edw.490.2023.01.16.15.10.25; Mon, 16 Jan 2023 15:10:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=SebS2ZJ5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235175AbjAPXJn (ORCPT + 99 others); Mon, 16 Jan 2023 18:09:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48154 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235196AbjAPXJG (ORCPT ); Mon, 16 Jan 2023 18:09:06 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A59C722032 for ; Mon, 16 Jan 2023 15:08:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910497; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=U7dHEM5AmzKgzCCEEdwyvVJA92oDXUOBOpHeITrEM9k=; b=SebS2ZJ5nIMKTlmgSs5/X8GGIt2lkmWBsgRyl4+aqtEYTK8lFKkzfISV1WgkRF+4R0KbCh xr5/VFa+1hWgFFBiep47MVwNbTzaR6mECGJo3EL90kNE8r76eVDUPw9aV8z//2w22xXR/x cr6Nd2a+LvpgCgBU5AlhvYPSqP50W2A= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-574-m8_Zm0OtPyCt81uCEzo9Ig-1; Mon, 16 Jan 2023 18:08:12 -0500 X-MC-Unique: m8_Zm0OtPyCt81uCEzo9Ig-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CE7FB101A52E; Mon, 16 Jan 2023 23:08:11 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7D3F9492B00; Mon, 16 Jan 2023 23:08:10 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 01/34] vfs: Unconditionally set IOCB_WRITE in call_write_iter() From: David Howells To: Al Viro Cc: Christoph Hellwig , Jens Axboe , linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:08:09 +0000 Message-ID: <167391048988.2311931.1567396746365286847.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755222532591891773?= X-GMAIL-MSGID: =?utf-8?q?1755222532591891773?= IOCB_WRITE is set by aio, io_uring and cachefiles before submitting a write operation to the VFS, but it isn't set by, say, the write() system call. Fix this by setting IOCB_WRITE unconditionally in call_write_iter(). This will allow drivers to use IOCB_WRITE instead of the iterator data source to determine the I/O direction. Signed-off-by: David Howells cc: Alexander Viro cc: Christoph Hellwig cc: Jens Axboe cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org --- include/linux/fs.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/fs.h b/include/linux/fs.h index 066555ad1bf8..649ff061440e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2183,6 +2183,7 @@ static inline ssize_t call_read_iter(struct file *file, struct kiocb *kio, static inline ssize_t call_write_iter(struct file *file, struct kiocb *kio, struct iov_iter *iter) { + kio->ki_flags |= IOCB_WRITE; return file->f_op->write_iter(kio, iter); } From patchwork Mon Jan 16 23:08:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44353 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1446963wrn; Mon, 16 Jan 2023 15:11:17 -0800 (PST) X-Google-Smtp-Source: AMrXdXtFIstJ6MofjAkYsjCzRavOWD47KTcuB8nl1xYA8JQrVEiCZqVT1naLCY0bT77NiG8xVI2F X-Received: by 2002:a17:907:29c4:b0:86e:a761:c5bd with SMTP id ev4-20020a17090729c400b0086ea761c5bdmr603408ejc.34.1673910677211; Mon, 16 Jan 2023 15:11:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673910677; cv=none; d=google.com; s=arc-20160816; b=YGzHLHo0MxEJDoGeFQJw6W4SCMFD61Qsxv/OFG0NgaEUX+e4vtKP8mzPfFzHRRiT9n icB/hHgUljSUzMhEIReITLDHuBp2a1ZU1a6YdVmV9WHMbwig1XBkq6GRkxx2vDPYqLan jd9ubsg1oO7XXCGZquIjUMU9bE6fesM32XVF5kU1/iXvlxOCZQgVWsg5/a+r61102B6X 45nNXJu7au5KDcWZNbZS7rFqRJim5k0rgcGawhdTQ7viWGF0R5rWlRyx66zmnfnT1jZl 7BfcExPdf3BMbiuziKkR33VZg5AxK+ghSmT8lWlLOboKQdmnns4iGU4jOkLK18qdQEGA 3IgA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=/M45GDyobIrHzzTTPdxvHNS6ahkXl3LWGYzDNmotxYs=; b=Z49H5I9sOx+wP0f4H89JcjAXMkRXolefLcGN2bQQf84E+ZcaFrPc1iD2NL7Fyh7NFI GBK+4qVCPjJUm8hm7w+bPjOCdj0Bdi6OVKWxr28H3yjNkU2pWpQdn5SG3wY1MHtDSJ2Q AtuBKxw8lb8K52emLTUjqbs5dbqiBLCn4WCThxDgRYcH6+1GMLbXvu2TZskIFGSugUIl 4CIRJ4fVHB0etZtIkP+unpaWHgXm8SkiKe9ElzijEw628bCnc3HZ97GyWEhA6y6KXADa hoZp21AR0cXXOHyVKPiZSmh42ZWo3jMLRoG1O7306pld4C9wwkf+m/oMg0fTPZOtflG8 TGrQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Sfqjs0j9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bf3-20020a170907098300b00871d5ec9073si1484993ejc.756.2023.01.16.15.10.53; Mon, 16 Jan 2023 15:11:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Sfqjs0j9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235046AbjAPXJ6 (ORCPT + 99 others); Mon, 16 Jan 2023 18:09:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48282 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235263AbjAPXJM (ORCPT ); Mon, 16 Jan 2023 18:09:12 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C4545222C1 for ; Mon, 16 Jan 2023 15:08:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910503; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/M45GDyobIrHzzTTPdxvHNS6ahkXl3LWGYzDNmotxYs=; b=Sfqjs0j9slXxyQeHcLmH7T/70HIn+1sfXaYPMpQEIe9wfioAh2YuY2wULhPW8Rz05RXfA6 INWfAIFAdRr2IhF0z30GdvgL3iF1RnD9GkNZaerJnbFeJ0dQtpkuKuPa/2ql1D0CiL/JxG cBNgRSpenNVXpT4B5wH7pDglPpg7nio= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-617-Hy5vodhWN_CQKmWbF5lN4w-1; Mon, 16 Jan 2023 18:08:19 -0500 X-MC-Unique: Hy5vodhWN_CQKmWbF5lN4w-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id EA79B3C0CD55; Mon, 16 Jan 2023 23:08:18 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 884EF492B00; Mon, 16 Jan 2023 23:08:17 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 02/34] iov_iter: Use IOCB/IOMAP_WRITE/op_is_write rather than iterator direction From: David Howells To: Al Viro Cc: dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:08:17 +0000 Message-ID: <167391049698.2311931.13641162904441620555.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755222562291653454?= X-GMAIL-MSGID: =?utf-8?q?1755222562291653454?= Use information other than the iterator direction to determine the direction of the I/O: (*) If a kiocb is available, use the IOCB_WRITE flag. (*) If an iomap_iter is available, use the IOMAP_WRITE flag. (*) If a request is available, use op_is_write(). Drop the check on the iterator in smbd_recv() and its warning. This leaves __iov_iter_get_pages_alloc() the only user of iov_iter_rw(), so move it there and uninline it. Changes: ======== ver #6) - Move to the front of the patchset. - Added iocb_is_read() and iocb_is_write() to check IOCB_WRITE. - Use op_is_write() in bio_copy_user_iov(). - Drop the checks from smbd_recv(). Signed-off-by: David Howells cc: Al Viro Link: https://lore.kernel.org/r/167305163159.1521586.9460968250704377087.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/167344727810.2425628.4715663653893036683.stgit@warthog.procyon.org.uk/ # v5 --- block/blk-map.c | 2 +- block/fops.c | 8 ++++---- fs/9p/vfs_addr.c | 2 +- fs/affs/file.c | 4 ++-- fs/ceph/file.c | 2 +- fs/cifs/smbdirect.c | 9 --------- fs/dax.c | 6 +++--- fs/direct-io.c | 22 +++++++++++----------- fs/exfat/inode.c | 6 +++--- fs/ext2/inode.c | 2 +- fs/f2fs/file.c | 10 +++++----- fs/fat/inode.c | 4 ++-- fs/fuse/dax.c | 2 +- fs/fuse/file.c | 8 ++++---- fs/hfs/inode.c | 2 +- fs/hfsplus/inode.c | 2 +- fs/iomap/direct-io.c | 6 +++--- fs/jfs/inode.c | 2 +- fs/nfs/direct.c | 2 +- fs/nilfs2/inode.c | 2 +- fs/ntfs3/inode.c | 2 +- fs/ocfs2/aops.c | 2 +- fs/orangefs/inode.c | 2 +- fs/reiserfs/inode.c | 2 +- fs/udf/inode.c | 2 +- include/linux/fs.h | 10 ++++++++++ include/linux/uio.h | 5 ----- lib/iov_iter.c | 5 +++++ 28 files changed, 67 insertions(+), 66 deletions(-) diff --git a/block/blk-map.c b/block/blk-map.c index 19940c978c73..08cbb7ff3b19 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -203,7 +203,7 @@ static int bio_copy_user_iov(struct request *rq, struct rq_map_data *map_data, /* * success */ - if ((iov_iter_rw(iter) == WRITE && + if ((op_is_write(rq->cmd_flags) && (!map_data || !map_data->null_mapped)) || (map_data && map_data->from_user)) { ret = bio_copy_from_iter(bio, iter); diff --git a/block/fops.c b/block/fops.c index 50d245e8c913..5d376285edde 100644 --- a/block/fops.c +++ b/block/fops.c @@ -73,7 +73,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb, return -ENOMEM; } - if (iov_iter_rw(iter) == READ) { + if (iocb_is_read(iocb)) { bio_init(&bio, bdev, vecs, nr_pages, REQ_OP_READ); if (user_backed_iter(iter)) should_dirty = true; @@ -88,7 +88,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb, goto out; ret = bio.bi_iter.bi_size; - if (iov_iter_rw(iter) == WRITE) + if (iocb_is_write(iocb)) task_io_account_write(ret); if (iocb->ki_flags & IOCB_NOWAIT) @@ -174,7 +174,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, struct blk_plug plug; struct blkdev_dio *dio; struct bio *bio; - bool is_read = (iov_iter_rw(iter) == READ), is_sync; + bool is_read = iocb_is_read(iocb), is_sync; blk_opf_t opf = is_read ? REQ_OP_READ : dio_bio_write_op(iocb); loff_t pos = iocb->ki_pos; int ret = 0; @@ -296,7 +296,7 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb, unsigned int nr_pages) { struct block_device *bdev = iocb->ki_filp->private_data; - bool is_read = iov_iter_rw(iter) == READ; + bool is_read = iocb_is_read(iocb); blk_opf_t opf = is_read ? REQ_OP_READ : dio_bio_write_op(iocb); struct blkdev_dio *dio; struct bio *bio; diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c index 97599edbc300..080be076b7b6 100644 --- a/fs/9p/vfs_addr.c +++ b/fs/9p/vfs_addr.c @@ -254,7 +254,7 @@ v9fs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) ssize_t n; int err = 0; - if (iov_iter_rw(iter) == WRITE) { + if (iocb_is_write(iocb)) { n = p9_client_write(file->private_data, pos, iter, &err); if (n) { struct inode *inode = file_inode(file); diff --git a/fs/affs/file.c b/fs/affs/file.c index cefa222f7881..0dc67fc5d6cb 100644 --- a/fs/affs/file.c +++ b/fs/affs/file.c @@ -400,7 +400,7 @@ affs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) loff_t offset = iocb->ki_pos; ssize_t ret; - if (iov_iter_rw(iter) == WRITE) { + if (iocb_is_write(iocb)) { loff_t size = offset + count; if (AFFS_I(inode)->mmu_private < size) @@ -408,7 +408,7 @@ affs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) } ret = blockdev_direct_IO(iocb, inode, iter, affs_get_block); - if (ret < 0 && iov_iter_rw(iter) == WRITE) + if (ret < 0 && iocb_is_write(iocb)) affs_write_failed(mapping, offset + count); return ret; } diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 764598e1efd9..27c72a2f6af5 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -1284,7 +1284,7 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter, struct timespec64 mtime = current_time(inode); size_t count = iov_iter_count(iter); loff_t pos = iocb->ki_pos; - bool write = iov_iter_rw(iter) == WRITE; + bool write = iocb_is_write(iocb); bool should_dirty = !write && user_backed_iter(iter); if (write && ceph_snap(file_inode(file)) != CEPH_NOSNAP) diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c index 90789aaa6567..3e693ffd0662 100644 --- a/fs/cifs/smbdirect.c +++ b/fs/cifs/smbdirect.c @@ -1938,14 +1938,6 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg) unsigned int to_read, page_offset; int rc; - if (iov_iter_rw(&msg->msg_iter) == WRITE) { - /* It's a bug in upper layer to get there */ - cifs_dbg(VFS, "Invalid msg iter dir %u\n", - iov_iter_rw(&msg->msg_iter)); - rc = -EINVAL; - goto out; - } - switch (iov_iter_type(&msg->msg_iter)) { case ITER_KVEC: buf = msg->msg_iter.kvec->iov_base; @@ -1967,7 +1959,6 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg) rc = -EINVAL; } -out: /* SMBDirect will read it all or nothing */ if (rc > 0) msg->msg_iter.count = 0; diff --git a/fs/dax.c b/fs/dax.c index c48a3a93ab29..b538a2ab7b66 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1405,7 +1405,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi, loff_t pos = iomi->pos; struct dax_device *dax_dev = iomap->dax_dev; loff_t end = pos + length, done = 0; - bool write = iov_iter_rw(iter) == WRITE; + bool write = iomi->flags & IOMAP_WRITE; bool cow = write && iomap->flags & IOMAP_F_SHARED; ssize_t ret = 0; size_t xfer; @@ -1455,7 +1455,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi, map_len = dax_direct_access(dax_dev, pgoff, PHYS_PFN(size), DAX_ACCESS, &kaddr, NULL); - if (map_len == -EIO && iov_iter_rw(iter) == WRITE) { + if (map_len == -EIO && write) { map_len = dax_direct_access(dax_dev, pgoff, PHYS_PFN(size), DAX_RECOVERY_WRITE, &kaddr, NULL); @@ -1530,7 +1530,7 @@ dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter, if (!iomi.len) return 0; - if (iov_iter_rw(iter) == WRITE) { + if (iocb_is_write(iocb)) { lockdep_assert_held_write(&iomi.inode->i_rwsem); iomi.flags |= IOMAP_WRITE; } else { diff --git a/fs/direct-io.c b/fs/direct-io.c index 03d381377ae1..cf196f2a211e 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -1143,7 +1143,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, */ /* watch out for a 0 len io from a tricksy fs */ - if (iov_iter_rw(iter) == READ && !count) + if (iocb_is_read(iocb) && !count) return 0; dio = kmem_cache_alloc(dio_cache, GFP_KERNEL); @@ -1157,14 +1157,14 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, memset(dio, 0, offsetof(struct dio, pages)); dio->flags = flags; - if (dio->flags & DIO_LOCKING && iov_iter_rw(iter) == READ) { + if (dio->flags & DIO_LOCKING && iocb_is_read(iocb)) { /* will be released by direct_io_worker */ inode_lock(inode); } /* Once we sampled i_size check for reads beyond EOF */ dio->i_size = i_size_read(inode); - if (iov_iter_rw(iter) == READ && offset >= dio->i_size) { + if (iocb_is_read(iocb) && offset >= dio->i_size) { retval = 0; goto fail_dio; } @@ -1177,7 +1177,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, goto fail_dio; } - if (dio->flags & DIO_LOCKING && iov_iter_rw(iter) == READ) { + if (dio->flags & DIO_LOCKING && iocb_is_read(iocb)) { struct address_space *mapping = iocb->ki_filp->f_mapping; retval = filemap_write_and_wait_range(mapping, offset, end - 1); @@ -1193,13 +1193,13 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, */ if (is_sync_kiocb(iocb)) dio->is_async = false; - else if (iov_iter_rw(iter) == WRITE && end > i_size_read(inode)) + else if (iocb_is_write(iocb) && end > i_size_read(inode)) dio->is_async = false; else dio->is_async = true; dio->inode = inode; - if (iov_iter_rw(iter) == WRITE) { + if (iocb_is_write(iocb)) { dio->opf = REQ_OP_WRITE | REQ_SYNC | REQ_IDLE; if (iocb->ki_flags & IOCB_NOWAIT) dio->opf |= REQ_NOWAIT; @@ -1211,7 +1211,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, * For AIO O_(D)SYNC writes we need to defer completions to a workqueue * so that we can call ->fsync. */ - if (dio->is_async && iov_iter_rw(iter) == WRITE) { + if (dio->is_async && iocb_is_write(iocb)) { retval = 0; if (iocb_is_dsync(iocb)) retval = dio_set_defer_completion(dio); @@ -1248,7 +1248,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, spin_lock_init(&dio->bio_lock); dio->refcount = 1; - dio->should_dirty = user_backed_iter(iter) && iov_iter_rw(iter) == READ; + dio->should_dirty = user_backed_iter(iter) && iocb_is_read(iocb); sdio.iter = iter; sdio.final_block_in_request = end >> blkbits; @@ -1305,7 +1305,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, * we can let i_mutex go now that its achieved its purpose * of protecting us from looking up uninitialized blocks. */ - if (iov_iter_rw(iter) == READ && (dio->flags & DIO_LOCKING)) + if (iocb_is_read(iocb) && (dio->flags & DIO_LOCKING)) inode_unlock(dio->inode); /* @@ -1317,7 +1317,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, */ BUG_ON(retval == -EIOCBQUEUED); if (dio->is_async && retval == 0 && dio->result && - (iov_iter_rw(iter) == READ || dio->result == count)) + (iocb_is_read(iocb) || dio->result == count)) retval = -EIOCBQUEUED; else dio_await_completion(dio); @@ -1330,7 +1330,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, return retval; fail_dio: - if (dio->flags & DIO_LOCKING && iov_iter_rw(iter) == READ) + if (dio->flags & DIO_LOCKING && iocb_is_read(iocb)) inode_unlock(inode); kmem_cache_free(dio_cache, dio); diff --git a/fs/exfat/inode.c b/fs/exfat/inode.c index 5b644cb057fa..82554aaf4fd0 100644 --- a/fs/exfat/inode.c +++ b/fs/exfat/inode.c @@ -412,10 +412,10 @@ static ssize_t exfat_direct_IO(struct kiocb *iocb, struct iov_iter *iter) struct address_space *mapping = iocb->ki_filp->f_mapping; struct inode *inode = mapping->host; loff_t size = iocb->ki_pos + iov_iter_count(iter); - int rw = iov_iter_rw(iter); + bool writing = iocb_is_write(iocb); ssize_t ret; - if (rw == WRITE) { + if (writing) { /* * FIXME: blockdev_direct_IO() doesn't use ->write_begin(), * so we need to update the ->i_size_aligned to block boundary. @@ -434,7 +434,7 @@ static ssize_t exfat_direct_IO(struct kiocb *iocb, struct iov_iter *iter) * condition of exfat_get_block() and ->truncate(). */ ret = blockdev_direct_IO(iocb, inode, iter, exfat_get_block); - if (ret < 0 && (rw & WRITE)) + if (ret < 0 && writing) exfat_write_failed(mapping, size); return ret; } diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index 69aed9e2359e..26a61f886844 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -919,7 +919,7 @@ ext2_direct_IO(struct kiocb *iocb, struct iov_iter *iter) ssize_t ret; ret = blockdev_direct_IO(iocb, inode, iter, ext2_get_block); - if (ret < 0 && iov_iter_rw(iter) == WRITE) + if (ret < 0 && iocb_is_write(iocb)) ext2_write_failed(mapping, offset + count); return ret; } diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index ecbc8c135b49..51a24580cfec 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -809,7 +809,7 @@ int f2fs_truncate(struct inode *inode) return 0; } -static bool f2fs_force_buffered_io(struct inode *inode, int rw) +static bool f2fs_force_buffered_io(struct inode *inode, bool writing) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); @@ -827,9 +827,9 @@ static bool f2fs_force_buffered_io(struct inode *inode, int rw) * for blkzoned device, fallback direct IO to buffered IO, so * all IOs can be serialized by log-structured write. */ - if (f2fs_sb_has_blkzoned(sbi) && (rw == WRITE)) + if (f2fs_sb_has_blkzoned(sbi) && writing) return true; - if (f2fs_lfs_mode(sbi) && rw == WRITE && F2FS_IO_ALIGNED(sbi)) + if (f2fs_lfs_mode(sbi) && writing && F2FS_IO_ALIGNED(sbi)) return true; if (is_sbi_flag_set(sbi, SBI_CP_DISABLED)) return true; @@ -865,7 +865,7 @@ int f2fs_getattr(struct user_namespace *mnt_userns, const struct path *path, unsigned int bsize = i_blocksize(inode); stat->result_mask |= STATX_DIOALIGN; - if (!f2fs_force_buffered_io(inode, WRITE)) { + if (!f2fs_force_buffered_io(inode, true)) { stat->dio_mem_align = bsize; stat->dio_offset_align = bsize; } @@ -4254,7 +4254,7 @@ static bool f2fs_should_use_dio(struct inode *inode, struct kiocb *iocb, if (!(iocb->ki_flags & IOCB_DIRECT)) return false; - if (f2fs_force_buffered_io(inode, iov_iter_rw(iter))) + if (f2fs_force_buffered_io(inode, iocb_is_write(iocb))) return false; /* diff --git a/fs/fat/inode.c b/fs/fat/inode.c index d99b8549ec8f..237e20891df2 100644 --- a/fs/fat/inode.c +++ b/fs/fat/inode.c @@ -261,7 +261,7 @@ static ssize_t fat_direct_IO(struct kiocb *iocb, struct iov_iter *iter) loff_t offset = iocb->ki_pos; ssize_t ret; - if (iov_iter_rw(iter) == WRITE) { + if (iocb_is_write(iocb)) { /* * FIXME: blockdev_direct_IO() doesn't use ->write_begin(), * so we need to update the ->mmu_private to block boundary. @@ -281,7 +281,7 @@ static ssize_t fat_direct_IO(struct kiocb *iocb, struct iov_iter *iter) * condition of fat_get_block() and ->truncate(). */ ret = blockdev_direct_IO(iocb, inode, iter, fat_get_block); - if (ret < 0 && iov_iter_rw(iter) == WRITE) + if (ret < 0 && iocb_is_write(iocb)) fat_write_failed(mapping, offset + count); return ret; diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c index e23e802a8013..4351376db4a1 100644 --- a/fs/fuse/dax.c +++ b/fs/fuse/dax.c @@ -720,7 +720,7 @@ static bool file_extending_write(struct kiocb *iocb, struct iov_iter *from) { struct inode *inode = file_inode(iocb->ki_filp); - return (iov_iter_rw(from) == WRITE && + return (iocb_is_write(iocb) && ((iocb->ki_pos) >= i_size_read(inode) || (iocb->ki_pos + iov_iter_count(from) > i_size_read(inode)))); } diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 875314ee6f59..d68b45f8b3ae 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2897,7 +2897,7 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter) inode = file->f_mapping->host; i_size = i_size_read(inode); - if ((iov_iter_rw(iter) == READ) && (offset >= i_size)) + if (iocb_is_read(iocb) && (offset >= i_size)) return 0; io = kmalloc(sizeof(struct fuse_io_priv), GFP_KERNEL); @@ -2909,7 +2909,7 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter) io->bytes = -1; io->size = 0; io->offset = offset; - io->write = (iov_iter_rw(iter) == WRITE); + io->write = iocb_is_write(iocb); io->err = 0; /* * By default, we want to optimize all I/Os with async request @@ -2942,7 +2942,7 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter) io->done = &wait; } - if (iov_iter_rw(iter) == WRITE) { + if (iocb_is_write(iocb)) { ret = fuse_direct_io(io, iter, &pos, FUSE_DIO_WRITE); fuse_invalidate_attr_mask(inode, FUSE_STATX_MODSIZE); } else { @@ -2965,7 +2965,7 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter) kref_put(&io->refcnt, fuse_io_release); - if (iov_iter_rw(iter) == WRITE) { + if (iocb_is_write(iocb)) { fuse_write_update_attr(inode, pos, ret); /* For extending writes we already hold exclusive lock */ if (ret < 0 && offset + count > i_size) diff --git a/fs/hfs/inode.c b/fs/hfs/inode.c index 9c329a365e75..eec166e039d5 100644 --- a/fs/hfs/inode.c +++ b/fs/hfs/inode.c @@ -141,7 +141,7 @@ static ssize_t hfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) * In case of error extending write may have instantiated a few * blocks outside i_size. Trim these off again. */ - if (unlikely(iov_iter_rw(iter) == WRITE && ret < 0)) { + if (unlikely(iocb_is_write(iocb) && ret < 0)) { loff_t isize = i_size_read(inode); loff_t end = iocb->ki_pos + count; diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c index 840577a0c1e7..2b4effb6ca3e 100644 --- a/fs/hfsplus/inode.c +++ b/fs/hfsplus/inode.c @@ -138,7 +138,7 @@ static ssize_t hfsplus_direct_IO(struct kiocb *iocb, struct iov_iter *iter) * In case of error extending write may have instantiated a few * blocks outside i_size. Trim these off again. */ - if (unlikely(iov_iter_rw(iter) == WRITE && ret < 0)) { + if (unlikely(iocb_is_write(iocb) && ret < 0)) { loff_t isize = i_size_read(inode); loff_t end = iocb->ki_pos + count; diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 9804714b1751..b03d87f116fc 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -519,7 +519,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, dio->submit.waiter = current; dio->submit.poll_bio = NULL; - if (iov_iter_rw(iter) == READ) { + if (iocb_is_read(iocb)) { if (iomi.pos >= dio->i_size) goto out_free_dio; @@ -573,7 +573,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, if (ret) goto out_free_dio; - if (iov_iter_rw(iter) == WRITE) { + if (iomi.flags & IOMAP_WRITE) { /* * Try to invalidate cache pages for the range we are writing. * If this invalidation fails, let the caller fall back to @@ -613,7 +613,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, * Revert iter to a state corresponding to that as some callers (such * as the splice code) rely on it. */ - if (iov_iter_rw(iter) == READ && iomi.pos >= dio->i_size) + if (!(iomi.flags & IOMAP_WRITE) && iomi.pos >= dio->i_size) iov_iter_revert(iter, iomi.pos - dio->i_size); if (ret == -EFAULT && dio->size && (dio_flags & IOMAP_DIO_PARTIAL)) { diff --git a/fs/jfs/inode.c b/fs/jfs/inode.c index 8ac10e396050..0d1f94ac9488 100644 --- a/fs/jfs/inode.c +++ b/fs/jfs/inode.c @@ -334,7 +334,7 @@ static ssize_t jfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) * In case of error extending write may have instantiated a few * blocks outside i_size. Trim these off again. */ - if (unlikely(iov_iter_rw(iter) == WRITE && ret < 0)) { + if (unlikely(iocb_is_write(iocb) && ret < 0)) { loff_t isize = i_size_read(inode); loff_t end = iocb->ki_pos + count; diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 1707f46b1335..d865945f2a63 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -133,7 +133,7 @@ int nfs_swap_rw(struct kiocb *iocb, struct iov_iter *iter) VM_BUG_ON(iov_iter_count(iter) != PAGE_SIZE); - if (iov_iter_rw(iter) == READ) + if (iocb_is_read(iocb)) ret = nfs_file_direct_read(iocb, iter, true); else ret = nfs_file_direct_write(iocb, iter, true); diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c index 232dd7b6cca1..496801507083 100644 --- a/fs/nilfs2/inode.c +++ b/fs/nilfs2/inode.c @@ -289,7 +289,7 @@ nilfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) { struct inode *inode = file_inode(iocb->ki_filp); - if (iov_iter_rw(iter) == WRITE) + if (iocb_is_write(iocb)) return 0; /* Needs synchronization with the cleaner */ diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c index 20b953871574..675be8d629fc 100644 --- a/fs/ntfs3/inode.c +++ b/fs/ntfs3/inode.c @@ -761,7 +761,7 @@ static ssize_t ntfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) struct ntfs_inode *ni = ntfs_i(inode); loff_t vbo = iocb->ki_pos; loff_t end; - int wr = iov_iter_rw(iter) & WRITE; + bool wr = iocb_is_write(iocb); size_t iter_count = iov_iter_count(iter); loff_t valid; ssize_t ret; diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c index 1d65f6ef00ca..b741068a0a7e 100644 --- a/fs/ocfs2/aops.c +++ b/fs/ocfs2/aops.c @@ -2441,7 +2441,7 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter) !ocfs2_supports_append_dio(osb)) return 0; - if (iov_iter_rw(iter) == READ) + if (iocb_is_read(iocb)) get_block = ocfs2_lock_get_block; else get_block = ocfs2_dio_wr_get_block; diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c index 4df560894386..ece65907ff83 100644 --- a/fs/orangefs/inode.c +++ b/fs/orangefs/inode.c @@ -521,7 +521,7 @@ static ssize_t orangefs_direct_IO(struct kiocb *iocb, */ struct file *file = iocb->ki_filp; loff_t pos = iocb->ki_pos; - enum ORANGEFS_io_type type = iov_iter_rw(iter) == WRITE ? + enum ORANGEFS_io_type type = iocb_is_write(iocb) ? ORANGEFS_IO_WRITE : ORANGEFS_IO_READ; loff_t *offset = &pos; struct inode *inode = file->f_mapping->host; diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c index c7d1fa526dea..0ed65feda193 100644 --- a/fs/reiserfs/inode.c +++ b/fs/reiserfs/inode.c @@ -3249,7 +3249,7 @@ static ssize_t reiserfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) * In case of error extending write may have instantiated a few * blocks outside i_size. Trim these off again. */ - if (unlikely(iov_iter_rw(iter) == WRITE && ret < 0)) { + if (unlikely(iocb_is_write(iocb) && ret < 0)) { loff_t isize = i_size_read(inode); loff_t end = iocb->ki_pos + count; diff --git a/fs/udf/inode.c b/fs/udf/inode.c index 1d7c2a812fc1..66a1b9e85cb2 100644 --- a/fs/udf/inode.c +++ b/fs/udf/inode.c @@ -219,7 +219,7 @@ static ssize_t udf_direct_IO(struct kiocb *iocb, struct iov_iter *iter) ssize_t ret; ret = blockdev_direct_IO(iocb, inode, iter, udf_get_block); - if (unlikely(ret < 0 && iov_iter_rw(iter) == WRITE)) + if (unlikely(ret < 0 && iocb_is_write(iocb))) udf_write_failed(mapping, iocb->ki_pos + count); return ret; } diff --git a/include/linux/fs.h b/include/linux/fs.h index 649ff061440e..6a488ae69f5d 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -353,6 +353,16 @@ static inline bool is_sync_kiocb(struct kiocb *kiocb) return kiocb->ki_complete == NULL; } +static inline bool iocb_is_write(const struct kiocb *kiocb) +{ + return kiocb->ki_flags & IOCB_WRITE; +} + +static inline bool iocb_is_read(const struct kiocb *kiocb) +{ + return !iocb_is_write(kiocb); +} + struct address_space_operations { int (*writepage)(struct page *page, struct writeback_control *wbc); int (*read_folio)(struct file *, struct folio *); diff --git a/include/linux/uio.h b/include/linux/uio.h index 9f158238edba..6f4dfa96324d 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -114,11 +114,6 @@ static inline bool iov_iter_is_xarray(const struct iov_iter *i) return iov_iter_type(i) == ITER_XARRAY; } -static inline unsigned char iov_iter_rw(const struct iov_iter *i) -{ - return i->data_source ? WRITE : READ; -} - static inline bool user_backed_iter(const struct iov_iter *i) { return i->user_backed; diff --git a/lib/iov_iter.c b/lib/iov_iter.c index f9a3ff37ecd1..68497d9c1452 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1429,6 +1429,11 @@ static struct page *first_bvec_segment(const struct iov_iter *i, return page; } +static unsigned char iov_iter_rw(const struct iov_iter *i) +{ + return i->data_source ? WRITE : READ; +} + static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, size_t maxsize, unsigned int maxpages, size_t *start, From patchwork Mon Jan 16 23:08:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44354 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1446993wrn; Mon, 16 Jan 2023 15:11:23 -0800 (PST) X-Google-Smtp-Source: AMrXdXvEd/CLpLqXGo6BZchPC499nd05wsdPpS7/T0NOv1vu2Qg/2r3QKi8/7p3BK9pCFt7bPXZI X-Received: by 2002:a17:906:9e91:b0:84d:373b:39da with SMTP id fx17-20020a1709069e9100b0084d373b39damr541439ejc.40.1673910683082; Mon, 16 Jan 2023 15:11:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673910683; cv=none; d=google.com; s=arc-20160816; b=wun65lZ9PR/GYMqcdbQTazf0ANKhCBQ4AUJqOHCG/0gfwopY84Bp/xUy00CtAsUiGh w7rVj7ItR04qcbtFff54o1tMjSsJTV4ehgOcDqa62r3Nv99BVNGvBJ9ksVo/Vz+Aqn1p Xd8QIqRmqomURx/eENl4thpY/dcJ0w/M7IcBg3YjIQ7lAjxLaZ7fAUOsF412BWQzXfRy iigHNCQJPeBEDdbysWS6KXx6I1DDUMOc8F8FxWXiRIbjlaqzDqocZe9uVf40xH2+/t4s kKasL+N4NQMhhi+npID6qLWGOuecpYuAdcqG3iXPdCCmshbUmDy8k+MI/yY6nMTmfFik z0IQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=tVennCGgg329YaWBH1MuYWHNl9yn102yVqT3Ftw70OY=; b=HrJ2O0mW7T5/kSvOfOozYYkfwhSf9gA4KANXxV/fyfoqUWyU86W3pxUTgvDrAAOLBv xaQOUETjblBMKP9dTn7KlVx8ZIL490alhW7OxMZp68a4PBgfkrjrT3ytthKscezYts1D 73yaYEY3hEGekXiNlcn6JaQBHDAdCL5pHD7FypKXVQIRh6FGlWujx4e2k/FLcUpNDwjz zEpeWo2lmMrUXsKGIYuRg45gUukh0tZrymfQO7MpJQLo2Kbu6shZDxeywZSP5k3bqF6b VJuO8CSXoj7QvfVEM9+Tl76Ogo0VNIbicETsQ+Bg2xHfFjHwKv2v1mEGWQ0hP+a/AzoB ZQIQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Ij4WrAtw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hp34-20020a1709073e2200b00870e0c1f5bbsi4449423ejc.159.2023.01.16.15.10.59; Mon, 16 Jan 2023 15:11:23 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Ij4WrAtw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233936AbjAPXKT (ORCPT + 99 others); Mon, 16 Jan 2023 18:10:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48544 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235195AbjAPXJW (ORCPT ); Mon, 16 Jan 2023 18:09:22 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 96CBF22DED for ; Mon, 16 Jan 2023 15:08:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910509; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tVennCGgg329YaWBH1MuYWHNl9yn102yVqT3Ftw70OY=; b=Ij4WrAtwAxauoZIzqsXmI83QU19sV1tK83cAXs9yBV1rpVR5MQIGxkOoO8+XKtoGOBOTxs NGUCAi2YzzQQAw8bFcx0d2LjTLk8vO8S60G+QOb0++6E+bPYYMJQzU4N+HAYOacR6ZRGWf EE+dCehomyYJcEUvuYpjlPR5qQrVQPo= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-642-9iYowuiSPQ2ft8O_1VfWHQ-1; Mon, 16 Jan 2023 18:08:26 -0500 X-MC-Unique: 9iYowuiSPQ2ft8O_1VfWHQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 161DB811E6E; Mon, 16 Jan 2023 23:08:26 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id AA8C040C6EC4; Mon, 16 Jan 2023 23:08:24 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 03/34] iov_iter: Pass I/O direction into iov_iter_get_pages*() From: David Howells To: Al Viro Cc: dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:08:24 +0000 Message-ID: <167391050409.2311931.7103784292954267373.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755222568448610124?= X-GMAIL-MSGID: =?utf-8?q?1755222568448610124?= Define FOLL_SOURCE_BUF and FOLL_DEST_BUF to indicate to get_user_pages*() and iov_iter_get_pages*() how the buffer is intended to be used in an I/O operation. Don't use READ and WRITE as a read I/O writes to memory and vice versa - which causes confusion. The direction is checked against the iterator's data_source. Signed-off-by: David Howells --- block/bio.c | 6 ++++++ block/blk-map.c | 2 ++ crypto/af_alg.c | 9 ++++++--- crypto/algif_hash.c | 3 ++- drivers/vhost/scsi.c | 9 ++++++--- fs/ceph/addr.c | 2 +- fs/ceph/file.c | 14 ++++++++------ fs/cifs/file.c | 8 ++++---- fs/cifs/misc.c | 3 ++- fs/direct-io.c | 6 ++++-- fs/fuse/dev.c | 3 ++- fs/fuse/file.c | 8 ++++---- fs/nfs/direct.c | 10 ++++++---- fs/splice.c | 3 ++- include/crypto/if_alg.h | 3 ++- include/linux/bio.h | 18 ++++++++++++++++-- include/linux/mm.h | 10 ++++++++++ lib/iov_iter.c | 14 +++++++------- net/9p/trans_virtio.c | 12 ++++++++---- net/core/datagram.c | 5 +++-- net/core/skmsg.c | 4 ++-- net/rds/message.c | 4 ++-- net/tls/tls_sw.c | 5 ++--- 23 files changed, 107 insertions(+), 54 deletions(-) diff --git a/block/bio.c b/block/bio.c index 5f96fcae3f75..867cf4db87ea 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1242,6 +1242,8 @@ static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page, * pages will have to be released using put_page() when done. * For multi-segment *iter, this function only adds pages from the * next non-empty segment of the iov iterator. + * + * The I/O direction is determined from the bio operation type. */ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) { @@ -1263,6 +1265,8 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2); pages += entries_left * (PAGE_PTRS_PER_BVEC - 1); + gup_flags |= bio_is_write(bio) ? FOLL_SOURCE_BUF : FOLL_DEST_BUF; + if (bio->bi_bdev && blk_queue_pci_p2pdma(bio->bi_bdev->bd_disk->queue)) gup_flags |= FOLL_PCI_P2PDMA; @@ -1332,6 +1336,8 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) * fit into the bio, or are requested in @iter, whatever is smaller. If * MM encounters an error pinning the requested pages, it stops. Error * is returned only if 0 pages could be pinned. + * + * The bio operation indicates the data direction. */ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) { diff --git a/block/blk-map.c b/block/blk-map.c index 08cbb7ff3b19..c30be529fb55 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -279,6 +279,8 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter, if (bio == NULL) return -ENOMEM; + gup_flags |= bio_is_write(bio) ? FOLL_SOURCE_BUF : FOLL_DEST_BUF; + if (blk_queue_pci_p2pdma(rq->q)) gup_flags |= FOLL_PCI_P2PDMA; diff --git a/crypto/af_alg.c b/crypto/af_alg.c index 0a4fa2a429e2..7a68db157fae 100644 --- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -531,13 +531,15 @@ static const struct net_proto_family alg_family = { .owner = THIS_MODULE, }; -int af_alg_make_sg(struct af_alg_sgl *sgl, struct iov_iter *iter, int len) +int af_alg_make_sg(struct af_alg_sgl *sgl, struct iov_iter *iter, int len, + unsigned int gup_flags) { size_t off; ssize_t n; int npages, i; - n = iov_iter_get_pages2(iter, sgl->pages, len, ALG_MAX_PAGES, &off); + n = iov_iter_get_pages(iter, sgl->pages, len, ALG_MAX_PAGES, &off, + gup_flags); if (n < 0) return n; @@ -1310,7 +1312,8 @@ int af_alg_get_rsgl(struct sock *sk, struct msghdr *msg, int flags, list_add_tail(&rsgl->list, &areq->rsgl_list); /* make one iovec available as scatterlist */ - err = af_alg_make_sg(&rsgl->sgl, &msg->msg_iter, seglen); + err = af_alg_make_sg(&rsgl->sgl, &msg->msg_iter, seglen, + FOLL_DEST_BUF); if (err < 0) { rsgl->sg_num_bytes = 0; return err; diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c index 1d017ec5c63c..fe3d2258145f 100644 --- a/crypto/algif_hash.c +++ b/crypto/algif_hash.c @@ -91,7 +91,8 @@ static int hash_sendmsg(struct socket *sock, struct msghdr *msg, if (len > limit) len = limit; - len = af_alg_make_sg(&ctx->sgl, &msg->msg_iter, len); + len = af_alg_make_sg(&ctx->sgl, &msg->msg_iter, len, + FOLL_SOURCE_BUF); if (len < 0) { err = copied ? 0 : len; goto unlock; diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index dca6346d75b3..5d10837d19ec 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -646,10 +646,13 @@ vhost_scsi_map_to_sgl(struct vhost_scsi_cmd *cmd, struct scatterlist *sg = sgl; ssize_t bytes; size_t offset; - unsigned int npages = 0; + unsigned int npages = 0, gup_flags = 0; - bytes = iov_iter_get_pages2(iter, pages, LONG_MAX, - VHOST_SCSI_PREALLOC_UPAGES, &offset); + gup_flags |= write ? FOLL_SOURCE_BUF : FOLL_DEST_BUF; + + bytes = iov_iter_get_pages(iter, pages, LONG_MAX, + VHOST_SCSI_PREALLOC_UPAGES, &offset, + gup_flags); /* No pages were pinned */ if (bytes <= 0) return bytes < 0 ? bytes : -EFAULT; diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 8c74871e37c9..cfc3353e5604 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -328,7 +328,7 @@ static void ceph_netfs_issue_read(struct netfs_io_subrequest *subreq) dout("%s: pos=%llu orig_len=%zu len=%llu\n", __func__, subreq->start, subreq->len, len); iov_iter_xarray(&iter, ITER_DEST, &rreq->mapping->i_pages, subreq->start, len); - err = iov_iter_get_pages_alloc2(&iter, &pages, len, &page_off); + err = iov_iter_get_pages_alloc(&iter, &pages, len, &page_off, FOLL_DEST_BUF); if (err < 0) { dout("%s: iov_ter_get_pages_alloc returned %d\n", __func__, err); goto out; diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 27c72a2f6af5..ffd36eeea186 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -81,7 +81,7 @@ static __le32 ceph_flags_sys2wire(u32 flags) #define ITER_GET_BVECS_PAGES 64 static ssize_t __iter_get_bvecs(struct iov_iter *iter, size_t maxsize, - struct bio_vec *bvecs) + struct bio_vec *bvecs, bool write) { size_t size = 0; int bvec_idx = 0; @@ -95,8 +95,9 @@ static ssize_t __iter_get_bvecs(struct iov_iter *iter, size_t maxsize, size_t start; int idx = 0; - bytes = iov_iter_get_pages2(iter, pages, maxsize - size, - ITER_GET_BVECS_PAGES, &start); + bytes = iov_iter_get_pages(iter, pages, maxsize - size, + ITER_GET_BVECS_PAGES, &start, + write ? FOLL_SOURCE_BUF : FOLL_DEST_BUF); if (bytes < 0) return size ?: bytes; @@ -127,7 +128,8 @@ static ssize_t __iter_get_bvecs(struct iov_iter *iter, size_t maxsize, * Return the number of bytes in the created bio_vec array, or an error. */ static ssize_t iter_get_bvecs_alloc(struct iov_iter *iter, size_t maxsize, - struct bio_vec **bvecs, int *num_bvecs) + struct bio_vec **bvecs, int *num_bvecs, + bool write) { struct bio_vec *bv; size_t orig_count = iov_iter_count(iter); @@ -146,7 +148,7 @@ static ssize_t iter_get_bvecs_alloc(struct iov_iter *iter, size_t maxsize, if (!bv) return -ENOMEM; - bytes = __iter_get_bvecs(iter, maxsize, bv); + bytes = __iter_get_bvecs(iter, maxsize, bv, write); if (bytes < 0) { /* * No pages were pinned -- just free the array. @@ -1334,7 +1336,7 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter, break; } - len = iter_get_bvecs_alloc(iter, size, &bvecs, &num_pages); + len = iter_get_bvecs_alloc(iter, size, &bvecs, &num_pages, write); if (len < 0) { ceph_osdc_put_request(req); ret = len; diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 22dfc1f8b4f1..d100b9cb8682 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -3290,8 +3290,8 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from, if (ctx->direct_io) { ssize_t result; - result = iov_iter_get_pages_alloc2( - from, &pagevec, cur_len, &start); + result = iov_iter_get_pages_alloc( + from, &pagevec, cur_len, &start, FOLL_SOURCE_BUF); if (result < 0) { cifs_dbg(VFS, "direct_writev couldn't get user pages (rc=%zd) iter type %d iov_offset %zd count %zd\n", @@ -4031,9 +4031,9 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file, if (ctx->direct_io) { ssize_t result; - result = iov_iter_get_pages_alloc2( + result = iov_iter_get_pages_alloc( &direct_iov, &pagevec, - cur_len, &start); + cur_len, &start, FOLL_DEST_BUF); if (result < 0) { cifs_dbg(VFS, "Couldn't get user pages (rc=%zd) iter type %d iov_offset %zd count %zd\n", diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c index 4d3c586785a5..9655cf359ab9 100644 --- a/fs/cifs/misc.c +++ b/fs/cifs/misc.c @@ -1030,7 +1030,8 @@ setup_aio_ctx_iter(struct cifs_aio_ctx *ctx, struct iov_iter *iter, int rw) saved_len = count; while (count && npages < max_pages) { - rc = iov_iter_get_pages2(iter, pages, count, max_pages, &start); + rc = iov_iter_get_pages(iter, pages, count, max_pages, &start, + rw == WRITE ? FOLL_SOURCE_BUF : FOLL_DEST_BUF); if (rc < 0) { cifs_dbg(VFS, "Couldn't get user pages (rc=%zd)\n", rc); break; diff --git a/fs/direct-io.c b/fs/direct-io.c index cf196f2a211e..b1e26a706e31 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -169,8 +169,10 @@ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio) const enum req_op dio_op = dio->opf & REQ_OP_MASK; ssize_t ret; - ret = iov_iter_get_pages2(sdio->iter, dio->pages, LONG_MAX, DIO_PAGES, - &sdio->from); + ret = iov_iter_get_pages(sdio->iter, dio->pages, LONG_MAX, DIO_PAGES, + &sdio->from, + op_is_write(dio_op) ? + FOLL_SOURCE_BUF : FOLL_DEST_BUF); if (ret < 0 && sdio->blocks_available && dio_op == REQ_OP_WRITE) { struct page *page = ZERO_PAGE(0); diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index e8b60ce72c9a..e3d8443e24a6 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -730,7 +730,8 @@ static int fuse_copy_fill(struct fuse_copy_state *cs) } } else { size_t off; - err = iov_iter_get_pages2(cs->iter, &page, PAGE_SIZE, 1, &off); + err = iov_iter_get_pages(cs->iter, &page, PAGE_SIZE, 1, &off, + cs->write ? FOLL_SOURCE_BUF : FOLL_DEST_BUF); if (err < 0) return err; BUG_ON(!err); diff --git a/fs/fuse/file.c b/fs/fuse/file.c index d68b45f8b3ae..68c196437306 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1414,10 +1414,10 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii, while (nbytes < *nbytesp && ap->num_pages < max_pages) { unsigned npages; size_t start; - ret = iov_iter_get_pages2(ii, &ap->pages[ap->num_pages], - *nbytesp - nbytes, - max_pages - ap->num_pages, - &start); + ret = iov_iter_get_pages(ii, &ap->pages[ap->num_pages], + *nbytesp - nbytes, + max_pages - ap->num_pages, + &start, write ? FOLL_SOURCE_BUF : FOLL_DEST_BUF); if (ret < 0) break; diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index d865945f2a63..42af84685f20 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -332,8 +332,9 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, size_t pgbase; unsigned npages, i; - result = iov_iter_get_pages_alloc2(iter, &pagevec, - rsize, &pgbase); + result = iov_iter_get_pages_alloc(iter, &pagevec, + rsize, &pgbase, + FOLL_DEST_BUF); if (result < 0) break; @@ -791,8 +792,9 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, size_t pgbase; unsigned npages, i; - result = iov_iter_get_pages_alloc2(iter, &pagevec, - wsize, &pgbase); + result = iov_iter_get_pages_alloc(iter, &pagevec, + wsize, &pgbase, + FOLL_SOURCE_BUF); if (result < 0) break; diff --git a/fs/splice.c b/fs/splice.c index 5969b7a1d353..19c5b5adc548 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1165,7 +1165,8 @@ static int iter_to_pipe(struct iov_iter *from, size_t start; int i, n; - left = iov_iter_get_pages2(from, pages, ~0UL, 16, &start); + left = iov_iter_get_pages(from, pages, ~0UL, 16, &start, + FOLL_SOURCE_BUF); if (left <= 0) { ret = left; break; diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h index a5db86670bdf..12058ab6cad9 100644 --- a/include/crypto/if_alg.h +++ b/include/crypto/if_alg.h @@ -165,7 +165,8 @@ int af_alg_release(struct socket *sock); void af_alg_release_parent(struct sock *sk); int af_alg_accept(struct sock *sk, struct socket *newsock, bool kern); -int af_alg_make_sg(struct af_alg_sgl *sgl, struct iov_iter *iter, int len); +int af_alg_make_sg(struct af_alg_sgl *sgl, struct iov_iter *iter, int len, + unsigned int gup_flags); void af_alg_free_sg(struct af_alg_sgl *sgl); static inline struct alg_sock *alg_sk(struct sock *sk) diff --git a/include/linux/bio.h b/include/linux/bio.h index 22078a28d7cb..3f7ba7fe48ac 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -40,11 +40,25 @@ static inline unsigned int bio_max_segs(unsigned int nr_segs) #define bio_sectors(bio) bvec_iter_sectors((bio)->bi_iter) #define bio_end_sector(bio) bvec_iter_end_sector((bio)->bi_iter) +/** + * bio_is_write - Query if the I/O direction is towards the disk + * @bio: The bio to query + * + * Return true if this is some sort of write operation - ie. the data is going + * towards the disk. + */ +static inline bool bio_is_write(const struct bio *bio) +{ + return op_is_write(bio_op(bio)); +} + /* * Return the data direction, READ or WRITE. */ -#define bio_data_dir(bio) \ - (op_is_write(bio_op(bio)) ? WRITE : READ) +static inline int bio_data_dir(const struct bio *bio) +{ + return bio_is_write(bio) ? WRITE : READ; +} /* * Check whether this bio carries any data or not. A NULL bio is allowed. diff --git a/include/linux/mm.h b/include/linux/mm.h index f3f196e4d66d..3af4ca8b1fe7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3090,6 +3090,10 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, #define FOLL_PCI_P2PDMA 0x100000 /* allow returning PCI P2PDMA pages */ #define FOLL_INTERRUPTIBLE 0x200000 /* allow interrupts from generic signals */ +#define FOLL_SOURCE_BUF 0 /* Memory will be read from by I/O */ +#define FOLL_DEST_BUF FOLL_WRITE /* Memory will be written to by I/O */ +#define FOLL_BUF_MASK FOLL_WRITE + /* * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each * other. Here is what they mean, and how to use them: @@ -3143,6 +3147,12 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, * releasing pages: get_user_pages*() pages must be released via put_page(), * while pin_user_pages*() pages must be released via unpin_user_page(). * + * FOLL_SOURCE_BUF and FOLL_DEST_BUF are indicators to get_user_pages*() and + * iov_iter_*_pages*() as to how the pages obtained are going to be used. + * FOLL_SOURCE_BUF indicates that I/O op is going to transfer from memory to + * device; FOLL_DEST_BUF that the op is going to transfer from device to + * memory. + * * Please see Documentation/core-api/pin_user_pages.rst for more information. */ diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 68497d9c1452..f53583836009 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1429,11 +1429,6 @@ static struct page *first_bvec_segment(const struct iov_iter *i, return page; } -static unsigned char iov_iter_rw(const struct iov_iter *i) -{ - return i->data_source ? WRITE : READ; -} - static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, size_t maxsize, unsigned int maxpages, size_t *start, @@ -1448,12 +1443,17 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i, if (maxsize > MAX_RW_COUNT) maxsize = MAX_RW_COUNT; + if (WARN_ON_ONCE((gup_flags & FOLL_BUF_MASK) == FOLL_SOURCE_BUF && + i->data_source == ITER_DEST)) + return -EIO; + if (WARN_ON_ONCE((gup_flags & FOLL_BUF_MASK) == FOLL_DEST_BUF && + i->data_source == ITER_SOURCE)) + return -EIO; + if (likely(user_backed_iter(i))) { unsigned long addr; int res; - if (iov_iter_rw(i) != WRITE) - gup_flags |= FOLL_WRITE; if (i->nofault) gup_flags |= FOLL_NOFAULT; diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c index 3c27ffb781e3..eb28b54fe5f6 100644 --- a/net/9p/trans_virtio.c +++ b/net/9p/trans_virtio.c @@ -310,7 +310,8 @@ static int p9_get_mapped_pages(struct virtio_chan *chan, struct iov_iter *data, int count, size_t *offs, - int *need_drop) + int *need_drop, + unsigned int gup_flags) { int nr_pages; int err; @@ -330,7 +331,8 @@ static int p9_get_mapped_pages(struct virtio_chan *chan, if (err == -ERESTARTSYS) return err; } - n = iov_iter_get_pages_alloc2(data, pages, count, offs); + n = iov_iter_get_pages_alloc(data, pages, count, offs, + gup_flags); if (n < 0) return n; *need_drop = 1; @@ -437,7 +439,8 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req, if (uodata) { __le32 sz; int n = p9_get_mapped_pages(chan, &out_pages, uodata, - outlen, &offs, &need_drop); + outlen, &offs, &need_drop, + FOLL_DEST_BUF); if (n < 0) { err = n; goto err_out; @@ -456,7 +459,8 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req, memcpy(&req->tc.sdata[0], &sz, sizeof(sz)); } else if (uidata) { int n = p9_get_mapped_pages(chan, &in_pages, uidata, - inlen, &offs, &need_drop); + inlen, &offs, &need_drop, + FOLL_SOURCE_BUF); if (n < 0) { err = n; goto err_out; diff --git a/net/core/datagram.c b/net/core/datagram.c index e4ff2db40c98..9f0914b781ad 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -632,8 +632,9 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, if (frag == MAX_SKB_FRAGS) return -EMSGSIZE; - copied = iov_iter_get_pages2(from, pages, length, - MAX_SKB_FRAGS - frag, &start); + copied = iov_iter_get_pages(from, pages, length, + MAX_SKB_FRAGS - frag, &start, + FOLL_SOURCE_BUF); if (copied < 0) return -EFAULT; diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 53d0251788aa..f63a13690712 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -324,8 +324,8 @@ int sk_msg_zerocopy_from_iter(struct sock *sk, struct iov_iter *from, goto out; } - copied = iov_iter_get_pages2(from, pages, bytes, maxpages, - &offset); + copied = iov_iter_get_pages(from, pages, bytes, maxpages, + &offset, FOLL_SOURCE_BUF); if (copied <= 0) { ret = -EFAULT; goto out; diff --git a/net/rds/message.c b/net/rds/message.c index b47e4f0a1639..fcfd406b97af 100644 --- a/net/rds/message.c +++ b/net/rds/message.c @@ -390,8 +390,8 @@ static int rds_message_zcopy_from_user(struct rds_message *rm, struct iov_iter * size_t start; ssize_t copied; - copied = iov_iter_get_pages2(from, &pages, PAGE_SIZE, - 1, &start); + copied = iov_iter_get_pages(from, &pages, PAGE_SIZE, + 1, &start, FOLL_SOURCE_BUF); if (copied < 0) { struct mmpin *mmp; int i; diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 9ed978634125..59acaeb24f54 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1354,9 +1354,8 @@ static int tls_setup_from_iter(struct iov_iter *from, rc = -EFAULT; goto out; } - copied = iov_iter_get_pages2(from, pages, - length, - maxpages, &offset); + copied = iov_iter_get_pages(from, pages, length, + maxpages, &offset, FOLL_DEST_BUF); if (copied <= 0) { rc = -EFAULT; goto out; From patchwork Mon Jan 16 23:08:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44356 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1447208wrn; Mon, 16 Jan 2023 15:12:03 -0800 (PST) X-Google-Smtp-Source: AMrXdXtjiL3LDx0eELhV5kYZQg4k1pN9xsKQr9FGXCTMWC6+K3CPfQ900qTvtbCGaNKCfTK2GFBH X-Received: by 2002:a17:906:edac:b0:844:1d1d:f7 with SMTP id sa12-20020a170906edac00b008441d1d00f7mr666665ejb.23.1673910723493; Mon, 16 Jan 2023 15:12:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673910723; cv=none; d=google.com; s=arc-20160816; b=Hpck/m+dCCHfg8gTnQgnqnWuHvxQkL7Xbp3qd17XdSCE//1S2Cyyw6/nKl7QlhdKaS WfwHtrnpZnHu+l26qzL0mwUsIR448JM/0mFyjqJNh4QdENcbc+g8DoWVl9xvlNhLaWQT 3tAYJ+H/+BCEvmcMR/l+rBcOeHl2mN9NCpnhYpxqZ91dBfmo1b0PhSBE5s1AD9fv1qQy KCBvd5FHotXeR6yIr5Ec3ANWfsS3bRqswE1UBUkFyMPW8A6TgG4VeWoi1jJc0SMPMaiG DmbdxguYoDgTaQoWh+rJ4kHPSLGi3ywEzqFtrNMHdoirokLQXLi4dYXPlePzMVLE476T b0Dg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=4dAIPKunUAzbkWNJ87wOuiKX6GyF+tsszNSftx0o+4s=; b=gUY8VEeMievshyY1EZB+fBMVhN+nBgAPZj4AatcA58Y4jQ/p/JNc7hNfj7iGMcnBP3 VcfCXj/5V9W/c3VPxxVIPbEK+9ypsNqivC3JMAVAqDoITkf0CkACkHRhsdFMGZXeBxvw Us6tqneyLbzXCntK2CVixIlWRinIpd6rEsSdwXPRVFtHHL99/+ihV71ESDsGZKSrnGax Qr/aet9eZ6azDtyji0sZ9AfMt8olZMeQRoWDOAevhoD7tuBTgyg5jPkjSguLnI/8NS6U oR/rBVkZpStvF5LMBx/ZQ0M2JCwd9JQWJCfLO6EacOFEsL/cTfxKueqbGztvrXkFulif UFJQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HFiUfl90; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id cw18-20020a170906479200b008641d4f0500si18231856ejc.301.2023.01.16.15.11.40; Mon, 16 Jan 2023 15:12:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HFiUfl90; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234743AbjAPXK3 (ORCPT + 99 others); Mon, 16 Jan 2023 18:10:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48572 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234921AbjAPXJ1 (ORCPT ); Mon, 16 Jan 2023 18:09:27 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7EB4C15578 for ; Mon, 16 Jan 2023 15:08:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910519; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4dAIPKunUAzbkWNJ87wOuiKX6GyF+tsszNSftx0o+4s=; b=HFiUfl90+bD2R3jW8acQ4gDPk4NcjhUSJeqkXwvCbVv9vL48IfDFTDtwwnaa2UENxqGG/v 21SiwbYZfMwEJ9SzH96Nz0uiila21spRxGnO6iFSkzOVBuoP5nICwRfk/3HS8VEOK39K4n t/oKmvdgVUonnQlwYwmyCM3Ynr2HOmM= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-39-CDItCd_iNCaYGYyZBePHvw-1; Mon, 16 Jan 2023 18:08:33 -0500 X-MC-Unique: CDItCd_iNCaYGYyZBePHvw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id ECD041C0432A; Mon, 16 Jan 2023 23:08:32 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id BE10E2166B26; Mon, 16 Jan 2023 23:08:31 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 04/34] iov_iter: Remove iov_iter_get_pages2/pages_alloc2() From: David Howells To: Al Viro Cc: dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:08:31 +0000 Message-ID: <167391051122.2311931.14824492646435673046.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755222610623743999?= X-GMAIL-MSGID: =?utf-8?q?1755222610623743999?= There are now no users of iov_iter_get_pages2() and iov_iter_get_pages_alloc2(), so remove them. Signed-off-by: David Howells --- include/linux/uio.h | 4 ---- lib/iov_iter.c | 14 -------------- 2 files changed, 18 deletions(-) diff --git a/include/linux/uio.h b/include/linux/uio.h index 6f4dfa96324d..365e26c405f2 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -248,13 +248,9 @@ void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray * ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, size_t *start, unsigned gup_flags); -ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages, - size_t maxsize, unsigned maxpages, size_t *start); ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, size_t maxsize, size_t *start, unsigned gup_flags); -ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, struct page ***pages, - size_t maxsize, size_t *start); int iov_iter_npages(const struct iov_iter *i, int maxpages); void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state); diff --git a/lib/iov_iter.c b/lib/iov_iter.c index f53583836009..ca89ffa9d6e1 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1511,13 +1511,6 @@ ssize_t iov_iter_get_pages(struct iov_iter *i, } EXPORT_SYMBOL_GPL(iov_iter_get_pages); -ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages, - size_t maxsize, unsigned maxpages, size_t *start) -{ - return iov_iter_get_pages(i, pages, maxsize, maxpages, start, 0); -} -EXPORT_SYMBOL(iov_iter_get_pages2); - ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, size_t maxsize, size_t *start, unsigned gup_flags) @@ -1536,13 +1529,6 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, } EXPORT_SYMBOL_GPL(iov_iter_get_pages_alloc); -ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, - struct page ***pages, size_t maxsize, size_t *start) -{ - return iov_iter_get_pages_alloc(i, pages, maxsize, start, 0); -} -EXPORT_SYMBOL(iov_iter_get_pages_alloc2); - size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i) { From patchwork Mon Jan 16 23:08:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44355 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1447136wrn; Mon, 16 Jan 2023 15:11:50 -0800 (PST) X-Google-Smtp-Source: AMrXdXsXAKJSoNs/y+I2TL+DtBT7Ku+fLpf1kWhe1by3y/ZOkDI9nT8hkN4UJ+7zXjGkrq8niKgP X-Received: by 2002:a05:6402:174d:b0:49e:4b8:f6ab with SMTP id v13-20020a056402174d00b0049e04b8f6abmr926392edx.8.1673910709878; Mon, 16 Jan 2023 15:11:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673910709; cv=none; d=google.com; s=arc-20160816; b=L0Nxv0QLUsL+h42nQgrp0HoDCWXySGCx5UlbCFB0DCHbxHslQ8EDJ6B1jafHmT//ou stdFZPomE9F8Ld3kZXQ43+V73uvq/pUwOuPeQberGvhDAmTNq85u8/pvo/PN9vA7ycgj FAxblBx5VmIL0vMfEAQJ83CqTXwCQZrmYGo+tcPMT+Wd2BkzbhqAw4IVfZZasepxFp8r gndizNx9i8kbxEZL/HqJHBkA7wJ3E7ImgrRESXiZPPxn9kSuV3FpqBPdgYmiMu2gF0Sn 4fEStnrwYzVYOwSNx+rYQs10Q18P4sDSf0UB4yJghQbY2dOJE4c+98FQzpJfPqvhwYBr OYnQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=AXGRyxrMRjzyNrjh13JZHEue2NHbrOxKU/9FUI9urUA=; b=vWkSDauifgtcH3BJDaWpWibnZujEkpPSKia/1ZHq4VIcRckoRHVyGrodrS6BI0q+JU 8ps/RpolFQtM7hULuYnJN7LMJrDH3tKNEpfZy3ZnCh21MmckNX6YF8vAsILV5ZVAQc2k owqJI6CvBn4FtFKBYH9aPfQT1LFm/4/M/MS0pDgV8yFR+X6DkaTyD+5OyLkLREfWBQvQ 8iZ8gqUnnSeDVUjRBmjZbQzdwghrrPv22FUYCzjdX7DiOEW/ZRvPknfVWJO9U0Cf4O9r n/Xm6J0REJgHywk0M49cMZjJChvo/c9L7buu7FxUSL3+SEsbW5Q+gX611ehtDoCLOaMb MxIA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="ZC7InEO/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gt20-20020a1709072d9400b00866e7c0ae3esi16440949ejc.827.2023.01.16.15.11.25; Mon, 16 Jan 2023 15:11:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="ZC7InEO/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232684AbjAPXKZ (ORCPT + 99 others); Mon, 16 Jan 2023 18:10:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48598 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235209AbjAPXJf (ORCPT ); Mon, 16 Jan 2023 18:09:35 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 581C522DF6 for ; Mon, 16 Jan 2023 15:08:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910523; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AXGRyxrMRjzyNrjh13JZHEue2NHbrOxKU/9FUI9urUA=; b=ZC7InEO/20UsqM0wBy8uaRx85t+cfUfrCHKKj6O1jmLnveFzqho8dHi7n1gQi+Isk/grRx lY0oetTprlRoXGsgwd16e5iQ5+3l9GV7rBG2R5R9TsshNy5DBt7MB1ynIja9902MGOHPtI 56IjfJBhy+rP7LBTq/6+lK7fH0rDQ4U= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-661-49nb9eXwM3SfZlrM7W4EnA-1; Mon, 16 Jan 2023 18:08:40 -0500 X-MC-Unique: 49nb9eXwM3SfZlrM7W4EnA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CE2282A59556; Mon, 16 Jan 2023 23:08:39 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id A01352166B26; Mon, 16 Jan 2023 23:08:38 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 05/34] iov_iter: Change the direction macros into an enum From: David Howells To: Al Viro Cc: dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:08:38 +0000 Message-ID: <167391051810.2311931.8545361041888737395.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755222596896657174?= X-GMAIL-MSGID: =?utf-8?q?1755222596896657174?= Change the ITER_SOURCE and ITER_DEST direction macros into an enum and provide three new helper functions: iov_iter_dir() - returns the iterator direction iov_iter_is_dest() - returns true if it's an ITER_DEST iterator iov_iter_is_source() - returns true if it's an ITER_SOURCE iterator Signed-off-by: David Howells cc: Al Viro Link: https://lore.kernel.org/r/167305161763.1521586.6593798818336440133.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/167344726413.2425628.317218805692680763.stgit@warthog.procyon.org.uk/ # v5 --- include/linux/uio.h | 30 ++++++++++++++++++++++++++---- 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/include/linux/uio.h b/include/linux/uio.h index 365e26c405f2..8d0dabfcb2fe 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -29,8 +29,10 @@ enum iter_type { ITER_UBUF, }; -#define ITER_SOURCE 1 // == WRITE -#define ITER_DEST 0 // == READ +enum iter_dir { + ITER_DEST = 0, // == READ + ITER_SOURCE = 1, // == WRITE +} __mode(byte); struct iov_iter_state { size_t iov_offset; @@ -39,9 +41,9 @@ struct iov_iter_state { }; struct iov_iter { - u8 iter_type; + enum iter_type iter_type __mode(byte); bool nofault; - bool data_source; + enum iter_dir data_source; bool user_backed; union { size_t iov_offset; @@ -114,6 +116,26 @@ static inline bool iov_iter_is_xarray(const struct iov_iter *i) return iov_iter_type(i) == ITER_XARRAY; } +static inline enum iter_dir iov_iter_dir(const struct iov_iter *i) +{ + return i->data_source; +} + +static inline bool iov_iter_is_source(const struct iov_iter *i) +{ + return iov_iter_dir(i) == ITER_SOURCE; /* ie. WRITE */ +} + +static inline bool iov_iter_is_dest(const struct iov_iter *i) +{ + return iov_iter_dir(i) == ITER_DEST; /* ie. READ */ +} + +static inline bool iov_iter_dir_valid(enum iter_dir direction) +{ + return direction == ITER_DEST || direction == ITER_SOURCE; +} + static inline bool user_backed_iter(const struct iov_iter *i) { return i->user_backed; From patchwork Mon Jan 16 23:08:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44357 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1447257wrn; Mon, 16 Jan 2023 15:12:10 -0800 (PST) X-Google-Smtp-Source: AMrXdXse+n7Ns2xg6spS37HAT9hogaaNi2bu4LKXJd7hLTFIxNJYoMtGbxSqSCtVHjKseFkJssZf X-Received: by 2002:a17:906:4a85:b0:84d:2f09:661 with SMTP id x5-20020a1709064a8500b0084d2f090661mr741798eju.1.1673910730633; Mon, 16 Jan 2023 15:12:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673910730; cv=none; d=google.com; s=arc-20160816; b=PkdFJIYzSC4yv8fQloB0x2MNOhzFF0gCg7Sc2QHoh7xzPb/YxJfQQcozSXoeIGnTEh kjGmC+RMt/hs/tn+32OqAbu5sTPxMQrJVUctVjshLcqLKgyArKB25p4uzKWTdyIMEM4J zMOkbIvXn3wF/OG3WfOYNXQE9d8dVEsruqQMpwWacLGabZuAT1G0QOoQ96qBgW4UbQPV eHTjIYfGBozfmibl7SfNGRQkcmmRckrtyyhOONk4D4MU9FIjGOLmy+8eHjE4cR1RYay9 5/6V3zDiAvu+6E2OHUCGoyyK8jz7ujn6PnXoA5f2ssKKvSOFUJM4zAKPIfnin6gvZ1dT eUwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=9V/eGmZYnC6MiNi85mIVw3mHkZILPya3zFOC5f4iszc=; b=BJb0ur8sKrXSHccBlasKT4Q5H/Fgny0veYpQ+w5xRR3aXjk8Vl5oKl2v0ooQpsytdf 6WUnjh19+cTOeZKSD/Nux6u+oS2nN4WPFZf+YFAeNQMDCaCd5ztKzyvAXsAKz93sKnTx FX7oQYgS6oEMa1o6fIHigUaPbq7Ncll34/QzXJJPGabVL6qg4+dQ7QAgXLvaz0TpGuZe k9kuQiGmp55Qfc5Mdb6HG0UeP5msHmwMaLpcI6uNONBuSrW7BS12A8tG9Dx6DMlRLnZc fZk1HwZY1pLNFsoEwwHts/nTFuXkWoGrhCHIRqyM5/Wnp/0xvao0/rhuSchFaUxu7m/F R6oQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=I4wXJvIb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id nc22-20020a1709071c1600b008700e354518si5545823ejc.40.2023.01.16.15.11.47; Mon, 16 Jan 2023 15:12:10 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=I4wXJvIb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235316AbjAPXKx (ORCPT + 99 others); Mon, 16 Jan 2023 18:10:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48626 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235042AbjAPXJn (ORCPT ); Mon, 16 Jan 2023 18:09:43 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A648F2384B for ; Mon, 16 Jan 2023 15:08:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910530; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9V/eGmZYnC6MiNi85mIVw3mHkZILPya3zFOC5f4iszc=; b=I4wXJvIbfbGeMfwsTOBdKXnM6Lv670IJG4IlUuSvP8gnwkzGNsxExb3ePquPSU4zllVybh Z+5sCZZ6jMI+23u3586b41/iuXjqeUCjNPVIGJ0WhUCgdn5qhF7QYt0CZFH21rOL2j9yI4 FGKVK3Cn/jDm7CcM5ZohUD1bt3UWcTk= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-350-v6XTz86ZOaSWSCziUJ045Q-1; Mon, 16 Jan 2023 18:08:47 -0500 X-MC-Unique: v6XTz86ZOaSWSCziUJ045Q-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E42B12806046; Mon, 16 Jan 2023 23:08:46 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 82A6B40C6EC4; Mon, 16 Jan 2023 23:08:45 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 06/34] iov_iter: Use the direction in the iterator functions From: David Howells To: Al Viro Cc: dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:08:44 +0000 Message-ID: <167391052497.2311931.9463379582932734164.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755222618249071451?= X-GMAIL-MSGID: =?utf-8?q?1755222618249071451?= Use the direction in the iterator functions rather than READ/WRITE. Add a check into __iov_iter_get_pages_alloc() that the supplied FOLL_SOURCE/DEST_BUF gup_flag matches the ITER_SOURCE/DEST flag on the iterator. Changes ======= ver #6) - Add a check on FOLL_SOURCE/DEST_BUF into __iov_iter_get_pages_alloc() Signed-off-by: David Howells cc: Al Viro Link: https://lore.kernel.org/r/167305162465.1521586.18077838937455153675.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/167344727112.2425628.995771894170560721.stgit@warthog.procyon.org.uk/ # v5 --- include/linux/uio.h | 22 +-- lib/iov_iter.c | 409 ++++++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 396 insertions(+), 35 deletions(-) diff --git a/include/linux/uio.h b/include/linux/uio.h index 8d0dabfcb2fe..18b64068cc6d 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -256,16 +256,16 @@ bool iov_iter_is_aligned(const struct iov_iter *i, unsigned addr_mask, unsigned len_mask); unsigned long iov_iter_alignment(const struct iov_iter *i); unsigned long iov_iter_gap_alignment(const struct iov_iter *i); -void iov_iter_init(struct iov_iter *i, unsigned int direction, const struct iovec *iov, +void iov_iter_init(struct iov_iter *i, enum iter_dir direction, const struct iovec *iov, unsigned long nr_segs, size_t count); -void iov_iter_kvec(struct iov_iter *i, unsigned int direction, const struct kvec *kvec, +void iov_iter_kvec(struct iov_iter *i, enum iter_dir direction, const struct kvec *kvec, unsigned long nr_segs, size_t count); -void iov_iter_bvec(struct iov_iter *i, unsigned int direction, const struct bio_vec *bvec, +void iov_iter_bvec(struct iov_iter *i, enum iter_dir direction, const struct bio_vec *bvec, unsigned long nr_segs, size_t count); -void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct pipe_inode_info *pipe, +void iov_iter_pipe(struct iov_iter *i, enum iter_dir direction, struct pipe_inode_info *pipe, size_t count); -void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count); -void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray *xarray, +void iov_iter_discard(struct iov_iter *i, enum iter_dir direction, size_t count); +void iov_iter_xarray(struct iov_iter *i, enum iter_dir direction, struct xarray *xarray, loff_t start, size_t count); ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, size_t *start, @@ -351,19 +351,19 @@ size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp, struct iovec *iovec_from_user(const struct iovec __user *uvector, unsigned long nr_segs, unsigned long fast_segs, struct iovec *fast_iov, bool compat); -ssize_t import_iovec(int type, const struct iovec __user *uvec, +ssize_t import_iovec(enum iter_dir direction, const struct iovec __user *uvec, unsigned nr_segs, unsigned fast_segs, struct iovec **iovp, struct iov_iter *i); -ssize_t __import_iovec(int type, const struct iovec __user *uvec, +ssize_t __import_iovec(enum iter_dir direction, const struct iovec __user *uvec, unsigned nr_segs, unsigned fast_segs, struct iovec **iovp, struct iov_iter *i, bool compat); -int import_single_range(int type, void __user *buf, size_t len, +int import_single_range(enum iter_dir direction, void __user *buf, size_t len, struct iovec *iov, struct iov_iter *i); -static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction, +static inline void iov_iter_ubuf(struct iov_iter *i, enum iter_dir direction, void __user *buf, size_t count) { - WARN_ON(direction & ~(READ | WRITE)); + WARN_ON(!iov_iter_dir_valid(direction)); *i = (struct iov_iter) { .iter_type = ITER_UBUF, .user_backed = true, diff --git a/lib/iov_iter.c b/lib/iov_iter.c index ca89ffa9d6e1..6436438bf46b 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -421,11 +421,11 @@ size_t fault_in_iov_iter_writeable(const struct iov_iter *i, size_t size) } EXPORT_SYMBOL(fault_in_iov_iter_writeable); -void iov_iter_init(struct iov_iter *i, unsigned int direction, +void iov_iter_init(struct iov_iter *i, enum iter_dir direction, const struct iovec *iov, unsigned long nr_segs, size_t count) { - WARN_ON(direction & ~(READ | WRITE)); + WARN_ON(!iov_iter_dir_valid(direction)); *i = (struct iov_iter) { .iter_type = ITER_IOVEC, .nofault = false, @@ -994,11 +994,11 @@ size_t iov_iter_single_seg_count(const struct iov_iter *i) } EXPORT_SYMBOL(iov_iter_single_seg_count); -void iov_iter_kvec(struct iov_iter *i, unsigned int direction, +void iov_iter_kvec(struct iov_iter *i, enum iter_dir direction, const struct kvec *kvec, unsigned long nr_segs, size_t count) { - WARN_ON(direction & ~(READ | WRITE)); + WARN_ON(!iov_iter_dir_valid(direction)); *i = (struct iov_iter){ .iter_type = ITER_KVEC, .data_source = direction, @@ -1010,11 +1010,11 @@ void iov_iter_kvec(struct iov_iter *i, unsigned int direction, } EXPORT_SYMBOL(iov_iter_kvec); -void iov_iter_bvec(struct iov_iter *i, unsigned int direction, +void iov_iter_bvec(struct iov_iter *i, enum iter_dir direction, const struct bio_vec *bvec, unsigned long nr_segs, size_t count) { - WARN_ON(direction & ~(READ | WRITE)); + WARN_ON(!iov_iter_dir_valid(direction)); *i = (struct iov_iter){ .iter_type = ITER_BVEC, .data_source = direction, @@ -1026,15 +1026,15 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction, } EXPORT_SYMBOL(iov_iter_bvec); -void iov_iter_pipe(struct iov_iter *i, unsigned int direction, +void iov_iter_pipe(struct iov_iter *i, enum iter_dir direction, struct pipe_inode_info *pipe, size_t count) { - BUG_ON(direction != READ); + BUG_ON(direction != ITER_DEST); WARN_ON(pipe_full(pipe->head, pipe->tail, pipe->ring_size)); *i = (struct iov_iter){ .iter_type = ITER_PIPE, - .data_source = false, + .data_source = ITER_DEST, .pipe = pipe, .head = pipe->head, .start_head = pipe->head, @@ -1057,10 +1057,10 @@ EXPORT_SYMBOL(iov_iter_pipe); * from evaporation, either by taking a ref on them or locking them by the * caller. */ -void iov_iter_xarray(struct iov_iter *i, unsigned int direction, +void iov_iter_xarray(struct iov_iter *i, enum iter_dir direction, struct xarray *xarray, loff_t start, size_t count) { - BUG_ON(direction & ~1); + WARN_ON(!iov_iter_dir_valid(direction)); *i = (struct iov_iter) { .iter_type = ITER_XARRAY, .data_source = direction, @@ -1079,14 +1079,14 @@ EXPORT_SYMBOL(iov_iter_xarray); * @count: The size of the I/O buffer in bytes. * * Set up an I/O iterator that just discards everything that's written to it. - * It's only available as a READ iterator. + * It's only available as a destination iterator. */ -void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count) +void iov_iter_discard(struct iov_iter *i, enum iter_dir direction, size_t count) { - BUG_ON(direction != READ); + BUG_ON(direction != ITER_DEST); *i = (struct iov_iter){ .iter_type = ITER_DISCARD, - .data_source = false, + .data_source = ITER_DEST, .count = count, .iov_offset = 0 }; @@ -1444,10 +1444,10 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i, maxsize = MAX_RW_COUNT; if (WARN_ON_ONCE((gup_flags & FOLL_BUF_MASK) == FOLL_SOURCE_BUF && - i->data_source == ITER_DEST)) + iov_iter_is_dest(i))) return -EIO; if (WARN_ON_ONCE((gup_flags & FOLL_BUF_MASK) == FOLL_DEST_BUF && - i->data_source == ITER_SOURCE)) + iov_iter_is_source(i))) return -EIO; if (likely(user_backed_iter(i))) { @@ -1775,7 +1775,7 @@ struct iovec *iovec_from_user(const struct iovec __user *uvec, return iov; } -ssize_t __import_iovec(int type, const struct iovec __user *uvec, +ssize_t __import_iovec(enum iter_dir direction, const struct iovec __user *uvec, unsigned nr_segs, unsigned fast_segs, struct iovec **iovp, struct iov_iter *i, bool compat) { @@ -1814,7 +1814,7 @@ ssize_t __import_iovec(int type, const struct iovec __user *uvec, total_len += len; } - iov_iter_init(i, type, iov, nr_segs, total_len); + iov_iter_init(i, direction, iov, nr_segs, total_len); if (iov == *iovp) *iovp = NULL; else @@ -1827,7 +1827,7 @@ ssize_t __import_iovec(int type, const struct iovec __user *uvec, * into the kernel, check that it is valid, and initialize a new * &struct iov_iter iterator to access it. * - * @type: One of %READ or %WRITE. + * @direction: One of %ITER_SOURCE or %ITER_DEST. * @uvec: Pointer to the userspace array. * @nr_segs: Number of elements in userspace array. * @fast_segs: Number of elements in @iov. @@ -1844,16 +1844,16 @@ ssize_t __import_iovec(int type, const struct iovec __user *uvec, * * Return: Negative error code on error, bytes imported on success */ -ssize_t import_iovec(int type, const struct iovec __user *uvec, +ssize_t import_iovec(enum iter_dir direction, const struct iovec __user *uvec, unsigned nr_segs, unsigned fast_segs, struct iovec **iovp, struct iov_iter *i) { - return __import_iovec(type, uvec, nr_segs, fast_segs, iovp, i, + return __import_iovec(direction, uvec, nr_segs, fast_segs, iovp, i, in_compat_syscall()); } EXPORT_SYMBOL(import_iovec); -int import_single_range(int rw, void __user *buf, size_t len, +int import_single_range(enum iter_dir direction, void __user *buf, size_t len, struct iovec *iov, struct iov_iter *i) { if (len > MAX_RW_COUNT) @@ -1863,7 +1863,7 @@ int import_single_range(int rw, void __user *buf, size_t len, iov->iov_base = buf; iov->iov_len = len; - iov_iter_init(i, rw, iov, 1, len); + iov_iter_init(i, direction, iov, 1, len); return 0; } EXPORT_SYMBOL(import_single_range); @@ -1905,3 +1905,364 @@ void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state) i->iov -= state->nr_segs - i->nr_segs; i->nr_segs = state->nr_segs; } + +/* + * Extract a list of contiguous pages from an ITER_PIPE iterator. This does + * not get references of its own on the pages, nor does it get a pin on them. + * If there's a partial page, it adds that first and will then allocate and add + * pages into the pipe to make up the buffer space to the amount required. + * + * The caller must hold the pipe locked and only transferring into a pipe is + * supported. + */ +static ssize_t iov_iter_extract_pipe_pages(struct iov_iter *i, + struct page ***pages, size_t maxsize, + unsigned int maxpages, + unsigned int gup_flags, + size_t *offset0) +{ + unsigned int nr, offset, chunk, j; + struct page **p; + size_t left; + + if (!sanity(i)) + return -EFAULT; + + offset = pipe_npages(i, &nr); + if (!nr) + return -EFAULT; + *offset0 = offset; + + maxpages = min_t(size_t, nr, maxpages); + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + p = *pages; + + left = maxsize; + for (j = 0; j < maxpages; j++) { + struct page *page = append_pipe(i, left, &offset); + if (!page) + break; + chunk = min_t(size_t, left, PAGE_SIZE - offset); + left -= chunk; + *p++ = page; + } + if (!j) + return -EFAULT; + return maxsize - left; +} + +/* + * Extract a list of contiguous pages from an ITER_XARRAY iterator. This does not + * get references on the pages, nor does it get a pin on them. + */ +static ssize_t iov_iter_extract_xarray_pages(struct iov_iter *i, + struct page ***pages, size_t maxsize, + unsigned int maxpages, + unsigned int gup_flags, + size_t *offset0) +{ + struct page *page, **p; + unsigned int nr = 0, offset; + loff_t pos = i->xarray_start + i->iov_offset; + pgoff_t index = pos >> PAGE_SHIFT; + XA_STATE(xas, i->xarray, index); + + offset = pos & ~PAGE_MASK; + *offset0 = offset; + + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + p = *pages; + + rcu_read_lock(); + for (page = xas_load(&xas); page; page = xas_next(&xas)) { + if (xas_retry(&xas, page)) + continue; + + /* Has the page moved or been split? */ + if (unlikely(page != xas_reload(&xas))) { + xas_reset(&xas); + continue; + } + + p[nr++] = find_subpage(page, xas.xa_index); + if (nr == maxpages) + break; + } + rcu_read_unlock(); + + maxsize = min_t(size_t, nr * PAGE_SIZE - offset, maxsize); + i->iov_offset += maxsize; + i->count -= maxsize; + return maxsize; +} + +/* + * Extract a list of contiguous pages from an ITER_BVEC iterator. This does + * not get references on the pages, nor does it get a pin on them. + */ +static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i, + struct page ***pages, size_t maxsize, + unsigned int maxpages, + unsigned int gup_flags, + size_t *offset0) +{ + struct page **p, *page; + size_t skip = i->iov_offset, offset; + int k; + + maxsize = min(maxsize, i->bvec->bv_len - skip); + skip += i->bvec->bv_offset; + page = i->bvec->bv_page + skip / PAGE_SIZE; + offset = skip % PAGE_SIZE; + *offset0 = offset; + + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + p = *pages; + for (k = 0; k < maxpages; k++) + p[k] = page + k; + + maxsize = min_t(size_t, maxsize, maxpages * PAGE_SIZE - offset); + i->count -= maxsize; + i->iov_offset += maxsize; + if (i->iov_offset == i->bvec->bv_len) { + i->iov_offset = 0; + i->bvec++; + i->nr_segs--; + } + return maxsize; +} + +/* + * Get the first segment from an ITER_UBUF or ITER_IOVEC iterator. The + * iterator must not be empty. + */ +static unsigned long iov_iter_extract_first_user_segment(const struct iov_iter *i, + size_t *size) +{ + size_t skip; + long k; + + if (iter_is_ubuf(i)) + return (unsigned long)i->ubuf + i->iov_offset; + + for (k = 0, skip = i->iov_offset; k < i->nr_segs; k++, skip = 0) { + size_t len = i->iov[k].iov_len - skip; + + if (unlikely(!len)) + continue; + if (*size > len) + *size = len; + return (unsigned long)i->iov[k].iov_base + skip; + } + BUG(); // if it had been empty, we wouldn't get called +} + +/* + * Extract a list of contiguous pages from a user iterator and get references + * on them. This should only be used iff the iterator is user-backed + * (IOBUF/UBUF) and data is being transferred out of the buffer described by + * the iterator (ie. this is the source). + * + * The pages are returned with incremented refcounts that the caller must undo + * once the transfer is complete, but no additional pins are obtained. + * + * This is only safe to be used where background IO/DMA is not going to be + * modifying the buffer, and so won't cause a problem with CoW on fork. + */ +static ssize_t iov_iter_extract_user_pages_and_get(struct iov_iter *i, + struct page ***pages, + size_t maxsize, + unsigned int maxpages, + unsigned int gup_flags, + size_t *offset0) +{ + unsigned long addr; + size_t offset; + int res; + + if (WARN_ON_ONCE(!iov_iter_is_source(i))) + return -EFAULT; + + gup_flags |= FOLL_GET; + if (i->nofault) + gup_flags |= FOLL_NOFAULT; + + addr = iov_iter_extract_first_user_segment(i, &maxsize); + *offset0 = offset = addr % PAGE_SIZE; + addr &= PAGE_MASK; + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + res = get_user_pages_fast(addr, maxpages, gup_flags, *pages); + if (unlikely(res <= 0)) + return res; + maxsize = min_t(size_t, maxsize, res * PAGE_SIZE - offset); + iov_iter_advance(i, maxsize); + return maxsize; +} + +/* + * Extract a list of contiguous pages from a user iterator and get a pin on + * each of them. This should only be used iff the iterator is user-backed + * (IOBUF/UBUF) and data is being transferred into the buffer described by the + * iterator (ie. this is the destination). + * + * It does not get refs on the pages, but the pages must be unpinned by the + * caller once the transfer is complete. + * + * This is safe to be used where background IO/DMA *is* going to be modifying + * the buffer; using a pin rather than a ref makes sure that CoW happens + * correctly in the parent during fork. + */ +static ssize_t iov_iter_extract_user_pages_and_pin(struct iov_iter *i, + struct page ***pages, + size_t maxsize, + unsigned int maxpages, + unsigned int gup_flags, + size_t *offset0) +{ + unsigned long addr; + size_t offset; + int res; + + if (WARN_ON_ONCE(!iov_iter_is_dest(i))) + return -EFAULT; + + gup_flags |= FOLL_PIN | FOLL_WRITE; + if (i->nofault) + gup_flags |= FOLL_NOFAULT; + + addr = first_iovec_segment(i, &maxsize); + *offset0 = offset = addr % PAGE_SIZE; + addr &= PAGE_MASK; + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + res = pin_user_pages_fast(addr, maxpages, gup_flags, *pages); + if (unlikely(res <= 0)) + return res; + maxsize = min_t(size_t, maxsize, res * PAGE_SIZE - offset); + iov_iter_advance(i, maxsize); + return maxsize; +} + +static ssize_t iov_iter_extract_user_pages(struct iov_iter *i, + struct page ***pages, size_t maxsize, + unsigned int maxpages, + unsigned int gup_flags, + size_t *offset0) +{ + if (iov_iter_extract_mode(i, gup_flags) == FOLL_GET) + return iov_iter_extract_user_pages_and_get(i, pages, maxsize, + maxpages, gup_flags, + offset0); + else + return iov_iter_extract_user_pages_and_pin(i, pages, maxsize, + maxpages, gup_flags, + offset0); +} + +/** + * iov_iter_extract_pages - Extract a list of contiguous pages from an iterator + * @i: The iterator to extract from + * @pages: Where to return the list of pages + * @maxsize: The maximum amount of iterator to extract + * @maxpages: The maximum size of the list of pages + * @gup_flags: Direction indicator and additional flags + * @offset0: Where to return the starting offset into (*@pages)[0] + * + * Extract a list of contiguous pages from the current point of the iterator, + * advancing the iterator. The maximum number of pages and the maximum amount + * of page contents can be set. + * + * If *@pages is NULL, a page list will be allocated to the required size and + * *@pages will be set to its base. If *@pages is not NULL, it will be assumed + * that the caller allocated a page list at least @maxpages in size and this + * will be filled in. + * + * @gup_flags can be set to either FOLL_SOURCE_BUF or FOLL_DEST_BUF, indicating + * how the buffer is to be used, and can have FOLL_PCI_P2PDMA OR'd with that. + * + * The iov_iter_extract_mode() function can be used to query how cleanup should + * be performed. + * + * Extra refs or pins on the pages may be obtained as follows: + * + * (*) If the iterator is user-backed (ITER_IOVEC/ITER_UBUF) and data is to be + * transferred /OUT OF/ the buffer (@gup_flags |= FOLL_SOURCE_BUF), refs + * will be taken on the pages, but pins will not be added. This can be + * used for DMA from a page; it cannot be used for DMA to a page, as it + * may cause page-COW problems in fork. iov_iter_extract_mode() will + * return FOLL_GET. + * + * (*) If the iterator is user-backed (ITER_IOVEC/ITER_UBUF) and data is to be + * transferred /INTO/ the described buffer (@gup_flags |= FOLL_DEST_BUF), + * pins will be added to the pages, but refs will not be taken. This must + * be used for DMA to a page. iov_iter_extract_mode() will return + * FOLL_PIN. + * + * (*) If the iterator is ITER_PIPE, this must describe a destination for the + * data. Additional pages may be allocated and added to the pipe (which + * will hold the refs), but neither refs nor pins will be obtained for the + * caller. The caller must hold the pipe lock. iov_iter_extract_mode() + * will return 0. + * + * (*) If the iterator is ITER_BVEC or ITER_XARRAY, the pages are merely + * listed; no extra refs or pins are obtained. iov_iter_extract_mode() + * will return 0. + * + * Note also: + * + * (*) Use with ITER_KVEC is not supported as that may refer to memory that + * doesn't have associated page structs. + * + * (*) Use with ITER_DISCARD is not supported as that has no content. + * + * On success, the function sets *@pages to the new pagelist, if allocated, and + * sets *offset0 to the offset into the first page.. + * + * It may also return -ENOMEM and -EFAULT. + */ +ssize_t iov_iter_extract_pages(struct iov_iter *i, + struct page ***pages, + size_t maxsize, + unsigned int maxpages, + unsigned int gup_flags, + size_t *offset0) +{ + if (WARN_ON_ONCE((gup_flags & FOLL_BUF_MASK) == FOLL_SOURCE_BUF && + iov_iter_is_dest(i))) + return -EIO; + if (WARN_ON_ONCE((gup_flags & FOLL_BUF_MASK) == FOLL_DEST_BUF && + iov_iter_is_source(i))) + return -EIO; + + maxsize = min_t(size_t, min_t(size_t, maxsize, i->count), MAX_RW_COUNT); + if (!maxsize) + return 0; + + if (likely(user_backed_iter(i))) + return iov_iter_extract_user_pages(i, pages, maxsize, + maxpages, gup_flags, + offset0); + if (iov_iter_is_bvec(i)) + return iov_iter_extract_bvec_pages(i, pages, maxsize, + maxpages, gup_flags, + offset0); + if (iov_iter_is_pipe(i)) + return iov_iter_extract_pipe_pages(i, pages, maxsize, + maxpages, gup_flags, + offset0); + if (iov_iter_is_xarray(i)) + return iov_iter_extract_xarray_pages(i, pages, maxsize, + maxpages, gup_flags, + offset0); + return -EFAULT; +} +EXPORT_SYMBOL_GPL(iov_iter_extract_pages); From patchwork Mon Jan 16 23:08:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44358 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1447288wrn; Mon, 16 Jan 2023 15:12:15 -0800 (PST) X-Google-Smtp-Source: AMrXdXuFZnefsmjf0dHvhJwlQBHo+st4DMiRzxn4l2j4E5leov6sDq+X3tEuL/SSor1Y4unkvQjD X-Received: by 2002:a05:6402:2b86:b0:499:70a8:f919 with SMTP id fj6-20020a0564022b8600b0049970a8f919mr942373edb.32.1673910735139; Mon, 16 Jan 2023 15:12:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673910735; cv=none; d=google.com; s=arc-20160816; b=yfS5AClV3VTNM1S4eZsjjjW9TFx+5jGuuAU9aj+UFzSmyuVqy0k1URenwdmFk9hREe X8BDre4cTRn4JjQzSnmXw8kBAek4TpKWROn3ZZ4oxQBuOKwxTSTQQdZmU+rNIaoH/87m wGCEQI/0xRr3HJw+BpC3d8WoHGVMEgsir4Uhx4oZ06iqZp/JW7N2oLZPzvIkGpWK73of 4tBhT2/T1djo6libOMBL8BikVnCIIMfIMM2Ow6AW3lS28mP+yd1nzTJtXVkKLHAwj/iq szzZ6RxjdiOjdFPQ11UnguMA8QpTeZRg5/qEGd16L17fgHxT4JnxXtO+f5fnDaSE3CE/ qPyQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=TCpipUj3nCrwQxMm0ou1SVJZpUKEvQM3FUN+vn5vbFE=; b=hTs8JsbUxot+JInz+Ztp34nj9eeFUcSJHGgdeuUM1Nvw5GtrV+RO3cfkewP/dQtmCE 8yTnaAaVe6A4SEh9OqTybsc3jGZtgnMeXCj2x9/ZiFDca9BesZxB9JqqM8BprzBpqnXD KmKrF8mGiO6fMgF/+Qu9U+QEN5dkaDxZ+mHG3ZyVvqgMPjEiyMiC8GQwGKCC2xx/pePw GD3z6FTVcRIUPXA7S8dydZtv3lnlZeGw4BVlu3NF7BKXGD+Nq9sGKqdesW/1E0oiSRda kinYqbqW2x0beTqeJRZOa7FLXeoeQpP5c1dCGul9yU89zUcVbqDfdMGvhYuNN1oMPmrf buRg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=N6NsMWQW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z9-20020a056402274900b0049394fbf982si37209133edd.132.2023.01.16.15.11.51; Mon, 16 Jan 2023 15:12:15 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=N6NsMWQW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235211AbjAPXK6 (ORCPT + 99 others); Mon, 16 Jan 2023 18:10:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48800 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233570AbjAPXJy (ORCPT ); Mon, 16 Jan 2023 18:09:54 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7B7242A143 for ; Mon, 16 Jan 2023 15:09:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910536; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TCpipUj3nCrwQxMm0ou1SVJZpUKEvQM3FUN+vn5vbFE=; b=N6NsMWQWqcxWAi4tzbH8yuS8A/Ftcj9KDV2CdlLrja0Xg/swfs6MVVJ7Jsut/CcIkem6v9 PxZRxK9JteKfZ/5ebLhqBdPMZzMAQ7Y/3WVBWJ54H2C+WwC3DnzbacAhOrmpAzg3w7z//A 0oov+skIHsG/KTIq6pEuKqI7U6NRG3A= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-644-i5YEOVHvMj-EMbjcVQNT2w-1; Mon, 16 Jan 2023 18:08:55 -0500 X-MC-Unique: i5YEOVHvMj-EMbjcVQNT2w-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 338A4811E6E; Mon, 16 Jan 2023 23:08:54 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 987AA40C6EC4; Mon, 16 Jan 2023 23:08:52 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 07/34] iov_iter: Add a function to extract a page list from an iterator From: David Howells To: Al Viro Cc: Christoph Hellwig , John Hubbard , Matthew Wilcox , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:08:52 +0000 Message-ID: <167391053207.2311931.16398133457201442907.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755222623031723301?= X-GMAIL-MSGID: =?utf-8?q?1755222623031723301?= Add a function, iov_iter_extract_pages(), to extract a list of pages from an iterator. The pages may be returned with a reference added or a pin added or neither, depending on the type of iterator and the direction of transfer. The caller should pass FOLL_SOURCE_BUF or FOLL_DEST_BUF as part of gup_flags to indicate how the iterator contents are to be used. Add a second function, iov_iter_extract_mode(), to determine how the cleanup should be done. There are three cases: (1) Transfer *into* an ITER_IOVEC or ITER_UBUF iterator. Extracted pages will have pins obtained on them (but not references) so that fork() doesn't CoW the pages incorrectly whilst the I/O is in progress. iov_iter_extract_mode() will return FOLL_PIN for this case. The caller should use something like unpin_user_page() to dispose of the page. (2) Transfer is *out of* an ITER_IOVEC or ITER_UBUF iterator. Extracted pages will have references obtained on them, but not pins. iov_iter_extract_mode() will return FOLL_GET. The caller should use something like put_page() for page disposal. (3) Any other sort of iterator. No refs or pins are obtained on the page, the assumption is made that the caller will manage page retention. iov_iter_extract_mode() will return 0. The pages don't need additional disposal. Changes: ======== ver #6) - Add back the function to indicate the cleanup mode. - Drop the cleanup_mode return arg to iov_iter_extract_pages(). - Pass FOLL_SOURCE/DEST_BUF in gup_flags. Check this against the iter data_source. ver #4) - Use ITER_SOURCE/DEST instead of WRITE/READ. - Allow additional FOLL_* flags, such as FOLL_PCI_P2PDMA to be passed in. ver #3) - Switch to using EXPORT_SYMBOL_GPL to prevent indirect 3rd-party access to get/pin_user_pages_fast()[1]. Signed-off-by: David Howells cc: Al Viro cc: Christoph Hellwig cc: John Hubbard cc: Matthew Wilcox cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/Y3zFzdWnWlEJ8X8/@infradead.org/ [1] Link: https://lore.kernel.org/r/166722777971.2555743.12953624861046741424.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732025748.3186319.8314014902727092626.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166869689451.3723671.18242195992447653092.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166920903885.1461876.692029808682876184.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/166997421646.9475.14837976344157464997.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/167305163883.1521586.10777155475378874823.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/167344728530.2425628.9613910866466387722.stgit@warthog.procyon.org.uk/ # v5 --- include/linux/uio.h | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/include/linux/uio.h b/include/linux/uio.h index 18b64068cc6d..38607c82e0cc 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -373,4 +373,32 @@ static inline void iov_iter_ubuf(struct iov_iter *i, enum iter_dir direction, }; } +ssize_t iov_iter_extract_pages(struct iov_iter *i, struct page ***pages, + size_t maxsize, unsigned int maxpages, + unsigned int gup_flags, size_t *offset0); + +/** + * iov_iter_extract_mode - Indicate how pages from the iterator will be retained + * @iter: The iterator + * @gup_flags: How the iterator is to be used (FOLL_SOURCE/DEST_BUF) + * + * Examine the iterator and the gup_flags and indicate by returning FOLL_PIN, + * FOLL_GET or 0 as to how, if at all, pages extracted from the iterator will + * be retained by the extraction function. + * + * FOLL_GET indicates that the pages will have a reference taken on them that + * the caller must put. This can be done for DMA/async DIO write from a page. + * + * FOLL_PIN indicates that the pages will have a pin placed in them that the + * caller must unpin. This is must be done for DMA/async DIO read to a page to + * avoid CoW problems in fork. + * + * 0 indicates that no measures are taken and that it's up to the caller to + * retain the pages. + */ +#define iov_iter_extract_mode(iter, gup_flags) \ + (user_backed_iter(iter) ? \ + (gup_flags & FOLL_BUF_MASK) == FOLL_SOURCE_BUF ? \ + FOLL_GET : FOLL_PIN : 0) + #endif From patchwork Mon Jan 16 23:08:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44359 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1447477wrn; Mon, 16 Jan 2023 15:12:44 -0800 (PST) X-Google-Smtp-Source: AMrXdXsM0+JvLfrbfdU+dBc2pkqG1Cv80chUZOe/22BOJ0jq3lkj96xJPfL0vlkDPZCsMDgetJ4r X-Received: by 2002:a17:906:684e:b0:84d:3794:5748 with SMTP id a14-20020a170906684e00b0084d37945748mr655279ejs.9.1673910764750; Mon, 16 Jan 2023 15:12:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673910764; cv=none; d=google.com; s=arc-20160816; b=VQYibGoH6IynQoRj0bo8m5OH5J8mFA0wP11sb8/xt+qb3xIzHEiefjG39CF6qE+k80 E4Bv4vjv+jPh+LuSZ22NxGYd/fztoMhdZXetOhaUsRuBQjddhcyg6So2MQjzf9NtRh5b i4n7n+fAhRu97VdDjFquITm7iswqvVDUfv0M+UBTLgjlxsf8Z+XFni/d0kGDmMG6Vsfz 7nNU721md52Pjg/wDswe5ltVmXFswSNBTA60jNmydgw6LQHZrG/aS+YCdy+evfvTK7Du Yt0bHlcxOQqR+lnJqoPvi2l6+QvnC2CYLXkpyQ+Du9SQWKMVRv2DBoNDK68dIWu3qMxe o0Og== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=QrTYZnqppnIRJJtPh9wzilaUaNto9L5Dg5kn8SyEIfI=; b=iC6XBWMCgp702HhKiTKBQIwEIxoRgyIX+PmLbyTKr8d0mRXITg/98qkRqZTQVWvm/C O9FfLJC6UNS+H67zSgKfDF93FUzRLHiCIJS2CcBL4QV6UrxuakskKODFyZ3p/4+nOaW9 KdJnGWhLmWdh2GOeyKpHLWSJZ7lIAbYQ7TYy/tk1Nt6l2UNEwebWD2zDhjeTIpdfZfpu tK0rpPLNnaOkvgvYylJBwOO1iaYDChVCSlb9WDfnKe7BZhp4EQvBp8zkz99ng0zi73uD 566VBhgcb0aNzY8HnRKBr1qnbDiKZxkuEazC1ULmEitbJQNsVy1T93yHrLGwvgDW9Air Doeg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gu2bdKAc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hr26-20020a1709073f9a00b0086bfaaa9edesi9019477ejc.807.2023.01.16.15.12.21; Mon, 16 Jan 2023 15:12:44 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gu2bdKAc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235262AbjAPXLZ (ORCPT + 99 others); Mon, 16 Jan 2023 18:11:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48994 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235254AbjAPXJ6 (ORCPT ); Mon, 16 Jan 2023 18:09:58 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 95BAD29E0C for ; Mon, 16 Jan 2023 15:09:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910546; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QrTYZnqppnIRJJtPh9wzilaUaNto9L5Dg5kn8SyEIfI=; b=gu2bdKAcFpa5vBTDF+i/VjU76IUoE3VEep9vKc5vYTaRZrlXUZMyal57qV1DF9iJZGMiBF 3EnJHHh6yPeCfUL2O0Xs+sUrKGskzXLoOaA1FFYmifH5lXpPtMOHNzOByMQE/qRrYwF6iO YmDlTL/Q5cjgn4j3h41rU1lNyZcWLgA= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-511-rgMGV2BLNVaszWNZA-oF3g-1; Mon, 16 Jan 2023 18:09:01 -0500 X-MC-Unique: rgMGV2BLNVaszWNZA-oF3g-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2CA8385CCE3; Mon, 16 Jan 2023 23:09:01 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id EFE7540C6EC4; Mon, 16 Jan 2023 23:08:59 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 08/34] mm: Provide a helper to drop a pin/ref on a page From: David Howells To: Al Viro Cc: dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:08:59 +0000 Message-ID: <167391053934.2311931.17229969100836070492.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755222654120270430?= X-GMAIL-MSGID: =?utf-8?q?1755222654120270430?= Provide a helper in the get_user_pages code to drop a pin or a ref on a page based on being given FOLL_GET or FOLL_PIN in its flags argument or do nothing if neither is set. Signed-off-by: David Howells --- include/linux/mm.h | 3 +++ mm/gup.c | 22 ++++++++++++++++++++++ 2 files changed, 25 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 3af4ca8b1fe7..8e746a930945 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1367,6 +1367,9 @@ static inline bool is_cow_mapping(vm_flags_t flags) #define SECTION_IN_PAGE_FLAGS #endif +void folio_put_unpin(struct folio *folio, unsigned int flags); +void page_put_unpin(struct page *page, unsigned int flags); + /* * The identification function is mainly used by the buddy allocator for * determining if two pages could be buddies. We are not really identifying diff --git a/mm/gup.c b/mm/gup.c index f45a3a5be53a..3ee4b4c7e0cb 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -191,6 +191,28 @@ static void gup_put_folio(struct folio *folio, int refs, unsigned int flags) folio_put_refs(folio, refs); } +/** + * folio_put_unpin - Unpin/put a folio as appropriate + * @folio: The folio to release + * @flags: gup flags indicating the mode of release (FOLL_*) + * + * Release a folio according to the flags. If FOLL_GET is set, the folio has a + * ref dropped; if FOLL_PIN is set, it is unpinned; otherwise it is left + * unaltered. + */ +void folio_put_unpin(struct folio *folio, unsigned int flags) +{ + if (flags & (FOLL_GET | FOLL_PIN)) + gup_put_folio(folio, 1, flags); +} +EXPORT_SYMBOL_GPL(folio_put_unpin); + +void page_put_unpin(struct page *page, unsigned int flags) +{ + folio_put_unpin(page_folio(page), flags); +} +EXPORT_SYMBOL_GPL(page_put_unpin); + /** * try_grab_page() - elevate a page's refcount by a flag-dependent amount * @page: pointer to page to be grabbed From patchwork Mon Jan 16 23:09:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44360 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1447522wrn; Mon, 16 Jan 2023 15:12:50 -0800 (PST) X-Google-Smtp-Source: AMrXdXufvWwXxLFYyUxhwN8M2c62AGht/tPVa9TNmwG+sXbbkgBWwJWU6GWnDapCyux4wPaAAXag X-Received: by 2002:aa7:cfc4:0:b0:49e:1d59:7942 with SMTP id r4-20020aa7cfc4000000b0049e1d597942mr977354edy.13.1673910769993; Mon, 16 Jan 2023 15:12:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673910769; cv=none; d=google.com; s=arc-20160816; b=BkGG4DNofPeFhvO8F3gSC//8BcLIGfj7a+g/7K6HRuVmmH8vZAW+Q9/p5YeKpELeuY MRtci5uQSeNLzG5rxQqpYuvSUD20cYh1Pe5oThr2jBsEQvWiNXb3gYTCXlVGe3UAWT3t N1d1z+LMrktpf2EEZd0hlQ1FsWtdrTJDYRyxdb9PxSgVE+HmceKQoIYsfhHGsHD8IqEV xHAgwylzwTCw/QnLxYNvMoM3DJE+H70kKxwZ0CsrkFhUb2Yse0a0dolGwW0f7718/MpO ZGsqy22/+l+eiPi3lmNabSZAEiz0WkIqaolMPlyox8cvPEHfCo1zvSDo5TbPtuonQL2F 84fw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=oIg4SI8jd35c8Kgm0gG6JOeWye+BZTFoEmJg3h8S5uc=; b=dZMeK/ddGGwTHZU/bNHlsSaNzRtprXRhZOsTN97E96iUs6vqhpzxQcKnrVhdiigxKK QSQKwucZIUBZwbHqUCp351m/s320o2tTe+srCCWPcIooiOc1YnisT9H1nlncicrU/SZp Ug+NhcVW75C0NX6QsLOMS2dW+aEAVa8ddsM9tOLJAIxT2WZZuqxPy3ptT9nq6XnqgYN7 8MBkQOPDYUS5bvruIOW+e06sRhtWKzqi9NKyPzogawOZ4TLAZiHr0sCgYYfv/HYbS3xH 2zDSkFf/bq6Bt+M9YjIICCTIvMO13dUjLfOcPF69HQ5dpd3J8fHKP40yL+H/o3nIQFWb cKDA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=U2tzO5L3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m14-20020a056402510e00b004876785ef7esi1786917edd.489.2023.01.16.15.12.26; Mon, 16 Jan 2023 15:12:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=U2tzO5L3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233038AbjAPXLj (ORCPT + 99 others); Mon, 16 Jan 2023 18:11:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48838 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235274AbjAPXKW (ORCPT ); Mon, 16 Jan 2023 18:10:22 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E98D62BEF9 for ; Mon, 16 Jan 2023 15:09:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910554; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oIg4SI8jd35c8Kgm0gG6JOeWye+BZTFoEmJg3h8S5uc=; b=U2tzO5L3eVNH/bQdMw6iswVDw9ustmx7Epejj7BdzETs54WzgrtNc9UrOplwKgbcyEavA3 eMx2DVnwi4i8JKpy2CszKB05h0s32PoiWHY18Ou6d3Jh245F3arAT+yniYgB86//dRteOM ERWiUN43pqSVHfsRs+nw/dFxOSpllvs= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-21-t5kEwhkRNwqeCqPIKbDeXQ-1; Mon, 16 Jan 2023 18:09:08 -0500 X-MC-Unique: t5kEwhkRNwqeCqPIKbDeXQ-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3F7A22A59556; Mon, 16 Jan 2023 23:09:08 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id D489D492B00; Mon, 16 Jan 2023 23:09:06 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 09/34] bio: Rename BIO_NO_PAGE_REF to BIO_PAGE_REFFED and invert the meaning From: David Howells To: Al Viro Cc: Jens Axboe , Jan Kara , Christoph Hellwig , Matthew Wilcox , Logan Gunthorpe , linux-block@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:09:06 +0000 Message-ID: <167391054631.2311931.7588488803802952158.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755222659740058249?= X-GMAIL-MSGID: =?utf-8?q?1755222659740058249?= Rename BIO_NO_PAGE_REF to BIO_PAGE_REFFED and invert the meaning. In a following patch I intend to add a BIO_PAGE_PINNED flag to indicate that the page needs unpinning and this way both flags have the same logic. Changes ======= ver #5) - Split from patch that uses iov_iter_extract_pages(). Signed-off-by: David Howells cc: Al Viro cc: Jens Axboe cc: Jan Kara cc: Christoph Hellwig cc: Matthew Wilcox cc: Logan Gunthorpe cc: linux-block@vger.kernel.org Link: https://lore.kernel.org/r/167305166150.1521586.10220949115402059720.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/167344730802.2425628.14034153595667416149.stgit@warthog.procyon.org.uk/ # v5 Reviewed-by: Christoph Hellwig --- block/bio.c | 9 ++++++++- include/linux/bio.h | 2 +- include/linux/blk_types.h | 2 +- 3 files changed, 10 insertions(+), 3 deletions(-) diff --git a/block/bio.c b/block/bio.c index 867cf4db87ea..5b6a76c3e620 100644 --- a/block/bio.c +++ b/block/bio.c @@ -243,6 +243,10 @@ static void bio_free(struct bio *bio) * Users of this function have their own bio allocation. Subsequently, * they must remember to pair any call to bio_init() with bio_uninit() * when IO has completed, or when the bio is released. + * + * We set the initial assumption that pages attached to the bio will be + * released with put_page() by setting BIO_PAGE_REFFED; if the pages + * should not be put, this flag should be cleared. */ void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table, unsigned short max_vecs, blk_opf_t opf) @@ -274,6 +278,7 @@ void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table, #ifdef CONFIG_BLK_DEV_INTEGRITY bio->bi_integrity = NULL; #endif + bio_set_flag(bio, BIO_PAGE_REFFED); bio->bi_vcnt = 0; atomic_set(&bio->__bi_remaining, 1); @@ -302,6 +307,7 @@ void bio_reset(struct bio *bio, struct block_device *bdev, blk_opf_t opf) { bio_uninit(bio); memset(bio, 0, BIO_RESET_BYTES); + bio_set_flag(bio, BIO_PAGE_REFFED); atomic_set(&bio->__bi_remaining, 1); bio->bi_bdev = bdev; if (bio->bi_bdev) @@ -812,6 +818,7 @@ EXPORT_SYMBOL(bio_put); static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp) { bio_set_flag(bio, BIO_CLONED); + bio_clear_flag(bio, BIO_PAGE_REFFED); bio->bi_ioprio = bio_src->bi_ioprio; bio->bi_iter = bio_src->bi_iter; @@ -1198,7 +1205,7 @@ void bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter) bio->bi_io_vec = (struct bio_vec *)iter->bvec; bio->bi_iter.bi_bvec_done = iter->iov_offset; bio->bi_iter.bi_size = size; - bio_set_flag(bio, BIO_NO_PAGE_REF); + bio_clear_flag(bio, BIO_PAGE_REFFED); bio_set_flag(bio, BIO_CLONED); } diff --git a/include/linux/bio.h b/include/linux/bio.h index 3f7ba7fe48ac..69b32c5532f6 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -496,7 +496,7 @@ void zero_fill_bio(struct bio *bio); static inline void bio_release_pages(struct bio *bio, bool mark_dirty) { - if (!bio_flagged(bio, BIO_NO_PAGE_REF)) + if (bio_flagged(bio, BIO_PAGE_REFFED)) __bio_release_pages(bio, mark_dirty); } diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 99be590f952f..86711fb0534a 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -318,7 +318,7 @@ struct bio { * bio flags */ enum { - BIO_NO_PAGE_REF, /* don't put release vec pages */ + BIO_PAGE_REFFED, /* Pages need refs putting (equivalent to FOLL_GET) */ BIO_CLONED, /* doesn't own data */ BIO_BOUNCED, /* bio is a bounce bio */ BIO_QUIET, /* Make BIO Quiet */ From patchwork Mon Jan 16 23:09:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44361 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1447695wrn; Mon, 16 Jan 2023 15:13:17 -0800 (PST) X-Google-Smtp-Source: AMrXdXv9fomLkyTrmoV2hH9KX4L9iU3SaRG87jD4vqWpnkUSSfa22/WPkhI2zyiQhCx+nYAgEkg5 X-Received: by 2002:aa7:d449:0:b0:499:bec8:3a78 with SMTP id q9-20020aa7d449000000b00499bec83a78mr828417edr.41.1673910797439; Mon, 16 Jan 2023 15:13:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673910797; cv=none; d=google.com; s=arc-20160816; b=W6tQFvZ8hg3Fx+PR11TuM236twBDj50AmwsO3EKl9S+GrwDOF8wmFsJK1mjzKnZNpq RdoZUgJoguGu0+UKgWybJ/G4J2rIinWwkq9HfhhtWP3Wc2djCG29Rkqjsxq3oG2iC212 bV/Sj8rr+5qNuOTyWQc/Tgh94i2rniJRwQAEC03IChFIizX1hSBvtbswjxtF6n2fKez9 fx7u85Yxk2i0siX3PQJH8xjr1qBLFzHLg7vSxYgqlfZhctyYxazAp55/HP6UmNMy6Q+S s8XkuRidpylXMo/uwxevIeTTL3WWZ63iXJaZFFcqnf2qHIwuXLFzophHR+1+/FJaicQq hkvw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=Girl2YBNSmk/RhHD3I8qnvIF9Ng617btd8CJd+5Emo0=; b=XWjvyceNUbDiufWulClOdEk2Zc8FncBjqDZuh7vB8LeaIWd0VfogLr61cHEt9dCX5w /IoeHiu00tjzDJEfblIGr6fGfMm16X4xX80c2nfoQOFyhN27y5E7CsssGF/it35gnkfi QH9gI1HQTwOBr9xC/VJuHmM3NtMWrgHCvFuxGCJvyQjYWn2vGDueEXMvYtId8hoI2oc1 Wc/l1hhrrj5PfzCaUBWMq+VXqMFh+6bkguik4Ryv2D1HPvCOyDgMZQSf+QQ47JkB9EGs o1MJSgQLZhlY4gYk3pCN1FaDNEzVwWiSz7l6d/ICA8yWTmjhKLR0M5D9W0q8wt9XFFTd NEIw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Gz79hdbA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i9-20020a05640242c900b0046c8d52c79asi41899156edc.357.2023.01.16.15.12.54; Mon, 16 Jan 2023 15:13:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Gz79hdbA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235282AbjAPXLt (ORCPT + 99 others); Mon, 16 Jan 2023 18:11:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235293AbjAPXK3 (ORCPT ); Mon, 16 Jan 2023 18:10:29 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 838872BF10 for ; Mon, 16 Jan 2023 15:09:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910558; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Girl2YBNSmk/RhHD3I8qnvIF9Ng617btd8CJd+5Emo0=; b=Gz79hdbAvkWwnJvuyCWdRNSJGOpADDDelXG/C9mW6kVv8/mYW8Cdf90i0D3Hr62TI6Va2z dUiYEpSb+K7Fp7tWlCogctMvXXm787kpkobY2AYKiavOSJE2nxMCWVusjAaxSPzrCo2bGC fABt+WdvRTI17avCnnC5hs0l12FhyhQ= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-355-QG4fZcgpOoC_YVw6MQevGA-1; Mon, 16 Jan 2023 18:09:16 -0500 X-MC-Unique: QG4fZcgpOoC_YVw6MQevGA-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 533ED1C0518D; Mon, 16 Jan 2023 23:09:15 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id E91BE492B11; Mon, 16 Jan 2023 23:09:13 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 10/34] mm, block: Make BIO_PAGE_REFFED/PINNED the same as FOLL_GET/PIN numerically From: David Howells To: Al Viro Cc: Jens Axboe , Jan Kara , Christoph Hellwig , Matthew Wilcox , Logan Gunthorpe , linux-block@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:09:13 +0000 Message-ID: <167391055339.2311931.11902422289425837725.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755222688204799196?= X-GMAIL-MSGID: =?utf-8?q?1755222688204799196?= Make BIO_PAGE_REFFED the same as FOLL_GET and BIO_PAGE_PINNED the same as FOLL_PIN numerically so that the BIO_* flags can be passed directly to page_put_unpin(). Provide a build-time assertion to check this. Signed-off-by: David Howells cc: Al Viro cc: Jens Axboe cc: Jan Kara cc: Christoph Hellwig cc: Matthew Wilcox cc: Logan Gunthorpe cc: linux-block@vger.kernel.org --- block/bio.c | 3 +++ include/linux/blk_types.h | 1 + include/linux/mm.h | 17 ++++++++++------- 3 files changed, 14 insertions(+), 7 deletions(-) diff --git a/block/bio.c b/block/bio.c index 5b6a76c3e620..d8c636cefcdd 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1798,6 +1798,9 @@ static int __init init_bio(void) { int i; + BUILD_BUG_ON((1 << BIO_PAGE_REFFED) != FOLL_GET); + BUILD_BUG_ON((1 << BIO_PAGE_PINNED) != FOLL_PIN); + bio_integrity_init(); for (i = 0; i < ARRAY_SIZE(bvec_slabs); i++) { diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 86711fb0534a..42b40156c517 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -319,6 +319,7 @@ struct bio { */ enum { BIO_PAGE_REFFED, /* Pages need refs putting (equivalent to FOLL_GET) */ + BIO_PAGE_PINNED, /* Pages need unpinning (equivalent to FOLL_PIN) */ BIO_CLONED, /* doesn't own data */ BIO_BOUNCED, /* bio is a bounce bio */ BIO_QUIET, /* Make BIO Quiet */ diff --git a/include/linux/mm.h b/include/linux/mm.h index 8e746a930945..f14edb192394 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3074,12 +3074,13 @@ static inline vm_fault_t vmf_error(int err) struct page *follow_page(struct vm_area_struct *vma, unsigned long address, unsigned int foll_flags); -#define FOLL_WRITE 0x01 /* check pte is writable */ -#define FOLL_TOUCH 0x02 /* mark page accessed */ -#define FOLL_GET 0x04 /* do get_page on page */ -#define FOLL_DUMP 0x08 /* give error on hole if it would be zero */ -#define FOLL_FORCE 0x10 /* get_user_pages read/write w/o permission */ -#define FOLL_NOWAIT 0x20 /* if a disk transfer is needed, start the IO +#define FOLL_GET 0x01 /* do get_page on page (equivalent to BIO_FOLL_GET) */ +#define FOLL_PIN 0x02 /* pages must be released via unpin_user_page */ +#define FOLL_WRITE 0x04 /* check pte is writable */ +#define FOLL_TOUCH 0x08 /* mark page accessed */ +#define FOLL_DUMP 0x10 /* give error on hole if it would be zero */ +#define FOLL_FORCE 0x20 /* get_user_pages read/write w/o permission */ +#define FOLL_NOWAIT 0x40 /* if a disk transfer is needed, start the IO * and return without waiting upon it */ #define FOLL_NOFAULT 0x80 /* do not fault in pages */ #define FOLL_HWPOISON 0x100 /* check page is hwpoisoned */ @@ -3088,7 +3089,6 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, #define FOLL_ANON 0x8000 /* don't do file mappings */ #define FOLL_LONGTERM 0x10000 /* mapping lifetime is indefinite: see below */ #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ -#define FOLL_PIN 0x40000 /* pages must be released via unpin_user_page */ #define FOLL_FAST_ONLY 0x80000 /* gup_fast: prevent fall-back to slow gup */ #define FOLL_PCI_P2PDMA 0x100000 /* allow returning PCI P2PDMA pages */ #define FOLL_INTERRUPTIBLE 0x200000 /* allow interrupts from generic signals */ @@ -3098,6 +3098,9 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, #define FOLL_BUF_MASK FOLL_WRITE /* + * FOLL_GET must be the same bit as BIO_FOLL_GET and FOLL_PIN must be the same + * bit as BIO_FOLL_PIN. + * * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each * other. Here is what they mean, and how to use them: * From patchwork Mon Jan 16 23:09:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44362 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1447765wrn; Mon, 16 Jan 2023 15:13:31 -0800 (PST) X-Google-Smtp-Source: AMrXdXtTZRyaBD7D/c+Xtyz7gRb33A931kawHCMXRRRc0PyS+DAl0XXsUjk3fAG8FTQhPUoyP6i1 X-Received: by 2002:a17:907:8c06:b0:7c1:a0e:1607 with SMTP id ta6-20020a1709078c0600b007c10a0e1607mr699616ejc.12.1673910811781; Mon, 16 Jan 2023 15:13:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673910811; cv=none; d=google.com; s=arc-20160816; b=pGIlcJDz3cXghBdxkWJRBEjLMI1FVUjF92HmHQp9frN2IVxWGApm/i5oYilqEE0jdK uB4gmggFigRxhQwfYq55IIK4jSe2UXC+8YmBobq69c7aewu0CbyxXCa2hoxTlt6leuJd tAMGx0OHdYw6uKD7+T+vMDBkweXSucQJrM3lxw5yX4XzS/TEXKWE/CEEYqzlBSkzdShI tuYIo2dggzAAZxOuBBms4X/gI0WRPnd4KFh4m9Ki+hig2biSlAL6i9d2nY/x/PbNXIgJ 3dU7iZxLhUo0Fu9UZ5L5VMk3gvigmUPGj1onOeUeK03Oypq08qgsy07wP++xiK+/okBu p2aA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=XMfr1u6zgBhuUVGQJaZ/A8MK9MKewwNzFy741O6RN1U=; b=eR72gXMEalOwWR6CfsrXz0unpMqAn8lFsJ4/rtPKD5C4lSGzt8CtR6TRDj+F6vArb+ vw41Y/oED/cMeKDyIdqHi8jhuQqQluiOgVWiL3FXDb1XF6FBltMJNL5C896Nnu8nt5lq 8jrreFqKehBDrBS3F8bhUYUz6odOdjJ/oVAlpX7INKyIHkkUrvZS3CoUYbdtehETQohn vfDJNA5kxaZlB+DnEB9o4BWIKo8wtUdkcQJJcg4wAL2WJmXNiz2Qj0lOANy8Ais2ig0Q dFwhqkCWnQYYVMP+6mvsHwS5lihu01dQU1J7P4ySkQ2bsauEAImCVcCdxRckqzH4uhLV TggQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NajNmed+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id js9-20020a17090797c900b0084d4e0424ffsi24885749ejc.768.2023.01.16.15.13.08; Mon, 16 Jan 2023 15:13:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NajNmed+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235313AbjAPXLz (ORCPT + 99 others); Mon, 16 Jan 2023 18:11:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49028 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235205AbjAPXKr (ORCPT ); Mon, 16 Jan 2023 18:10:47 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 577652CC5D for ; Mon, 16 Jan 2023 15:09:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910566; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XMfr1u6zgBhuUVGQJaZ/A8MK9MKewwNzFy741O6RN1U=; b=NajNmed+PdUhbCmI+dlIlvADNmUjeHTEvx8jiZNKERyg3puwI2EMxjMAN8TCKUqDf9jozr Pt2/lsuRKlGU2RlfOGc7m0bkEw47S5twJfZvWYwl0B6DDufdyrzB7Q3NRzMcP2jWtfU6fO RSyfq2MZ2TsmzWYjOltPpyf4wG70cKc= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-457-5Pu3Y-WLN-GMPLkQPXNLGg-1; Mon, 16 Jan 2023 18:09:23 -0500 X-MC-Unique: 5Pu3Y-WLN-GMPLkQPXNLGg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 686E9101A521; Mon, 16 Jan 2023 23:09:22 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0775A51FF; Mon, 16 Jan 2023 23:09:20 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 11/34] iov_iter, block: Make bio structs pin pages rather than ref'ing if appropriate From: David Howells To: Al Viro Cc: Jens Axboe , Jan Kara , Christoph Hellwig , Matthew Wilcox , Logan Gunthorpe , linux-block@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:09:20 +0000 Message-ID: <167391056047.2311931.6772604381276147664.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755222703621304085?= X-GMAIL-MSGID: =?utf-8?q?1755222703621304085?= Convert the block layer's bio code to use iov_iter_extract_pages() instead of iov_iter_get_pages(). This will pin pages or leave them unaltered rather than getting a ref on them as appropriate to the source iterator. The pages need to be pinned for DIO-read rather than having refs taken on them to prevent VM copy-on-write from malfunctioning during a concurrent fork() (the result of the I/O would otherwise end up only visible to the child process and not the parent). To implement this: (1) If the BIO_PAGE_REFFED flag is set, this causes attached pages to be passed to put_page() during cleanup. (2) A BIO_PAGE_PINNED flag is provided. If set, this causes attached pages to be passed to unpin_user_page() during cleanup. (3) BIO_PAGE_REFFED is set by default and BIO_PAGE_PINNED is cleared by default when the bio is (re-)initialised. (4) If iov_iter_extract_pages() indicates FOLL_GET, this causes BIO_PAGE_REFFED to be set and if FOLL_PIN is indicated, this causes BIO_PAGE_PINNED to be set. If it returns neither FOLL_* flag, then both BIO_PAGE_* flags will be cleared. Mixing sets of pages with different clean up modes is not supported. (5) Cloned bio structs have both flags cleared. (6) bio_release_pages() will do the release if either BIO_PAGE_* flag is set. [!] Note that this is tested a bit with ext4, but nothing else. Changes ======= ver #5) - Transcribe the FOLL_* flags returned by iov_iter_extract_pages() to BIO_* flags and got rid of bi_cleanup_mode. - Replaced BIO_NO_PAGE_REF to BIO_PAGE_REFFED in the preceding patch. Signed-off-by: David Howells cc: Al Viro cc: Jens Axboe cc: Jan Kara cc: Christoph Hellwig cc: Matthew Wilcox cc: Logan Gunthorpe cc: linux-block@vger.kernel.org Link: https://lore.kernel.org/r/167305166150.1521586.10220949115402059720.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/167344731521.2425628.5403113335062567245.stgit@warthog.procyon.org.uk/ # v5 --- block/bio.c | 34 +++++++++++++++++++--------------- block/blk-map.c | 22 +++++++++++----------- block/blk.h | 25 +++++++++++++++++++++++++ include/linux/bio.h | 3 ++- 4 files changed, 57 insertions(+), 27 deletions(-) diff --git a/block/bio.c b/block/bio.c index d8c636cefcdd..f9ee3625d65c 100644 --- a/block/bio.c +++ b/block/bio.c @@ -245,8 +245,9 @@ static void bio_free(struct bio *bio) * when IO has completed, or when the bio is released. * * We set the initial assumption that pages attached to the bio will be - * released with put_page() by setting BIO_PAGE_REFFED; if the pages - * should not be put, this flag should be cleared. + * released with put_page() by setting BIO_PAGE_REFFED, but this should be set + * to BIO_PAGE_PINNED if the page should be unpinned instead; if the pages + * should not be put or unpinned, these flags should be cleared. */ void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table, unsigned short max_vecs, blk_opf_t opf) @@ -819,6 +820,7 @@ static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp) { bio_set_flag(bio, BIO_CLONED); bio_clear_flag(bio, BIO_PAGE_REFFED); + bio_clear_flag(bio, BIO_PAGE_PINNED); bio->bi_ioprio = bio_src->bi_ioprio; bio->bi_iter = bio_src->bi_iter; @@ -1183,7 +1185,7 @@ void __bio_release_pages(struct bio *bio, bool mark_dirty) bio_for_each_segment_all(bvec, bio, iter_all) { if (mark_dirty && !PageCompound(bvec->bv_page)) set_page_dirty_lock(bvec->bv_page); - put_page(bvec->bv_page); + bio_release_page(bio, bvec->bv_page); } } EXPORT_SYMBOL_GPL(__bio_release_pages); @@ -1220,7 +1222,7 @@ static int bio_iov_add_page(struct bio *bio, struct page *page, } if (same_page) - put_page(page); + bio_release_page(bio, page); return 0; } @@ -1234,7 +1236,7 @@ static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page, queue_max_zone_append_sectors(q), &same_page) != len) return -EINVAL; if (same_page) - put_page(page); + bio_release_page(bio, page); return 0; } @@ -1245,10 +1247,10 @@ static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page, * @bio: bio to add pages to * @iter: iov iterator describing the region to be mapped * - * Pins pages from *iter and appends them to @bio's bvec array. The - * pages will have to be released using put_page() when done. - * For multi-segment *iter, this function only adds pages from the - * next non-empty segment of the iov iterator. + * Extracts pages from *iter and appends them to @bio's bvec array. The pages + * will have to be cleaned up in the way indicated by the BIO_PAGE_REFFED and + * BIO_PAGE_PINNED flags. For a multi-segment *iter, this function only adds + * pages from the next non-empty segment of the iov iterator. * * The I/O direction is determined from the bio operation type. */ @@ -1284,12 +1286,14 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) * result to ensure the bio's total size is correct. The remainder of * the iov data will be picked up in the next bio iteration. */ - size = iov_iter_get_pages(iter, pages, - UINT_MAX - bio->bi_iter.bi_size, - nr_pages, &offset, gup_flags); + size = iov_iter_extract_pages(iter, &pages, + UINT_MAX - bio->bi_iter.bi_size, + nr_pages, gup_flags, &offset); if (unlikely(size <= 0)) return size ? size : -EFAULT; + bio_set_cleanup_mode(bio, iter, gup_flags); + nr_pages = DIV_ROUND_UP(offset + size, PAGE_SIZE); trim = size & (bdev_logical_block_size(bio->bi_bdev) - 1); @@ -1319,7 +1323,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) iov_iter_revert(iter, left); out: while (i < nr_pages) - put_page(pages[i++]); + bio_release_page(bio, pages[i++]); return ret; } @@ -1502,8 +1506,8 @@ void bio_set_pages_dirty(struct bio *bio) * the BIO and re-dirty the pages in process context. * * It is expected that bio_check_pages_dirty() will wholly own the BIO from - * here on. It will run one put_page() against each page and will run one - * bio_put() against the BIO. + * here on. It will run one put_page() or unpin_user_page() against each page + * and will run one bio_put() against the BIO. */ static void bio_dirty_fn(struct work_struct *work); diff --git a/block/blk-map.c b/block/blk-map.c index c30be529fb55..be769f889eca 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -285,24 +285,24 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter, gup_flags |= FOLL_PCI_P2PDMA; while (iov_iter_count(iter)) { - struct page **pages, *stack_pages[UIO_FASTIOV]; + struct page *stack_pages[UIO_FASTIOV]; + struct page **pages = stack_pages; ssize_t bytes; size_t offs; int npages; - if (nr_vecs <= ARRAY_SIZE(stack_pages)) { - pages = stack_pages; - bytes = iov_iter_get_pages(iter, pages, LONG_MAX, - nr_vecs, &offs, gup_flags); - } else { - bytes = iov_iter_get_pages_alloc(iter, &pages, - LONG_MAX, &offs, gup_flags); - } + if (nr_vecs > ARRAY_SIZE(stack_pages)) + pages = NULL; + + bytes = iov_iter_extract_pages(iter, &pages, LONG_MAX, + nr_vecs, gup_flags, &offs); if (unlikely(bytes <= 0)) { ret = bytes ? bytes : -EFAULT; goto out_unmap; } + bio_set_cleanup_mode(bio, iter, gup_flags); + npages = DIV_ROUND_UP(offs + bytes, PAGE_SIZE); if (unlikely(offs & queue_dma_alignment(rq->q))) @@ -319,7 +319,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter, if (!bio_add_hw_page(rq->q, bio, page, n, offs, max_sectors, &same_page)) { if (same_page) - put_page(page); + bio_release_page(bio, page); break; } @@ -331,7 +331,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter, * release the pages we didn't map into the bio, if any */ while (j < npages) - put_page(pages[j++]); + bio_release_page(bio, pages[j++]); if (pages != stack_pages) kvfree(pages); /* couldn't stuff something into bio? */ diff --git a/block/blk.h b/block/blk.h index 4c3b3325219a..29f12f758915 100644 --- a/block/blk.h +++ b/block/blk.h @@ -425,6 +425,31 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio, struct page *page, unsigned int len, unsigned int offset, unsigned int max_sectors, bool *same_page); +/* + * Set the cleanup mode for a bio from an iterator and the GUP flags. + */ +static inline void bio_set_cleanup_mode(struct bio *bio, struct iov_iter *iter, + unsigned int gup_flags) +{ + unsigned int cleanup_mode; + + bio_clear_flag(bio, BIO_PAGE_REFFED); + cleanup_mode = iov_iter_extract_mode(iter, gup_flags); + if (cleanup_mode & FOLL_GET) + bio_set_flag(bio, BIO_PAGE_REFFED); + if (cleanup_mode & FOLL_PIN) + bio_set_flag(bio, BIO_PAGE_PINNED); +} + +/* + * Clean up a page appropriately, where the page may be pinned, may have a + * ref taken on it or neither. + */ +static inline void bio_release_page(struct bio *bio, struct page *page) +{ + page_put_unpin(page, bio->bi_flags & (FOLL_GET | FOLL_PIN)); +} + struct request_queue *blk_alloc_queue(int node_id); int disk_scan_partitions(struct gendisk *disk, fmode_t mode, void *owner); diff --git a/include/linux/bio.h b/include/linux/bio.h index 69b32c5532f6..856b28e41d24 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -496,7 +496,8 @@ void zero_fill_bio(struct bio *bio); static inline void bio_release_pages(struct bio *bio, bool mark_dirty) { - if (bio_flagged(bio, BIO_PAGE_REFFED)) + if (bio_flagged(bio, BIO_PAGE_REFFED) || + bio_flagged(bio, BIO_PAGE_PINNED)) __bio_release_pages(bio, mark_dirty); } From patchwork Mon Jan 16 23:09:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44365 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1449049wrn; Mon, 16 Jan 2023 15:17:13 -0800 (PST) X-Google-Smtp-Source: AMrXdXseLdActpTuIWcINdF3C2nCaOGMp+LpxfRyDnPO7TpmuvwmvIRexVzEWtApAABKX3fuAQmE X-Received: by 2002:a17:907:cf48:b0:870:8b4f:8a86 with SMTP id uv8-20020a170907cf4800b008708b4f8a86mr791608ejc.6.1673911033141; Mon, 16 Jan 2023 15:17:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673911033; cv=none; d=google.com; s=arc-20160816; b=bsjcEO6QxlYzUGznZqBSLdeVshsynN8fM8cNrgVuUlz3KEAZpyqEAYKlxt2rz8A0d0 oKb9Tww1D5YelTDV57UGmHOAV0akT/PivX6zlvNFoXv6GrHG8NILG2pGisrkpGzfcYDK JjZAn3075LDtmaufdffsn31997XoMgKCzDNZzV2IdNuihMyQFLs6/xVJJZ88bywwKIXU st6fn5+H2mijgud5ztbeeFTmZ1lnUpN0xZxEeDd2CeEtNmZjkywOuEdnbsNwmahs16az efVSo5nUH1/fwlRm/NrrXb6onVnU2WeFNbC7BrMcCIahFo8lnIYtsWYx3gSP0ck2CIEH kf8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=YVa1ryPTQBH/SP1C4Yj88p68hB8CmUr5EMTQUgrIC1o=; b=E7bi+NNXT0BCQzY85IFLvKrI2HWE3cjfv9xGRbnPQ8vTpgma49+YtqJ4Ua+xIeUhiJ NgQEcmZpcNd0MKjpOTo6b4dXa1+INhRoh3kFfUHRYopgZyqzJQl3s3de+/BKueuNdbFE 7/GcZ5ytp+sdc2wcZOSSiIZMWwZpyGVbKPcR+YGWb8WCHZLaFzazottMJI/46QwDdY69 cTMJBY77EyFFGL/ShoXqCq0vPzRTJkw25yopBOO9P97FFyXWl0j0QtRPZU+ttu4ZSLns DQBlVd1+ScFTcIWlWbWDCImKkFceq33Z4hCS0BxgPiyVaGtJ4dhNJ+Ii+yrJzaf0iUbH zfkw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=a7BryOK0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hc18-20020a170907169200b00871fccac238si1995546ejc.785.2023.01.16.15.16.49; Mon, 16 Jan 2023 15:17:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=a7BryOK0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235365AbjAPXM1 (ORCPT + 99 others); Mon, 16 Jan 2023 18:12:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48138 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235193AbjAPXLD (ORCPT ); Mon, 16 Jan 2023 18:11:03 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C3EDE2CC7E for ; Mon, 16 Jan 2023 15:09:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910573; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YVa1ryPTQBH/SP1C4Yj88p68hB8CmUr5EMTQUgrIC1o=; b=a7BryOK0yzBTaKFBqKJGkzBd1IxjsH0+5A6uPE6RenCajJ9hE7tWTjeHC649IqJhdNZFNu DKMObqk21+3i5hOrCFST71oeh0Vr3+ukFVow+yrQg4WlbNAZE4hfKqko5efI4xJYbjY7iH LkpPfONsir6LbjiBCZuGkbj87BRS5gQ= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-20-qc22pDstOuC3dJ8UShOP6g-1; Mon, 16 Jan 2023 18:09:29 -0500 X-MC-Unique: qc22pDstOuC3dJ8UShOP6g-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4B38B2806046; Mon, 16 Jan 2023 23:09:29 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1BE9A14171B8; Mon, 16 Jan 2023 23:09:28 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 12/34] bio: Fix bio_flagged() so that gcc can better optimise it From: David Howells To: Al Viro Cc: Jens Axboe , linux-block@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:09:27 +0000 Message-ID: <167391056756.2311931.356007731815807265.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755222935302002727?= X-GMAIL-MSGID: =?utf-8?q?1755222935302002727?= Fix bio_flagged() so that multiple instances of it, such as: if (bio_flagged(bio, BIO_PAGE_REFFED) || bio_flagged(bio, BIO_PAGE_PINNED)) can be combined by the gcc optimiser into a single test in assembly (arguably, this is a compiler optimisation issue[1]). The missed optimisation stems from bio_flagged() comparing the result of the bitwise-AND to zero. This results in an out-of-line bio_release_page() being compiled to something like: <+0>: mov 0x14(%rdi),%eax <+3>: test $0x1,%al <+5>: jne 0xffffffff816dac53 <+7>: test $0x2,%al <+9>: je 0xffffffff816dac5c <+11>: movzbl %sil,%esi <+15>: jmp 0xffffffff816daba1 <__bio_release_pages> <+20>: jmp 0xffffffff81d0b800 <__x86_return_thunk> However, the test is superfluous as the return type is bool. Removing it results in: <+0>: testb $0x3,0x14(%rdi) <+4>: je 0xffffffff816e4af4 <+6>: movzbl %sil,%esi <+10>: jmp 0xffffffff816dab7c <__bio_release_pages> <+15>: jmp 0xffffffff81d0b7c0 <__x86_return_thunk> instead. Also, the MOVZBL instruction looks unnecessary[2] - I think it's just 're-booling' the mark_dirty parameter. Fixes: b7c44ed9d2fc ("block: manipulate bio->bi_flags through helpers") Signed-off-by: David Howells cc: Jens Axboe cc: linux-block@vger.kernel.org Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108370 [1] Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108371 [2] Reviewed-by: Christoph Hellwig --- include/linux/bio.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/bio.h b/include/linux/bio.h index 856b28e41d24..5e34bcfcfa2c 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -241,7 +241,7 @@ static inline void bio_cnt_set(struct bio *bio, unsigned int count) static inline bool bio_flagged(struct bio *bio, unsigned int bit) { - return (bio->bi_flags & (1U << bit)) != 0; + return bio->bi_flags & (1U << bit); } static inline void bio_set_flag(struct bio *bio, unsigned int bit) From patchwork Mon Jan 16 23:09:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44363 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1447820wrn; Mon, 16 Jan 2023 15:13:42 -0800 (PST) X-Google-Smtp-Source: AMrXdXtSQG3/sjf+w8lOsI7E58pd0JpXZnNHW9SLaYi/6AnN2O0sXuiA84JkhsyIQ7fxEtEJgLvC X-Received: by 2002:a17:907:348d:b0:7c1:6f0a:a2cf with SMTP id zx13-20020a170907348d00b007c16f0aa2cfmr619418ejb.32.1673910822754; Mon, 16 Jan 2023 15:13:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673910822; cv=none; d=google.com; s=arc-20160816; b=SMDGg5u8CLIWWU5UdV6LClePMhWA9u8+9SgCXXspQ6+PGuc9UY7s1h9a1H39X5alh7 mfMnoXHla6h25grBBg3sKrMdMnEvlDj6hFy5ofbzOoLCVgcOBHacVdhmMYuvckzyQSC4 Y+uJrYW0ZwrzB9noZTcJUWUc/bQy157vqZYSdPmgvl2PK4w//rztXgA0AU7W/QHfMc0O ebG89EkuwSq/X7EgzxcqLdwKFiWfKrjMvjJq3LLK9Km+JBlM7PEi39GUBDCjs8eGxLRL w2voatWYuZfZk62UZveR/78uHHg04YSyWd1OS+afxAla4VKbWLyyapSCzyBdwSuI54wP o4qA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=SyFdojZHqYZAjivktOFvj0WETw0xJd5nC3JvqHo/mQc=; b=LBlSHiXHgEp+Cik8XtDACAmBMMePZBRuPTpqUVinWfvsProxFS24AVyQRlFSo8i9bX B8+aiX7L+eKoKy70vSU4qpmM0lD4z3wkol/rZzLRCw6gkHFxYXjNQndl7+5igPM7E8qh L/t2quaEWj1qOTUn0/g3Rv93QmHyW3F9uSxNj373TUQWJCr9+FMTOTO0bhB9MAuZL/mX Zg3csdw5ahwMMIeHuQNyGx2sL/7NBpOu0wPiOIdYRm41eLtyRu+lbjRfvAYua+pbTKKC TkcfHJFAluBhQe4lM3ISNhqoV1ieW9UKwv6IgvGmfsJop8ak7A5At6Oyh+7Srcq8QPLI bQfQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=AkYCyekI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u5-20020a170906654500b007c11f2411e1si24188547ejn.815.2023.01.16.15.13.19; Mon, 16 Jan 2023 15:13:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=AkYCyekI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235318AbjAPXMV (ORCPT + 99 others); Mon, 16 Jan 2023 18:12:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48598 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233570AbjAPXLI (ORCPT ); Mon, 16 Jan 2023 18:11:08 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7A2D228870 for ; Mon, 16 Jan 2023 15:09:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910580; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SyFdojZHqYZAjivktOFvj0WETw0xJd5nC3JvqHo/mQc=; b=AkYCyekI4Tv+niMfM3AyCd4veMyeQKC+tYkAY6tN4Afr3zA/94FP0vOfYi2uWA1Hzc616V dtPDYEMnF2mCRAktuYhK+/soenUDXHeRyc9cS2DJCpxxFDwXeylkaLMi7V2TlXrrVvLiIO ZquOavswoCMv6j+SL/zXsbInG2IgntY= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-199-wUUqsBjMOO6GsYwbVltGmw-1; Mon, 16 Jan 2023 18:09:37 -0500 X-MC-Unique: wUUqsBjMOO6GsYwbVltGmw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C5F14811E6E; Mon, 16 Jan 2023 23:09:36 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id F3F3140C2064; Mon, 16 Jan 2023 23:09:34 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 13/34] netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator From: David Howells To: Al Viro Cc: Jeff Layton , Steve French , Shyam Prasad N , Rohith Surabattula , linux-cachefs@redhat.com, linux-cifs@vger.kernel.org, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:09:34 +0000 Message-ID: <167391057444.2311931.12321968641492694017.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755222714645331355?= X-GMAIL-MSGID: =?utf-8?q?1755222714645331355?= Add a function to extract the pages from a user-space supplied iterator (UBUF- or IOVEC-type) into a BVEC-type iterator, retaining the pages by getting a ref on them (FOLL_SOURCE_BUF is indicated) or pinning them (FOLL_DEST_BUF is indicated) as we go. This is useful in three situations: (1) A userspace thread may have a sibling that unmaps or remaps the process's VM during the operation, changing the assignment of the pages and potentially causing an error. Retaining the pages keeps some pages around, even if this occurs; futher, we find out at the point of extraction if EFAULT is going to be incurred. (2) Pages might get swapped out/discarded if not retained, so we want to retain them to avoid the reload causing a deadlock due to a DIO from/to an mmapped region on the same file. (3) The iterator may get passed to sendmsg() by the filesystem. If a fault occurs, we may get a short write to a TCP stream that's then tricky to recover from. We don't deal with other types of iterator here, leaving it to other mechanisms to retain the pages (eg. PG_locked, PG_writeback and the pipe lock). Changes: ======== ver #6) - Pass in a gup_flags argument to allow FOLL_SOURCE_BUF and FOLL_DEST_BUF and other FOLL_* flags to be passed in. - Don't pass back the cleanup mode - iov_iter_extract_mode() can be used to determine that. ver #3) - Switch to using EXPORT_SYMBOL_GPL to prevent indirect 3rd-party access to get/pin_user_pages_fast()[1]. Signed-off-by: David Howells cc: Jeff Layton cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: linux-cachefs@redhat.com cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/Y3zFzdWnWlEJ8X8/@infradead.org/ [1] Link: https://lore.kernel.org/r/166697255265.61150.6289490555867717077.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732026503.3186319.12020462741051772825.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166869690376.3723671.8813331570219190705.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166920904810.1461876.11603559311247187100.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/166997422579.9475.12101700945635692496.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/167305164634.1521586.12199658904363317567.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/167344729278.2425628.3277966637577509831.stgit@warthog.procyon.org.uk/ # v5 --- fs/netfs/Makefile | 1 fs/netfs/iterator.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/netfs.h | 2 + 3 files changed, 105 insertions(+) create mode 100644 fs/netfs/iterator.c diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile index f684c0cd1ec5..386d6fb92793 100644 --- a/fs/netfs/Makefile +++ b/fs/netfs/Makefile @@ -3,6 +3,7 @@ netfs-y := \ buffered_read.o \ io.o \ + iterator.o \ main.o \ objects.o diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c new file mode 100644 index 000000000000..f7f26de1a247 --- /dev/null +++ b/fs/netfs/iterator.c @@ -0,0 +1,102 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* Iterator helpers. + * + * Copyright (C) 2022 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include +#include +#include +#include +#include "internal.h" + +/** + * netfs_extract_user_iter - Extract the pages from a user iterator into a bvec + * @orig: The original iterator + * @orig_len: The amount of iterator to copy + * @new: The iterator to be set up + * @gup_flags: Direction indicator and additional flags + * + * Extract the page fragments from the given amount of the source iterator and + * build up a second iterator that refers to all of those bits. This allows + * the original iterator to disposed of. + * + * @gup_flags should indicate FOLL_SOURCE_BUF or FOLL_DEST_BUF plus any + * additional flags needed. + * + * On success, the number of elements in the bvec is returned, the original + * iterator will have been advanced by the amount extracted. + * + * The iov_iter_extract_mode() function should be used to query how cleanup + * should be performed. + */ +ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len, + struct iov_iter *new, unsigned int gup_flags) +{ + struct bio_vec *bv = NULL; + struct page **pages; + unsigned int cur_npages; + unsigned int max_pages; + unsigned int npages = 0; + unsigned int i; + ssize_t ret; + size_t count = orig_len, offset, len; + size_t bv_size, pg_size; + + if (WARN_ON_ONCE(!iter_is_ubuf(orig) && !iter_is_iovec(orig))) + return -EIO; + + max_pages = iov_iter_npages(orig, INT_MAX); + bv_size = array_size(max_pages, sizeof(*bv)); + bv = kvmalloc(bv_size, GFP_KERNEL); + if (!bv) + return -ENOMEM; + + /* Put the page list at the end of the bvec list storage. bvec + * elements are larger than page pointers, so as long as we work + * 0->last, we should be fine. + */ + pg_size = array_size(max_pages, sizeof(*pages)); + pages = (void *)bv + bv_size - pg_size; + + while (count && npages < max_pages) { + ret = iov_iter_extract_pages(orig, &pages, count, + max_pages - npages, gup_flags, + &offset); + if (ret < 0) { + pr_err("Couldn't get user pages (rc=%zd)\n", ret); + break; + } + + if (ret > count) { + pr_err("get_pages rc=%zd more than %zu\n", ret, count); + break; + } + + count -= ret; + ret += offset; + cur_npages = DIV_ROUND_UP(ret, PAGE_SIZE); + + if (npages + cur_npages > max_pages) { + pr_err("Out of bvec array capacity (%u vs %u)\n", + npages + cur_npages, max_pages); + break; + } + + for (i = 0; i < cur_npages; i++) { + len = ret > PAGE_SIZE ? PAGE_SIZE : ret; + bv[npages + i].bv_page = *pages++; + bv[npages + i].bv_offset = offset; + bv[npages + i].bv_len = len - offset; + ret -= len; + offset = 0; + } + + npages += cur_npages; + } + + iov_iter_bvec(new, orig->data_source, bv, npages, orig_len - count); + return npages; +} +EXPORT_SYMBOL_GPL(netfs_extract_user_iter); diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 4c76ddfb6a67..a45757dd382d 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -296,6 +296,8 @@ void netfs_get_subrequest(struct netfs_io_subrequest *subreq, void netfs_put_subrequest(struct netfs_io_subrequest *subreq, bool was_async, enum netfs_sreq_ref_trace what); void netfs_stats_show(struct seq_file *); +ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len, + struct iov_iter *new, unsigned int gup_flags); /** * netfs_inode - Get the netfs inode context from the inode From patchwork Mon Jan 16 23:09:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44364 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1448884wrn; Mon, 16 Jan 2023 15:16:47 -0800 (PST) X-Google-Smtp-Source: AMrXdXuJgWixWGetvQkhq/9hsTh/Mb0QQniU5MQLZv4prJUEwNQSc7b3kNeNi9GRvmZfuzJtnkad X-Received: by 2002:a17:907:c24b:b0:85c:f3cd:66f3 with SMTP id tj11-20020a170907c24b00b0085cf3cd66f3mr737687ejc.9.1673911007112; Mon, 16 Jan 2023 15:16:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673911007; cv=none; d=google.com; s=arc-20160816; b=Z9ZzcYpYvgVEFZKezYPjUMagxvV25ed1gppMJuIFdwAikT7o/pt05T7P7+e5RltBTr toa94QmHT83H8dUejqsKa2epJYNdLJ0FDLIiNMFTRa5mxNg+mMblT7nNSbXBQ0bLRomt XbhBmvr2owfSKC/hchjSfGWsMKGFVv/+P3kusNLY7aDGhXtsi7qzp75vY6zPoXSwzl9T SyU9hGtOCxHKhndGZnAxv7HuaVgi5f7reC9VWTES2wIPn/F8TS5d/mjB5G11atWAVngS f0+s2/I3xFFEi+h99t4K//14gtrRZP1nSMCdRI5LReILJlzJ+1Mov+LhqC0Qf8qeQ23A BOKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=5rRwJZJuSkZ5sxTa2OXIsMauckbTTOA2+6xHNDzx1XY=; b=gs06yP4mNhSAkFhZXboiT1WZpBgs2gC1CgZNQfMJAtelWMvGgfKfnxgqBTnKdCRcD+ yoKg0FolMgdAIxOkO1Lgzj25q8g7JAeLMTtCIhXV94ziriqlg763B44VOOQHls4+eZtg 5mgF36k2/hBG4YqWmQ5b+iqMq7/Pwnnpc/EpH12huzSewHxsCsif9WMcEl3Dnwzpj7BG CcN57Z1D2YjcBNvlsYj3LITkJolgqyf78cRG3BtgEaK4lC8t3DZmB9GknHLbx89tlEYL H18ztOuG1Gc911y6fq8piGSDYa25wIR85aML5sA7ExD4izyc2CwNranN1H76KBSVzUTK W/yg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=jSRSFtQw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id fy13-20020a1709069f0d00b00870d2ca67aasi3441485ejc.208.2023.01.16.15.16.23; Mon, 16 Jan 2023 15:16:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=jSRSFtQw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235205AbjAPXNF (ORCPT + 99 others); Mon, 16 Jan 2023 18:13:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48914 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235327AbjAPXMD (ORCPT ); Mon, 16 Jan 2023 18:12:03 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3BD7C2D159 for ; Mon, 16 Jan 2023 15:09:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910590; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5rRwJZJuSkZ5sxTa2OXIsMauckbTTOA2+6xHNDzx1XY=; b=jSRSFtQw86G41HyUQlmYHZp9vFSDlBqIjuGykmqejR+XOi6B56b/MMTTPl9OH4k8CuNbFV xT7S9GQGle156UscSTp6ywOIJ0kDPD7nUnxownVCyvUzxqjKDd9w0AC10spamaPMfeFgm9 FLE9XAqoze6a3g8lNbRhfkPywJ1EglE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-147-xbF-EhUjPymfyGtJUq3XLA-1; Mon, 16 Jan 2023 18:09:45 -0500 X-MC-Unique: xbF-EhUjPymfyGtJUq3XLA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6544980234E; Mon, 16 Jan 2023 23:09:44 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7944E2026D4B; Mon, 16 Jan 2023 23:09:42 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 14/34] netfs: Add a function to extract an iterator into a scatterlist From: David Howells To: Al Viro Cc: Jeff Layton , Steve French , Shyam Prasad N , Rohith Surabattula , linux-cachefs@redhat.com, linux-cifs@vger.kernel.org, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:09:41 +0000 Message-ID: <167391058194.2311931.1725331547727885666.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755222907874445112?= X-GMAIL-MSGID: =?utf-8?q?1755222907874445112?= Provide a function for filling in a scatterlist from the list of pages contained in an iterator. The function is passed FOLL_SOURCE_BUF or FOLL_DEST_BUF to indicate how the extracted pages are to be used. If the iterator is UBUF- or IOBUF-type, the pages have a ref (FOLL_SOURCE_BUF) or a pin (FOLL_DEST_BUF) taken on them. If the iterator is BVEC-, KVEC- or XARRAY-type, no ref is taken on the pages and it is left to the caller to manage their lifetime. It cannot be assumed that a ref can be validly taken, particularly in the case of a KVEC iterator. Changes: ======== ver #6) - Pass in a gup_flags argument to allow FOLL_SOURCE_BUF and FOLL_DEST_BUF and other FOLL_* flags to be passed in. - Don't pass back the cleanup mode - iov_iter_extract_mode() can be used to determine that. ver #3) - Switch to using EXPORT_SYMBOL_GPL to prevent indirect 3rd-party access to get/pin_user_pages_fast()[1]. Signed-off-by: David Howells cc: Jeff Layton cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: linux-cachefs@redhat.com cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/Y3zFzdWnWlEJ8X8/@infradead.org/ [1] Link: https://lore.kernel.org/r/166697255985.61150.16489950598033809487.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732027275.3186319.5186488812166611598.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166869691313.3723671.10714823767342163891.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166920905749.1461876.12079195122363691498.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/166997423514.9475.11145024341505464337.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/167305165398.1521586.12353215176136705725.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/167344730041.2425628.14391053364759792950.stgit@warthog.procyon.org.uk/ # v5 --- fs/netfs/iterator.c | 269 +++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/netfs.h | 4 + mm/vmalloc.c | 1 3 files changed, 274 insertions(+) diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c index f7f26de1a247..1d20ad2123b5 100644 --- a/fs/netfs/iterator.c +++ b/fs/netfs/iterator.c @@ -7,7 +7,9 @@ #include #include +#include #include +#include #include #include "internal.h" @@ -100,3 +102,270 @@ ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len, return npages; } EXPORT_SYMBOL_GPL(netfs_extract_user_iter); + +/* + * Extract as list of up to sg_max pages from UBUF- or IOVEC-class iterators, + * pin or get refs on them appropriate and add them to the scatterlist. + */ +static ssize_t netfs_extract_user_to_sg(struct iov_iter *iter, + ssize_t maxsize, + struct sg_table *sgtable, + unsigned int sg_max, + unsigned int gup_flags) +{ + struct scatterlist *sg = sgtable->sgl + sgtable->nents; + struct page **pages; + unsigned int npages; + ssize_t ret = 0, res; + size_t len, off; + + /* We decant the page list into the tail of the scatterlist */ + pages = (void *)sgtable->sgl + array_size(sg_max, sizeof(struct scatterlist)); + pages -= sg_max; + + do { + res = iov_iter_extract_pages(iter, &pages, maxsize, sg_max, + gup_flags, &off); + if (res < 0) + goto failed; + + len = res; + maxsize -= len; + ret += len; + npages = DIV_ROUND_UP(off + len, PAGE_SIZE); + sg_max -= npages; + + for (; npages < 0; npages--) { + struct page *page = *pages; + size_t seg = min_t(size_t, PAGE_SIZE - off, len); + + *pages++ = NULL; + sg_set_page(sg, page, len, off); + sgtable->nents++; + sg++; + len -= seg; + off = 0; + } + } while (maxsize > 0 && sg_max > 0); + + return ret; + +failed: + while (sgtable->nents > sgtable->orig_nents) + put_page(sg_page(&sgtable->sgl[--sgtable->nents])); + return res; +} + +/* + * Extract up to sg_max pages from a BVEC-type iterator and add them to the + * scatterlist. The pages are not pinned. + */ +static ssize_t netfs_extract_bvec_to_sg(struct iov_iter *iter, + ssize_t maxsize, + struct sg_table *sgtable, + unsigned int sg_max, + unsigned int gup_flags) +{ + const struct bio_vec *bv = iter->bvec; + struct scatterlist *sg = sgtable->sgl + sgtable->nents; + unsigned long start = iter->iov_offset; + unsigned int i; + ssize_t ret = 0; + + for (i = 0; i < iter->nr_segs; i++) { + size_t off, len; + + len = bv[i].bv_len; + if (start >= len) { + start -= len; + continue; + } + + len = min_t(size_t, maxsize, len - start); + off = bv[i].bv_offset + start; + + sg_set_page(sg, bv[i].bv_page, len, off); + sgtable->nents++; + sg++; + sg_max--; + + ret += len; + maxsize -= len; + if (maxsize <= 0 || sg_max == 0) + break; + start = 0; + } + + if (ret > 0) + iov_iter_advance(iter, ret); + return ret; +} + +/* + * Extract up to sg_max pages from a KVEC-type iterator and add them to the + * scatterlist. This can deal with vmalloc'd buffers as well as kmalloc'd or + * static buffers. The pages are not pinned. + */ +static ssize_t netfs_extract_kvec_to_sg(struct iov_iter *iter, + ssize_t maxsize, + struct sg_table *sgtable, + unsigned int sg_max, + unsigned int gup_flags) +{ + const struct kvec *kv = iter->kvec; + struct scatterlist *sg = sgtable->sgl + sgtable->nents; + unsigned long start = iter->iov_offset; + unsigned int i; + ssize_t ret = 0; + + for (i = 0; i < iter->nr_segs; i++) { + struct page *page; + unsigned long kaddr; + size_t off, len, seg; + + len = kv[i].iov_len; + if (start >= len) { + start -= len; + continue; + } + + kaddr = (unsigned long)kv[i].iov_base + start; + off = kaddr & ~PAGE_MASK; + len = min_t(size_t, maxsize, len - start); + kaddr &= PAGE_MASK; + + maxsize -= len; + ret += len; + do { + seg = min_t(size_t, len, PAGE_SIZE - off); + if (is_vmalloc_or_module_addr((void *)kaddr)) + page = vmalloc_to_page((void *)kaddr); + else + page = virt_to_page(kaddr); + + sg_set_page(sg, page, len, off); + sgtable->nents++; + sg++; + sg_max--; + + len -= seg; + kaddr += PAGE_SIZE; + off = 0; + } while (len > 0 && sg_max > 0); + + if (maxsize <= 0 || sg_max == 0) + break; + start = 0; + } + + if (ret > 0) + iov_iter_advance(iter, ret); + return ret; +} + +/* + * Extract up to sg_max folios from an XARRAY-type iterator and add them to + * the scatterlist. The pages are not pinned. + */ +static ssize_t netfs_extract_xarray_to_sg(struct iov_iter *iter, + ssize_t maxsize, + struct sg_table *sgtable, + unsigned int sg_max, + unsigned int gup_flags) +{ + struct scatterlist *sg = sgtable->sgl + sgtable->nents; + struct xarray *xa = iter->xarray; + struct folio *folio; + loff_t start = iter->xarray_start + iter->iov_offset; + pgoff_t index = start / PAGE_SIZE; + ssize_t ret = 0; + size_t offset, len; + XA_STATE(xas, xa, index); + + rcu_read_lock(); + + xas_for_each(&xas, folio, ULONG_MAX) { + if (xas_retry(&xas, folio)) + continue; + if (WARN_ON(xa_is_value(folio))) + break; + if (WARN_ON(folio_test_hugetlb(folio))) + break; + + offset = offset_in_folio(folio, start); + len = min_t(size_t, maxsize, folio_size(folio) - offset); + + sg_set_page(sg, folio_page(folio, 0), len, offset); + sgtable->nents++; + sg++; + sg_max--; + + maxsize -= len; + ret += len; + if (maxsize <= 0 || sg_max == 0) + break; + } + + rcu_read_unlock(); + if (ret > 0) + iov_iter_advance(iter, ret); + return ret; +} + +/** + * netfs_extract_iter_to_sg - Extract pages from an iterator and add ot an sglist + * @iter: The iterator to extract from + * @maxsize: The amount of iterator to copy + * @sgtable: The scatterlist table to fill in + * @sg_max: Maximum number of elements in @sgtable that may be filled + * @gup_flags: Direction indicator and additional flags + * + * Extract the page fragments from the given amount of the source iterator and + * add them to a scatterlist that refers to all of those bits, to a maximum + * addition of @sg_max elements. + * + * The pages referred to by UBUF- and IOVEC-type iterators are extracted and + * pinned; BVEC-, KVEC- and XARRAY-type are extracted but aren't pinned; PIPE- + * and DISCARD-type are not supported. + * + * No end mark is placed on the scatterlist; that's left to the caller. + * + * @gup_flags should indicate FOLL_SOURCE_BUF or FOLL_DEST_BUF plus any + * additional flags needed. + * + * If successul, @sgtable->nents is updated to include the number of elements + * added and the number of bytes added is returned. @sgtable->orig_nents is + * left unaltered. + * + * The iov_iter_extract_mode() function should be used to query how cleanup + * should be performed. + */ +ssize_t netfs_extract_iter_to_sg(struct iov_iter *iter, size_t maxsize, + struct sg_table *sgtable, unsigned int sg_max, + unsigned int gup_flags) +{ + if (maxsize == 0) + return 0; + + switch (iov_iter_type(iter)) { + case ITER_UBUF: + case ITER_IOVEC: + return netfs_extract_user_to_sg(iter, maxsize, sgtable, sg_max, + gup_flags); + case ITER_BVEC: + return netfs_extract_bvec_to_sg(iter, maxsize, sgtable, sg_max, + gup_flags); + case ITER_KVEC: + return netfs_extract_kvec_to_sg(iter, maxsize, sgtable, sg_max, + gup_flags); + case ITER_XARRAY: + return netfs_extract_xarray_to_sg(iter, maxsize, sgtable, sg_max, + gup_flags); + default: + pr_err("netfs_extract_iter_to_sg(%u) unsupported\n", + iov_iter_type(iter)); + WARN_ON_ONCE(1); + return -EIO; + } +} +EXPORT_SYMBOL_GPL(netfs_extract_iter_to_sg); diff --git a/include/linux/netfs.h b/include/linux/netfs.h index a45757dd382d..2493df855f05 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -298,6 +298,10 @@ void netfs_put_subrequest(struct netfs_io_subrequest *subreq, void netfs_stats_show(struct seq_file *); ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len, struct iov_iter *new, unsigned int gup_flags); +struct sg_table; +ssize_t netfs_extract_iter_to_sg(struct iov_iter *iter, size_t len, + struct sg_table *sgtable, unsigned int sg_max, + unsigned int gup_flags); /** * netfs_inode - Get the netfs inode context from the inode diff --git a/mm/vmalloc.c b/mm/vmalloc.c index ca71de7c9d77..61f5bec0f2b6 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -656,6 +656,7 @@ int is_vmalloc_or_module_addr(const void *x) #endif return is_vmalloc_addr(x); } +EXPORT_SYMBOL_GPL(is_vmalloc_or_module_addr); /* * Walk a vmap address to the struct page it maps. Huge vmap mappings will From patchwork Mon Jan 16 23:09:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44366 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1450699wrn; Mon, 16 Jan 2023 15:21:49 -0800 (PST) X-Google-Smtp-Source: AMrXdXubySp5x6EbWR2vxyWWjc1V6ay6RNhZFP6ai/lFr06quoAOAkZ9tUvQenzsM2EVrZKO1g6Q X-Received: by 2002:a17:907:80cd:b0:7c1:3472:5e75 with SMTP id io13-20020a17090780cd00b007c134725e75mr887930ejc.29.1673911309397; Mon, 16 Jan 2023 15:21:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673911309; cv=none; d=google.com; s=arc-20160816; b=DM835tQ4aBknpjRj6KmxH4PnV5p0IGN734Ths8bjKgfYdHSM/NfexB9FWUB8h62oL/ T2fAwr5Z25c+MHt+3C43RWBUUcE1o0sQCutSkviK77cNywvR1TNXgvEIGDk2WMkOCcgz bzkuiK7HTauoUTsuElMYh/C+eXY5FXR5pgabBWRSJXYZzIf7+nekYtafhjQ7XyhapOMV pOG1Fa07kMh9EUKIlqv1GVRF72HWunu3WKd6TZ6TKtKyC0lrVv+1fgDs0WElwUTTdQ1g G0bxiAnARFwy26V+QlH41BAkv3s6BH7p7oIRR1NAzVOYP1SKgl0t9X2uEg386vOU6ur9 OxKg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=kKmnjesqbVcK16NFYaezZHmG2fXFwkY2y+msB04GC+4=; b=FWowu1JJJIR/JEHUBpa8YkSXm+X4xIAsEuwfnfmVlMusdLOTXpwbBkBw25TixzwA9l nKGgUEgc1njNhgBBLDxTt6AacbMlUmIRurtyOZtYz8ygrwcaug9pNUOlK1++3YtL2R21 ugWGB/rp93M8xsCdUPKKF1ewSq8LhsjbxV1VAkSfVSmfPE1V+qCqNbk7601UDtYbTxMP +EEmlqjC0u5hPU9QA4R7mdMd62Ycc5p5khC+K6Q9hJ5bxtTXh3BRoe70Q3eL0frkx4M8 gJHSYDrh1+9C47FmsmB2EXfcv0tQ5jptAnF/iJZEV9WE/8R/kakYl37MAWY8toNtb7+A 9zGQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=X5pJx92R; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ga42-20020a1709070c2a00b007ae2b7df929si33046639ejc.72.2023.01.16.15.21.24; Mon, 16 Jan 2023 15:21:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=X5pJx92R; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232782AbjAPXNM (ORCPT + 99 others); Mon, 16 Jan 2023 18:13:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48818 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235370AbjAPXM2 (ORCPT ); Mon, 16 Jan 2023 18:12:28 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1F3032B61A for ; Mon, 16 Jan 2023 15:09:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910597; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kKmnjesqbVcK16NFYaezZHmG2fXFwkY2y+msB04GC+4=; b=X5pJx92RUSqw7kr5bZAYndiyfuO1A1PwqxrwqJlvSAvM9JfFNDvICYXbrFYlpB/r4IiYs/ BkdRWiUNOmhMD04HZoenxdE2n7Zbgrh5At2nrbEW80VUcCUGiUv2V13/4ImhO3bHNHJFvm sFrYo/4UXb5UFbnIBmSg55rW+2hrKoo= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-523-Dg-7CkFxNXKEqeYPM6L_Fg-1; Mon, 16 Jan 2023 18:09:52 -0500 X-MC-Unique: Dg-7CkFxNXKEqeYPM6L_Fg-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7ADA62A59560; Mon, 16 Jan 2023 23:09:51 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 181692026D4B; Mon, 16 Jan 2023 23:09:50 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 15/34] af_alg: Pin pages rather than ref'ing if appropriate From: David Howells To: Al Viro Cc: Herbert Xu , linux-crypto@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:09:49 +0000 Message-ID: <167391058954.2311931.2012230616335750882.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755223225134159554?= X-GMAIL-MSGID: =?utf-8?q?1755223225134159554?= Convert AF_ALG to use iov_iter_extract_pages() instead of iov_iter_get_pages(). This will pin pages or leave them unaltered rather than getting a ref on them as appropriate to the iterator. The pages need to be pinned for DIO-read rather than having refs taken on them to prevent VM copy-on-write from malfunctioning during a concurrent fork() (the result of the I/O would otherwise end up only visible to the child process and not the parent). Signed-off-by: David Howells cc: Herbert Xu cc: linux-crypto@vger.kernel.org --- crypto/af_alg.c | 9 ++++++--- include/crypto/if_alg.h | 1 + 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/crypto/af_alg.c b/crypto/af_alg.c index 7a68db157fae..c99e09fce71f 100644 --- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -534,15 +534,18 @@ static const struct net_proto_family alg_family = { int af_alg_make_sg(struct af_alg_sgl *sgl, struct iov_iter *iter, int len, unsigned int gup_flags) { + struct page **pages = sgl->pages; size_t off; ssize_t n; int npages, i; - n = iov_iter_get_pages(iter, sgl->pages, len, ALG_MAX_PAGES, &off, - gup_flags); + n = iov_iter_extract_pages(iter, &pages, len, ALG_MAX_PAGES, + gup_flags, &off); if (n < 0) return n; + sgl->cleanup_mode = iov_iter_extract_mode(iter, gup_flags); + npages = DIV_ROUND_UP(off + n, PAGE_SIZE); if (WARN_ON(npages == 0)) return -EINVAL; @@ -576,7 +579,7 @@ void af_alg_free_sg(struct af_alg_sgl *sgl) int i; for (i = 0; i < sgl->npages; i++) - put_page(sgl->pages[i]); + page_put_unpin(sgl->pages[i], sgl->cleanup_mode); } EXPORT_SYMBOL_GPL(af_alg_free_sg); diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h index 12058ab6cad9..95b3b7517d3f 100644 --- a/include/crypto/if_alg.h +++ b/include/crypto/if_alg.h @@ -61,6 +61,7 @@ struct af_alg_sgl { struct scatterlist sg[ALG_MAX_PAGES + 1]; struct page *pages[ALG_MAX_PAGES]; unsigned int npages; + unsigned int cleanup_mode; }; /* TX SGL entry */ From patchwork Mon Jan 16 23:09:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44369 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1451750wrn; Mon, 16 Jan 2023 15:25:13 -0800 (PST) X-Google-Smtp-Source: AMrXdXvp9iHqeXXoEgMjOttLOOfgaRDjYUX9dCct0K+n5YjqQAmyINcKjoAOrGfkKZwsSv0S6Wjs X-Received: by 2002:aa7:dcc5:0:b0:499:bed5:f69b with SMTP id w5-20020aa7dcc5000000b00499bed5f69bmr1036824edu.26.1673911512973; Mon, 16 Jan 2023 15:25:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673911512; cv=none; d=google.com; s=arc-20160816; b=a13SYZPpdWPfIvB/Ygk5jjWjPqeFEPfbqilvlyAm9y7In2XeOPVNz2dmh4YCGyQQGv vdybhZqUT+bszRUfZFZfxx3fgG8H/461L5hTpu9RhMwFgjtAemcI+wBD4QMpVjlRDxli R3ROOgHXgN9zPIJc6pagYu/c5D/ehqm7yMOSds2fJ3K3mS5nufDVvecnFHpeA0gEOnNZ hpcaCL8WxA8J1IgckBZIqYhfYkXCoTiE9Pmi1VCuLYq1htju1GHGeSOlY/WNTEtFTsPg DDNJpYmyrS7bWyWMpys4XeKQ/TPMZsnynyNQ0SidZ3pMwwhyOg6sw82qcm+MLeqngOEt 4vKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=PjIAiip7N5L8hcYwP5P+y/gD9B8I5Bbt/wX0DBUObE8=; b=TwT76hywLDG/dOmEVBFHGGyjuk6xrP1AVOh8QG+bDaVkXQDUKsBrS0XWJxAKzK6JZ7 G3uDKFpGUsOapGmYttPFh+BduCZC9AhVy9ArboiuAMFFf7/pKGB5C/QMGEoH/5Klwce/ /GeGX/eTxjitWZRBkWZ1LyAB8Ws4C2AQ6hOw3H7AvjDVYaBH7GHbrY4zJ0cm3P5CwBEn T+xaNQy9zag/MVbjQAJp8SKoyT0s0SOs45egiQXFCxVhfFdLVqfr5d1wh8WZpPy3BtSB ADjKbU5gylkMzz77bOPX/INzgg0LfSNNQLtsspYQvD7HLDt8/ywCw1qQMMfSgpXABNcv B3Wg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=N59+z+xY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g13-20020a056402320d00b0049d861dce55si9006687eda.30.2023.01.16.15.24.48; Mon, 16 Jan 2023 15:25:12 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=N59+z+xY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235163AbjAPXNs (ORCPT + 99 others); Mon, 16 Jan 2023 18:13:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49014 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235342AbjAPXM7 (ORCPT ); Mon, 16 Jan 2023 18:12:59 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 59E262DE68 for ; Mon, 16 Jan 2023 15:10:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910602; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PjIAiip7N5L8hcYwP5P+y/gD9B8I5Bbt/wX0DBUObE8=; b=N59+z+xY7qP/C/MArMLAu2H7p7qhw7/tyegWONMOh+OJEI3uFWk6Un6YqPqeLp5rAjc41F Tv3IaMppPYFKcZ160Q41neShDjOPp2CfVsulexY2H+M7a/ValD+nHXjr6UdMqDboML31Kx zNw3747Hj/iYMFCRcwWzatuvr+ITM94= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-464-BpfiqjMTNrS7NCWxGctuPg-1; Mon, 16 Jan 2023 18:09:59 -0500 X-MC-Unique: BpfiqjMTNrS7NCWxGctuPg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A3942101AA78; Mon, 16 Jan 2023 23:09:58 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2E19E2166B26; Mon, 16 Jan 2023 23:09:57 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 16/34] af_alg: [RFC] Use netfs_extract_iter_to_sg() to create scatterlists From: David Howells To: Al Viro Cc: Herbert Xu , linux-crypto@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:09:56 +0000 Message-ID: <167391059663.2311931.12037449511418464282.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755223438412225477?= X-GMAIL-MSGID: =?utf-8?q?1755223438412225477?= Use netfs_extract_iter_to_sg() to decant the destination iterator into a scatterlist in af_alg_get_rsgl(). af_alg_make_sg() can then be removed. Note that if this fits, netfs_extract_iter_to_sg() should move to core code. Signed-off-by: David Howells cc: Herbert Xu cc: linux-crypto@vger.kernel.org --- crypto/af_alg.c | 63 +++++++++++++---------------------------------- crypto/algif_hash.c | 21 +++++++++++----- include/crypto/if_alg.h | 7 +---- 3 files changed, 35 insertions(+), 56 deletions(-) diff --git a/crypto/af_alg.c b/crypto/af_alg.c index c99e09fce71f..c5fbe39366ff 100644 --- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -531,55 +532,22 @@ static const struct net_proto_family alg_family = { .owner = THIS_MODULE, }; -int af_alg_make_sg(struct af_alg_sgl *sgl, struct iov_iter *iter, int len, - unsigned int gup_flags) -{ - struct page **pages = sgl->pages; - size_t off; - ssize_t n; - int npages, i; - - n = iov_iter_extract_pages(iter, &pages, len, ALG_MAX_PAGES, - gup_flags, &off); - if (n < 0) - return n; - - sgl->cleanup_mode = iov_iter_extract_mode(iter, gup_flags); - - npages = DIV_ROUND_UP(off + n, PAGE_SIZE); - if (WARN_ON(npages == 0)) - return -EINVAL; - /* Add one extra for linking */ - sg_init_table(sgl->sg, npages + 1); - - for (i = 0, len = n; i < npages; i++) { - int plen = min_t(int, len, PAGE_SIZE - off); - - sg_set_page(sgl->sg + i, sgl->pages[i], plen, off); - - off = 0; - len -= plen; - } - sg_mark_end(sgl->sg + npages - 1); - sgl->npages = npages; - - return n; -} -EXPORT_SYMBOL_GPL(af_alg_make_sg); - static void af_alg_link_sg(struct af_alg_sgl *sgl_prev, struct af_alg_sgl *sgl_new) { - sg_unmark_end(sgl_prev->sg + sgl_prev->npages - 1); - sg_chain(sgl_prev->sg, sgl_prev->npages + 1, sgl_new->sg); + sg_unmark_end(sgl_prev->sgt.sgl + sgl_prev->sgt.nents - 1); + sg_chain(sgl_prev->sgt.sgl, sgl_prev->sgt.nents + 1, sgl_new->sgt.sgl); } void af_alg_free_sg(struct af_alg_sgl *sgl) { int i; - for (i = 0; i < sgl->npages; i++) - page_put_unpin(sgl->pages[i], sgl->cleanup_mode); + if (!(sgl->cleanup_mode & (FOLL_PIN | FOLL_GET))) + return; + + for (i = 0; i < sgl->sgt.nents; i++) + page_put_unpin(sg_page(&sgl->sgt.sgl[i]), sgl->cleanup_mode); } EXPORT_SYMBOL_GPL(af_alg_free_sg); @@ -1293,8 +1261,8 @@ int af_alg_get_rsgl(struct sock *sk, struct msghdr *msg, int flags, while (maxsize > len && msg_data_left(msg)) { struct af_alg_rsgl *rsgl; + ssize_t err; size_t seglen; - int err; /* limit the amount of readable buffers */ if (!af_alg_readable(sk)) @@ -1311,17 +1279,22 @@ int af_alg_get_rsgl(struct sock *sk, struct msghdr *msg, int flags, return -ENOMEM; } - rsgl->sgl.npages = 0; + rsgl->sgl.sgt.sgl = rsgl->sgl.sgl; + rsgl->sgl.sgt.nents = 0; + rsgl->sgl.sgt.orig_nents = 0; list_add_tail(&rsgl->list, &areq->rsgl_list); - /* make one iovec available as scatterlist */ - err = af_alg_make_sg(&rsgl->sgl, &msg->msg_iter, seglen, - FOLL_DEST_BUF); + err = netfs_extract_iter_to_sg(&msg->msg_iter, seglen, + &rsgl->sgl.sgt, ALG_MAX_PAGES, + FOLL_DEST_BUF); if (err < 0) { rsgl->sg_num_bytes = 0; return err; } + rsgl->sgl.cleanup_mode = iov_iter_extract_mode(&msg->msg_iter, + FOLL_DEST_BUF); + /* chain the new scatterlist with previous one */ if (areq->last_rsgl) af_alg_link_sg(&areq->last_rsgl->sgl, &rsgl->sgl); diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c index fe3d2258145f..5aef6818a9ff 100644 --- a/crypto/algif_hash.c +++ b/crypto/algif_hash.c @@ -14,6 +14,7 @@ #include #include #include +#include #include struct hash_ctx { @@ -91,14 +92,22 @@ static int hash_sendmsg(struct socket *sock, struct msghdr *msg, if (len > limit) len = limit; - len = af_alg_make_sg(&ctx->sgl, &msg->msg_iter, len, - FOLL_SOURCE_BUF); + ctx->sgl.sgt.sgl = ctx->sgl.sgl; + ctx->sgl.sgt.nents = 0; + ctx->sgl.sgt.orig_nents = 0; + + len = netfs_extract_iter_to_sg(&msg->msg_iter, len, + &ctx->sgl.sgt, ALG_MAX_PAGES, + FOLL_SOURCE_BUF); if (len < 0) { err = copied ? 0 : len; goto unlock; } - ahash_request_set_crypt(&ctx->req, ctx->sgl.sg, NULL, len); + ctx->sgl.cleanup_mode = iov_iter_extract_mode(&msg->msg_iter, + FOLL_SOURCE_BUF); + + ahash_request_set_crypt(&ctx->req, ctx->sgl.sgt.sgl, NULL, len); err = crypto_wait_req(crypto_ahash_update(&ctx->req), &ctx->wait); @@ -142,8 +151,8 @@ static ssize_t hash_sendpage(struct socket *sock, struct page *page, flags |= MSG_MORE; lock_sock(sk); - sg_init_table(ctx->sgl.sg, 1); - sg_set_page(ctx->sgl.sg, page, size, offset); + sg_init_table(ctx->sgl.sgl, 1); + sg_set_page(ctx->sgl.sgl, page, size, offset); if (!(flags & MSG_MORE)) { err = hash_alloc_result(sk, ctx); @@ -152,7 +161,7 @@ static ssize_t hash_sendpage(struct socket *sock, struct page *page, } else if (!ctx->more) hash_free_result(sk, ctx); - ahash_request_set_crypt(&ctx->req, ctx->sgl.sg, ctx->result, size); + ahash_request_set_crypt(&ctx->req, ctx->sgl.sgl, ctx->result, size); if (!(flags & MSG_MORE)) { if (ctx->more) diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h index 95b3b7517d3f..424a2071705d 100644 --- a/include/crypto/if_alg.h +++ b/include/crypto/if_alg.h @@ -58,9 +58,8 @@ struct af_alg_type { }; struct af_alg_sgl { - struct scatterlist sg[ALG_MAX_PAGES + 1]; - struct page *pages[ALG_MAX_PAGES]; - unsigned int npages; + struct sg_table sgt; + struct scatterlist sgl[ALG_MAX_PAGES + 1]; unsigned int cleanup_mode; }; @@ -166,8 +165,6 @@ int af_alg_release(struct socket *sock); void af_alg_release_parent(struct sock *sk); int af_alg_accept(struct sock *sk, struct socket *newsock, bool kern); -int af_alg_make_sg(struct af_alg_sgl *sgl, struct iov_iter *iter, int len, - unsigned int gup_flags); void af_alg_free_sg(struct af_alg_sgl *sgl); static inline struct alg_sock *alg_sk(struct sock *sk) From patchwork Mon Jan 16 23:10:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44367 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1451125wrn; Mon, 16 Jan 2023 15:23:23 -0800 (PST) X-Google-Smtp-Source: AMrXdXuc0Y8P10LolboXQ9QGIR3sIqBc5tJk35CFbiYYE4fnHbiBdkUD/lGzxmwY5Wp52u2vYHAz X-Received: by 2002:a05:6402:219:b0:499:70a8:f915 with SMTP id t25-20020a056402021900b0049970a8f915mr911777edv.21.1673911402981; Mon, 16 Jan 2023 15:23:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673911402; cv=none; d=google.com; s=arc-20160816; b=UN9eaIEgHdP+FJrFfGTUTTeal7eqNb6Q7j895AyTORANPbx0iGNI9e2dyv9JsfMPnh CRN6ggv+n3ahGBGgh8xmp+tVqCtJH9MJEiP7nVihhmXVYu882JBRHPrK0BHNGX3YGtNY Omjdg8TvKYLhOItSlzzfuJ2Q7VyG19p2/a73QNw6vqFAavo0l6Ebb5aBBr1zGMUE1gbT IweUXYQUdQWPu23oRXXpqvZfroHzhRCsNJdt1dFbdNtkRJXaEqp5rkDc6B+NchAlJy6B +QY1H0FxsQVo0UCH+6tIhBCoTINVYW+a/iGLfadxrhMeNy6CdE0/Rp9PTo91H6A77Psr TbUg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=EU0pOV6WBLH3AThSCCtWqw1z4qNF39IxpGe04ts73SI=; b=nSeZp5/sq8kGFehtGGJUSru4GrKhpSHjxjS2bf2oZiEsEUb/S47svafHWmyb34mGuh rINOiFzcDJHi1zVKOih7qj7Hftmmjkt6C7JIi1QVJ4+ajPw1pvTYjv+sa4Qn8uiFfQ8H tnAa3GsnldUBEnUtbndS9NQ8vFn77RlBCvvCOBi4ZxRzm2QhUgGq831r+FnU004gPd6U WpKBC2JjNVgFsPM6OUUaw5X9F2J3A1K2rVAXBlr7yu8q7//28A8imezeTSKtFGCJEgoe n5bbJ3h0NbGvLQZWSxKfok2wGJFarlJ9wTrJ/EMkcpBxYwCoGj339nqcTtxApwGt3R5I vbSA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=AyOupuBV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id cm25-20020a0564020c9900b00482a5c9b04fsi17161289edb.519.2023.01.16.15.22.59; Mon, 16 Jan 2023 15:23:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=AyOupuBV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235376AbjAPXOR (ORCPT + 99 others); Mon, 16 Jan 2023 18:14:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235413AbjAPXNf (ORCPT ); Mon, 16 Jan 2023 18:13:35 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D2BC12ED71 for ; Mon, 16 Jan 2023 15:10:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910610; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EU0pOV6WBLH3AThSCCtWqw1z4qNF39IxpGe04ts73SI=; b=AyOupuBVKgfXeQSaCppA0SsJgNzSzmei/bkjDWDFMmtDh3d6bBPCf7qZrd66cU2JrapID6 1HE1BE3yGOjGyqchlf0hthv4tyG+KwmvjGu+i8G5C4EDlcR48yJpMHrUWz58B0SEx2cK9N acmo1W72AZFaqEhlJ4fDO6gaQYC/ZUQ= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-185-RByxuKzUOKioXzZbR3mJUw-1; Mon, 16 Jan 2023 18:10:06 -0500 X-MC-Unique: RByxuKzUOKioXzZbR3mJUw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0ACC71C0432B; Mon, 16 Jan 2023 23:10:06 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 57C3C1121319; Mon, 16 Jan 2023 23:10:04 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 17/34] scsi: [RFC] Use netfs_extract_iter_to_sg() From: David Howells To: Al Viro Cc: "James E.J. Bottomley" , "Martin K. Petersen" , Christoph Hellwig , linux-scsi@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:10:03 +0000 Message-ID: <167391060380.2311931.5962669831677025433.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755223323100078356?= X-GMAIL-MSGID: =?utf-8?q?1755223323100078356?= Use netfs_extract_iter_to_sg() to build a scatterlist from an iterator. Note that if this fits, netfs_extract_iter_to_sg() should move to core code. Signed-off-by: David Howells cc: James E.J. Bottomley cc: Martin K. Petersen cc: Christoph Hellwig cc: linux-scsi@vger.kernel.org --- drivers/vhost/scsi.c | 78 +++++++++++++++----------------------------------- 1 file changed, 23 insertions(+), 55 deletions(-) diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 5d10837d19ec..af897cc4036d 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -34,6 +34,7 @@ #include #include #include +#include #include "vhost.h" @@ -75,6 +76,9 @@ struct vhost_scsi_cmd { u32 tvc_prot_sgl_count; /* Saved unpacked SCSI LUN for vhost_scsi_target_queue_cmd() */ u32 tvc_lun; + /* Cleanup modes for scatterlists */ + unsigned int tvc_cleanup_mode; + unsigned int tvc_prot_cleanup_mode; /* Pointer to the SGL formatted memory from virtio-scsi */ struct scatterlist *tvc_sgl; struct scatterlist *tvc_prot_sgl; @@ -339,11 +343,13 @@ static void vhost_scsi_release_cmd_res(struct se_cmd *se_cmd) if (tv_cmd->tvc_sgl_count) { for (i = 0; i < tv_cmd->tvc_sgl_count; i++) - put_page(sg_page(&tv_cmd->tvc_sgl[i])); + page_put_unpin(sg_page(&tv_cmd->tvc_sgl[i]), + tv_cmd->tvc_cleanup_mode); } if (tv_cmd->tvc_prot_sgl_count) { for (i = 0; i < tv_cmd->tvc_prot_sgl_count; i++) - put_page(sg_page(&tv_cmd->tvc_prot_sgl[i])); + page_put_unpin(sg_page(&tv_cmd->tvc_prot_sgl[i]), + tv_cmd->tvc_prot_cleanup_mode); } sbitmap_clear_bit(&svq->scsi_tags, se_cmd->map_tag); @@ -631,41 +637,6 @@ vhost_scsi_get_cmd(struct vhost_virtqueue *vq, struct vhost_scsi_tpg *tpg, return cmd; } -/* - * Map a user memory range into a scatterlist - * - * Returns the number of scatterlist entries used or -errno on error. - */ -static int -vhost_scsi_map_to_sgl(struct vhost_scsi_cmd *cmd, - struct iov_iter *iter, - struct scatterlist *sgl, - bool write) -{ - struct page **pages = cmd->tvc_upages; - struct scatterlist *sg = sgl; - ssize_t bytes; - size_t offset; - unsigned int npages = 0, gup_flags = 0; - - gup_flags |= write ? FOLL_SOURCE_BUF : FOLL_DEST_BUF; - - bytes = iov_iter_get_pages(iter, pages, LONG_MAX, - VHOST_SCSI_PREALLOC_UPAGES, &offset, - gup_flags); - /* No pages were pinned */ - if (bytes <= 0) - return bytes < 0 ? bytes : -EFAULT; - - while (bytes) { - unsigned n = min_t(unsigned, PAGE_SIZE - offset, bytes); - sg_set_page(sg++, pages[npages++], n, offset); - bytes -= n; - offset = 0; - } - return npages; -} - static int vhost_scsi_calc_sgls(struct iov_iter *iter, size_t bytes, int max_sgls) { @@ -689,24 +660,19 @@ vhost_scsi_calc_sgls(struct iov_iter *iter, size_t bytes, int max_sgls) static int vhost_scsi_iov_to_sgl(struct vhost_scsi_cmd *cmd, bool write, struct iov_iter *iter, - struct scatterlist *sg, int sg_count) + struct scatterlist *sg, int sg_count, + unsigned int *cleanup_mode) { - struct scatterlist *p = sg; - int ret; + struct sg_table sgt = { .sgl = sg }; + unsigned int gup_flags = write ? FOLL_SOURCE_BUF : FOLL_DEST_BUF; + ssize_t ret; - while (iov_iter_count(iter)) { - ret = vhost_scsi_map_to_sgl(cmd, iter, sg, write); - if (ret < 0) { - while (p < sg) { - struct page *page = sg_page(p++); - if (page) - put_page(page); - } - return ret; - } - sg += ret; - } - return 0; + ret = netfs_extract_iter_to_sg(iter, LONG_MAX, &sgt, sg_count, gup_flags); + if (ret > 0) + sg_mark_end(sg + sgt.nents - 1); + + *cleanup_mode = iov_iter_extract_mode(iter, gup_flags); + return ret; } static int @@ -730,7 +696,8 @@ vhost_scsi_mapal(struct vhost_scsi_cmd *cmd, ret = vhost_scsi_iov_to_sgl(cmd, write, prot_iter, cmd->tvc_prot_sgl, - cmd->tvc_prot_sgl_count); + cmd->tvc_prot_sgl_count, + &cmd->tvc_prot_cleanup_mode); if (ret < 0) { cmd->tvc_prot_sgl_count = 0; return ret; @@ -747,7 +714,8 @@ vhost_scsi_mapal(struct vhost_scsi_cmd *cmd, cmd->tvc_sgl, cmd->tvc_sgl_count); ret = vhost_scsi_iov_to_sgl(cmd, write, data_iter, - cmd->tvc_sgl, cmd->tvc_sgl_count); + cmd->tvc_sgl, cmd->tvc_sgl_count, + &cmd->tvc_cleanup_mode); if (ret < 0) { cmd->tvc_sgl_count = 0; return ret; From patchwork Mon Jan 16 23:10:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44368 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1451155wrn; Mon, 16 Jan 2023 15:23:27 -0800 (PST) X-Google-Smtp-Source: AMrXdXt2kBa8BYy+fiN+bWWmnbjbX7tuiwaP3Lg+xd/Z+igfKX1xsd/aTXPA3pTPP2HPrVctJVr0 X-Received: by 2002:a05:6402:f05:b0:49c:d9c3:ca74 with SMTP id i5-20020a0564020f0500b0049cd9c3ca74mr1170093eda.13.1673911407390; Mon, 16 Jan 2023 15:23:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673911407; cv=none; d=google.com; s=arc-20160816; b=pQWWQ7eeSJ6Armss7DlUqhRseLnp/3ugjz757HZKHRdFKfMRPoHqABSJHUzCxgqEWO uo8FU8r+MpX6R5Yqt+9Rc/edjkOXmqt86vvhT87LK307CKg4KThX5rS+PZ2w1NWAGuBo 6X0JbBobijyXeL3+WWcHG+LRBNrx5dlTQ393DIkoVlmCjZbaS9prsVDJVr+LWsFRf1Yw c3TjLx0LIK3+xKRdk9eo9z906gfZUclgWkjI9aro+RqxbQINlEWcGBzeC/N0o7eWLLzP M3kvwXyvFjZGPght8lY0hyitXdiSaV6mOiEMooDk3twKISiXTDHvoqZ7ZN4aWYnH27s7 RYZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=jarh3eMoc9hZrrF4c4ACwfJRYikpAl6+h2t1z40q5Pk=; b=ScN8Xs/iazuQHtAbUlcUWVurO9QFLPvTEm9pU8fN4nAlJPNHf4CT5lg1VT1zqCRkub +r8Oss/OLYI/FEXYLZM/tXbluhluOM7mXSTtenqJstkCFHmDqzpZpHfgS/3bxm+Sjk8P 4vynqXPXHcmyKM6oYjD1ipSRTJGt3hTZaOi0L40PkpbSeRmIyre3uQEiABMhHiDHaLrR hJgGFkpSNoOig98YV8RWoOmu3L85UhvXqtIDWTcXxawL9Y+15pM+80YOmHzkJGLUQix5 a/c1iOD1QG3I93+2bzhMmLzI7k9eAXQ6F9qENna4xUlRKcxVd6zBIlJYsQfqxb2tECMe U5qA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=VkqXekKW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h11-20020a0564020e0b00b0049cb1901e8fsi11145327edh.561.2023.01.16.15.23.04; Mon, 16 Jan 2023 15:23:27 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=VkqXekKW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235412AbjAPXOh (ORCPT + 99 others); Mon, 16 Jan 2023 18:14:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49022 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234370AbjAPXNy (ORCPT ); Mon, 16 Jan 2023 18:13:54 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ADFF12F793 for ; Mon, 16 Jan 2023 15:10:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910616; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jarh3eMoc9hZrrF4c4ACwfJRYikpAl6+h2t1z40q5Pk=; b=VkqXekKWGlgvPHdiBnhSr1uO6KEveFBM2oxV/+dJjLq7rcsDU8WgiE/XuuYpnB2U5wi6Tk sf4JQ2BaOXhi+/H6xFaALl9VlwxArqsb++EzLwnRDzly840lvZpi2v+6TEPeSYiMKEdFO3 U/2YIVeg8JDk2FTDC0eUWaQYniav/yQ= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-193-WvNlc5M6Mx-qoev1bLXoGA-1; Mon, 16 Jan 2023 18:10:13 -0500 X-MC-Unique: WvNlc5M6Mx-qoev1bLXoGA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1F0A42A59569; Mon, 16 Jan 2023 23:10:13 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id B36BE53AA; Mon, 16 Jan 2023 23:10:11 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 18/34] dio: Pin pages rather than ref'ing if appropriate From: David Howells To: Al Viro Cc: Jens Axboe , Jan Kara , Christoph Hellwig , Matthew Wilcox , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:10:11 +0000 Message-ID: <167391061117.2311931.16807283804788007499.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755223327779958307?= X-GMAIL-MSGID: =?utf-8?q?1755223327779958307?= Convert the generic direct-I/O code to use iov_iter_extract_pages() instead of iov_iter_get_pages(). This will pin pages or leave them unaltered rather than getting a ref on them as appropriate to the iterator. The pages need to be pinned for DIO-read rather than having refs taken on them to prevent VM copy-on-write from malfunctioning during a concurrent fork() (the result of the I/O would otherwise end up only visible to the child process and not the parent). Signed-off-by: David Howells cc: Al Viro cc: Jens Axboe cc: Jan Kara cc: Christoph Hellwig cc: Matthew Wilcox cc: Logan Gunthorpe cc: linux-fsdevel@vger.kernel.org cc: linux-block@vger.kernel.org --- fs/direct-io.c | 57 ++++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 37 insertions(+), 20 deletions(-) diff --git a/fs/direct-io.c b/fs/direct-io.c index b1e26a706e31..b4d2c9f85a5b 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -142,9 +142,11 @@ struct dio { /* * pages[] (and any fields placed after it) are not zeroed out at - * allocation time. Don't add new fields after pages[] unless you - * wish that they not be zeroed. + * allocation time. Don't add new fields after pages[] unless you wish + * that they not be zeroed. Pages may have a ref taken, a pin emplaced + * or no retention measures. */ + unsigned int cleanup_mode; /* How pages should be cleaned up (0/FOLL_GET/PIN) */ union { struct page *pages[DIO_PAGES]; /* page buffer */ struct work_struct complete_work;/* deferred AIO completion */ @@ -167,12 +169,13 @@ static inline unsigned dio_pages_present(struct dio_submit *sdio) static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio) { const enum req_op dio_op = dio->opf & REQ_OP_MASK; + unsigned int gup_flags = + op_is_write(dio_op) ? FOLL_SOURCE_BUF : FOLL_DEST_BUF; + struct page **pages = dio->pages; ssize_t ret; - ret = iov_iter_get_pages(sdio->iter, dio->pages, LONG_MAX, DIO_PAGES, - &sdio->from, - op_is_write(dio_op) ? - FOLL_SOURCE_BUF : FOLL_DEST_BUF); + ret = iov_iter_extract_pages(sdio->iter, &pages, LONG_MAX, DIO_PAGES, + gup_flags, &sdio->from); if (ret < 0 && sdio->blocks_available && dio_op == REQ_OP_WRITE) { struct page *page = ZERO_PAGE(0); @@ -183,7 +186,7 @@ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio) */ if (dio->page_errors == 0) dio->page_errors = ret; - get_page(page); + dio->cleanup_mode = 0; dio->pages[0] = page; sdio->head = 0; sdio->tail = 1; @@ -197,6 +200,8 @@ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio) sdio->head = 0; sdio->tail = (ret + PAGE_SIZE - 1) / PAGE_SIZE; sdio->to = ((ret - 1) & (PAGE_SIZE - 1)) + 1; + dio->cleanup_mode = + iov_iter_extract_mode(sdio->iter, gup_flags); return 0; } return ret; @@ -400,6 +405,10 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio, * we request a valid number of vectors. */ bio = bio_alloc(bdev, nr_vecs, dio->opf, GFP_KERNEL); + if (!(dio->cleanup_mode & FOLL_GET)) + bio_clear_flag(bio, BIO_PAGE_REFFED); + if (dio->cleanup_mode & FOLL_PIN) + bio_set_flag(bio, BIO_PAGE_PINNED); bio->bi_iter.bi_sector = first_sector; if (dio->is_async) bio->bi_end_io = dio_bio_end_aio; @@ -443,13 +452,18 @@ static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio) sdio->logical_offset_in_bio = 0; } +static void dio_cleanup_page(struct dio *dio, struct page *page) +{ + page_put_unpin(page, dio->cleanup_mode); +} + /* * Release any resources in case of a failure */ static inline void dio_cleanup(struct dio *dio, struct dio_submit *sdio) { while (sdio->head < sdio->tail) - put_page(dio->pages[sdio->head++]); + dio_cleanup_page(dio, dio->pages[sdio->head++]); } /* @@ -704,7 +718,7 @@ static inline int dio_new_bio(struct dio *dio, struct dio_submit *sdio, * * Return zero on success. Non-zero means the caller needs to start a new BIO. */ -static inline int dio_bio_add_page(struct dio_submit *sdio) +static inline int dio_bio_add_page(struct dio *dio, struct dio_submit *sdio) { int ret; @@ -771,11 +785,11 @@ static inline int dio_send_cur_page(struct dio *dio, struct dio_submit *sdio, goto out; } - if (dio_bio_add_page(sdio) != 0) { + if (dio_bio_add_page(dio, sdio) != 0) { dio_bio_submit(dio, sdio); ret = dio_new_bio(dio, sdio, sdio->cur_page_block, map_bh); if (ret == 0) { - ret = dio_bio_add_page(sdio); + ret = dio_bio_add_page(dio, sdio); BUG_ON(ret != 0); } } @@ -832,13 +846,16 @@ submit_page_section(struct dio *dio, struct dio_submit *sdio, struct page *page, */ if (sdio->cur_page) { ret = dio_send_cur_page(dio, sdio, map_bh); - put_page(sdio->cur_page); + dio_cleanup_page(dio, sdio->cur_page); sdio->cur_page = NULL; if (ret) return ret; } - get_page(page); /* It is in dio */ + ret = try_grab_page(page, dio->cleanup_mode); /* It is in dio */ + if (ret < 0) + return ret; + sdio->cur_page = page; sdio->cur_page_offset = offset; sdio->cur_page_len = len; @@ -853,7 +870,7 @@ submit_page_section(struct dio *dio, struct dio_submit *sdio, struct page *page, ret = dio_send_cur_page(dio, sdio, map_bh); if (sdio->bio) dio_bio_submit(dio, sdio); - put_page(sdio->cur_page); + dio_cleanup_page(dio, sdio->cur_page); sdio->cur_page = NULL; } return ret; @@ -954,7 +971,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, ret = get_more_blocks(dio, sdio, map_bh); if (ret) { - put_page(page); + dio_cleanup_page(dio, page); goto out; } if (!buffer_mapped(map_bh)) @@ -999,7 +1016,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, /* AKPM: eargh, -ENOTBLK is a hack */ if (dio_op == REQ_OP_WRITE) { - put_page(page); + dio_cleanup_page(dio, page); return -ENOTBLK; } @@ -1012,7 +1029,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, if (sdio->block_in_file >= i_size_aligned >> blkbits) { /* We hit eof */ - put_page(page); + dio_cleanup_page(dio, page); goto out; } zero_user(page, from, 1 << blkbits); @@ -1052,7 +1069,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, sdio->next_block_for_io, map_bh); if (ret) { - put_page(page); + dio_cleanup_page(dio, page); goto out; } sdio->next_block_for_io += this_chunk_blocks; @@ -1068,7 +1085,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, } /* Drop the ref which was taken in get_user_pages() */ - put_page(page); + dio_cleanup_page(dio, page); } out: return ret; @@ -1288,7 +1305,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, ret2 = dio_send_cur_page(dio, &sdio, &map_bh); if (retval == 0) retval = ret2; - put_page(sdio.cur_page); + dio_cleanup_page(dio, sdio.cur_page); sdio.cur_page = NULL; } if (sdio.bio) From patchwork Mon Jan 16 23:10:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44371 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1452097wrn; Mon, 16 Jan 2023 15:26:10 -0800 (PST) X-Google-Smtp-Source: AMrXdXv59dMcAToc+Cya6XbY4sKciOTuXvABsnEzrZlhJlcxYztpkf/6djMjgcX+/5HAMgb+JDuX X-Received: by 2002:a05:6a20:939f:b0:b2:3b40:32b1 with SMTP id x31-20020a056a20939f00b000b23b4032b1mr789964pzh.57.1673911570640; Mon, 16 Jan 2023 15:26:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673911570; cv=none; d=google.com; s=arc-20160816; b=p6nbnuyDeTk36FejKdaKQa6OcyAxJ3NBcfF1wJysXlLhWNq1hed95IAmsrmdDRjWdy dO8ruV7Zn4tC4iGxagi+0q+IOpD092RvE3hOhjPfiibVgBpofeBF9Bf7LhwBiyunGZyN Gj1CUL6W6wwIxSmRdbYhKS/mVlyWgWGeCSbIumoi0v3FIh3gp4DHeWpi5G3HNj+wHtSg Cz8TKo7YA19wPUQG98VJJE6eH+IsQ1mJknZLBco4hSY0bzPa6zKGEhOJRek/uwHIHnRz Rz9w3dI+xNP51R392tNdzdWEAJdoh9uHsMxl1mBNIFYfbRMxzTTdk+pe6ohiolvX6Ksg cYag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=hzJ1pMyiJ+RsXEy2r6s2nQM3QoHbxUG3+tLaUVatkNA=; b=fPVYhG22QPAXde+yY4ugzHdKkFnYLBxJfjsoijXl/NTygilkTGOqaNTQxg5I2x2v2h UDg8MgLjdNPxqgBf7dpMFmMxiWmPOHsLGAyS0xy+ScdQrMmr6XjMkaYnI/u1tGsHa+fi 1eejZc1oisWZL4hg9UCjFjpntiX9f4OJzNrTzj+IOfRWSd11pLZuIvjVqgRZFhtMz/mg iO2mlHw+zqPFeICBg2WGz85mUWc8ZxTlmp8xXMv6hADwkf2l/8rq1IJUMjlt13qet10k OZcG9GrAoyLm6zDkB9SZYVY5CGNKycpOnTmHM7W7Ii8o2t7jf/66DyCB9rmVN6CtNXj7 BPJQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ASRyUDAJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bk13-20020a056a02028d00b00477bf2ebf14si27634079pgb.266.2023.01.16.15.25.58; Mon, 16 Jan 2023 15:26:10 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ASRyUDAJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235340AbjAPXPE (ORCPT + 99 others); Mon, 16 Jan 2023 18:15:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54686 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235364AbjAPXOH (ORCPT ); Mon, 16 Jan 2023 18:14:07 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BBDD92FCC4 for ; Mon, 16 Jan 2023 15:10:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910625; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hzJ1pMyiJ+RsXEy2r6s2nQM3QoHbxUG3+tLaUVatkNA=; b=ASRyUDAJRq87l/k4Ql5X/PSrHccI3pOyleDUJPtuEfELhs8WxQMQAQTHjaXLjj48gAviA8 uMnw9v4n24fKkOEj5oZeAw2BYNYyaEHsybdc/+5vYRLed9+Zz68MnKTzxOrFFooRTFJuNh +qevApJlv1P1P7ELK5p/aCWB70Bsv30= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-578-oNGc_qOmMcuLavBHLMbwog-1; Mon, 16 Jan 2023 18:10:20 -0500 X-MC-Unique: oNGc_qOmMcuLavBHLMbwog-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4A6583806628; Mon, 16 Jan 2023 23:10:20 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id C70FC40C2064; Mon, 16 Jan 2023 23:10:18 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 19/34] fuse: Pin pages rather than ref'ing if appropriate From: David Howells To: Al Viro Cc: Miklos Szeredi , Christoph Hellwig , linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:10:18 +0000 Message-ID: <167391061826.2311931.4301280201217181104.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755223498992434009?= X-GMAIL-MSGID: =?utf-8?q?1755223498992434009?= Convert the fuse code to use iov_iter_extract_pages() instead of iov_iter_get_pages(). This will pin pages or leave them unaltered rather than getting a ref on them as appropriate to the iterator. The pages need to be pinned for DIO-read rather than having refs taken on them to prevent VM copy-on-write from malfunctioning during a concurrent fork() (the result of the I/O would otherwise end up only visible to the child process and not the parent). Signed-off-by: David Howells cc: Miklos Szeredi cc: Al Viro cc: Christoph Hellwig cc: linux-fsdevel@vger.kernel.org --- fs/fuse/dev.c | 25 +++++++++++++++++++------ fs/fuse/file.c | 26 ++++++++++++++++++-------- fs/fuse/fuse_i.h | 1 + 3 files changed, 38 insertions(+), 14 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index e3d8443e24a6..107497e68726 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -641,6 +641,7 @@ static int unlock_request(struct fuse_req *req) struct fuse_copy_state { int write; + unsigned int cleanup_mode; /* Page cleanup mode (0/FOLL_GET/PIN) */ struct fuse_req *req; struct iov_iter *iter; struct pipe_buffer *pipebufs; @@ -661,6 +662,11 @@ static void fuse_copy_init(struct fuse_copy_state *cs, int write, cs->iter = iter; } +static void fuse_release_copy_page(struct fuse_copy_state *cs, struct page *page) +{ + page_put_unpin(page, cs->cleanup_mode); +} + /* Unmap and put previous page of userspace buffer */ static void fuse_copy_finish(struct fuse_copy_state *cs) { @@ -675,7 +681,7 @@ static void fuse_copy_finish(struct fuse_copy_state *cs) flush_dcache_page(cs->pg); set_page_dirty_lock(cs->pg); } - put_page(cs->pg); + fuse_release_copy_page(cs, cs->pg); } cs->pg = NULL; } @@ -704,6 +710,7 @@ static int fuse_copy_fill(struct fuse_copy_state *cs) BUG_ON(!cs->nr_segs); cs->currbuf = buf; + cs->cleanup_mode = FOLL_GET; cs->pg = buf->page; cs->offset = buf->offset; cs->len = buf->len; @@ -722,6 +729,7 @@ static int fuse_copy_fill(struct fuse_copy_state *cs) buf->len = 0; cs->currbuf = buf; + cs->cleanup_mode = FOLL_GET; cs->pg = page; cs->offset = 0; cs->len = PAGE_SIZE; @@ -729,15 +737,18 @@ static int fuse_copy_fill(struct fuse_copy_state *cs) cs->nr_segs++; } } else { + unsigned int gup_flags = cs->write ? FOLL_SOURCE_BUF : FOLL_DEST_BUF; + struct page **pages = &cs->pg; size_t off; - err = iov_iter_get_pages(cs->iter, &page, PAGE_SIZE, 1, &off, - cs->write ? FOLL_SOURCE_BUF : FOLL_DEST_BUF); + + err = iov_iter_extract_pages(cs->iter, &pages, PAGE_SIZE, 1, + gup_flags, &off); if (err < 0) return err; BUG_ON(!err); cs->len = err; cs->offset = off; - cs->pg = page; + cs->cleanup_mode = iov_iter_extract_mode(cs->iter, gup_flags); } return lock_request(cs->req); @@ -899,10 +910,12 @@ static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page, if (cs->nr_segs >= cs->pipe->max_usage) return -EIO; - get_page(page); + err = try_grab_page(page, cs->cleanup_mode); + if (err < 0) + return err; err = unlock_request(cs->req); if (err) { - put_page(page); + fuse_release_copy_page(cs, page); return err; } diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 68c196437306..c317300e757a 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -624,6 +624,11 @@ void fuse_read_args_fill(struct fuse_io_args *ia, struct file *file, loff_t pos, args->out_args[0].size = count; } +static void fuse_release_page(struct fuse_args_pages *ap, struct page *page) +{ + page_put_unpin(page, ap->cleanup_mode); +} + static void fuse_release_user_pages(struct fuse_args_pages *ap, bool should_dirty) { @@ -632,7 +637,7 @@ static void fuse_release_user_pages(struct fuse_args_pages *ap, for (i = 0; i < ap->num_pages; i++) { if (should_dirty) set_page_dirty_lock(ap->pages[i]); - put_page(ap->pages[i]); + fuse_release_page(ap, ap->pages[i]); } } @@ -920,7 +925,7 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args, else SetPageError(page); unlock_page(page); - put_page(page); + fuse_release_page(ap, page); } if (ia->ff) fuse_file_put(ia->ff, false, false); @@ -1153,7 +1158,7 @@ static ssize_t fuse_send_write_pages(struct fuse_io_args *ia, } if (ia->write.page_locked && (i == ap->num_pages - 1)) unlock_page(page); - put_page(page); + fuse_release_page(ap, page); } return err; @@ -1172,6 +1177,7 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia, ap->args.in_pages = true; ap->descs[0].offset = offset; + ap->cleanup_mode = FOLL_GET; do { size_t tmp; @@ -1200,7 +1206,7 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia, if (!tmp) { unlock_page(page); - put_page(page); + fuse_release_page(ap, page); goto again; } @@ -1393,9 +1399,12 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii, size_t *nbytesp, int write, unsigned int max_pages) { + unsigned int gup_flags = write ? FOLL_SOURCE_BUF : FOLL_DEST_BUF; size_t nbytes = 0; /* # bytes already packed in req */ ssize_t ret = 0; + ap->cleanup_mode = iov_iter_extract_mode(ii, gup_flags); + /* Special case for kernel I/O: can copy directly into the buffer */ if (iov_iter_is_kvec(ii)) { unsigned long user_addr = fuse_get_user_addr(ii); @@ -1412,12 +1421,13 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii, } while (nbytes < *nbytesp && ap->num_pages < max_pages) { + struct page **pages = &ap->pages[ap->num_pages]; unsigned npages; size_t start; - ret = iov_iter_get_pages(ii, &ap->pages[ap->num_pages], - *nbytesp - nbytes, - max_pages - ap->num_pages, - &start, write ? FOLL_SOURCE_BUF : FOLL_DEST_BUF); + ret = iov_iter_extract_pages(ii, &pages, + *nbytesp - nbytes, + max_pages - ap->num_pages, + gup_flags, &start); if (ret < 0) break; diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index c673faefdcb9..7b6be1dd7593 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -271,6 +271,7 @@ struct fuse_args_pages { struct page **pages; struct fuse_page_desc *descs; unsigned int num_pages; + unsigned int cleanup_mode; }; #define FUSE_ARGS(args) struct fuse_args args = {} From patchwork Mon Jan 16 23:10:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44370 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1452065wrn; Mon, 16 Jan 2023 15:26:05 -0800 (PST) X-Google-Smtp-Source: AMrXdXvLUUo+1kBN3qYW2c8QTWUtbbg99+ZtM+cEldf6qU70seWjVqvToBnuF+TmVp8Tax2BXO3E X-Received: by 2002:a17:907:104e:b0:849:e96f:521b with SMTP id oy14-20020a170907104e00b00849e96f521bmr14688596ejb.32.1673911565590; Mon, 16 Jan 2023 15:26:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673911565; cv=none; d=google.com; s=arc-20160816; b=ikO8SBKCyvwCF3B7LrA68Vu1dWDvqEmTh4O/KFlFjtoGkgPgAiCKZXMQZzB1eMi6gc YpmIEyQwALL5/U94Bj2JNJf3yK2NpIV3jaDYzcio07ei0c7gbyP1lqgLyUvMqwH4JZVB Wm017s8Wq9VwwSVs/4HlUB0EJmZ8pXoAZdX5eEeI+Sc5Lz4fowkcHwcXmtMQAwj3D653 yDs7mF7AivfLpkfS1YMe6ihRCAYJRNcMHJamDJR2QKli8W8FkRXRS4k7I+Xewmalofd1 mqSXa1tXu9kmt40lllkebWEfUH0/Bbi/B4KkufGT+aa7e+ljJFqTNE8htsUVEfFbCeOg vdxw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=QRj9zo8/4XGZMXr1F9brjuZIjMqL4+pPYiSkNjAxJaI=; b=aR4jxM4f+lcnkedd176yzo0d4H+jDoWSYOQ9Uk2075LPN1mbC9TFcFGQmLexR+5DvZ JlGhj+Ur5wAN/b0V8F8kmG8kNvUcR5EZQQqPSd5a1ks8HXgolRhJWbEMPgES9/L+zKhD 4NrDH/k6CtLCp9PMImr1e6LcIsHMDpY8SgdieepfP9ZIoh5fJKaOae5WfLVfQhQLGFEF 4fEBEMA1FBk4Iz7VdQ8wwriZH3VQ5vwuw2rNyj1OFqTpILFBzG6wagBgZHnyMXmQ9RQl 73W+n9H5FCY9rmU56vI5kzwQ1limSXuxX2QcnJqXRQPYBCWqP6lOj4wotnPzjZ8mowia wlaw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=BNZJFHS9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id vu5-20020a170907a64500b0086fcc2f95c1si5618192ejc.854.2023.01.16.15.25.42; Mon, 16 Jan 2023 15:26:05 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=BNZJFHS9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235152AbjAPXPb (ORCPT + 99 others); Mon, 16 Jan 2023 18:15:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53152 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235300AbjAPXOp (ORCPT ); Mon, 16 Jan 2023 18:14:45 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 80BDA2FCD9 for ; Mon, 16 Jan 2023 15:10:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910633; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QRj9zo8/4XGZMXr1F9brjuZIjMqL4+pPYiSkNjAxJaI=; b=BNZJFHS95LNIstxQTbNzgAn/AVuyY0qbXxbIPPMELFD6JXQH1ZuCAy3ueuAyr+E64RuGpm B3lez+r1fmYhuKrXuJv6P8GPLXcQLDQKMR//PerQr+HeSr90sQLKDzZHTkBzjI3Z5i5biA Qsda3TktSnPOARFRxLMMfh2jjmOstnk= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-612-R7KYRgH-NsqGjrjFzNEidA-1; Mon, 16 Jan 2023 18:10:27 -0500 X-MC-Unique: R7KYRgH-NsqGjrjFzNEidA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4611F2A59556; Mon, 16 Jan 2023 23:10:27 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id F21AD51FF; Mon, 16 Jan 2023 23:10:25 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 20/34] vfs: Make splice use iov_iter_extract_pages() From: David Howells To: Al Viro Cc: Christoph Hellwig , Matthew Wilcox , linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:10:25 +0000 Message-ID: <167391062544.2311931.15195962488932892568.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755223493914388209?= X-GMAIL-MSGID: =?utf-8?q?1755223493914388209?= Make splice's iter_to_pipe() use iov_iter_extract_pages(). Splice requests will rejected if the request if the cleanup mode is going to be anything other than put_pages() since we're going to be attaching pages from the iterator to a pipe and then returning to the caller, leaving the spliced pages to their fates at some unknown time in the future. Note this will cause some requests to fail that could work before - such as splicing from an XARRAY-type iterator - if there's any way to do it as extraction doesn't take refs or pins on non-user-backed iterators. Signed-off-by: David Howells cc: Al Viro cc: Christoph Hellwig cc: Matthew Wilcox cc: linux-fsdevel@vger.kernel.org --- fs/splice.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index 19c5b5adc548..c3433266ba1b 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1159,14 +1159,18 @@ static int iter_to_pipe(struct iov_iter *from, size_t total = 0; int ret = 0; + /* For the moment, all pages attached to a pipe must have refs, not pins. */ + if (WARN_ON(iov_iter_extract_mode(from, FOLL_SOURCE_BUF) != FOLL_GET)) + return -EIO; + while (iov_iter_count(from)) { - struct page *pages[16]; + struct page *pages[16], **ppages = pages; ssize_t left; size_t start; int i, n; - left = iov_iter_get_pages(from, pages, ~0UL, 16, &start, - FOLL_SOURCE_BUF); + left = iov_iter_extract_pages(from, &ppages, ~0UL, 16, + FOLL_SOURCE_BUF, &start); if (left <= 0) { ret = left; break; From patchwork Mon Jan 16 23:10:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44372 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1453064wrn; Mon, 16 Jan 2023 15:29:25 -0800 (PST) X-Google-Smtp-Source: AMrXdXtOyqpxuAxXrt8jj2zQHutAEAP9ZveCuQJ0T+XPqI0O84bLYSfAh0UhgWV9TD2SXmgkDkF9 X-Received: by 2002:a17:902:f10c:b0:189:bda4:4a39 with SMTP id e12-20020a170902f10c00b00189bda44a39mr1249186plb.49.1673911765088; Mon, 16 Jan 2023 15:29:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673911765; cv=none; d=google.com; s=arc-20160816; b=qkdn34f8VsL+ez7gcevLPof6R6FWNPNyz0Rao5xtMWRB+DWLPXg+8X0yrv2Mgi8IJ0 /rTYhfFgB7HWrQqk/qd3mwdgGT4zDVvh3cTb7+tQTXD6Ynj9wFMF7qxoUrA1aqIV2Qxr L6MVLoYJYf9QKIXW6bu9o0WfrJiUacLvcyrE1fglkusVOAZhGIBqoKdOPw5lOP6nXoM5 vKq8B/90m3uoUD/aZNgfd4BUhcnrv4Eb2hh5gviAoLv9C2LaR0mzqAN7Mv/szMlagf4v kfUGHbjwqKgQASuNBEX5B5lfLkUE51BYqauFuFyuzEzKcNKY4DfB/YUzA7p6k2v7XZAX DZJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=Ca3YM5+ZDMsvf+3X9heZWXQBMUIpoxlz5ebJhHm9RTk=; b=lUWOG5Wu+U2TI7Gtxs6SOIyJz95oOZ9G22xYZ/xhMLsksMXuva5pNifelic7SRmrCL hB1AFA1M0vuTgSAN8tqmx6kSy2ykB+FFqlsjaEe2Uzqm/rqO/AZjY6HIPIx2CdxvW3x1 sbWyWOtpdkkrfd5krsqc0Q40QH+QB+6uEDlppTZn7h8vq+e4nUeikYPqldkc6wUVj/h9 zqi92NFdtGCDCNPmj+xe8Rucl1PiVEv2j+yBpPJkpLiJM85dps/dvsF1RnR0G3YN1o5M rgnJQ6RinYbabqWTDK8MCNKIZogpEflAFQgw4iidC28BztlGvIGoQyqYzgtj7FqVabH2 f44Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DoaMkf8T; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e1-20020a17090301c100b00192c2683e4asi32071040plh.3.2023.01.16.15.29.12; Mon, 16 Jan 2023 15:29:25 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DoaMkf8T; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235389AbjAPXPk (ORCPT + 99 others); Mon, 16 Jan 2023 18:15:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55932 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234786AbjAPXO4 (ORCPT ); Mon, 16 Jan 2023 18:14:56 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D8E2B3028C for ; Mon, 16 Jan 2023 15:10:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910640; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Ca3YM5+ZDMsvf+3X9heZWXQBMUIpoxlz5ebJhHm9RTk=; b=DoaMkf8TPejz9BWZD02+pGjpzUl2+GAk89PFFSgR32ND+fflDUMEbGrY48KL+ZFhXOeuHQ TgEBencnE5ntqOsww0MpTDMN69Amseb1QW535zFO9v+s7qJ5M45lKjq5eZpXFRJC5dXuVT RK4SOKwmL/hFt7g1jyeU1nOuQDS6WNE= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-474-t5zTBbIRNZmpiJVxDWfEjg-1; Mon, 16 Jan 2023 18:10:35 -0500 X-MC-Unique: t5zTBbIRNZmpiJVxDWfEjg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B9D3F3C0F42B; Mon, 16 Jan 2023 23:10:34 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id EE0491121315; Mon, 16 Jan 2023 23:10:32 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 21/34] 9p: Pin pages rather than ref'ing if appropriate From: David Howells To: Al Viro Cc: Dominique Martinet , Eric Van Hensbergen , Latchesar Ionkov , Christian Schoenebeck , v9fs-developer@lists.sourceforge.net, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:10:32 +0000 Message-ID: <167391063242.2311931.3275290816918213423.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755223703074318191?= X-GMAIL-MSGID: =?utf-8?q?1755223703074318191?= Convert the 9p filesystem to use iov_iter_extract_pages() instead of iov_iter_get_pages(). This will pin pages or leave them unaltered rather than getting a ref on them as appropriate to the iterator. The pages need to be pinned for DIO-read rather than having refs taken on them to prevent VM copy-on-write from malfunctioning during a concurrent fork() (the result of the I/O would otherwise end up only visible to the child process and not the parent). Signed-off-by: David Howells cc: Dominique Martinet cc: Eric Van Hensbergen cc: Latchesar Ionkov cc: Christian Schoenebeck cc: v9fs-developer@lists.sourceforge.net --- net/9p/trans_common.c | 6 ++- net/9p/trans_common.h | 3 +- net/9p/trans_virtio.c | 89 ++++++++++++++----------------------------------- 3 files changed, 31 insertions(+), 67 deletions(-) diff --git a/net/9p/trans_common.c b/net/9p/trans_common.c index c827f694551c..31d133412677 100644 --- a/net/9p/trans_common.c +++ b/net/9p/trans_common.c @@ -12,13 +12,15 @@ * p9_release_pages - Release pages after the transaction. * @pages: array of pages to be put * @nr_pages: size of array + * @cleanup_mode: How to clean up the pages. */ -void p9_release_pages(struct page **pages, int nr_pages) +void p9_release_pages(struct page **pages, int nr_pages, + unsigned int cleanup_mode) { int i; for (i = 0; i < nr_pages; i++) if (pages[i]) - put_page(pages[i]); + page_put_unpin(pages[i], cleanup_mode); } EXPORT_SYMBOL(p9_release_pages); diff --git a/net/9p/trans_common.h b/net/9p/trans_common.h index 32134db6abf3..9b20eb4f2359 100644 --- a/net/9p/trans_common.h +++ b/net/9p/trans_common.h @@ -4,4 +4,5 @@ * Author Venkateswararao Jujjuri */ -void p9_release_pages(struct page **pages, int nr_pages); +void p9_release_pages(struct page **pages, int nr_pages, + unsigned int cleanup_mode); diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c index eb28b54fe5f6..561f7cbd79da 100644 --- a/net/9p/trans_virtio.c +++ b/net/9p/trans_virtio.c @@ -310,73 +310,34 @@ static int p9_get_mapped_pages(struct virtio_chan *chan, struct iov_iter *data, int count, size_t *offs, - int *need_drop, + int *cleanup_mode, unsigned int gup_flags) { int nr_pages; int err; + int n; if (!iov_iter_count(data)) return 0; - if (!iov_iter_is_kvec(data)) { - int n; - /* - * We allow only p9_max_pages pinned. We wait for the - * Other zc request to finish here - */ - if (atomic_read(&vp_pinned) >= chan->p9_max_pages) { - err = wait_event_killable(vp_wq, - (atomic_read(&vp_pinned) < chan->p9_max_pages)); - if (err == -ERESTARTSYS) - return err; - } - n = iov_iter_get_pages_alloc(data, pages, count, offs, - gup_flags); - if (n < 0) - return n; - *need_drop = 1; - nr_pages = DIV_ROUND_UP(n + *offs, PAGE_SIZE); - atomic_add(nr_pages, &vp_pinned); - return n; - } else { - /* kernel buffer, no need to pin pages */ - int index; - size_t len; - void *p; - - /* we'd already checked that it's non-empty */ - while (1) { - len = iov_iter_single_seg_count(data); - if (likely(len)) { - p = data->kvec->iov_base + data->iov_offset; - break; - } - iov_iter_advance(data, 0); - } - if (len > count) - len = count; - - nr_pages = DIV_ROUND_UP((unsigned long)p + len, PAGE_SIZE) - - (unsigned long)p / PAGE_SIZE; - - *pages = kmalloc_array(nr_pages, sizeof(struct page *), - GFP_NOFS); - if (!*pages) - return -ENOMEM; - - *need_drop = 0; - p -= (*offs = offset_in_page(p)); - for (index = 0; index < nr_pages; index++) { - if (is_vmalloc_addr(p)) - (*pages)[index] = vmalloc_to_page(p); - else - (*pages)[index] = kmap_to_page(p); - p += PAGE_SIZE; - } - iov_iter_advance(data, len); - return len; + /* + * We allow only p9_max_pages pinned. We wait for the + * Other zc request to finish here + */ + if (atomic_read(&vp_pinned) >= chan->p9_max_pages) { + err = wait_event_killable(vp_wq, + (atomic_read(&vp_pinned) < chan->p9_max_pages)); + if (err == -ERESTARTSYS) + return err; } + + n = iov_iter_extract_pages(data, pages, count, offs, gup_flags); + if (n < 0) + return n; + *cleanup_mode = iov_iter_extract_mode(data, gup_flags); + nr_pages = DIV_ROUND_UP(n + *offs, PAGE_SIZE); + atomic_add(nr_pages, &vp_pinned); + return n; } static void handle_rerror(struct p9_req_t *req, int in_hdr_len, @@ -431,7 +392,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req, struct virtio_chan *chan = client->trans; struct scatterlist *sgs[4]; size_t offs; - int need_drop = 0; + int cleanup_mode = 0; int kicked = 0; p9_debug(P9_DEBUG_TRANS, "virtio request\n"); @@ -439,7 +400,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req, if (uodata) { __le32 sz; int n = p9_get_mapped_pages(chan, &out_pages, uodata, - outlen, &offs, &need_drop, + outlen, &offs, &cleanup_mode, FOLL_DEST_BUF); if (n < 0) { err = n; @@ -459,7 +420,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req, memcpy(&req->tc.sdata[0], &sz, sizeof(sz)); } else if (uidata) { int n = p9_get_mapped_pages(chan, &in_pages, uidata, - inlen, &offs, &need_drop, + inlen, &offs, &cleanup_mode, FOLL_SOURCE_BUF); if (n < 0) { err = n; @@ -546,14 +507,14 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req, * Non kernel buffers are pinned, unpin them */ err_out: - if (need_drop) { + if (cleanup_mode) { if (in_pages) { p9_release_pages(in_pages, in_nr_pages); - atomic_sub(in_nr_pages, &vp_pinned); + atomic_sub(in_nr_pages, &vp_pinned, cleanup_mode); } if (out_pages) { p9_release_pages(out_pages, out_nr_pages); - atomic_sub(out_nr_pages, &vp_pinned); + atomic_sub(out_nr_pages, &vp_pinned, cleanup_mode); } /* wakeup anybody waiting for slots to pin pages */ wake_up(&vp_wq); From patchwork Mon Jan 16 23:10:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44373 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1453098wrn; Mon, 16 Jan 2023 15:29:31 -0800 (PST) X-Google-Smtp-Source: AMrXdXt4x/WtykkGodsp/hWXVBwPi2oq6w66Do3UGymmDxdpn4L7oZb/IR7lIVWPB8Wlnp7SF8Ag X-Received: by 2002:a17:902:b18f:b0:193:234:443a with SMTP id s15-20020a170902b18f00b001930234443amr1285689plr.45.1673911771434; Mon, 16 Jan 2023 15:29:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673911771; cv=none; d=google.com; s=arc-20160816; b=GNXNNteCSLpZ69CCtMZTgmuGEnu6sdEJFYxqPX5T28ZstTVjUQkynQob3y+7g1yfQj 0kofFRnF1A/JFd8rVXcjjUb2Y6WonwdBCN3KVI0QP8sRrKbPjIlljIwbFT+cY0MlLtqi Z4jgbgO4XM/+01IwRIRscDX54NITc2ZvQQrASlGUO8c+W0Z2XdGnDmUTyLi4S+kkXK7B +s9vbWeCppQZDhugbNe9zzalbcP6bxPpnq6H12G6NbdL4J8QXnd4GqILHQ1JQXUZAl8K l/m0MbChEP25caZCBtTuDwPNIqJS1ok0j4/YiEQwBaF/NdUXR1wiigdW4NdWFy/g4OM5 g8Wg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=21npy4d/0NS7rpte2Lw8y8mTFb/H6639N8dVp3qG+wk=; b=aWblH5C0NyUdCqeMWRoueCpNqto+K48iaH9tt3xBX6tplEJ0GuWdEd8TcPB/bJlDor 2X2GL5cRHeZvi5OC4FMkvQihqGrW+ZFCOFTUSTukqYNFGnmPkohG16U0cYr1MF+QzSp3 hA1Et9ExmhgVCGpe0smvilLK0fzGp/BWgBUxJtmHTk8GKvcjZFASfVwF/ONrkNb6BVje FSZml2S9mYkJOfKfCig7IKQO4UPghATKwrRirmN0ITNeKopJ+RjcGdL9hde7A3Vk1Nhg t+cv1QcbmTnw4eSX9oWvymfz892GKqyVbzLB1xHSaPU13mn43ihJasOBFwOU7SNUYbKd sx6w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dmW4ujOx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h4-20020a170902f54400b001949482830esi4528315plf.414.2023.01.16.15.29.19; Mon, 16 Jan 2023 15:29:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dmW4ujOx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235421AbjAPXQ1 (ORCPT + 99 others); Mon, 16 Jan 2023 18:16:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55816 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235413AbjAPXPv (ORCPT ); Mon, 16 Jan 2023 18:15:51 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3806A302AA for ; Mon, 16 Jan 2023 15:10:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910649; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=21npy4d/0NS7rpte2Lw8y8mTFb/H6639N8dVp3qG+wk=; b=dmW4ujOxXSWx7zpxkGmueu3EUMiJWfIEccrectQZOTnrbmf9jY2nCs4EAKYs81JPUCgqaN N098Z8M+BgxkWPeGGAoA0F+cEnP8rB6qiTulUrRHGutErXpoHgGt52XcUdphQlZGAIuacs RRrJv6ejw9cA4x11Rp1HoFkb4iyxgeo= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-480-asss4pKVPImrS-lvo5oIYA-1; Mon, 16 Jan 2023 18:10:42 -0500 X-MC-Unique: asss4pKVPImrS-lvo5oIYA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 096C83C0F42B; Mon, 16 Jan 2023 23:10:42 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6D01339D6D; Mon, 16 Jan 2023 23:10:40 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 22/34] nfs: Pin pages rather than ref'ing if appropriate From: David Howells To: Al Viro Cc: Trond Myklebust , Anna Schumaker , Jeff Layton , linux-nfs@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:10:39 +0000 Message-ID: <167391063989.2311931.13252453380684759087.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755223709582795489?= X-GMAIL-MSGID: =?utf-8?q?1755223709582795489?= Convert the NFS direct I/O code to use iov_iter_extract_pages() instead of iov_iter_get_pages(). This will pin pages or leave them unaltered rather than getting a ref on them as appropriate to the iterator. The pages need to be pinned for DIO-read rather than having refs taken on them to prevent VM copy-on-write from malfunctioning during a concurrent fork() (the result of the I/O would otherwise end up only visible to the child process and not the parent). Signed-off-by: David Howells cc: Trond Myklebust cc: Anna Schumaker cc: Jeff Layton cc: linux-nfs@vger.kernel.org --- fs/nfs/direct.c | 32 ++++++++++++++++++-------------- 1 file changed, 18 insertions(+), 14 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 42af84685f20..4a3108db2cb6 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -142,11 +142,15 @@ int nfs_swap_rw(struct kiocb *iocb, struct iov_iter *iter) return 0; } -static void nfs_direct_release_pages(struct page **pages, unsigned int npages) +static void nfs_direct_release_pages(struct page **pages, unsigned int npages, + unsigned int cleanup_mode) { unsigned int i; - for (i = 0; i < npages; i++) - put_page(pages[i]); + + if (cleanup_mode) { + for (i = 0; i < npages; i++) + page_put_unpin(pages[i], cleanup_mode); + } } void nfs_init_cinfo_from_dreq(struct nfs_commit_info *cinfo, @@ -327,17 +331,16 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, inode_dio_begin(inode); while (iov_iter_count(iter)) { - struct page **pagevec; + struct page **pagevec = NULL; size_t bytes; size_t pgbase; unsigned npages, i; - result = iov_iter_get_pages_alloc(iter, &pagevec, - rsize, &pgbase, - FOLL_DEST_BUF); + result = iov_iter_extract_pages(iter, &pagevec, rsize, INT_MAX, + FOLL_DEST_BUF, &pgbase); if (result < 0) break; - + bytes = result; npages = (result + pgbase + PAGE_SIZE - 1) / PAGE_SIZE; for (i = 0; i < npages; i++) { @@ -363,7 +366,8 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, pos += req_len; dreq->bytes_left -= req_len; } - nfs_direct_release_pages(pagevec, npages); + nfs_direct_release_pages(pagevec, npages, + iov_iter_extract_mode(iter, FOLL_DEST_BUF)); kvfree(pagevec); if (result < 0) break; @@ -787,14 +791,13 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, NFS_I(inode)->write_io += iov_iter_count(iter); while (iov_iter_count(iter)) { - struct page **pagevec; + struct page **pagevec = NULL; size_t bytes; size_t pgbase; unsigned npages, i; - result = iov_iter_get_pages_alloc(iter, &pagevec, - wsize, &pgbase, - FOLL_SOURCE_BUF); + result = iov_iter_extract_pages(iter, &pagevec, wsize, INT_MAX, + FOLL_SOURCE_BUF, &pgbase); if (result < 0) break; @@ -831,7 +834,8 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, pos += req_len; dreq->bytes_left -= req_len; } - nfs_direct_release_pages(pagevec, npages); + nfs_direct_release_pages(pagevec, npages, + iov_iter_extract_mode(iter, FOLL_SOURCE_BUF)); kvfree(pagevec); if (result < 0) break; From patchwork Mon Jan 16 23:10:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44379 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1457498wrn; Mon, 16 Jan 2023 15:42:20 -0800 (PST) X-Google-Smtp-Source: AMrXdXuwNxqOlNeg9h2qli5X5CwAhxpjN2+xJ5EbEnfZefQcW0ffbX4GE0kWYPh665wLWg6FDDJo X-Received: by 2002:a17:902:efd4:b0:194:5ee5:3c44 with SMTP id ja20-20020a170902efd400b001945ee53c44mr19569920plb.17.1673912540199; Mon, 16 Jan 2023 15:42:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673912540; cv=none; d=google.com; s=arc-20160816; b=DmoAzhU7N9qCiQbx28EpgRYEIGA/6g0Iw9KItm6jRRsYB33sP3WlFYahrxyOHAAyO/ fZFSHohmHyfKhxEoX2sOvjcDsWYyGF8buG3S88Hgo+FuUBx6vtNf+3MwVBVX3b38z50K GjZN6p7qIXAcOY5NwvJgYMRMYUTCjnYME4ezmpmUXFMSYCosh5dJqAGFaWWtn5mubLED Acb0xrMnrUhQs2GeXMSQv18f5mdl4+BcjYm9fT2yklpWDbd0mfgG0WXBy0drXXNqMQHw rcUw178dNXuTSdrTQLi8/0SlPyOXRxMtMZplTn4mlZTx+Fnpl7DgLOLjsYldBIrDM264 wUqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=YK/GsmtKWjs0Z6+QV9idzb2/DMCcKOjB4/+/VpYvNS4=; b=sfrmPYIFQGUd1ybE4pklQwsb+GzzOAmO36h59uWwi6FAQ6uK8rNhdH5RGWA/jI/1OP EiLFHa/ylJpVnjgildz7ngspfx8E52k60IrquZ6GDXPR4zlb47JrAQHc7+4OcaGPWNIF R/rM8K316YmcN0NmPjmHLauU9pqTSc7VezMfHkrurWThaGAU/C24G28O8PxSlXLcayCf yi9Mva3zHDuDeZWt0Yl1q6ZeshsvNZWoo0QjatIuTom3tV+/LDDM74SqIofkGrNF/n02 Ag5WfGj7Cuy6fnTAWH1KGP6grh752xuoZzv8mxoz8elLoSS5ijgrEGe9kQDENQB2L8fP uWjQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NG92XxWB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b12-20020a170902d50c00b00192721d6a97si2573304plg.499.2023.01.16.15.42.08; Mon, 16 Jan 2023 15:42:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NG92XxWB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235342AbjAPXQc (ORCPT + 99 others); Mon, 16 Jan 2023 18:16:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55928 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235229AbjAPXP5 (ORCPT ); Mon, 16 Jan 2023 18:15:57 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43AF92C656 for ; Mon, 16 Jan 2023 15:10:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910653; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YK/GsmtKWjs0Z6+QV9idzb2/DMCcKOjB4/+/VpYvNS4=; b=NG92XxWBbej9I4Iw+0p9xn5fWxdwzhLATEkIlFp0QGWksy3NfoQ9H2tdCRExLQJNs3X5LA IyUKins1qdeq1ZOQDjBad2EsTGQE7RG6tqi5QfxS1HTjLBEFRjWw5pV3Y4e+tsJ9IMgjbs IVe7SqqWuADrxRTPmb7rKzrhDSK13dc= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-651-cmJBKWPzMk-bODE0brYFcw-1; Mon, 16 Jan 2023 18:10:50 -0500 X-MC-Unique: cmJBKWPzMk-bODE0brYFcw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 66E5B3C0F42D; Mon, 16 Jan 2023 23:10:49 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id B1A8A1121315; Mon, 16 Jan 2023 23:10:47 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 23/34] cifs: Implement splice_read to pass down ITER_BVEC not ITER_PIPE From: David Howells To: Al Viro Cc: Steve French , Shyam Prasad N , Rohith Surabattula , Jeff Layton , linux-cifs@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:10:47 +0000 Message-ID: <167391064717.2311931.7504820268968962092.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755224515791866838?= X-GMAIL-MSGID: =?utf-8?q?1755224515791866838?= Provide cifs_splice_read() to use a bvec rather than an pipe iterator as the latter cannot so easily be split and advanced, which is necessary to pass an iterator down to the bottom levels. Upstream cifs gets around this problem by using iov_iter_get_pages() to prefill the pipe and then passing the list of pages down. This is done by: (1) Bulk-allocate a bunch of pages to carry as much of the requested amount of data as possible, but without overrunning the available slots in the pipe and add them to an ITER_BVEC. (2) Synchronously call ->read_iter() to read into the buffer. (3) Discard any unused pages. (4) Load the remaining pages into the pipe in order and advance the head pointer. Signed-off-by: David Howells cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: Jeff Layton cc: Al Viro cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/166732028113.3186319.1793644937097301358.stgit@warthog.procyon.org.uk/ # rfc --- fs/cifs/cifsfs.c | 12 ++++--- fs/cifs/cifsfs.h | 3 ++ fs/cifs/file.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/splice.c | 1 + 4 files changed, 102 insertions(+), 6 deletions(-) diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c index 10e00c624922..3c57e8b11692 100644 --- a/fs/cifs/cifsfs.c +++ b/fs/cifs/cifsfs.c @@ -1358,7 +1358,7 @@ const struct file_operations cifs_file_ops = { .fsync = cifs_fsync, .flush = cifs_flush, .mmap = cifs_file_mmap, - .splice_read = generic_file_splice_read, + .splice_read = cifs_splice_read, .splice_write = iter_file_splice_write, .llseek = cifs_llseek, .unlocked_ioctl = cifs_ioctl, @@ -1378,7 +1378,7 @@ const struct file_operations cifs_file_strict_ops = { .fsync = cifs_strict_fsync, .flush = cifs_flush, .mmap = cifs_file_strict_mmap, - .splice_read = generic_file_splice_read, + .splice_read = cifs_splice_read, .splice_write = iter_file_splice_write, .llseek = cifs_llseek, .unlocked_ioctl = cifs_ioctl, @@ -1398,7 +1398,7 @@ const struct file_operations cifs_file_direct_ops = { .fsync = cifs_fsync, .flush = cifs_flush, .mmap = cifs_file_mmap, - .splice_read = generic_file_splice_read, + .splice_read = cifs_splice_read, .splice_write = iter_file_splice_write, .unlocked_ioctl = cifs_ioctl, .copy_file_range = cifs_copy_file_range, @@ -1416,7 +1416,7 @@ const struct file_operations cifs_file_nobrl_ops = { .fsync = cifs_fsync, .flush = cifs_flush, .mmap = cifs_file_mmap, - .splice_read = generic_file_splice_read, + .splice_read = cifs_splice_read, .splice_write = iter_file_splice_write, .llseek = cifs_llseek, .unlocked_ioctl = cifs_ioctl, @@ -1434,7 +1434,7 @@ const struct file_operations cifs_file_strict_nobrl_ops = { .fsync = cifs_strict_fsync, .flush = cifs_flush, .mmap = cifs_file_strict_mmap, - .splice_read = generic_file_splice_read, + .splice_read = cifs_splice_read, .splice_write = iter_file_splice_write, .llseek = cifs_llseek, .unlocked_ioctl = cifs_ioctl, @@ -1452,7 +1452,7 @@ const struct file_operations cifs_file_direct_nobrl_ops = { .fsync = cifs_fsync, .flush = cifs_flush, .mmap = cifs_file_mmap, - .splice_read = generic_file_splice_read, + .splice_read = cifs_splice_read, .splice_write = iter_file_splice_write, .unlocked_ioctl = cifs_ioctl, .copy_file_range = cifs_copy_file_range, diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h index 63a0ac2b9355..25decebbc478 100644 --- a/fs/cifs/cifsfs.h +++ b/fs/cifs/cifsfs.h @@ -100,6 +100,9 @@ extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to); extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from); extern ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from); extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from); +extern ssize_t cifs_splice_read(struct file *in, loff_t *ppos, + struct pipe_inode_info *pipe, size_t len, + unsigned int flags); extern int cifs_flock(struct file *pfile, int cmd, struct file_lock *plock); extern int cifs_lock(struct file *, int, struct file_lock *); extern int cifs_fsync(struct file *, loff_t, loff_t, int); diff --git a/fs/cifs/file.c b/fs/cifs/file.c index d100b9cb8682..f1297386a185 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -5273,3 +5273,95 @@ const struct address_space_operations cifs_addr_ops_smallbuf = { .launder_folio = cifs_launder_folio, .migrate_folio = filemap_migrate_folio, }; + +/* + * Splice data from a file into a pipe. + */ +ssize_t cifs_splice_read(struct file *file, loff_t *ppos, + struct pipe_inode_info *pipe, size_t len, + unsigned int flags) +{ + LIST_HEAD(pages); + struct iov_iter to; + struct bio_vec *bv; + struct kiocb kiocb; + struct page *page; + unsigned int head; + ssize_t ret; + size_t used, npages, chunk, remain, reclaim; + int i; + + /* Work out how much data we can actually add into the pipe */ + used = pipe_occupancy(pipe->head, pipe->tail); + npages = max_t(ssize_t, pipe->max_usage - used, 0); + len = min_t(size_t, len, npages * PAGE_SIZE); + npages = DIV_ROUND_UP(len, PAGE_SIZE); + + bv = kmalloc(array_size(npages, sizeof(bv[0])), GFP_KERNEL); + if (!bv) + return -ENOMEM; + + npages = alloc_pages_bulk_list(GFP_USER, npages, &pages); + if (!npages) { + kfree(bv); + return -ENOMEM; + } + + remain = len = min_t(size_t, len, npages * PAGE_SIZE); + + for (i = 0; i < npages; i++) { + chunk = min_t(size_t, PAGE_SIZE, remain); + page = list_first_entry(&pages, struct page, lru); + list_del_init(&page->lru); + bv[i].bv_page = page; + bv[i].bv_offset = 0; + bv[i].bv_len = chunk; + remain -= chunk; + } + + /* Do the I/O */ + iov_iter_bvec(&to, READ, bv, npages, len); + init_sync_kiocb(&kiocb, file); + kiocb.ki_pos = *ppos; + ret = call_read_iter(file, &kiocb, &to); + + reclaim = npages * PAGE_SIZE; + remain = 0; + if (ret > 0) { + reclaim -= ret; + remain = ret; + *ppos = kiocb.ki_pos; + file_accessed(file); + } else if (ret < 0) { + /* + * callers of ->splice_read() expect -EAGAIN on + * "can't put anything in there", rather than -EFAULT. + */ + if (ret == -EFAULT) + ret = -EAGAIN; + } + + /* Free any pages that didn't get touched at all. */ + for (; reclaim >= PAGE_SIZE; reclaim -= PAGE_SIZE) + __free_page(bv[--npages].bv_page); + + /* Push the remaining pages into the pipe. */ + head = pipe->head; + for (i = 0; i < npages; i++) { + struct pipe_buffer *buf = &pipe->bufs[head & (pipe->ring_size - 1)]; + + chunk = min_t(size_t, remain, PAGE_SIZE); + *buf = (struct pipe_buffer) { + .ops = &default_pipe_buf_ops, + .page = bv[i].bv_page, + .offset = 0, + .len = chunk, + }; + head++; + remain -= chunk; + } + pipe->head = head; + + kfree(bv); + return ret; +} diff --git a/fs/splice.c b/fs/splice.c index c3433266ba1b..1245ffb64414 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -330,6 +330,7 @@ const struct pipe_buf_operations default_pipe_buf_ops = { .try_steal = generic_pipe_buf_try_steal, .get = generic_pipe_buf_get, }; +EXPORT_SYMBOL(default_pipe_buf_ops); /* Pipe buffer operations for a socket and similar. */ const struct pipe_buf_operations nosteal_pipe_buf_ops = { From patchwork Mon Jan 16 23:10:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44378 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1457205wrn; Mon, 16 Jan 2023 15:41:28 -0800 (PST) X-Google-Smtp-Source: AMrXdXvJXLHihL8qJupzleyxgsa+qjqNDVI7s5yv7z2qoibLDDPe+eQCMscamQyAiKL+ErbByMxh X-Received: by 2002:a17:902:ce0b:b0:194:586a:77ba with SMTP id k11-20020a170902ce0b00b00194586a77bamr1553318plg.52.1673912488289; Mon, 16 Jan 2023 15:41:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673912488; cv=none; d=google.com; s=arc-20160816; b=L7O6IfqjJjecbb9ikeYQR63C1K9z2l3O2zl5Xeu1lZZq1Vbk++fh3aDxAT7vz63A0j cPqQh4xnzKyjBEFX+Jccr8Oq0wY1577pGfEJojaSv8FhkdsaoMr/3nc+EbMWryjpcgKD s7cL2T80XnE0rRphQ/KV8N4dst/A4eoEwbBsSH39vQJkxA52Rn4Xofs566o8y6orGSNr Iz4DWSuM+QR5zBqWXPKYYg5OfFTbf/G5wrcJ92uaOOBtaQNUzlwof1N8tPy+I21s1FcH +cAsI8ZWJqxqSik3NChDmZCAHQTWBYZqqCesBYZAFDsoIYctojuMmt66OlQL0o2bN5tr +nSw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=lvoePorPb+s/SNMB7xx6LbwM0BqIOrroTztDrtCxjAY=; b=aHYcT+MmTOB2PBxv/p7M1Yds7mbaOZoJA8w1zojYH0a9NaqymCj9yQHFBC8l6kANTA ycmpVkuB+lGLG9DH79JvwBKwkUhxJzkIvPvRp5LXfwIu7O5TSds2Gyk8xDefMLXAWije d2PZKsIAxzodSG/wLITgMlEw8HP9D9wCJHo4SouzqdphXhB9UHBnQ4bNaTSTGs1Qb7GR s664tDIA/Y01ZTgIEFDKvUzw9NlSVWYK1x+bIE8WRSc2Ub1ZfI25vgX0mGVQf7ehwTTb 12ldHFPwYBB7/ib/8I2ewcqmOHum/waSobEG945PUajPtPQ99vdj1xjt3D3cw05hO+cB WvxQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dGysmAeA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b12-20020a170902d50c00b00192721d6a97si2573304plg.499.2023.01.16.15.41.15; Mon, 16 Jan 2023 15:41:28 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dGysmAeA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235298AbjAPXRJ (ORCPT + 99 others); Mon, 16 Jan 2023 18:17:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57004 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235353AbjAPXQR (ORCPT ); Mon, 16 Jan 2023 18:16:17 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B3DBE26856 for ; Mon, 16 Jan 2023 15:11:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910661; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lvoePorPb+s/SNMB7xx6LbwM0BqIOrroTztDrtCxjAY=; b=dGysmAeAlEazIvTAYFtg6YYowokAYjlGzrfd/eWfpSGsBTwVO0zbW1nkrVaLzLaxODEf+o PRmsfMQ517Kcb0XfMhmHyBzoX/yS+I4XC2t2J+Lfok12weNf3CbbP1xBiKXecX8VEWRr/T IOMyZedHudPDpm3nX7eW+9XWzy6EdgI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-193-lT3YyhGPO9uz0g4OFhAvXg-1; Mon, 16 Jan 2023 18:10:57 -0500 X-MC-Unique: lT3YyhGPO9uz0g4OFhAvXg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id F272E85CCE0; Mon, 16 Jan 2023 23:10:56 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1AF9E40C6EC4; Mon, 16 Jan 2023 23:10:55 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 24/34] cifs: Add a function to build an RDMA SGE list from an iterator From: David Howells To: Al Viro Cc: Steve French , Shyam Prasad N , Rohith Surabattula , Tom Talpey , Jeff Layton , linux-cifs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-rdma@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:10:54 +0000 Message-ID: <167391065455.2311931.6594946160942957670.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755224460991543852?= X-GMAIL-MSGID: =?utf-8?q?1755224460991543852?= Add a function to add elements onto an RDMA SGE list representing page fragments extracted from a BVEC-, KVEC- or XARRAY-type iterator and DMA mapped until the maximum number of elements is reached. Nothing is done to make sure the pages remain present - that must be done by the caller. Signed-off-by: David Howells cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: Tom Talpey cc: Jeff Layton cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-rdma@vger.kernel.org Link: https://lore.kernel.org/r/166697256704.61150.17388516338310645808.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732028840.3186319.8512284239779728860.stgit@warthog.procyon.org.uk/ # rfc --- fs/cifs/smbdirect.c | 224 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 224 insertions(+) diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c index 3e693ffd0662..78a76752fafd 100644 --- a/fs/cifs/smbdirect.c +++ b/fs/cifs/smbdirect.c @@ -44,6 +44,17 @@ static int smbd_post_send_page(struct smbd_connection *info, static void destroy_mr_list(struct smbd_connection *info); static int allocate_mr_list(struct smbd_connection *info); +struct smb_extract_to_rdma { + struct ib_sge *sge; + unsigned int nr_sge; + unsigned int max_sge; + struct ib_device *device; + u32 local_dma_lkey; + enum dma_data_direction direction; +}; +static ssize_t smb_extract_iter_to_rdma(struct iov_iter *iter, size_t len, + struct smb_extract_to_rdma *rdma); + /* SMBD version number */ #define SMBD_V1 0x0100 @@ -2480,3 +2491,216 @@ int smbd_deregister_mr(struct smbd_mr *smbdirect_mr) return rc; } + +static bool smb_set_sge(struct smb_extract_to_rdma *rdma, + struct page *lowest_page, size_t off, size_t len) +{ + struct ib_sge *sge = &rdma->sge[rdma->nr_sge]; + u64 addr; + + addr = ib_dma_map_page(rdma->device, lowest_page, + off, len, rdma->direction); + if (ib_dma_mapping_error(rdma->device, addr)) + return false; + + sge->addr = addr; + sge->length = len; + sge->lkey = rdma->local_dma_lkey; + rdma->nr_sge++; + return true; +} + +/* + * Extract page fragments from a BVEC-class iterator and add them to an RDMA + * element list. The pages are not pinned. + */ +static ssize_t smb_extract_bvec_to_rdma(struct iov_iter *iter, + struct smb_extract_to_rdma *rdma, + ssize_t maxsize) +{ + const struct bio_vec *bv = iter->bvec; + unsigned long start = iter->iov_offset; + unsigned int i, sge_max = rdma->max_sge; + ssize_t ret = 0; + + for (i = 0; i < iter->nr_segs; i++) { + size_t off, len; + + len = bv[i].bv_len; + if (start >= len) { + start -= len; + continue; + } + + len = min_t(size_t, maxsize, len - start); + off = bv[i].bv_offset + start; + + if (!smb_set_sge(rdma, bv[i].bv_page, off, len)) + return -EIO; + sge_max--; + + ret += len; + maxsize -= len; + if (maxsize <= 0 || sge_max == 0) + break; + start = 0; + } + + return ret; +} + +/* + * Extract fragments from a KVEC-class iterator and add them to an RDMA list. + * This can deal with vmalloc'd buffers as well as kmalloc'd or static buffers. + * The pages are not pinned. + */ +static ssize_t smb_extract_kvec_to_rdma(struct iov_iter *iter, + struct smb_extract_to_rdma *rdma, + ssize_t maxsize) +{ + const struct kvec *kv = iter->kvec; + unsigned long start = iter->iov_offset; + unsigned int i, sge_max = rdma->max_sge; + ssize_t ret = 0; + + for (i = 0; i < iter->nr_segs; i++) { + struct page *page; + unsigned long kaddr; + size_t off, len, seg; + + len = kv[i].iov_len; + if (start >= len) { + start -= len; + continue; + } + + kaddr = (unsigned long)kv[i].iov_base + start; + off = kaddr & ~PAGE_MASK; + len = min_t(size_t, maxsize, len - start); + kaddr &= PAGE_MASK; + + maxsize -= len; + ret += len; + do { + seg = min_t(size_t, len, PAGE_SIZE - off); + + if (is_vmalloc_or_module_addr((void *)kaddr)) + page = vmalloc_to_page((void *)kaddr); + else + page = virt_to_page(kaddr); + + if (!smb_set_sge(rdma, page, off, len)) + return -EIO; + sge_max--; + + len -= seg; + kaddr += PAGE_SIZE; + off = 0; + } while (len > 0 && sge_max > 0); + + if (maxsize <= 0 || sge_max == 0) + break; + start = 0; + } + + return ret; +} + +/* + * Extract folio fragments from an XARRAY-class iterator and add them to an + * RDMA list. The folios are not pinned. + */ +static ssize_t smb_extract_xarray_to_rdma(struct iov_iter *iter, + struct smb_extract_to_rdma *rdma, + ssize_t maxsize) +{ + struct xarray *xa = iter->xarray; + struct folio *folio; + unsigned int sge_max = rdma->max_sge; + loff_t start = iter->xarray_start + iter->iov_offset; + pgoff_t index = start / PAGE_SIZE; + ssize_t ret = 0; + size_t off, len; + XA_STATE(xas, xa, index); + + rcu_read_lock(); + + xas_for_each(&xas, folio, ULONG_MAX) { + if (xas_retry(&xas, folio)) + continue; + if (WARN_ON(xa_is_value(folio))) + break; + if (WARN_ON(folio_test_hugetlb(folio))) + break; + + off = offset_in_folio(folio, start); + len = min_t(size_t, maxsize, folio_size(folio) - off); + + if (!smb_set_sge(rdma, folio_page(folio, 0), off, len)) { + rcu_read_lock(); + return -EIO; + } + sge_max--; + + maxsize -= len; + ret += len; + if (maxsize <= 0 || sge_max == 0) + break; + } + + rcu_read_unlock(); + return ret; +} + +/* + * Extract page fragments from up to the given amount of the source iterator + * and build up an RDMA list that refers to all of those bits. The RDMA list + * is appended to, up to the maximum number of elements set in the parameter + * block. + * + * The extracted page fragments are not pinned or ref'd in any way; if an + * IOVEC/UBUF-type iterator is to be used, it should be converted to a + * BVEC-type iterator and the pages pinned, ref'd or otherwise held in some + * way. + */ +static ssize_t smb_extract_iter_to_rdma(struct iov_iter *iter, size_t len, + struct smb_extract_to_rdma *rdma) +{ + ssize_t ret; + int before = rdma->nr_sge; + + if (iov_iter_is_discard(iter) || + iov_iter_is_pipe(iter) || + user_backed_iter(iter)) { + WARN_ON_ONCE(1); + return -EIO; + } + + switch (iov_iter_type(iter)) { + case ITER_BVEC: + ret = smb_extract_bvec_to_rdma(iter, rdma, len); + break; + case ITER_KVEC: + ret = smb_extract_kvec_to_rdma(iter, rdma, len); + break; + case ITER_XARRAY: + ret = smb_extract_xarray_to_rdma(iter, rdma, len); + break; + default: + BUG(); + } + + if (ret > 0) { + iov_iter_advance(iter, ret); + } else if (ret < 0) { + while (rdma->nr_sge > before) { + struct ib_sge *sge = &rdma->sge[rdma->nr_sge--]; + + ib_dma_unmap_single(rdma->device, sge->addr, sge->length, + rdma->direction); + sge->addr = 0; + } + } + + return ret; +} From patchwork Mon Jan 16 23:11:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44375 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1455831wrn; Mon, 16 Jan 2023 15:37:05 -0800 (PST) X-Google-Smtp-Source: AMrXdXt1ZrQbSv2AtXJ1W+z35TVKxgsSYBXkxhlm7jsaVzWh53FFcFSizYdfWQ/YTC/b0bSIUvnb X-Received: by 2002:a17:906:40d7:b0:84d:16d0:717 with SMTP id a23-20020a17090640d700b0084d16d00717mr856270ejk.65.1673912224990; Mon, 16 Jan 2023 15:37:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673912224; cv=none; d=google.com; s=arc-20160816; b=VvUi1zlwXdkvT0keciOXdQ+s8ATKp15RPAHVm3WaI4dynL1OoUMg6svmHhdlE8LVfQ 9O/zYmgVEQmkJ/WJiQDtTTqEghSKJRKvAoa8/gCXyDIibrzKqHjsJ4oRmd1eKzMAzK/P qvwgQAU0WwResZiLm7+coP8WRNiiqbBknCMCcprZaGLXziQ1hqV3b8KUMvq0j5wTsuPV V8BirMXd3Ul0syvPylPMHSvSq94wQYMCb1QCdkZI60RDr1k1DPYdeHHMEBQaqiydCIYp hpqK1lNbgC0er8GaGNzvOH2UDzKUglAHt9DctG5YPiDI+/Ift3K0MmYwskTbD+b/kqQA LIVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=mkjFMRxDIKJKWLRwMUOLvuSiSTpWYy2LHznevDgStcI=; b=KcmWaLeXsGfe7xRNtwG28yqMTW96Xi+N9oas1fR4foVGxfk62u0RYTX0jlpqPNNKUi 2XUxmpwWptYeNu2F4O/Bw7JCiCRsXxzJN0ImLYHztjDGyChqv9qKsqX0xPbjfV39kL+w Bso4SrlHwNLrx/rzE+1PtFkh1Y348MMiAv59Vdxma0P57dcdIfT2w3J7hLNqBcLTTm4K IyZ0+hdq1sh4alQbU9Xeb266pFcktrAqiSbKFCzb9peZx+7qPzlwXSnY5H9X8qKIBc/6 /18zoW0iZC5S6DOzLcH99YL3bwJNtQm4lTuBQWM12kmWxMGIHO90/Nwhcp07B9KJNQSR um8A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=fHYJ3Xp4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id qa13-20020a170907868d00b007ad976163f4si7360611ejc.254.2023.01.16.15.36.40; Mon, 16 Jan 2023 15:37:04 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=fHYJ3Xp4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235165AbjAPXRc (ORCPT + 99 others); Mon, 16 Jan 2023 18:17:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55964 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235255AbjAPXQk (ORCPT ); Mon, 16 Jan 2023 18:16:40 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF2E0279AE for ; Mon, 16 Jan 2023 15:11:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910670; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mkjFMRxDIKJKWLRwMUOLvuSiSTpWYy2LHznevDgStcI=; b=fHYJ3Xp4zUIWj3bCd1E9KpVylnFBxRI0uxCNI/azcugbxu6zOQXFqapUZy5B4itbtKyjnY nciR7aRfXwQNhrRWmT/3+S8VEUgpkZWpJg05NoFlZWcmBZTLodnPXCQX7gMoPyMqhwmSG7 KxU4JOePSpv06X/mTpggy23aSI2BGmM= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-203-enKb_4ZtORmXcmB3l_IJKw-1; Mon, 16 Jan 2023 18:11:05 -0500 X-MC-Unique: enKb_4ZtORmXcmB3l_IJKw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 710612A59560; Mon, 16 Jan 2023 23:11:04 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id A58A94010D46; Mon, 16 Jan 2023 23:11:02 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 25/34] cifs: Add a function to Hash the contents of an iterator From: David Howells To: Al Viro Cc: Steve French , Shyam Prasad N , Rohith Surabattula , Jeff Layton , linux-cifs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-crypto@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:11:02 +0000 Message-ID: <167391066212.2311931.16097548940184155209.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755224185223939793?= X-GMAIL-MSGID: =?utf-8?q?1755224185223939793?= Add a function to push the contents of a BVEC-, KVEC- or XARRAY-type iterator into a symmetric hash algorithm. UBUF- and IOBUF-type iterators are not supported on the assumption that either we're doing buffered I/O, in which case we won't see them, or we're doing direct I/O, in which case the iterator will have been extracted into a BVEC-type iterator higher up. Signed-off-by: David Howells cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: Jeff Layton cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-crypto@vger.kernel.org Link: https://lore.kernel.org/r/166697257423.61150.12070648579830206483.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732029577.3186319.17162612653237909961.stgit@warthog.procyon.org.uk/ # rfc --- fs/cifs/cifsencrypt.c | 144 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 144 insertions(+) diff --git a/fs/cifs/cifsencrypt.c b/fs/cifs/cifsencrypt.c index 5db73c0f792a..e13f26371540 100644 --- a/fs/cifs/cifsencrypt.c +++ b/fs/cifs/cifsencrypt.c @@ -24,6 +24,150 @@ #include "../smbfs_common/arc4.h" #include +/* + * Hash data from a BVEC-type iterator. + */ +static int cifs_shash_bvec(const struct iov_iter *iter, ssize_t maxsize, + struct shash_desc *shash) +{ + const struct bio_vec *bv = iter->bvec; + unsigned long start = iter->iov_offset; + unsigned int i; + void *p; + int ret; + + for (i = 0; i < iter->nr_segs; i++) { + size_t off, len; + + len = bv[i].bv_len; + if (start >= len) { + start -= len; + continue; + } + + len = min_t(size_t, maxsize, len - start); + off = bv[i].bv_offset + start; + + p = kmap_local_page(bv[i].bv_page); + ret = crypto_shash_update(shash, p + off, len); + kunmap_local(p); + if (ret < 0) + return ret; + + maxsize -= len; + if (maxsize <= 0) + break; + start = 0; + } + + return 0; +} + +/* + * Hash data from a KVEC-type iterator. + */ +static int cifs_shash_kvec(const struct iov_iter *iter, ssize_t maxsize, + struct shash_desc *shash) +{ + const struct kvec *kv = iter->kvec; + unsigned long start = iter->iov_offset; + unsigned int i; + int ret; + + for (i = 0; i < iter->nr_segs; i++) { + size_t len; + + len = kv[i].iov_len; + if (start >= len) { + start -= len; + continue; + } + + len = min_t(size_t, maxsize, len - start); + ret = crypto_shash_update(shash, kv[i].iov_base + start, len); + if (ret < 0) + return ret; + maxsize -= len; + + if (maxsize <= 0) + break; + start = 0; + } + + return 0; +} + +/* + * Hash data from an XARRAY-type iterator. + */ +static ssize_t cifs_shash_xarray(const struct iov_iter *iter, ssize_t maxsize, + struct shash_desc *shash) +{ + struct folio *folios[16], *folio; + unsigned int nr, i, j, npages; + loff_t start = iter->xarray_start + iter->iov_offset; + pgoff_t last, index = start / PAGE_SIZE; + ssize_t ret = 0; + size_t len, offset, foffset; + void *p; + + if (maxsize == 0) + return 0; + + last = (start + maxsize - 1) / PAGE_SIZE; + do { + nr = xa_extract(iter->xarray, (void **)folios, index, last, + ARRAY_SIZE(folios), XA_PRESENT); + if (nr == 0) + return -EIO; + + for (i = 0; i < nr; i++) { + folio = folios[i]; + npages = folio_nr_pages(folio); + foffset = start - folio_pos(folio); + offset = foffset % PAGE_SIZE; + for (j = foffset / PAGE_SIZE; j < npages; j++) { + len = min_t(size_t, maxsize, PAGE_SIZE - offset); + p = kmap_local_page(folio_page(folio, j)); + ret = crypto_shash_update(shash, p, len); + kunmap_local(p); + if (ret < 0) + return ret; + maxsize -= len; + if (maxsize <= 0) + return 0; + start += len; + offset = 0; + index++; + } + } + } while (nr == ARRAY_SIZE(folios)); + return 0; +} + +/* + * Pass the data from an iterator into a hash. + */ +static int cifs_shash_iter(const struct iov_iter *iter, size_t maxsize, + struct shash_desc *shash) +{ + if (maxsize == 0) + return 0; + + switch (iov_iter_type(iter)) { + case ITER_BVEC: + return cifs_shash_bvec(iter, maxsize, shash); + case ITER_KVEC: + return cifs_shash_kvec(iter, maxsize, shash); + case ITER_XARRAY: + return cifs_shash_xarray(iter, maxsize, shash); + default: + pr_err("cifs_shash_iter(%u) unsupported\n", iov_iter_type(iter)); + WARN_ON_ONCE(1); + return -EIO; + } +} + int __cifs_calc_signature(struct smb_rqst *rqst, struct TCP_Server_Info *server, char *signature, struct shash_desc *shash) From patchwork Mon Jan 16 23:11:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44382 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1459646wrn; Mon, 16 Jan 2023 15:49:23 -0800 (PST) X-Google-Smtp-Source: AMrXdXvYQEpQhawID8il8BnrEMLEeY/4TreQ44S83atbbJbzJv/OPUiQd+v5frqsSxLzaw7D/Prm X-Received: by 2002:a17:902:d718:b0:194:a529:9984 with SMTP id w24-20020a170902d71800b00194a5299984mr1307328ply.58.1673912963344; Mon, 16 Jan 2023 15:49:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673912963; cv=none; d=google.com; s=arc-20160816; b=DsS8Wu0b8zb3TKR3vVqJemkP/vmvJEUZWrwP5F8bWY95qEWoqNTe+MH42U0CCx6O08 OMiXZyXGxin+cazk8bUnFI6VlZsA1f8IzYx6emPjZtGph+eVd+eSoKboTXflXEQUgHK7 aENqPXrRsRYuLhzFszWKROpIk9RApbhZo9PUlt2Za5BGMJqAwKn4WDKQAnl+ElvZCEIr SFZ7NAhhTAs31idYn2tTF+JGPxt0/RME7lHd+f1oOt+DaRgN450PQWyMmdXbpxQ4SffP Xb/j5F+tG4lEk5neZhWiozQbHQod61F3CWcFfpNbDeCUNaTJtugRYgnwU1TpBBCeSBQv 5c9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=fiBnMGx+baxgSoLj5goLQrtXR0Ipe5sXcwIz+552lHw=; b=rqr6FjRzDDhneo+Ii6c9gdzY0305EeIJNqLg+LF23fzCiCljxrLCPTq0BacH/Q3l2/ e46S+LKx8FlC8dJiz3+ZIvVr49dkBo+ctvgux5CTVn9MejWu3dBToU6dJGaivmJllwg0 0tnX7YgApa/fjA9AFBiGRCnRm4QgPE53/POWIdURgSDOmzxrC+2wO0CRZ6c0XtMO6p2u 9IzofQ42FzlaZGY90aN2IwhyjzP6znmqvVnoV664tY6adAp3ZHyZt19e9eXHlA5E1Fx3 s8rkFISZ8Ypr7Oz8gLygjoI0/9JsSwSLL3V7y7iWIyHFk5j7AlEDUBQPVpejBbhfkI6p hr9Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DIZD9Hrd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b12-20020a170902d50c00b00192721d6a97si2573304plg.499.2023.01.16.15.49.10; Mon, 16 Jan 2023 15:49:23 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DIZD9Hrd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235472AbjAPXR5 (ORCPT + 99 others); Mon, 16 Jan 2023 18:17:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56988 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229842AbjAPXRD (ORCPT ); Mon, 16 Jan 2023 18:17:03 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E434303F0 for ; Mon, 16 Jan 2023 15:11:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910676; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fiBnMGx+baxgSoLj5goLQrtXR0Ipe5sXcwIz+552lHw=; b=DIZD9HrdRRN7I01CIRCnnNJD1JbR5PMCVjjPIWt362Vcr3BCO28uLi0IxlSvKGg1tirB2g kNtM/grzoLb758FI7M/xwe9QsLNmHUperTiKfaRB52BS5VAuX8X3MDGmwkwJIRU7dKnUPW Wn0B10Lk0frABv+uPTYPXTLBy597uM8= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-587-htQYbudlN-yAtPigeJ0tJw-1; Mon, 16 Jan 2023 18:11:12 -0500 X-MC-Unique: htQYbudlN-yAtPigeJ0tJw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CA17485CCE3; Mon, 16 Jan 2023 23:11:11 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 247C02166B26; Mon, 16 Jan 2023 23:11:10 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 26/34] cifs: Add some helper functions From: David Howells To: Al Viro Cc: Steve French , Shyam Prasad N , Rohith Surabattula , Jeff Layton , linux-cifs@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:11:09 +0000 Message-ID: <167391066959.2311931.17352170557719525141.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755224959428596653?= X-GMAIL-MSGID: =?utf-8?q?1755224959428596653?= Add some helper functions to manipulate the folio marks by iterating through a list of folios held in an xarray rather than using a page list. Signed-off-by: David Howells cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: Jeff Layton cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/164928616583.457102.15157033997163988344.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165211418840.3154751.3090684430628501879.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165348878940.2106726.204291614267188735.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165364825674.3334034.3356201708659748648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/166126394799.708021.10637797063862600488.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/166697258147.61150.9940790486999562110.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732030314.3186319.9209944805565413627.stgit@warthog.procyon.org.uk/ # rfc --- fs/cifs/cifsfs.h | 3 ++ fs/cifs/file.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 96 insertions(+) diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h index 25decebbc478..ea628da503c6 100644 --- a/fs/cifs/cifsfs.h +++ b/fs/cifs/cifsfs.h @@ -113,6 +113,9 @@ extern int cifs_file_strict_mmap(struct file *file, struct vm_area_struct *vma); extern const struct file_operations cifs_dir_ops; extern int cifs_dir_open(struct inode *inode, struct file *file); extern int cifs_readdir(struct file *file, struct dir_context *ctx); +extern void cifs_pages_written_back(struct inode *inode, loff_t start, unsigned int len); +extern void cifs_pages_write_failed(struct inode *inode, loff_t start, unsigned int len); +extern void cifs_pages_write_redirty(struct inode *inode, loff_t start, unsigned int len); /* Functions related to dir entries */ extern const struct dentry_operations cifs_dentry_ops; diff --git a/fs/cifs/file.c b/fs/cifs/file.c index f1297386a185..2873f28bf388 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -36,6 +36,99 @@ #include "cifs_ioctl.h" #include "cached_dir.h" +/* + * Completion of write to server. + */ +void cifs_pages_written_back(struct inode *inode, loff_t start, unsigned int len) +{ + struct address_space *mapping = inode->i_mapping; + struct folio *folio; + pgoff_t end; + + XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE); + + if (!len) + return; + + rcu_read_lock(); + + end = (start + len - 1) / PAGE_SIZE; + xas_for_each(&xas, folio, end) { + if (!folio_test_writeback(folio)) { + WARN_ONCE(1, "bad %x @%llx page %lx %lx\n", + len, start, folio_index(folio), end); + continue; + } + + folio_detach_private(folio); + folio_end_writeback(folio); + } + + rcu_read_unlock(); +} + +/* + * Failure of write to server. + */ +void cifs_pages_write_failed(struct inode *inode, loff_t start, unsigned int len) +{ + struct address_space *mapping = inode->i_mapping; + struct folio *folio; + pgoff_t end; + + XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE); + + if (!len) + return; + + rcu_read_lock(); + + end = (start + len - 1) / PAGE_SIZE; + xas_for_each(&xas, folio, end) { + if (!folio_test_writeback(folio)) { + WARN_ONCE(1, "bad %x @%llx page %lx %lx\n", + len, start, folio_index(folio), end); + continue; + } + + folio_set_error(folio); + folio_end_writeback(folio); + } + + rcu_read_unlock(); +} + +/* + * Redirty pages after a temporary failure. + */ +void cifs_pages_write_redirty(struct inode *inode, loff_t start, unsigned int len) +{ + struct address_space *mapping = inode->i_mapping; + struct folio *folio; + pgoff_t end; + + XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE); + + if (!len) + return; + + rcu_read_lock(); + + end = (start + len - 1) / PAGE_SIZE; + xas_for_each(&xas, folio, end) { + if (!folio_test_writeback(folio)) { + WARN_ONCE(1, "bad %x @%llx page %lx %lx\n", + len, start, folio_index(folio), end); + continue; + } + + filemap_dirty_folio(folio->mapping, folio); + folio_end_writeback(folio); + } + + rcu_read_unlock(); +} + /* * Mark as invalid, all open files on tree connections since they * were closed when session to server was lost. From patchwork Mon Jan 16 23:11:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44374 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1454559wrn; Mon, 16 Jan 2023 15:33:19 -0800 (PST) X-Google-Smtp-Source: AMrXdXv/urB8sbWq5Bj59NkqKEJlI4PQZaoFzs/UtrmX5bi+pRElVEwdPP86Fa5ExdZiQwyUqyHt X-Received: by 2002:a05:6402:524a:b0:490:47c3:3d78 with SMTP id t10-20020a056402524a00b0049047c33d78mr1383298edd.1.1673911999110; Mon, 16 Jan 2023 15:33:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673911999; cv=none; d=google.com; s=arc-20160816; b=aoV3Ww+rAGPFXPgNO4acml0L7s1u3STiAncCjHE786ZRyaTvirU3hVwoeIP92fziEO LqJYwEw/AKmdYu/EfX5qLymNtVTlbvydY0ko2Dsq87j0obXCQHFInaXanKukGutAUiaN zc9md2NRYpeQaHCsISZDFpu0JLkX1mngd/48hotNyAQYACwXYq5nZmjLLfSDhOAI5gzQ s65KqOl6mxdcGXF8B6TxDHvRa/PX0+5hzc6MDtrZ75kQzZhbBB3GASRdMs/BnuTC9G3A eMYfDt0wm2F57jwdGnk8Y1P0+hW1IRYGaUj3mmm4t46sQi1HWTOyEQK8spXqaJJ0pTTW ckpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=js62ZRaWnqbjmPpizk27IottYyKMHFmsKuCH/pU3Ec8=; b=IFIFlgDdsJEd/Xt62L6wKqVdqzj3UU2/IyGz6fpV+dSlLrdQs/uER2HbYObgOkIqyW izuc6bvE4LELbS7WwTjghIzx99s2JCgqxZCxuwzPxw3G7+PSehrSY8AfKz7mGudcWg7Z JCkPV2APWyoz4WvPp3o4t6pzxuwBI9lvVu/aL/eT1UxiyQ4JG4+ObLxJcMfxgOLX1PhP AjYK4C0bJ9eBZLz7jDdTTg/p7syC36glM268o/O+dEJRoSsyd+0Xn1rjDVqmcr2Opj5w SxeY6+Q5OZ6uNJ4+eXBAr2cL5Nks7xo602NjxwgXPNRJ0gxv5pD1ym4ex+z3ML7byJIC rT2A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=X4A4iSWB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id sg34-20020a170907a42200b007acc6769292si32891288ejc.365.2023.01.16.15.32.55; Mon, 16 Jan 2023 15:33:19 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=X4A4iSWB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235471AbjAPXSN (ORCPT + 99 others); Mon, 16 Jan 2023 18:18:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235451AbjAPXRO (ORCPT ); Mon, 16 Jan 2023 18:17:14 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CFD99305D7 for ; Mon, 16 Jan 2023 15:11:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910685; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=js62ZRaWnqbjmPpizk27IottYyKMHFmsKuCH/pU3Ec8=; b=X4A4iSWBTfbRXIkpjyavqpoU0XEgajOD4LraoXfBviFsV/P9LI2DpR7FWZwQ8sLRVceprD neKWjD/dKQOkOA4x6c0WKyQe4bPU3KFghlUe/Mcr6sqz0qve02qtWGrsjGXwREjnkKJybl n3+sCv+QQ2GnURjKrD7niVFpAj3c15E= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-637-NULuP-QOMeChjslrrhpo2A-1; Mon, 16 Jan 2023 18:11:19 -0500 X-MC-Unique: NULuP-QOMeChjslrrhpo2A-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1BEB2802C1C; Mon, 16 Jan 2023 23:11:19 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8030D2026D4B; Mon, 16 Jan 2023 23:11:17 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 27/34] cifs: Add a function to read into an iter from a socket From: David Howells To: Al Viro Cc: Steve French , Shyam Prasad N , Rohith Surabattula , Jeff Layton , linux-cifs@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:11:16 +0000 Message-ID: <167391067696.2311931.12784274342375267019.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755223948733962374?= X-GMAIL-MSGID: =?utf-8?q?1755223948733962374?= Add a helper function to read data from a socket into the given iterator. Signed-off-by: David Howells cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: Jeff Layton cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/164928617874.457102.10021662143234315566.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165211419563.3154751.18431990381145195050.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165348879662.2106726.16881134187242702351.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165364826398.3334034.12541600783145647319.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/166126395495.708021.12328677373159554478.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/166697258876.61150.3530237818849429372.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732031039.3186319.10691316510079412635.stgit@warthog.procyon.org.uk/ # rfc --- fs/cifs/cifsproto.h | 3 +++ fs/cifs/connect.c | 16 ++++++++++++++++ 2 files changed, 19 insertions(+) diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h index 1207b39686fb..cb7a3fe89278 100644 --- a/fs/cifs/cifsproto.h +++ b/fs/cifs/cifsproto.h @@ -244,6 +244,9 @@ extern int cifs_read_page_from_socket(struct TCP_Server_Info *server, struct page *page, unsigned int page_offset, unsigned int to_read); +int cifs_read_iter_from_socket(struct TCP_Server_Info *server, + struct iov_iter *iter, + unsigned int to_read); extern int cifs_setup_cifs_sb(struct cifs_sb_info *cifs_sb); void cifs_mount_put_conns(struct cifs_mount_ctx *mnt_ctx); int cifs_mount_get_session(struct cifs_mount_ctx *mnt_ctx); diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index d371259d6808..68d6d74c2f4e 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -765,6 +765,22 @@ cifs_read_page_from_socket(struct TCP_Server_Info *server, struct page *page, return cifs_readv_from_socket(server, &smb_msg); } +int +cifs_read_iter_from_socket(struct TCP_Server_Info *server, struct iov_iter *iter, + unsigned int to_read) +{ + struct msghdr smb_msg; + int ret; + + smb_msg.msg_iter = *iter; + if (smb_msg.msg_iter.count > to_read) + smb_msg.msg_iter.count = to_read; + ret = cifs_readv_from_socket(server, &smb_msg); + if (ret > 0) + iov_iter_advance(iter, ret); + return ret; +} + static bool is_smb_response(struct TCP_Server_Info *server, unsigned char type) { From patchwork Mon Jan 16 23:11:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44381 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1459184wrn; Mon, 16 Jan 2023 15:47:34 -0800 (PST) X-Google-Smtp-Source: AMrXdXs1dBuWqXTRr5EvCMWfhPiat912wqLnrxjskhNtaVherVbYllAb7Vo5u50YxlqNL+rRuHMK X-Received: by 2002:a17:902:e303:b0:192:6c8a:6b81 with SMTP id q3-20020a170902e30300b001926c8a6b81mr1194948plc.31.1673912853785; Mon, 16 Jan 2023 15:47:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673912853; cv=none; d=google.com; s=arc-20160816; b=BB49z7rd8HNE4/6caRomAX0nVNg/P0K+y4H2Rd3Ax5Ff2shZbak0CpHFlrn3/28ki7 j4nxkikqC7L8F8xBdN0cyhP+yybqNxYC+JzY0kWsDqqcKTIT90foXMzC0Ao9Hbynubh3 F9jKTD3PhmKKPaO92kWMGoD63cTNRUEfGRwRLFM6CwBAvZA1oR8bIBcovoWls6EqqhK0 npdBhoEu2pvD6BOaL/CBNNspfbUtCiexzq0WGlWGH4zKrVyvdyd8YVailiQxE76agSBL f4X2ErS5p5MWzwD2Uco6qd7FtHxWxTo4B4bZ7LlKAWqbA91W81gMmStIQ5Q1ERj1EhTp GMhA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=z6omL6UP7Zv6rup1FBcsc13h1TPBfVZYjP3PSM9JW54=; b=K5rKMw2jHVnrbMTa00+w/tlGB7dY0wACODU/o9XySI6hA7ZRwK8DYFe6kzHzeO7Pxc XY5CUQYR+Y/r63B5VBGHkeaynKzfU+j5TVEjL3Q0X9OP8Q9NwzrlPGxwEfYJjqhSdM39 VyWgBMk6epPJhnwd8iYKJC5WjFsZ7fB6b6k6hXgXBCKGMP+RJ/VJhL73Do8sv+gO3dls Uq5/9+9Dsej5lInMGUmfaLbg0A9pex7IvIwSHfkAjD6WeRHo19XN5HuhpB97MfZTd7ia v/uheTpAspF19ePEvvjyGFQBBLWSDOlvAfKT+V0EeqPR57HqtEByAkP9FaAdtQ58Fp8B cvew== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=cHPfdFT1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b12-20020a170902d50c00b00192721d6a97si2573304plg.499.2023.01.16.15.47.21; Mon, 16 Jan 2023 15:47:33 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=cHPfdFT1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235204AbjAPXVC (ORCPT + 99 others); Mon, 16 Jan 2023 18:21:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55708 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235351AbjAPXRs (ORCPT ); Mon, 16 Jan 2023 18:17:48 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C020B30E85 for ; Mon, 16 Jan 2023 15:11:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910691; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=z6omL6UP7Zv6rup1FBcsc13h1TPBfVZYjP3PSM9JW54=; b=cHPfdFT1f+YDDnZG4wCNr9OJ7u7eyxMFw1PxQxmuCzuzL2tXLuB2eqVF0iZCdc1FDka8em nixNlyFf8RH1xaXdUgbZsacvImK2Qt+dEErhFQeE2T0YPwQQfj7gm6PIaJHtZ0gKt/Kp+v 5Y9eN4q4H8ltO+BCO0gT/aowyODdwsw= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-584-_dYFwSE5M_ymm6a1_eDEhg-1; Mon, 16 Jan 2023 18:11:27 -0500 X-MC-Unique: _dYFwSE5M_ymm6a1_eDEhg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E7887802BF3; Mon, 16 Jan 2023 23:11:26 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id C883AC15BA0; Mon, 16 Jan 2023 23:11:24 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 28/34] cifs: Change the I/O paths to use an iterator rather than a page list From: David Howells To: Al Viro Cc: Steve French , Shyam Prasad N , Rohith Surabattula , Paulo Alcantara , Jeff Layton , linux-cifs@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:11:24 +0000 Message-ID: <167391068424.2311931.16870243598435547161.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755224844780519710?= X-GMAIL-MSGID: =?utf-8?q?1755224844780519710?= Currently, the cifs I/O paths hand lists of pages from the VM interface routines at the top all the way through the intervening layers to the socket interface at the bottom. This is a problem, however, for interfacing with netfslib which passes an iterator through to the ->issue_read() method (and will pass an iterator through to the ->issue_write() method in future). Netfslib takes over bounce buffering for direct I/O, async I/O and encrypted content, so cifs doesn't need to do that. Netfslib also converts IOVEC-type iterators into BVEC-type iterators if necessary. Further, cifs needs foliating - and folios may come in a variety of sizes, so a page list pointing to an array of heterogeneous pages may cause problems in places such as where crypto is done. Change the cifs I/O paths to hand iov_iter iterators all the way through instead. Notes: (1) Some old routines are #if'd out to be removed in a follow up patch so as to avoid confusing diff, thereby making the diff output easier to follow. I've removed functions that don't overlap with anything added. (2) struct smb_rqst loses rq_pages, rq_offset, rq_npages, rq_pagesz and rq_tailsz which describe the pages forming the buffer; instead there's an rq_iter describing the source buffer and an rq_buffer which is used to hold the buffer for encryption. (3) struct cifs_readdata and cifs_writedata are similarly modified to smb_rqst. The ->read_into_pages() and ->copy_into_pages() are then replaced with passing the iterator directly to the socket. The iterators are stored in these structs so that they are persistent and don't get deallocated when the function returns (unlike if they were stack variables). (4) Buffered writeback is overhauled, borrowing the code from the afs filesystem to gather up contiguous runs of folios. The XARRAY-type iterator is then used to refer directly to the pagecache and can be passed to the socket to transmit data directly from there. This includes: cifs_extend_writeback() cifs_write_back_from_locked_folio() cifs_writepages_region() cifs_writepages() (5) Pages are converted to folios. (6) Direct I/O uses netfs_extract_user_iter() to create a BVEC-type iterator from an IOBUF/UBUF-type source iterator. (7) smb2_get_aead_req() uses netfs_extract_iter_to_sg() to extract page fragments from the iterator into the scatterlists that the crypto layer prefers. (8) smb2_init_transform_rq() attached pages to smb_rqst::rq_buffer, an xarray, to use as a bounce buffer for encryption. An XARRAY-type iterator can then be used to pass the bounce buffer to lower layers. Signed-off-by: David Howells cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: Paulo Alcantara cc: Jeff Layton cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/164311907995.2806745.400147335497304099.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/164928620163.457102.11602306234438271112.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165211420279.3154751.15923591172438186144.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165348880385.2106726.3220789453472800240.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165364827111.3334034.934805882842932881.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/166126396180.708021.271013668175370826.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/166697259595.61150.5982032408321852414.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732031756.3186319.12528413619888902872.stgit@warthog.procyon.org.uk/ # rfc --- fs/cifs/Kconfig | 1 fs/cifs/cifsencrypt.c | 28 - fs/cifs/cifsglob.h | 66 +-- fs/cifs/cifsproto.h | 8 fs/cifs/cifssmb.c | 13 - fs/cifs/file.c | 1200 +++++++++++++++++++++++++++++++------------------ fs/cifs/fscache.c | 22 - fs/cifs/fscache.h | 10 fs/cifs/misc.c | 133 +---- fs/cifs/smb2ops.c | 371 +++++++-------- fs/cifs/smb2pdu.c | 45 +- fs/cifs/smbdirect.c | 263 ++++------- fs/cifs/smbdirect.h | 4 fs/cifs/transport.c | 57 +- 14 files changed, 1127 insertions(+), 1094 deletions(-) diff --git a/fs/cifs/Kconfig b/fs/cifs/Kconfig index 3b7e3b9e4fd2..1824e0a36f5a 100644 --- a/fs/cifs/Kconfig +++ b/fs/cifs/Kconfig @@ -18,6 +18,7 @@ config CIFS select DNS_RESOLVER select ASN1 select OID_REGISTRY + select NETFS_SUPPORT help This is the client VFS module for the SMB3 family of NAS protocols, (including support for the most recent, most secure dialect SMB3.1.1) diff --git a/fs/cifs/cifsencrypt.c b/fs/cifs/cifsencrypt.c index e13f26371540..05fc6ec36c28 100644 --- a/fs/cifs/cifsencrypt.c +++ b/fs/cifs/cifsencrypt.c @@ -169,11 +169,11 @@ static int cifs_shash_iter(const struct iov_iter *iter, size_t maxsize, } int __cifs_calc_signature(struct smb_rqst *rqst, - struct TCP_Server_Info *server, char *signature, - struct shash_desc *shash) + struct TCP_Server_Info *server, char *signature, + struct shash_desc *shash) { int i; - int rc; + ssize_t rc; struct kvec *iov = rqst->rq_iov; int n_vec = rqst->rq_nvec; @@ -205,25 +205,9 @@ int __cifs_calc_signature(struct smb_rqst *rqst, } } - /* now hash over the rq_pages array */ - for (i = 0; i < rqst->rq_npages; i++) { - void *kaddr; - unsigned int len, offset; - - rqst_page_get_length(rqst, i, &len, &offset); - - kaddr = (char *) kmap(rqst->rq_pages[i]) + offset; - - rc = crypto_shash_update(shash, kaddr, len); - if (rc) { - cifs_dbg(VFS, "%s: Could not update with payload\n", - __func__); - kunmap(rqst->rq_pages[i]); - return rc; - } - - kunmap(rqst->rq_pages[i]); - } + rc = cifs_shash_iter(&rqst->rq_iter, iov_iter_count(&rqst->rq_iter), shash); + if (rc < 0) + return rc; rc = crypto_shash_final(shash, signature); if (rc) diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index cfdd5bf701a1..e4f8c0f68152 100644 --- a/fs/cifs/cifsglob.h +++ b/fs/cifs/cifsglob.h @@ -216,11 +216,8 @@ static inline void cifs_free_open_info(struct cifs_open_info_data *data) struct smb_rqst { struct kvec *rq_iov; /* array of kvecs */ unsigned int rq_nvec; /* number of kvecs in array */ - struct page **rq_pages; /* pointer to array of page ptrs */ - unsigned int rq_offset; /* the offset to the 1st page */ - unsigned int rq_npages; /* number pages in array */ - unsigned int rq_pagesz; /* page size to use */ - unsigned int rq_tailsz; /* length of last page */ + struct iov_iter rq_iter; /* Data iterator */ + struct xarray rq_buffer; /* Page buffer for encryption */ }; struct mid_q_entry; @@ -1426,10 +1423,11 @@ struct cifs_aio_ctx { struct cifsFileInfo *cfile; struct bio_vec *bv; loff_t pos; - unsigned int npages; + unsigned int nr_pinned_pages; ssize_t rc; unsigned int len; unsigned int total_len; + unsigned int bv_cleanup_mode; /* How to clean up ->bv[] */ bool should_dirty; /* * Indicates if this aio_ctx is for direct_io, @@ -1447,28 +1445,18 @@ struct cifs_readdata { struct address_space *mapping; struct cifs_aio_ctx *ctx; __u64 offset; + ssize_t got_bytes; unsigned int bytes; - unsigned int got_bytes; pid_t pid; int result; struct work_struct work; - int (*read_into_pages)(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, - unsigned int len); - int (*copy_into_pages)(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, - struct iov_iter *iter); + struct iov_iter iter; struct kvec iov[2]; struct TCP_Server_Info *server; #ifdef CONFIG_CIFS_SMB_DIRECT struct smbd_mr *mr; #endif - unsigned int pagesz; - unsigned int page_offset; - unsigned int tailsz; struct cifs_credits credits; - unsigned int nr_pages; - struct page **pages; }; /* asynchronous write support */ @@ -1480,6 +1468,8 @@ struct cifs_writedata { struct work_struct work; struct cifsFileInfo *cfile; struct cifs_aio_ctx *ctx; + struct iov_iter iter; + struct bio_vec *bv; __u64 offset; pid_t pid; unsigned int bytes; @@ -1488,12 +1478,7 @@ struct cifs_writedata { #ifdef CONFIG_CIFS_SMB_DIRECT struct smbd_mr *mr; #endif - unsigned int pagesz; - unsigned int page_offset; - unsigned int tailsz; struct cifs_credits credits; - unsigned int nr_pages; - struct page **pages; }; /* @@ -2153,9 +2138,9 @@ static inline void move_cifs_info_to_smb2(struct smb2_file_all_info *dst, const dst->FileNameLength = src->FileNameLength; } -static inline unsigned int cifs_get_num_sgs(const struct smb_rqst *rqst, - int num_rqst, - const u8 *sig) +static inline int cifs_get_num_sgs(const struct smb_rqst *rqst, + int num_rqst, + const u8 *sig) { unsigned int len, skip; unsigned int nents = 0; @@ -2169,6 +2154,20 @@ static inline unsigned int cifs_get_num_sgs(const struct smb_rqst *rqst, * rqst[1+].rq_iov[0+] data to be encrypted/decrypted */ for (i = 0; i < num_rqst; i++) { + /* We really don't want a mixture of pinned and unpinned pages + * in the sglist. It's hard to keep track of which is what. + * Instead, we convert to a BVEC-type iterator higher up. + */ + if (WARN_ON_ONCE(user_backed_iter(&rqst[i].rq_iter))) + return -EIO; + + /* We also don't want to have any extra refs or pins + * to clean up in the sglist. + */ + if (WARN_ON_ONCE(iov_iter_extract_mode(&rqst[i].rq_iter, + FOLL_DEST_BUF))) + return -EIO; + /* * The first rqst has a transform header where the * first 20 bytes are not part of the encrypted blob. @@ -2186,7 +2185,7 @@ static inline unsigned int cifs_get_num_sgs(const struct smb_rqst *rqst, nents++; } } - nents += rqst[i].rq_npages; + nents += iov_iter_npages(&rqst[i].rq_iter, INT_MAX); } nents += DIV_ROUND_UP(offset_in_page(sig) + SMB2_SIGNATURE_SIZE, PAGE_SIZE); return nents; @@ -2195,9 +2194,9 @@ static inline unsigned int cifs_get_num_sgs(const struct smb_rqst *rqst, /* We can not use the normal sg_set_buf() as we will sometimes pass a * stack object as buf. */ -static inline struct scatterlist *cifs_sg_set_buf(struct scatterlist *sg, - const void *buf, - unsigned int buflen) +static inline void cifs_sg_set_buf(struct sg_table *sgtable, + const void *buf, + unsigned int buflen) { unsigned long addr = (unsigned long)buf; unsigned int off = offset_in_page(addr); @@ -2207,16 +2206,17 @@ static inline struct scatterlist *cifs_sg_set_buf(struct scatterlist *sg, do { unsigned int len = min_t(unsigned int, buflen, PAGE_SIZE - off); - sg_set_page(sg++, vmalloc_to_page((void *)addr), len, off); + sg_set_page(&sgtable->sgl[sgtable->nents++], + vmalloc_to_page((void *)addr), len, off); off = 0; addr += PAGE_SIZE; buflen -= len; } while (buflen); } else { - sg_set_page(sg++, virt_to_page(addr), buflen, off); + sg_set_page(&sgtable->sgl[sgtable->nents++], + virt_to_page(addr), buflen, off); } - return sg; } #endif /* _CIFS_GLOB_H */ diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h index cb7a3fe89278..2873f68a051c 100644 --- a/fs/cifs/cifsproto.h +++ b/fs/cifs/cifsproto.h @@ -584,10 +584,7 @@ int cifs_readv_receive(struct TCP_Server_Info *server, struct mid_q_entry *mid); int cifs_async_writev(struct cifs_writedata *wdata, void (*release)(struct kref *kref)); void cifs_writev_complete(struct work_struct *work); -struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages, - work_func_t complete); -struct cifs_writedata *cifs_writedata_direct_alloc(struct page **pages, - work_func_t complete); +struct cifs_writedata *cifs_writedata_alloc(work_func_t complete); void cifs_writedata_release(struct kref *refcount); int cifs_query_mf_symlink(unsigned int xid, struct cifs_tcon *tcon, struct cifs_sb_info *cifs_sb, @@ -604,13 +601,10 @@ enum securityEnum cifs_select_sectype(struct TCP_Server_Info *, enum securityEnum); struct cifs_aio_ctx *cifs_aio_ctx_alloc(void); void cifs_aio_ctx_release(struct kref *refcount); -int setup_aio_ctx_iter(struct cifs_aio_ctx *ctx, struct iov_iter *iter, int rw); int cifs_alloc_hash(const char *name, struct shash_desc **sdesc); void cifs_free_hash(struct shash_desc **sdesc); -void rqst_page_get_length(const struct smb_rqst *rqst, unsigned int page, - unsigned int *len, unsigned int *offset); struct cifs_chan * cifs_ses_find_chan(struct cifs_ses *ses, struct TCP_Server_Info *server); int cifs_try_adding_channels(struct cifs_sb_info *cifs_sb, struct cifs_ses *ses); diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c index 23f10e0d6e7e..878064370f46 100644 --- a/fs/cifs/cifssmb.c +++ b/fs/cifs/cifssmb.c @@ -24,6 +24,7 @@ #include #include #include "cifspdu.h" +#include "cifsfs.h" #include "cifsglob.h" #include "cifsacl.h" #include "cifsproto.h" @@ -1294,11 +1295,7 @@ cifs_readv_callback(struct mid_q_entry *mid) struct TCP_Server_Info *server = tcon->ses->server; struct smb_rqst rqst = { .rq_iov = rdata->iov, .rq_nvec = 2, - .rq_pages = rdata->pages, - .rq_offset = rdata->page_offset, - .rq_npages = rdata->nr_pages, - .rq_pagesz = rdata->pagesz, - .rq_tailsz = rdata->tailsz }; + .rq_iter = rdata->iter }; struct cifs_credits credits = { .value = 1, .instance = 0 }; cifs_dbg(FYI, "%s: mid=%llu state=%d result=%d bytes=%u\n", @@ -1737,11 +1734,7 @@ cifs_async_writev(struct cifs_writedata *wdata, rqst.rq_iov = iov; rqst.rq_nvec = 2; - rqst.rq_pages = wdata->pages; - rqst.rq_offset = wdata->page_offset; - rqst.rq_npages = wdata->nr_pages; - rqst.rq_pagesz = wdata->pagesz; - rqst.rq_tailsz = wdata->tailsz; + rqst.rq_iter = wdata->iter; cifs_dbg(FYI, "async write at %llu %u bytes\n", wdata->offset, wdata->bytes); diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 2873f28bf388..cfa8ad8a59c4 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -36,6 +36,32 @@ #include "cifs_ioctl.h" #include "cached_dir.h" +/* + * Remove the dirty flags from a span of pages. + */ +static void cifs_undirty_folios(struct inode *inode, loff_t start, unsigned int len) +{ + struct address_space *mapping = inode->i_mapping; + struct folio *folio; + pgoff_t end; + + XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE); + + rcu_read_lock(); + + end = (start + len - 1) / PAGE_SIZE; + xas_for_each_marked(&xas, folio, end, PAGECACHE_TAG_DIRTY) { + xas_pause(&xas); + rcu_read_unlock(); + folio_lock(folio); + folio_clear_dirty_for_io(folio); + folio_unlock(folio); + rcu_read_lock(); + } + + rcu_read_unlock(); +} + /* * Completion of write to server. */ @@ -2388,7 +2414,6 @@ cifs_writedata_release(struct kref *refcount) if (wdata->cfile) cifsFileInfo_put(wdata->cfile); - kvfree(wdata->pages); kfree(wdata); } @@ -2399,51 +2424,49 @@ cifs_writedata_release(struct kref *refcount) static void cifs_writev_requeue(struct cifs_writedata *wdata) { - int i, rc = 0; + int rc = 0; struct inode *inode = d_inode(wdata->cfile->dentry); struct TCP_Server_Info *server; - unsigned int rest_len; + unsigned int rest_len = wdata->bytes; + loff_t fpos = wdata->offset; server = tlink_tcon(wdata->cfile->tlink)->ses->server; - i = 0; - rest_len = wdata->bytes; do { struct cifs_writedata *wdata2; - unsigned int j, nr_pages, wsize, tailsz, cur_len; + unsigned int wsize, cur_len; wsize = server->ops->wp_retry_size(inode); if (wsize < rest_len) { - nr_pages = wsize / PAGE_SIZE; - if (!nr_pages) { - rc = -EOPNOTSUPP; + if (wsize < PAGE_SIZE) { + rc = -ENOTSUPP; break; } - cur_len = nr_pages * PAGE_SIZE; - tailsz = PAGE_SIZE; + cur_len = min(round_down(wsize, PAGE_SIZE), rest_len); } else { - nr_pages = DIV_ROUND_UP(rest_len, PAGE_SIZE); cur_len = rest_len; - tailsz = rest_len - (nr_pages - 1) * PAGE_SIZE; } - wdata2 = cifs_writedata_alloc(nr_pages, cifs_writev_complete); + wdata2 = cifs_writedata_alloc(cifs_writev_complete); if (!wdata2) { rc = -ENOMEM; break; } - for (j = 0; j < nr_pages; j++) { - wdata2->pages[j] = wdata->pages[i + j]; - lock_page(wdata2->pages[j]); - clear_page_dirty_for_io(wdata2->pages[j]); - } - wdata2->sync_mode = wdata->sync_mode; - wdata2->nr_pages = nr_pages; - wdata2->offset = page_offset(wdata2->pages[0]); - wdata2->pagesz = PAGE_SIZE; - wdata2->tailsz = tailsz; - wdata2->bytes = cur_len; + wdata2->offset = fpos; + wdata2->bytes = cur_len; + wdata2->iter = wdata->iter; + + iov_iter_advance(&wdata2->iter, fpos - wdata->offset); + iov_iter_truncate(&wdata2->iter, wdata2->bytes); + + if (iov_iter_is_xarray(&wdata2->iter)) + /* Check for pages having been redirtied and clean + * them. We can do this by walking the xarray. If + * it's not an xarray, then it's a DIO and we shouldn't + * be mucking around with the page bits. + */ + cifs_undirty_folios(inode, fpos, cur_len); rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY, &wdata2->cfile); @@ -2458,33 +2481,22 @@ cifs_writev_requeue(struct cifs_writedata *wdata) cifs_writedata_release); } - for (j = 0; j < nr_pages; j++) { - unlock_page(wdata2->pages[j]); - if (rc != 0 && !is_retryable_error(rc)) { - SetPageError(wdata2->pages[j]); - end_page_writeback(wdata2->pages[j]); - put_page(wdata2->pages[j]); - } - } - kref_put(&wdata2->refcount, cifs_writedata_release); if (rc) { if (is_retryable_error(rc)) continue; - i += nr_pages; + fpos += cur_len; + rest_len -= cur_len; break; } + fpos += cur_len; rest_len -= cur_len; - i += nr_pages; - } while (i < wdata->nr_pages); + } while (rest_len > 0); - /* cleanup remaining pages from the original wdata */ - for (; i < wdata->nr_pages; i++) { - SetPageError(wdata->pages[i]); - end_page_writeback(wdata->pages[i]); - put_page(wdata->pages[i]); - } + /* Clean up remaining pages from the original wdata */ + if (iov_iter_is_xarray(&wdata->iter)) + cifs_pages_write_failed(inode, fpos, rest_len); if (rc != 0 && !is_retryable_error(rc)) mapping_set_error(inode->i_mapping, rc); @@ -2497,7 +2509,6 @@ cifs_writev_complete(struct work_struct *work) struct cifs_writedata *wdata = container_of(work, struct cifs_writedata, work); struct inode *inode = d_inode(wdata->cfile->dentry); - int i = 0; if (wdata->result == 0) { spin_lock(&inode->i_lock); @@ -2508,45 +2519,24 @@ cifs_writev_complete(struct work_struct *work) } else if (wdata->sync_mode == WB_SYNC_ALL && wdata->result == -EAGAIN) return cifs_writev_requeue(wdata); - for (i = 0; i < wdata->nr_pages; i++) { - struct page *page = wdata->pages[i]; + if (wdata->result == -EAGAIN) + cifs_pages_write_redirty(inode, wdata->offset, wdata->bytes); + else if (wdata->result < 0) + cifs_pages_write_failed(inode, wdata->offset, wdata->bytes); + else + cifs_pages_written_back(inode, wdata->offset, wdata->bytes); - if (wdata->result == -EAGAIN) - __set_page_dirty_nobuffers(page); - else if (wdata->result < 0) - SetPageError(page); - end_page_writeback(page); - cifs_readpage_to_fscache(inode, page); - put_page(page); - } if (wdata->result != -EAGAIN) mapping_set_error(inode->i_mapping, wdata->result); kref_put(&wdata->refcount, cifs_writedata_release); } -struct cifs_writedata * -cifs_writedata_alloc(unsigned int nr_pages, work_func_t complete) -{ - struct cifs_writedata *writedata = NULL; - struct page **pages = - kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS); - if (pages) { - writedata = cifs_writedata_direct_alloc(pages, complete); - if (!writedata) - kvfree(pages); - } - - return writedata; -} - -struct cifs_writedata * -cifs_writedata_direct_alloc(struct page **pages, work_func_t complete) +struct cifs_writedata *cifs_writedata_alloc(work_func_t complete) { struct cifs_writedata *wdata; wdata = kzalloc(sizeof(*wdata), GFP_NOFS); if (wdata != NULL) { - wdata->pages = pages; kref_init(&wdata->refcount); INIT_LIST_HEAD(&wdata->list); init_completion(&wdata->done); @@ -2555,7 +2545,6 @@ cifs_writedata_direct_alloc(struct page **pages, work_func_t complete) return wdata; } - static int cifs_partialpagewrite(struct page *page, unsigned from, unsigned to) { struct address_space *mapping = page->mapping; @@ -2614,6 +2603,7 @@ static int cifs_partialpagewrite(struct page *page, unsigned from, unsigned to) return rc; } +#if 0 // TODO: Remove for iov_iter support static struct cifs_writedata * wdata_alloc_and_fillpages(pgoff_t tofind, struct address_space *mapping, pgoff_t end, pgoff_t *index, @@ -2919,6 +2909,374 @@ static int cifs_writepages(struct address_space *mapping, set_bit(CIFS_INO_MODIFIED_ATTR, &CIFS_I(inode)->flags); return rc; } +#endif + +/* + * Extend the region to be written back to include subsequent contiguously + * dirty pages if possible, but don't sleep while doing so. + */ +static void cifs_extend_writeback(struct address_space *mapping, + long *_count, + loff_t start, + int max_pages, + size_t max_len, + unsigned int *_len) +{ + struct folio_batch batch; + struct folio *folio; + unsigned int psize, nr_pages; + size_t len = *_len; + pgoff_t index = (start + len) / PAGE_SIZE; + bool stop = true; + unsigned int i; + + XA_STATE(xas, &mapping->i_pages, index); + folio_batch_init(&batch); + + do { + /* Firstly, we gather up a batch of contiguous dirty pages + * under the RCU read lock - but we can't clear the dirty flags + * there if any of those pages are mapped. + */ + rcu_read_lock(); + + xas_for_each(&xas, folio, ULONG_MAX) { + stop = true; + if (xas_retry(&xas, folio)) + continue; + if (xa_is_value(folio)) + break; + if (folio_index(folio) != index) + break; + if (!folio_try_get_rcu(folio)) { + xas_reset(&xas); + continue; + } + nr_pages = folio_nr_pages(folio); + if (nr_pages > max_pages) + break; + + /* Has the page moved or been split? */ + if (unlikely(folio != xas_reload(&xas))) { + folio_put(folio); + break; + } + + if (!folio_trylock(folio)) { + folio_put(folio); + break; + } + if (!folio_test_dirty(folio) || folio_test_writeback(folio)) { + folio_unlock(folio); + folio_put(folio); + break; + } + + max_pages -= nr_pages; + psize = folio_size(folio); + len += psize; + stop = false; + if (max_pages <= 0 || len >= max_len || *_count <= 0) + stop = true; + + index += nr_pages; + if (!folio_batch_add(&batch, folio)) + break; + if (stop) + break; + } + + if (!stop) + xas_pause(&xas); + rcu_read_unlock(); + + /* Now, if we obtained any pages, we can shift them to being + * writable and mark them for caching. + */ + if (!folio_batch_count(&batch)) + break; + + for (i = 0; i < folio_batch_count(&batch); i++) { + folio = batch.folios[i]; + /* The folio should be locked, dirty and not undergoing + * writeback from the loop above. + */ + if (!folio_clear_dirty_for_io(folio)) + WARN_ON(1); + if (folio_start_writeback(folio)) + WARN_ON(1); + + *_count -= folio_nr_pages(folio); + folio_unlock(folio); + } + + folio_batch_release(&batch); + cond_resched(); + } while (!stop); + + *_len = len; +} + +/* + * Write back the locked page and any subsequent non-locked dirty pages. + */ +static ssize_t cifs_write_back_from_locked_folio(struct address_space *mapping, + struct writeback_control *wbc, + struct folio *folio, + loff_t start, loff_t end) +{ + struct inode *inode = mapping->host; + struct TCP_Server_Info *server; + struct cifs_writedata *wdata; + struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); + struct cifs_credits credits_on_stack; + struct cifs_credits *credits = &credits_on_stack; + struct cifsFileInfo *cfile = NULL; + unsigned int xid, wsize, len; + loff_t i_size = i_size_read(inode); + size_t max_len; + long count = wbc->nr_to_write; + int rc; + + /* The folio should be locked, dirty and not undergoing writeback. */ + if (folio_start_writeback(folio)) + WARN_ON(1); + + count -= folio_nr_pages(folio); + len = folio_size(folio); + + xid = get_xid(); + server = cifs_pick_channel(cifs_sb_master_tcon(cifs_sb)->ses); + + rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY, &cfile); + if (rc) { + cifs_dbg(VFS, "No writable handle in writepages rc=%d\n", rc); + goto err_xid; + } + + rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->wsize, + &wsize, credits); + if (rc != 0) + goto err_close; + + wdata = cifs_writedata_alloc(cifs_writev_complete); + if (!wdata) { + rc = -ENOMEM; + goto err_uncredit; + } + + wdata->sync_mode = wbc->sync_mode; + wdata->offset = folio_pos(folio); + wdata->pid = cfile->pid; + wdata->credits = credits_on_stack; + wdata->cfile = cfile; + wdata->server = server; + cfile = NULL; + + /* Find all consecutive lockable dirty pages, stopping when we find a + * page that is not immediately lockable, is not dirty or is missing, + * or we reach the end of the range. + */ + if (start < i_size) { + /* Trim the write to the EOF; the extra data is ignored. Also + * put an upper limit on the size of a single storedata op. + */ + max_len = wsize; + max_len = min_t(unsigned long long, max_len, end - start + 1); + max_len = min_t(unsigned long long, max_len, i_size - start); + + if (len < max_len) { + int max_pages = INT_MAX; + +#ifdef CONFIG_CIFS_SMB_DIRECT + if (server->smbd_conn) + max_pages = server->smbd_conn->max_frmr_depth; +#endif + max_pages -= folio_nr_pages(folio); + + if (max_pages > 0) + cifs_extend_writeback(mapping, &count, start, + max_pages, max_len, &len); + } + len = min_t(loff_t, len, max_len); + } + + wdata->bytes = len; + + /* We now have a contiguous set of dirty pages, each with writeback + * set; the first page is still locked at this point, but all the rest + * have been unlocked. + */ + folio_unlock(folio); + + if (start < i_size) { + iov_iter_xarray(&wdata->iter, WRITE, &mapping->i_pages, start, len); + + rc = adjust_credits(wdata->server, &wdata->credits, wdata->bytes); + if (rc) + goto err_wdata; + + if (wdata->cfile->invalidHandle) + rc = -EAGAIN; + else + rc = wdata->server->ops->async_writev(wdata, + cifs_writedata_release); + if (rc >= 0) { + kref_put(&wdata->refcount, cifs_writedata_release); + goto err_close; + } + } else { + /* The dirty region was entirely beyond the EOF. */ + cifs_pages_written_back(inode, start, len); + rc = 0; + } + +err_wdata: + kref_put(&wdata->refcount, cifs_writedata_release); +err_uncredit: + add_credits_and_wake_if(server, credits, 0); +err_close: + if (cfile) + cifsFileInfo_put(cfile); +err_xid: + free_xid(xid); + if (rc == 0) { + wbc->nr_to_write = count; + } else if (is_retryable_error(rc)) { + cifs_pages_write_redirty(inode, start, len); + } else { + cifs_pages_write_failed(inode, start, len); + mapping_set_error(mapping, rc); + } + /* Indication to update ctime and mtime as close is deferred */ + set_bit(CIFS_INO_MODIFIED_ATTR, &CIFS_I(inode)->flags); + return rc; +} + +/* + * write a region of pages back to the server + */ +static int cifs_writepages_region(struct address_space *mapping, + struct writeback_control *wbc, + loff_t start, loff_t end, loff_t *_next) +{ + struct folio *folio; + struct page *head_page; + ssize_t ret; + int n, skips = 0; + + do { + pgoff_t index = start / PAGE_SIZE; + + n = find_get_pages_range_tag(mapping, &index, end / PAGE_SIZE, + PAGECACHE_TAG_DIRTY, 1, &head_page); + if (!n) + break; + + folio = page_folio(head_page); + start = folio_pos(folio); /* May regress with THPs */ + + /* At this point we hold neither the i_pages lock nor the + * page lock: the page may be truncated or invalidated + * (changing page->mapping to NULL), or even swizzled + * back from swapper_space to tmpfs file mapping + */ + if (wbc->sync_mode != WB_SYNC_NONE) { + ret = folio_lock_killable(folio); + if (ret < 0) { + folio_put(folio); + return ret; + } + } else { + if (!folio_trylock(folio)) { + folio_put(folio); + return 0; + } + } + + if (folio_mapping(folio) != mapping || + !folio_test_dirty(folio)) { + start += folio_size(folio); + folio_unlock(folio); + folio_put(folio); + continue; + } + + if (folio_test_writeback(folio) || + folio_test_fscache(folio)) { + folio_unlock(folio); + if (wbc->sync_mode != WB_SYNC_NONE) { + folio_wait_writeback(folio); +#ifdef CONFIG_CIFS_FSCACHE + folio_wait_fscache(folio); +#endif + } else { + start += folio_size(folio); + } + folio_put(folio); + if (wbc->sync_mode == WB_SYNC_NONE) { + if (skips >= 5 || need_resched()) + break; + skips++; + } + continue; + } + + if (!folio_clear_dirty_for_io(folio)) + /* We hold the page lock - it should've been dirty. */ + WARN_ON(1); + + ret = cifs_write_back_from_locked_folio(mapping, wbc, folio, start, end); + folio_put(folio); + if (ret < 0) + return ret; + + start += ret; + cond_resched(); + } while (wbc->nr_to_write > 0); + + *_next = start; + return 0; +} + +/* + * Write some of the pending data back to the server + */ +static int cifs_writepages(struct address_space *mapping, + struct writeback_control *wbc) +{ + loff_t start, next; + int ret; + + /* We have to be careful as we can end up racing with setattr() + * truncating the pagecache since the caller doesn't take a lock here + * to prevent it. + */ + + if (wbc->range_cyclic) { + start = mapping->writeback_index * PAGE_SIZE; + ret = cifs_writepages_region(mapping, wbc, start, LLONG_MAX, &next); + if (ret == 0) { + mapping->writeback_index = next / PAGE_SIZE; + if (start > 0 && wbc->nr_to_write > 0) { + ret = cifs_writepages_region(mapping, wbc, 0, + start, &next); + if (ret == 0) + mapping->writeback_index = + next / PAGE_SIZE; + } + } + } else if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) { + ret = cifs_writepages_region(mapping, wbc, 0, LLONG_MAX, &next); + if (wbc->nr_to_write > 0 && ret == 0) + mapping->writeback_index = next / PAGE_SIZE; + } else { + ret = cifs_writepages_region(mapping, wbc, + wbc->range_start, wbc->range_end, &next); + } + + return ret; +} static int cifs_writepage_locked(struct page *page, struct writeback_control *wbc) @@ -2969,6 +3327,7 @@ static int cifs_write_end(struct file *file, struct address_space *mapping, struct inode *inode = mapping->host; struct cifsFileInfo *cfile = file->private_data; struct cifs_sb_info *cifs_sb = CIFS_SB(cfile->dentry->d_sb); + struct folio *folio = page_folio(page); __u32 pid; if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD) @@ -2979,14 +3338,14 @@ static int cifs_write_end(struct file *file, struct address_space *mapping, cifs_dbg(FYI, "write_end for page %p from pos %lld with %d bytes\n", page, pos, copied); - if (PageChecked(page)) { + if (folio_test_checked(folio)) { if (copied == len) - SetPageUptodate(page); - ClearPageChecked(page); - } else if (!PageUptodate(page) && copied == PAGE_SIZE) - SetPageUptodate(page); + folio_mark_uptodate(folio); + folio_clear_checked(folio); + } else if (!folio_test_uptodate(folio) && copied == PAGE_SIZE) + folio_mark_uptodate(folio); - if (!PageUptodate(page)) { + if (!folio_test_uptodate(folio)) { char *page_data; unsigned offset = pos & (PAGE_SIZE - 1); unsigned int xid; @@ -3146,6 +3505,7 @@ int cifs_flush(struct file *file, fl_owner_t id) return rc; } +#if 0 // TODO: Remove for iov_iter support static int cifs_write_allocate_pages(struct page **pages, unsigned long num_pages) { @@ -3186,17 +3546,15 @@ size_t get_numpages(const size_t wsize, const size_t len, size_t *cur_len) return num_pages; } +#endif static void cifs_uncached_writedata_release(struct kref *refcount) { - int i; struct cifs_writedata *wdata = container_of(refcount, struct cifs_writedata, refcount); kref_put(&wdata->ctx->refcount, cifs_aio_ctx_release); - for (i = 0; i < wdata->nr_pages; i++) - put_page(wdata->pages[i]); cifs_writedata_release(refcount); } @@ -3222,6 +3580,7 @@ cifs_uncached_writev_complete(struct work_struct *work) kref_put(&wdata->refcount, cifs_uncached_writedata_release); } +#if 0 // TODO: Remove for iov_iter support static int wdata_fill_from_iovec(struct cifs_writedata *wdata, struct iov_iter *from, size_t *len, unsigned long *num_pages) @@ -3263,6 +3622,7 @@ wdata_fill_from_iovec(struct cifs_writedata *wdata, struct iov_iter *from, *num_pages = i + 1; return 0; } +#endif static int cifs_resend_wdata(struct cifs_writedata *wdata, struct list_head *wdata_list, @@ -3334,23 +3694,57 @@ cifs_resend_wdata(struct cifs_writedata *wdata, struct list_head *wdata_list, return rc; } +/* + * Select span of a bvec iterator we're going to use. Limit it by both maximum + * size and maximum number of segments. + */ +static size_t cifs_limit_bvec_subset(const struct iov_iter *iter, size_t max_size, + size_t max_segs, unsigned int *_nsegs) +{ + const struct bio_vec *bvecs = iter->bvec; + unsigned int nbv = iter->nr_segs, ix = 0, nsegs = 0; + size_t len, span = 0, n = iter->count; + size_t skip = iter->iov_offset; + + if (WARN_ON(!iov_iter_is_bvec(iter)) || n == 0) + return 0; + + while (n && ix < nbv && skip) { + len = bvecs[ix].bv_len; + if (skip < len) + break; + skip -= len; + n -= len; + ix++; + } + + while (n && ix < nbv) { + len = min3(n, bvecs[ix].bv_len - skip, max_size); + span += len; + nsegs++; + ix++; + if (span >= max_size || nsegs >= max_segs) + break; + skip = 0; + n -= len; + } + + *_nsegs = nsegs; + return span; +} + static int -cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from, +cifs_write_from_iter(loff_t fpos, size_t len, struct iov_iter *from, struct cifsFileInfo *open_file, struct cifs_sb_info *cifs_sb, struct list_head *wdata_list, struct cifs_aio_ctx *ctx) { int rc = 0; - size_t cur_len; - unsigned long nr_pages, num_pages, i; + size_t cur_len, max_len; struct cifs_writedata *wdata; - struct iov_iter saved_from = *from; - loff_t saved_offset = offset; pid_t pid; struct TCP_Server_Info *server; - struct page **pagevec; - size_t start; - unsigned int xid; + unsigned int xid, max_segs = INT_MAX; if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD) pid = open_file->pid; @@ -3360,10 +3754,20 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from, server = cifs_pick_channel(tlink_tcon(open_file->tlink)->ses); xid = get_xid(); +#ifdef CONFIG_CIFS_SMB_DIRECT + if (server->smbd_conn) + max_segs = server->smbd_conn->max_frmr_depth; +#endif + do { - unsigned int wsize; struct cifs_credits credits_on_stack; struct cifs_credits *credits = &credits_on_stack; + unsigned int wsize, nsegs = 0; + + if (signal_pending(current)) { + rc = -EINTR; + break; + } if (open_file->invalidHandle) { rc = cifs_reopen_file(open_file, false); @@ -3378,99 +3782,42 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from, if (rc) break; - cur_len = min_t(const size_t, len, wsize); - - if (ctx->direct_io) { - ssize_t result; - - result = iov_iter_get_pages_alloc( - from, &pagevec, cur_len, &start, FOLL_SOURCE_BUF); - if (result < 0) { - cifs_dbg(VFS, - "direct_writev couldn't get user pages (rc=%zd) iter type %d iov_offset %zd count %zd\n", - result, iov_iter_type(from), - from->iov_offset, from->count); - dump_stack(); - - rc = result; - add_credits_and_wake_if(server, credits, 0); - break; - } - cur_len = (size_t)result; - - nr_pages = - (cur_len + start + PAGE_SIZE - 1) / PAGE_SIZE; - - wdata = cifs_writedata_direct_alloc(pagevec, - cifs_uncached_writev_complete); - if (!wdata) { - rc = -ENOMEM; - for (i = 0; i < nr_pages; i++) - put_page(pagevec[i]); - kvfree(pagevec); - add_credits_and_wake_if(server, credits, 0); - break; - } - - - wdata->page_offset = start; - wdata->tailsz = - nr_pages > 1 ? - cur_len - (PAGE_SIZE - start) - - (nr_pages - 2) * PAGE_SIZE : - cur_len; - } else { - nr_pages = get_numpages(wsize, len, &cur_len); - wdata = cifs_writedata_alloc(nr_pages, - cifs_uncached_writev_complete); - if (!wdata) { - rc = -ENOMEM; - add_credits_and_wake_if(server, credits, 0); - break; - } - - rc = cifs_write_allocate_pages(wdata->pages, nr_pages); - if (rc) { - kvfree(wdata->pages); - kfree(wdata); - add_credits_and_wake_if(server, credits, 0); - break; - } - - num_pages = nr_pages; - rc = wdata_fill_from_iovec( - wdata, from, &cur_len, &num_pages); - if (rc) { - for (i = 0; i < nr_pages; i++) - put_page(wdata->pages[i]); - kvfree(wdata->pages); - kfree(wdata); - add_credits_and_wake_if(server, credits, 0); - break; - } + max_len = min_t(const size_t, len, wsize); + if (!max_len) { + rc = -EAGAIN; + add_credits_and_wake_if(server, credits, 0); + break; + } - /* - * Bring nr_pages down to the number of pages we - * actually used, and free any pages that we didn't use. - */ - for ( ; nr_pages > num_pages; nr_pages--) - put_page(wdata->pages[nr_pages - 1]); + cur_len = cifs_limit_bvec_subset(from, max_len, max_segs, &nsegs); + cifs_dbg(FYI, "write_from_iter len=%zx/%zx nsegs=%u/%lu/%u\n", + cur_len, max_len, nsegs, from->nr_segs, max_segs); + if (cur_len == 0) { + rc = -EIO; + add_credits_and_wake_if(server, credits, 0); + break; + } - wdata->tailsz = cur_len - ((nr_pages - 1) * PAGE_SIZE); + wdata = cifs_writedata_alloc(cifs_uncached_writev_complete); + if (!wdata) { + rc = -ENOMEM; + add_credits_and_wake_if(server, credits, 0); + break; } wdata->sync_mode = WB_SYNC_ALL; - wdata->nr_pages = nr_pages; - wdata->offset = (__u64)offset; - wdata->cfile = cifsFileInfo_get(open_file); - wdata->server = server; - wdata->pid = pid; - wdata->bytes = cur_len; - wdata->pagesz = PAGE_SIZE; - wdata->credits = credits_on_stack; - wdata->ctx = ctx; + wdata->offset = (__u64)fpos; + wdata->cfile = cifsFileInfo_get(open_file); + wdata->server = server; + wdata->pid = pid; + wdata->bytes = cur_len; + wdata->credits = credits_on_stack; + wdata->iter = *from; + wdata->ctx = ctx; kref_get(&ctx->refcount); + iov_iter_truncate(&wdata->iter, cur_len); + rc = adjust_credits(server, &wdata->credits, wdata->bytes); if (!rc) { @@ -3485,16 +3832,14 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from, add_credits_and_wake_if(server, &wdata->credits, 0); kref_put(&wdata->refcount, cifs_uncached_writedata_release); - if (rc == -EAGAIN) { - *from = saved_from; - iov_iter_advance(from, offset - saved_offset); + if (rc == -EAGAIN) continue; - } break; } list_add_tail(&wdata->list, wdata_list); - offset += cur_len; + iov_iter_advance(from, cur_len); + fpos += cur_len; len -= cur_len; } while (len > 0); @@ -3593,8 +3938,6 @@ static ssize_t __cifs_writev( struct cifs_tcon *tcon; struct cifs_sb_info *cifs_sb; struct cifs_aio_ctx *ctx; - struct iov_iter saved_from = *from; - size_t len = iov_iter_count(from); int rc; /* @@ -3628,23 +3971,56 @@ static ssize_t __cifs_writev( ctx->iocb = iocb; ctx->pos = iocb->ki_pos; + ctx->direct_io = direct; + ctx->nr_pinned_pages = 0; - if (direct) { - ctx->direct_io = true; - ctx->iter = *from; - ctx->len = len; - } else { - rc = setup_aio_ctx_iter(ctx, from, ITER_SOURCE); - if (rc) { + if (user_backed_iter(from)) { + /* + * Extract IOVEC/UBUF-type iterators to a BVEC-type iterator as + * they contain references to the calling process's virtual + * memory layout which won't be available in an async worker + * thread. This also takes a ref or a pin on every folio + * involved. + */ + rc = netfs_extract_user_iter(from, iov_iter_count(from), + &ctx->iter, FOLL_SOURCE_BUF); + if (rc < 0) { kref_put(&ctx->refcount, cifs_aio_ctx_release); return rc; } + + ctx->nr_pinned_pages = rc; + ctx->bv = (void *)ctx->iter.bvec; + ctx->bv_cleanup_mode = + iov_iter_extract_mode(&ctx->iter, FOLL_SOURCE_BUF); + } else if ((iov_iter_is_bvec(from) || iov_iter_is_kvec(from)) && + !is_sync_kiocb(iocb)) { + /* + * If the op is asynchronous, we need to copy the list attached + * to a BVEC/KVEC-type iterator, but we assume that the storage + * will be pinned by the caller; in any case, we may or may not + * be able to pin the pages, so we don't try. + */ + ctx->bv = (void *)dup_iter(&ctx->iter, from, GFP_KERNEL); + if (!ctx->bv) { + kref_put(&ctx->refcount, cifs_aio_ctx_release); + return -ENOMEM; + } + } else { + /* + * Otherwise, we just pass the iterator down as-is and rely on + * the caller to make sure the pages referred to by the + * iterator don't evaporate. + */ + ctx->iter = *from; } + ctx->len = iov_iter_count(&ctx->iter); + /* grab a lock here due to read response handlers can access ctx */ mutex_lock(&ctx->aio_mutex); - rc = cifs_write_from_iter(iocb->ki_pos, ctx->len, &saved_from, + rc = cifs_write_from_iter(iocb->ki_pos, ctx->len, &ctx->iter, cfile, cifs_sb, &ctx->list, ctx); /* @@ -3787,14 +4163,12 @@ cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from) return written; } -static struct cifs_readdata * -cifs_readdata_direct_alloc(struct page **pages, work_func_t complete) +static struct cifs_readdata *cifs_readdata_alloc(work_func_t complete) { struct cifs_readdata *rdata; rdata = kzalloc(sizeof(*rdata), GFP_KERNEL); - if (rdata != NULL) { - rdata->pages = pages; + if (rdata) { kref_init(&rdata->refcount); INIT_LIST_HEAD(&rdata->list); init_completion(&rdata->done); @@ -3804,27 +4178,14 @@ cifs_readdata_direct_alloc(struct page **pages, work_func_t complete) return rdata; } -static struct cifs_readdata * -cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete) -{ - struct page **pages = - kcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL); - struct cifs_readdata *ret = NULL; - - if (pages) { - ret = cifs_readdata_direct_alloc(pages, complete); - if (!ret) - kfree(pages); - } - - return ret; -} - void cifs_readdata_release(struct kref *refcount) { struct cifs_readdata *rdata = container_of(refcount, struct cifs_readdata, refcount); + + if (rdata->ctx) + kref_put(&rdata->ctx->refcount, cifs_aio_ctx_release); #ifdef CONFIG_CIFS_SMB_DIRECT if (rdata->mr) { smbd_deregister_mr(rdata->mr); @@ -3834,85 +4195,9 @@ cifs_readdata_release(struct kref *refcount) if (rdata->cfile) cifsFileInfo_put(rdata->cfile); - kvfree(rdata->pages); kfree(rdata); } -static int -cifs_read_allocate_pages(struct cifs_readdata *rdata, unsigned int nr_pages) -{ - int rc = 0; - struct page *page; - unsigned int i; - - for (i = 0; i < nr_pages; i++) { - page = alloc_page(GFP_KERNEL|__GFP_HIGHMEM); - if (!page) { - rc = -ENOMEM; - break; - } - rdata->pages[i] = page; - } - - if (rc) { - unsigned int nr_page_failed = i; - - for (i = 0; i < nr_page_failed; i++) { - put_page(rdata->pages[i]); - rdata->pages[i] = NULL; - } - } - return rc; -} - -static void -cifs_uncached_readdata_release(struct kref *refcount) -{ - struct cifs_readdata *rdata = container_of(refcount, - struct cifs_readdata, refcount); - unsigned int i; - - kref_put(&rdata->ctx->refcount, cifs_aio_ctx_release); - for (i = 0; i < rdata->nr_pages; i++) { - put_page(rdata->pages[i]); - } - cifs_readdata_release(refcount); -} - -/** - * cifs_readdata_to_iov - copy data from pages in response to an iovec - * @rdata: the readdata response with list of pages holding data - * @iter: destination for our data - * - * This function copies data from a list of pages in a readdata response into - * an array of iovecs. It will first calculate where the data should go - * based on the info in the readdata and then copy the data into that spot. - */ -static int -cifs_readdata_to_iov(struct cifs_readdata *rdata, struct iov_iter *iter) -{ - size_t remaining = rdata->got_bytes; - unsigned int i; - - for (i = 0; i < rdata->nr_pages; i++) { - struct page *page = rdata->pages[i]; - size_t copy = min_t(size_t, remaining, PAGE_SIZE); - size_t written; - - if (unlikely(iov_iter_is_pipe(iter))) { - void *addr = kmap_atomic(page); - - written = copy_to_iter(addr, copy, iter); - kunmap_atomic(addr); - } else - written = copy_page_to_iter(page, 0, copy, iter); - remaining -= written; - if (written < copy && iov_iter_count(iter) > 0) - break; - } - return remaining ? -EFAULT : 0; -} - static void collect_uncached_read_data(struct cifs_aio_ctx *ctx); static void @@ -3924,9 +4209,11 @@ cifs_uncached_readv_complete(struct work_struct *work) complete(&rdata->done); collect_uncached_read_data(rdata->ctx); /* the below call can possibly free the last ref to aio ctx */ - kref_put(&rdata->refcount, cifs_uncached_readdata_release); + kref_put(&rdata->refcount, cifs_readdata_release); } +#if 0 // TODO: Remove for iov_iter support + static int uncached_fill_pages(struct TCP_Server_Info *server, struct cifs_readdata *rdata, struct iov_iter *iter, @@ -4000,6 +4287,7 @@ cifs_uncached_copy_into_pages(struct TCP_Server_Info *server, { return uncached_fill_pages(server, rdata, iter, iter->count); } +#endif static int cifs_resend_rdata(struct cifs_readdata *rdata, struct list_head *rdata_list, @@ -4069,37 +4357,36 @@ static int cifs_resend_rdata(struct cifs_readdata *rdata, } while (rc == -EAGAIN); fail: - kref_put(&rdata->refcount, cifs_uncached_readdata_release); + kref_put(&rdata->refcount, cifs_readdata_release); return rc; } static int -cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file, +cifs_send_async_read(loff_t fpos, size_t len, struct cifsFileInfo *open_file, struct cifs_sb_info *cifs_sb, struct list_head *rdata_list, struct cifs_aio_ctx *ctx) { struct cifs_readdata *rdata; - unsigned int npages, rsize; + unsigned int rsize, nsegs, max_segs = INT_MAX; struct cifs_credits credits_on_stack; struct cifs_credits *credits = &credits_on_stack; - size_t cur_len; + size_t cur_len, max_len; int rc; pid_t pid; struct TCP_Server_Info *server; - struct page **pagevec; - size_t start; - struct iov_iter direct_iov = ctx->iter; server = cifs_pick_channel(tlink_tcon(open_file->tlink)->ses); +#ifdef CONFIG_CIFS_SMB_DIRECT + if (server->smbd_conn) + max_segs = server->smbd_conn->max_frmr_depth; +#endif + if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD) pid = open_file->pid; else pid = current->tgid; - if (ctx->direct_io) - iov_iter_advance(&direct_iov, offset - ctx->pos); - do { if (open_file->invalidHandle) { rc = cifs_reopen_file(open_file, true); @@ -4119,78 +4406,37 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file, if (rc) break; - cur_len = min_t(const size_t, len, rsize); - - if (ctx->direct_io) { - ssize_t result; - - result = iov_iter_get_pages_alloc( - &direct_iov, &pagevec, - cur_len, &start, FOLL_DEST_BUF); - if (result < 0) { - cifs_dbg(VFS, - "Couldn't get user pages (rc=%zd) iter type %d iov_offset %zd count %zd\n", - result, iov_iter_type(&direct_iov), - direct_iov.iov_offset, - direct_iov.count); - dump_stack(); - - rc = result; - add_credits_and_wake_if(server, credits, 0); - break; - } - cur_len = (size_t)result; - - rdata = cifs_readdata_direct_alloc( - pagevec, cifs_uncached_readv_complete); - if (!rdata) { - add_credits_and_wake_if(server, credits, 0); - rc = -ENOMEM; - break; - } - - npages = (cur_len + start + PAGE_SIZE-1) / PAGE_SIZE; - rdata->page_offset = start; - rdata->tailsz = npages > 1 ? - cur_len-(PAGE_SIZE-start)-(npages-2)*PAGE_SIZE : - cur_len; - - } else { - - npages = DIV_ROUND_UP(cur_len, PAGE_SIZE); - /* allocate a readdata struct */ - rdata = cifs_readdata_alloc(npages, - cifs_uncached_readv_complete); - if (!rdata) { - add_credits_and_wake_if(server, credits, 0); - rc = -ENOMEM; - break; - } + max_len = min_t(size_t, len, rsize); - rc = cifs_read_allocate_pages(rdata, npages); - if (rc) { - kvfree(rdata->pages); - kfree(rdata); - add_credits_and_wake_if(server, credits, 0); - break; - } + cur_len = cifs_limit_bvec_subset(&ctx->iter, max_len, + max_segs, &nsegs); + cifs_dbg(FYI, "read-to-iter len=%zx/%zx nsegs=%u/%lu/%u\n", + cur_len, max_len, nsegs, ctx->iter.nr_segs, max_segs); + if (cur_len == 0) { + rc = -EIO; + add_credits_and_wake_if(server, credits, 0); + break; + } - rdata->tailsz = PAGE_SIZE; + rdata = cifs_readdata_alloc(cifs_uncached_readv_complete); + if (!rdata) { + add_credits_and_wake_if(server, credits, 0); + rc = -ENOMEM; + break; } - rdata->server = server; - rdata->cfile = cifsFileInfo_get(open_file); - rdata->nr_pages = npages; - rdata->offset = offset; - rdata->bytes = cur_len; - rdata->pid = pid; - rdata->pagesz = PAGE_SIZE; - rdata->read_into_pages = cifs_uncached_read_into_pages; - rdata->copy_into_pages = cifs_uncached_copy_into_pages; - rdata->credits = credits_on_stack; - rdata->ctx = ctx; + rdata->server = server; + rdata->cfile = cifsFileInfo_get(open_file); + rdata->offset = fpos; + rdata->bytes = cur_len; + rdata->pid = pid; + rdata->credits = credits_on_stack; + rdata->ctx = ctx; kref_get(&ctx->refcount); + rdata->iter = ctx->iter; + iov_iter_truncate(&rdata->iter, cur_len); + rc = adjust_credits(server, &rdata->credits, rdata->bytes); if (!rc) { @@ -4202,17 +4448,15 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file, if (rc) { add_credits_and_wake_if(server, &rdata->credits, 0); - kref_put(&rdata->refcount, - cifs_uncached_readdata_release); - if (rc == -EAGAIN) { - iov_iter_revert(&direct_iov, cur_len); + kref_put(&rdata->refcount, cifs_readdata_release); + if (rc == -EAGAIN) continue; - } break; } list_add_tail(&rdata->list, rdata_list); - offset += cur_len; + iov_iter_advance(&ctx->iter, cur_len); + fpos += cur_len; len -= cur_len; } while (len > 0); @@ -4254,22 +4498,6 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx) list_del_init(&rdata->list); INIT_LIST_HEAD(&tmp_list); - /* - * Got a part of data and then reconnect has - * happened -- fill the buffer and continue - * reading. - */ - if (got_bytes && got_bytes < rdata->bytes) { - rc = 0; - if (!ctx->direct_io) - rc = cifs_readdata_to_iov(rdata, to); - if (rc) { - kref_put(&rdata->refcount, - cifs_uncached_readdata_release); - continue; - } - } - if (ctx->direct_io) { /* * Re-use rdata as this is a @@ -4286,7 +4514,7 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx) &tmp_list, ctx); kref_put(&rdata->refcount, - cifs_uncached_readdata_release); + cifs_readdata_release); } list_splice(&tmp_list, &ctx->list); @@ -4294,8 +4522,6 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx) goto again; } else if (rdata->result) rc = rdata->result; - else if (!ctx->direct_io) - rc = cifs_readdata_to_iov(rdata, to); /* if there was a short read -- discard anything left */ if (rdata->got_bytes && rdata->got_bytes < rdata->bytes) @@ -4304,7 +4530,7 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx) ctx->total_len += rdata->got_bytes; } list_del_init(&rdata->list); - kref_put(&rdata->refcount, cifs_uncached_readdata_release); + kref_put(&rdata->refcount, cifs_readdata_release); } if (!ctx->direct_io) @@ -4364,26 +4590,55 @@ static ssize_t __cifs_readv( if (!ctx) return -ENOMEM; - ctx->cfile = cifsFileInfo_get(cfile); + ctx->pos = offset; + ctx->direct_io = direct; + ctx->len = len; + ctx->cfile = cifsFileInfo_get(cfile); + ctx->nr_pinned_pages = 0; if (!is_sync_kiocb(iocb)) ctx->iocb = iocb; - if (user_backed_iter(to)) - ctx->should_dirty = true; - - if (direct) { - ctx->pos = offset; - ctx->direct_io = true; - ctx->iter = *to; - ctx->len = len; - } else { - rc = setup_aio_ctx_iter(ctx, to, ITER_DEST); - if (rc) { + if (user_backed_iter(to)) { + /* + * Extract IOVEC/UBUF-type iterators to a BVEC-type iterator as + * they contain references to the calling process's virtual + * memory layout which won't be available in an async worker + * thread. This also takes a ref or a pin on every folio + * involved. + */ + rc = netfs_extract_user_iter(to, iov_iter_count(to), + &ctx->iter, FOLL_DEST_BUF); + if (rc < 0) { kref_put(&ctx->refcount, cifs_aio_ctx_release); return rc; } - len = ctx->len; + + ctx->nr_pinned_pages = rc; + ctx->bv = (void *)ctx->iter.bvec; + ctx->bv_cleanup_mode = + iov_iter_extract_mode(&ctx->iter, FOLL_DEST_BUF); + ctx->should_dirty = true; + } else if ((iov_iter_is_bvec(to) || iov_iter_is_kvec(to)) && + !is_sync_kiocb(iocb)) { + /* + * If the op is asynchronous, we need to copy the list attached + * to a BVEC/KVEC-type iterator, but we assume that the storage + * will be retained by the caller; in any case, we may or may + * not be able to pin the pages, so we don't try. + */ + ctx->bv = (void *)dup_iter(&ctx->iter, to, GFP_KERNEL); + if (!ctx->bv) { + kref_put(&ctx->refcount, cifs_aio_ctx_release); + return -ENOMEM; + } + } else { + /* + * Otherwise, we just pass the iterator down as-is and rely on + * the caller to make sure the pages referred to by the + * iterator don't evaporate. + */ + ctx->iter = *to; } if (direct) { @@ -4646,6 +4901,8 @@ int cifs_file_mmap(struct file *file, struct vm_area_struct *vma) return rc; } +#if 0 // TODO: Remove for iov_iter support + static void cifs_readv_complete(struct work_struct *work) { @@ -4776,19 +5033,74 @@ cifs_readpages_copy_into_pages(struct TCP_Server_Info *server, { return readpages_fill_pages(server, rdata, iter, iter->count); } +#endif + +/* + * Unlock a bunch of folios in the pagecache. + */ +static void cifs_unlock_folios(struct address_space *mapping, pgoff_t first, pgoff_t last) +{ + struct folio *folio; + XA_STATE(xas, &mapping->i_pages, first); + + rcu_read_lock(); + xas_for_each(&xas, folio, last) { + folio_unlock(folio); + } + rcu_read_unlock(); +} + +static void cifs_readahead_complete(struct work_struct *work) +{ + struct cifs_readdata *rdata = container_of(work, + struct cifs_readdata, work); + struct folio *folio; + pgoff_t last; + bool good = rdata->result == 0 || (rdata->result == -EAGAIN && rdata->got_bytes); + + XA_STATE(xas, &rdata->mapping->i_pages, rdata->offset / PAGE_SIZE); + + if (good) + cifs_readahead_to_fscache(rdata->mapping->host, + rdata->offset, rdata->bytes); + + if (iov_iter_count(&rdata->iter) > 0) + iov_iter_zero(iov_iter_count(&rdata->iter), &rdata->iter); + + last = (rdata->offset + rdata->bytes - 1) / PAGE_SIZE; + + rcu_read_lock(); + xas_for_each(&xas, folio, last) { + if (good) { + flush_dcache_folio(folio); + folio_mark_uptodate(folio); + } + folio_unlock(folio); + } + rcu_read_unlock(); + + kref_put(&rdata->refcount, cifs_readdata_release); +} static void cifs_readahead(struct readahead_control *ractl) { - int rc; struct cifsFileInfo *open_file = ractl->file->private_data; struct cifs_sb_info *cifs_sb = CIFS_FILE_SB(ractl->file); struct TCP_Server_Info *server; - pid_t pid; - unsigned int xid, nr_pages, last_batch_size = 0, cache_nr_pages = 0; - pgoff_t next_cached = ULONG_MAX; + unsigned int xid, nr_pages, cache_nr_pages = 0; + unsigned int ra_pages; + pgoff_t next_cached = ULONG_MAX, ra_index; bool caching = fscache_cookie_enabled(cifs_inode_cookie(ractl->mapping->host)) && cifs_inode_cookie(ractl->mapping->host)->cache_priv; bool check_cache = caching; + pid_t pid; + int rc = 0; + + /* Note that readahead_count() lags behind our dequeuing of pages from + * the ractl, wo we have to keep track for ourselves. + */ + ra_pages = readahead_count(ractl); + ra_index = readahead_index(ractl); xid = get_xid(); @@ -4797,22 +5109,21 @@ static void cifs_readahead(struct readahead_control *ractl) else pid = current->tgid; - rc = 0; server = cifs_pick_channel(tlink_tcon(open_file->tlink)->ses); cifs_dbg(FYI, "%s: file=%p mapping=%p num_pages=%u\n", - __func__, ractl->file, ractl->mapping, readahead_count(ractl)); + __func__, ractl->file, ractl->mapping, ra_pages); /* * Chop the readahead request up into rsize-sized read requests. */ - while ((nr_pages = readahead_count(ractl) - last_batch_size)) { - unsigned int i, got, rsize; - struct page *page; + while ((nr_pages = ra_pages)) { + unsigned int i, rsize; struct cifs_readdata *rdata; struct cifs_credits credits_on_stack; struct cifs_credits *credits = &credits_on_stack; - pgoff_t index = readahead_index(ractl) + last_batch_size; + struct folio *folio; + pgoff_t fsize; /* * Find out if we have anything cached in the range of @@ -4821,21 +5132,22 @@ static void cifs_readahead(struct readahead_control *ractl) if (caching) { if (check_cache) { rc = cifs_fscache_query_occupancy( - ractl->mapping->host, index, nr_pages, + ractl->mapping->host, ra_index, nr_pages, &next_cached, &cache_nr_pages); if (rc < 0) caching = false; check_cache = false; } - if (index == next_cached) { + if (ra_index == next_cached) { /* * TODO: Send a whole batch of pages to be read * by the cache. */ - struct folio *folio = readahead_folio(ractl); - - last_batch_size = folio_nr_pages(folio); + folio = readahead_folio(ractl); + fsize = folio_nr_pages(folio); + ra_pages -= fsize; + ra_index += fsize; if (cifs_readpage_from_fscache(ractl->mapping->host, &folio->page) < 0) { /* @@ -4846,8 +5158,8 @@ static void cifs_readahead(struct readahead_control *ractl) caching = false; } folio_unlock(folio); - next_cached++; - cache_nr_pages--; + next_cached += fsize; + cache_nr_pages -= fsize; if (cache_nr_pages == 0) check_cache = true; continue; @@ -4872,8 +5184,9 @@ static void cifs_readahead(struct readahead_control *ractl) &rsize, credits); if (rc) break; - nr_pages = min_t(size_t, rsize / PAGE_SIZE, readahead_count(ractl)); - nr_pages = min_t(size_t, nr_pages, next_cached - index); + nr_pages = min_t(size_t, rsize / PAGE_SIZE, ra_pages); + if (next_cached != ULONG_MAX) + nr_pages = min_t(size_t, nr_pages, next_cached - ra_index); /* * Give up immediately if rsize is too small to read an entire @@ -4886,33 +5199,31 @@ static void cifs_readahead(struct readahead_control *ractl) break; } - rdata = cifs_readdata_alloc(nr_pages, cifs_readv_complete); + rdata = cifs_readdata_alloc(cifs_readahead_complete); if (!rdata) { /* best to give up if we're out of mem */ add_credits_and_wake_if(server, credits, 0); break; } - got = __readahead_batch(ractl, rdata->pages, nr_pages); - if (got != nr_pages) { - pr_warn("__readahead_batch() returned %u/%u\n", - got, nr_pages); - nr_pages = got; - } - - rdata->nr_pages = nr_pages; - rdata->bytes = readahead_batch_length(ractl); + rdata->offset = ra_index * PAGE_SIZE; + rdata->bytes = nr_pages * PAGE_SIZE; rdata->cfile = cifsFileInfo_get(open_file); rdata->server = server; rdata->mapping = ractl->mapping; - rdata->offset = readahead_pos(ractl); rdata->pid = pid; - rdata->pagesz = PAGE_SIZE; - rdata->tailsz = PAGE_SIZE; - rdata->read_into_pages = cifs_readpages_read_into_pages; - rdata->copy_into_pages = cifs_readpages_copy_into_pages; rdata->credits = credits_on_stack; + for (i = 0; i < nr_pages; i++) { + if (!readahead_folio(ractl)) + WARN_ON(1); + } + ra_pages -= nr_pages; + ra_index += nr_pages; + + iov_iter_xarray(&rdata->iter, READ, &rdata->mapping->i_pages, + rdata->offset, rdata->bytes); + rc = adjust_credits(server, &rdata->credits, rdata->bytes); if (!rc) { if (rdata->cfile->invalidHandle) @@ -4923,18 +5234,15 @@ static void cifs_readahead(struct readahead_control *ractl) if (rc) { add_credits_and_wake_if(server, &rdata->credits, 0); - for (i = 0; i < rdata->nr_pages; i++) { - page = rdata->pages[i]; - unlock_page(page); - put_page(page); - } + cifs_unlock_folios(rdata->mapping, + rdata->offset / PAGE_SIZE, + (rdata->offset + rdata->bytes - 1) / PAGE_SIZE); /* Fallback to the readpage in error/reconnect cases */ kref_put(&rdata->refcount, cifs_readdata_release); break; } kref_put(&rdata->refcount, cifs_readdata_release); - last_batch_size = nr_pages; } free_xid(xid); @@ -4976,10 +5284,6 @@ static int cifs_readpage_worker(struct file *file, struct page *page, flush_dcache_page(page); SetPageUptodate(page); - - /* send this page to the cache */ - cifs_readpage_to_fscache(file_inode(file), page); - rc = 0; io_error: diff --git a/fs/cifs/fscache.c b/fs/cifs/fscache.c index f6f3a6b75601..47c9f36c11fb 100644 --- a/fs/cifs/fscache.c +++ b/fs/cifs/fscache.c @@ -165,22 +165,16 @@ static int fscache_fallback_read_page(struct inode *inode, struct page *page) /* * Fallback page writing interface. */ -static int fscache_fallback_write_page(struct inode *inode, struct page *page, - bool no_space_allocated_yet) +static int fscache_fallback_write_pages(struct inode *inode, loff_t start, size_t len, + bool no_space_allocated_yet) { struct netfs_cache_resources cres; struct fscache_cookie *cookie = cifs_inode_cookie(inode); struct iov_iter iter; - struct bio_vec bvec[1]; - loff_t start = page_offset(page); - size_t len = PAGE_SIZE; int ret; memset(&cres, 0, sizeof(cres)); - bvec[0].bv_page = page; - bvec[0].bv_offset = 0; - bvec[0].bv_len = PAGE_SIZE; - iov_iter_bvec(&iter, ITER_SOURCE, bvec, ARRAY_SIZE(bvec), PAGE_SIZE); + iov_iter_xarray(&iter, ITER_SOURCE, &inode->i_mapping->i_pages, start, len); ret = fscache_begin_write_operation(&cres, cookie); if (ret < 0) @@ -189,7 +183,7 @@ static int fscache_fallback_write_page(struct inode *inode, struct page *page, ret = cres.ops->prepare_write(&cres, &start, &len, i_size_read(inode), no_space_allocated_yet); if (ret == 0) - ret = fscache_write(&cres, page_offset(page), &iter, NULL, NULL); + ret = fscache_write(&cres, start, &iter, NULL, NULL); fscache_end_operation(&cres); return ret; } @@ -213,12 +207,12 @@ int __cifs_readpage_from_fscache(struct inode *inode, struct page *page) return 0; } -void __cifs_readpage_to_fscache(struct inode *inode, struct page *page) +void __cifs_readahead_to_fscache(struct inode *inode, loff_t pos, size_t len) { - cifs_dbg(FYI, "%s: (fsc: %p, p: %p, i: %p)\n", - __func__, cifs_inode_cookie(inode), page, inode); + cifs_dbg(FYI, "%s: (fsc: %p, p: %llx, l: %zx, i: %p)\n", + __func__, cifs_inode_cookie(inode), pos, len, inode); - fscache_fallback_write_page(inode, page, true); + fscache_fallback_write_pages(inode, pos, len, true); } /* diff --git a/fs/cifs/fscache.h b/fs/cifs/fscache.h index 67b601041f0a..173999610997 100644 --- a/fs/cifs/fscache.h +++ b/fs/cifs/fscache.h @@ -90,7 +90,7 @@ static inline int cifs_fscache_query_occupancy(struct inode *inode, } extern int __cifs_readpage_from_fscache(struct inode *pinode, struct page *ppage); -extern void __cifs_readpage_to_fscache(struct inode *pinode, struct page *ppage); +extern void __cifs_readahead_to_fscache(struct inode *pinode, loff_t pos, size_t len); static inline int cifs_readpage_from_fscache(struct inode *inode, @@ -101,11 +101,11 @@ static inline int cifs_readpage_from_fscache(struct inode *inode, return -ENOBUFS; } -static inline void cifs_readpage_to_fscache(struct inode *inode, - struct page *page) +static inline void cifs_readahead_to_fscache(struct inode *inode, + loff_t pos, size_t len) { if (cifs_inode_cookie(inode)) - __cifs_readpage_to_fscache(inode, page); + __cifs_readahead_to_fscache(inode, pos, len); } #else /* CONFIG_CIFS_FSCACHE */ @@ -141,7 +141,7 @@ cifs_readpage_from_fscache(struct inode *inode, struct page *page) } static inline -void cifs_readpage_to_fscache(struct inode *inode, struct page *page) {} +void cifs_readahead_to_fscache(struct inode *inode, loff_t pos, size_t len) {} #endif /* CONFIG_CIFS_FSCACHE */ diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c index 9655cf359ab9..a54a59a8e196 100644 --- a/fs/cifs/misc.c +++ b/fs/cifs/misc.c @@ -966,16 +966,24 @@ cifs_aio_ctx_release(struct kref *refcount) /* * ctx->bv is only set if setup_aio_ctx_iter() was call successfuly - * which means that iov_iter_get_pages() was a success and thus that - * we have taken reference on pages. + * which means that iov_iter_extract_pages() was a success and thus + * that we may have references or pins on pages that we need to + * release. */ if (ctx->bv) { - unsigned i; - - for (i = 0; i < ctx->npages; i++) { - if (ctx->should_dirty) - set_page_dirty(ctx->bv[i].bv_page); - put_page(ctx->bv[i].bv_page); + if (ctx->should_dirty || ctx->bv_cleanup_mode) { + unsigned i; + + for (i = 0; i < ctx->nr_pinned_pages; i++) { + struct page *page = ctx->bv[i].bv_page; + + if (ctx->should_dirty) + set_page_dirty(page); + if (ctx->bv_cleanup_mode & FOLL_PIN) + unpin_user_page(page); + if (ctx->bv_cleanup_mode & FOLL_GET) + put_page(page); + } } kvfree(ctx->bv); } @@ -983,96 +991,6 @@ cifs_aio_ctx_release(struct kref *refcount) kfree(ctx); } -#define CIFS_AIO_KMALLOC_LIMIT (1024 * 1024) - -int -setup_aio_ctx_iter(struct cifs_aio_ctx *ctx, struct iov_iter *iter, int rw) -{ - ssize_t rc; - unsigned int cur_npages; - unsigned int npages = 0; - unsigned int i; - size_t len; - size_t count = iov_iter_count(iter); - unsigned int saved_len; - size_t start; - unsigned int max_pages = iov_iter_npages(iter, INT_MAX); - struct page **pages = NULL; - struct bio_vec *bv = NULL; - - if (iov_iter_is_kvec(iter)) { - memcpy(&ctx->iter, iter, sizeof(*iter)); - ctx->len = count; - iov_iter_advance(iter, count); - return 0; - } - - if (array_size(max_pages, sizeof(*bv)) <= CIFS_AIO_KMALLOC_LIMIT) - bv = kmalloc_array(max_pages, sizeof(*bv), GFP_KERNEL); - - if (!bv) { - bv = vmalloc(array_size(max_pages, sizeof(*bv))); - if (!bv) - return -ENOMEM; - } - - if (array_size(max_pages, sizeof(*pages)) <= CIFS_AIO_KMALLOC_LIMIT) - pages = kmalloc_array(max_pages, sizeof(*pages), GFP_KERNEL); - - if (!pages) { - pages = vmalloc(array_size(max_pages, sizeof(*pages))); - if (!pages) { - kvfree(bv); - return -ENOMEM; - } - } - - saved_len = count; - - while (count && npages < max_pages) { - rc = iov_iter_get_pages(iter, pages, count, max_pages, &start, - rw == WRITE ? FOLL_SOURCE_BUF : FOLL_DEST_BUF); - if (rc < 0) { - cifs_dbg(VFS, "Couldn't get user pages (rc=%zd)\n", rc); - break; - } - - if (rc > count) { - cifs_dbg(VFS, "get pages rc=%zd more than %zu\n", rc, - count); - break; - } - - count -= rc; - rc += start; - cur_npages = DIV_ROUND_UP(rc, PAGE_SIZE); - - if (npages + cur_npages > max_pages) { - cifs_dbg(VFS, "out of vec array capacity (%u vs %u)\n", - npages + cur_npages, max_pages); - break; - } - - for (i = 0; i < cur_npages; i++) { - len = rc > PAGE_SIZE ? PAGE_SIZE : rc; - bv[npages + i].bv_page = pages[i]; - bv[npages + i].bv_offset = start; - bv[npages + i].bv_len = len - start; - rc -= len; - start = 0; - } - - npages += cur_npages; - } - - kvfree(pages); - ctx->bv = bv; - ctx->len = saved_len - count; - ctx->npages = npages; - iov_iter_bvec(&ctx->iter, rw, ctx->bv, npages, ctx->len); - return 0; -} - /** * cifs_alloc_hash - allocate hash and hash context together * @name: The name of the crypto hash algo @@ -1130,25 +1048,6 @@ cifs_free_hash(struct shash_desc **sdesc) *sdesc = NULL; } -/** - * rqst_page_get_length - obtain the length and offset for a page in smb_rqst - * @rqst: The request descriptor - * @page: The index of the page to query - * @len: Where to store the length for this page: - * @offset: Where to store the offset for this page - */ -void rqst_page_get_length(const struct smb_rqst *rqst, unsigned int page, - unsigned int *len, unsigned int *offset) -{ - *len = rqst->rq_pagesz; - *offset = (page == 0) ? rqst->rq_offset : 0; - - if (rqst->rq_npages == 1 || page == rqst->rq_npages-1) - *len = rqst->rq_tailsz; - else if (page == 0) - *len = rqst->rq_pagesz - rqst->rq_offset; -} - void extract_unc_hostname(const char *unc, const char **h, size_t *len) { const char *end; diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c index dc160de7a6de..387effcb905d 100644 --- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -4226,7 +4226,7 @@ fill_transform_hdr(struct smb2_transform_hdr *tr_hdr, unsigned int orig_len, static void *smb2_aead_req_alloc(struct crypto_aead *tfm, const struct smb_rqst *rqst, int num_rqst, const u8 *sig, u8 **iv, - struct aead_request **req, struct scatterlist **sgl, + struct aead_request **req, struct sg_table *sgt, unsigned int *num_sgs) { unsigned int req_size = sizeof(**req) + crypto_aead_reqsize(tfm); @@ -4235,70 +4235,68 @@ static void *smb2_aead_req_alloc(struct crypto_aead *tfm, const struct smb_rqst u8 *p; *num_sgs = cifs_get_num_sgs(rqst, num_rqst, sig); + if (IS_ERR_VALUE((long)(int)*num_sgs)) + return ERR_PTR(*num_sgs); len = iv_size; len += crypto_aead_alignmask(tfm) & ~(crypto_tfm_ctx_alignment() - 1); len = ALIGN(len, crypto_tfm_ctx_alignment()); len += req_size; len = ALIGN(len, __alignof__(struct scatterlist)); - len += *num_sgs * sizeof(**sgl); + len += array_size(*num_sgs, sizeof(struct scatterlist)); - p = kmalloc(len, GFP_ATOMIC); + p = kvzalloc(len, GFP_NOFS); if (!p) - return NULL; + return ERR_PTR(-ENOMEM); *iv = (u8 *)PTR_ALIGN(p, crypto_aead_alignmask(tfm) + 1); *req = (struct aead_request *)PTR_ALIGN(*iv + iv_size, crypto_tfm_ctx_alignment()); - *sgl = (struct scatterlist *)PTR_ALIGN((u8 *)*req + req_size, - __alignof__(struct scatterlist)); + sgt->sgl = (struct scatterlist *)PTR_ALIGN((u8 *)*req + req_size, + __alignof__(struct scatterlist)); return p; } -static void *smb2_get_aead_req(struct crypto_aead *tfm, const struct smb_rqst *rqst, +static void *smb2_get_aead_req(struct crypto_aead *tfm, struct smb_rqst *rqst, int num_rqst, const u8 *sig, u8 **iv, struct aead_request **req, struct scatterlist **sgl) { - unsigned int off, len, skip; - struct scatterlist *sg; - unsigned int num_sgs; - unsigned long addr; - int i, j; + struct sg_table sgtable = {}; + unsigned int skip, num_sgs, i, j; + ssize_t rc; void *p; - p = smb2_aead_req_alloc(tfm, rqst, num_rqst, sig, iv, req, sgl, &num_sgs); - if (!p) - return NULL; + p = smb2_aead_req_alloc(tfm, rqst, num_rqst, sig, iv, req, &sgtable, &num_sgs); + if (IS_ERR(p)) + return ERR_CAST(p); - sg_init_table(*sgl, num_sgs); - sg = *sgl; + sg_init_marker(sgtable.sgl, num_sgs); - /* Assumes the first rqst has a transform header as the first iov. - * I.e. - * rqst[0].rq_iov[0] is transform header - * rqst[0].rq_iov[1+] data to be encrypted/decrypted - * rqst[1+].rq_iov[0+] data to be encrypted/decrypted - */ for (i = 0; i < num_rqst; i++) { - /* - * The first rqst has a transform header where the - * first 20 bytes are not part of the encrypted blob. - */ - for (j = 0; j < rqst[i].rq_nvec; j++) { - struct kvec *iov = &rqst[i].rq_iov[j]; + struct iov_iter *iter = &rqst[i].rq_iter; + size_t count = iov_iter_count(iter); + for (j = 0; j < rqst[i].rq_nvec; j++) { + /* + * The first rqst has a transform header where the + * first 20 bytes are not part of the encrypted blob + */ skip = (i == 0) && (j == 0) ? 20 : 0; - addr = (unsigned long)iov->iov_base + skip; - len = iov->iov_len - skip; - sg = cifs_sg_set_buf(sg, (void *)addr, len); - } - for (j = 0; j < rqst[i].rq_npages; j++) { - rqst_page_get_length(&rqst[i], j, &len, &off); - sg_set_page(sg++, rqst[i].rq_pages[j], len, off); + cifs_sg_set_buf(&sgtable, + rqst[i].rq_iov[j].iov_base + skip, + rqst[i].rq_iov[j].iov_len - skip); } + sgtable.orig_nents = sgtable.nents; + + netfs_extract_iter_to_sg(iter, count, &sgtable, + num_sgs - sgtable.nents, + FOLL_DEST_BUF); + iov_iter_revert(iter, rc); + sgtable.orig_nents = sgtable.nents; } - cifs_sg_set_buf(sg, sig, SMB2_SIGNATURE_SIZE); + cifs_sg_set_buf(&sgtable, sig, SMB2_SIGNATURE_SIZE); + sg_mark_end(&sgtable.sgl[sgtable.nents - 1]); return p; } @@ -4386,8 +4384,8 @@ crypt_message(struct TCP_Server_Info *server, int num_rqst, } creq = smb2_get_aead_req(tfm, rqst, num_rqst, sign, &iv, &req, &sg); - if (unlikely(!creq)) - return -ENOMEM; + if (unlikely(IS_ERR(creq))) + return PTR_ERR(creq); if (!enc) { memcpy(sign, &tr_hdr->Signature, SMB2_SIGNATURE_SIZE); @@ -4419,18 +4417,31 @@ crypt_message(struct TCP_Server_Info *server, int num_rqst, return rc; } +/* + * Clear a read buffer, discarding the folios which have XA_MARK_0 set. + */ +static void cifs_clear_xarray_buffer(struct xarray *buffer) +{ + struct folio *folio; + + XA_STATE(xas, buffer, 0); + + rcu_read_lock(); + xas_for_each_marked(&xas, folio, ULONG_MAX, XA_MARK_0) { + folio_put(folio); + } + rcu_read_unlock(); + xa_destroy(buffer); +} + void smb3_free_compound_rqst(int num_rqst, struct smb_rqst *rqst) { - int i, j; + int i; - for (i = 0; i < num_rqst; i++) { - if (rqst[i].rq_pages) { - for (j = rqst[i].rq_npages - 1; j >= 0; j--) - put_page(rqst[i].rq_pages[j]); - kfree(rqst[i].rq_pages); - } - } + for (i = 0; i < num_rqst; i++) + if (!xa_empty(&rqst[i].rq_buffer)) + cifs_clear_xarray_buffer(&rqst[i].rq_buffer); } /* @@ -4450,9 +4461,8 @@ static int smb3_init_transform_rq(struct TCP_Server_Info *server, int num_rqst, struct smb_rqst *new_rq, struct smb_rqst *old_rq) { - struct page **pages; struct smb2_transform_hdr *tr_hdr = new_rq[0].rq_iov[0].iov_base; - unsigned int npages; + struct page *page; unsigned int orig_len = 0; int i, j; int rc = -ENOMEM; @@ -4460,45 +4470,42 @@ smb3_init_transform_rq(struct TCP_Server_Info *server, int num_rqst, for (i = 1; i < num_rqst; i++) { struct smb_rqst *old = &old_rq[i - 1]; struct smb_rqst *new = &new_rq[i]; + struct xarray *buffer = &new->rq_buffer; + size_t size = iov_iter_count(&old->rq_iter), seg, copied = 0; orig_len += smb_rqst_len(server, old); new->rq_iov = old->rq_iov; new->rq_nvec = old->rq_nvec; - npages = old->rq_npages; - if (!npages) - continue; - - pages = kmalloc_array(npages, sizeof(struct page *), - GFP_KERNEL); - if (!pages) - goto err_free; - - new->rq_pages = pages; - new->rq_npages = npages; - new->rq_offset = old->rq_offset; - new->rq_pagesz = old->rq_pagesz; - new->rq_tailsz = old->rq_tailsz; - - for (j = 0; j < npages; j++) { - pages[j] = alloc_page(GFP_KERNEL|__GFP_HIGHMEM); - if (!pages[j]) - goto err_free; - } + xa_init(buffer); - /* copy pages form the old */ - for (j = 0; j < npages; j++) { - char *dst, *src; - unsigned int offset, len; + if (size > 0) { + unsigned int npages = DIV_ROUND_UP(size, PAGE_SIZE); - rqst_page_get_length(new, j, &len, &offset); + for (j = 0; j < npages; j++) { + void *o; - dst = kmap_local_page(new->rq_pages[j]) + offset; - src = kmap_local_page(old->rq_pages[j]) + offset; + rc = -ENOMEM; + page = alloc_page(GFP_KERNEL|__GFP_HIGHMEM); + if (!page) + goto err_free; + page->index = j; + o = xa_store(buffer, j, page, GFP_KERNEL); + if (xa_is_err(o)) { + rc = xa_err(o); + put_page(page); + goto err_free; + } - memcpy(dst, src, len); - kunmap(new->rq_pages[j]); - kunmap(old->rq_pages[j]); + seg = min_t(size_t, size - copied, PAGE_SIZE); + if (copy_page_from_iter(page, 0, seg, &old->rq_iter) != seg) { + rc = -EFAULT; + goto err_free; + } + copied += seg; + } + iov_iter_xarray(&new->rq_iter, ITER_SOURCE, + buffer, 0, size); } } @@ -4527,12 +4534,12 @@ smb3_is_transform_hdr(void *buf) static int decrypt_raw_data(struct TCP_Server_Info *server, char *buf, - unsigned int buf_data_size, struct page **pages, - unsigned int npages, unsigned int page_data_size, + unsigned int buf_data_size, struct iov_iter *iter, bool is_offloaded) { struct kvec iov[2]; struct smb_rqst rqst = {NULL}; + size_t iter_size = 0; int rc; iov[0].iov_base = buf; @@ -4542,10 +4549,10 @@ decrypt_raw_data(struct TCP_Server_Info *server, char *buf, rqst.rq_iov = iov; rqst.rq_nvec = 2; - rqst.rq_pages = pages; - rqst.rq_npages = npages; - rqst.rq_pagesz = PAGE_SIZE; - rqst.rq_tailsz = (page_data_size % PAGE_SIZE) ? : PAGE_SIZE; + if (iter) { + rqst.rq_iter = *iter; + iter_size = iov_iter_count(iter); + } rc = crypt_message(server, 1, &rqst, 0); cifs_dbg(FYI, "Decrypt message returned %d\n", rc); @@ -4556,73 +4563,37 @@ decrypt_raw_data(struct TCP_Server_Info *server, char *buf, memmove(buf, iov[1].iov_base, buf_data_size); if (!is_offloaded) - server->total_read = buf_data_size + page_data_size; + server->total_read = buf_data_size + iter_size; return rc; } static int -read_data_into_pages(struct TCP_Server_Info *server, struct page **pages, - unsigned int npages, unsigned int len) +cifs_copy_pages_to_iter(struct xarray *pages, unsigned int data_size, + unsigned int skip, struct iov_iter *iter) { - int i; - int length; + struct page *page; + unsigned long index; - for (i = 0; i < npages; i++) { - struct page *page = pages[i]; - size_t n; + xa_for_each(pages, index, page) { + size_t n, len = min_t(unsigned int, PAGE_SIZE - skip, data_size); - n = len; - if (len >= PAGE_SIZE) { - /* enough data to fill the page */ - n = PAGE_SIZE; - len -= n; - } else { - zero_user(page, len, PAGE_SIZE - len); - len = 0; + n = copy_page_to_iter(page, skip, len, iter); + if (n != len) { + cifs_dbg(VFS, "%s: something went wrong\n", __func__); + return -EIO; } - length = cifs_read_page_from_socket(server, page, 0, n); - if (length < 0) - return length; - server->total_read += length; + data_size -= n; + skip = 0; } return 0; } -static int -init_read_bvec(struct page **pages, unsigned int npages, unsigned int data_size, - unsigned int cur_off, struct bio_vec **page_vec) -{ - struct bio_vec *bvec; - int i; - - bvec = kcalloc(npages, sizeof(struct bio_vec), GFP_KERNEL); - if (!bvec) - return -ENOMEM; - - for (i = 0; i < npages; i++) { - bvec[i].bv_page = pages[i]; - bvec[i].bv_offset = (i == 0) ? cur_off : 0; - bvec[i].bv_len = min_t(unsigned int, PAGE_SIZE, data_size); - data_size -= bvec[i].bv_len; - } - - if (data_size != 0) { - cifs_dbg(VFS, "%s: something went wrong\n", __func__); - kfree(bvec); - return -EIO; - } - - *page_vec = bvec; - return 0; -} - static int handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid, - char *buf, unsigned int buf_len, struct page **pages, - unsigned int npages, unsigned int page_data_size, - bool is_offloaded) + char *buf, unsigned int buf_len, struct xarray *pages, + unsigned int pages_len, bool is_offloaded) { unsigned int data_offset; unsigned int data_len; @@ -4631,9 +4602,6 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid, unsigned int pad_len; struct cifs_readdata *rdata = mid->callback_data; struct smb2_hdr *shdr = (struct smb2_hdr *)buf; - struct bio_vec *bvec = NULL; - struct iov_iter iter; - struct kvec iov; int length; bool use_rdma_mr = false; @@ -4722,7 +4690,7 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid, return 0; } - if (data_len > page_data_size - pad_len) { + if (data_len > pages_len - pad_len) { /* data_len is corrupt -- discard frame */ rdata->result = -EIO; if (is_offloaded) @@ -4732,8 +4700,9 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid, return 0; } - rdata->result = init_read_bvec(pages, npages, page_data_size, - cur_off, &bvec); + /* Copy the data to the output I/O iterator. */ + rdata->result = cifs_copy_pages_to_iter(pages, pages_len, + cur_off, &rdata->iter); if (rdata->result != 0) { if (is_offloaded) mid->mid_state = MID_RESPONSE_MALFORMED; @@ -4741,14 +4710,16 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid, dequeue_mid(mid, rdata->result); return 0; } + rdata->got_bytes = pages_len; - iov_iter_bvec(&iter, ITER_SOURCE, bvec, npages, data_len); } else if (buf_len >= data_offset + data_len) { /* read response payload is in buf */ - WARN_ONCE(npages > 0, "read data can be either in buf or in pages"); - iov.iov_base = buf + data_offset; - iov.iov_len = data_len; - iov_iter_kvec(&iter, ITER_SOURCE, &iov, 1, data_len); + WARN_ONCE(pages && !xa_empty(pages), + "read data can be either in buf or in pages"); + length = copy_to_iter(buf + data_offset, data_len, &rdata->iter); + if (length < 0) + return length; + rdata->got_bytes = data_len; } else { /* read response payload cannot be in both buf and pages */ WARN_ONCE(1, "buf can not contain only a part of read data"); @@ -4760,26 +4731,18 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid, return 0; } - length = rdata->copy_into_pages(server, rdata, &iter); - - kfree(bvec); - - if (length < 0) - return length; - if (is_offloaded) mid->mid_state = MID_RESPONSE_RECEIVED; else dequeue_mid(mid, false); - return length; + return 0; } struct smb2_decrypt_work { struct work_struct decrypt; struct TCP_Server_Info *server; - struct page **ppages; + struct xarray buffer; char *buf; - unsigned int npages; unsigned int len; }; @@ -4788,11 +4751,13 @@ static void smb2_decrypt_offload(struct work_struct *work) { struct smb2_decrypt_work *dw = container_of(work, struct smb2_decrypt_work, decrypt); - int i, rc; + int rc; struct mid_q_entry *mid; + struct iov_iter iter; + iov_iter_xarray(&iter, READ, &dw->buffer, 0, dw->len); rc = decrypt_raw_data(dw->server, dw->buf, dw->server->vals->read_rsp_size, - dw->ppages, dw->npages, dw->len, true); + &iter, true); if (rc) { cifs_dbg(VFS, "error decrypting rc=%d\n", rc); goto free_pages; @@ -4806,7 +4771,7 @@ static void smb2_decrypt_offload(struct work_struct *work) mid->decrypted = true; rc = handle_read_data(dw->server, mid, dw->buf, dw->server->vals->read_rsp_size, - dw->ppages, dw->npages, dw->len, + &dw->buffer, dw->len, true); if (rc >= 0) { #ifdef CONFIG_CIFS_STATS2 @@ -4839,10 +4804,7 @@ static void smb2_decrypt_offload(struct work_struct *work) } free_pages: - for (i = dw->npages-1; i >= 0; i--) - put_page(dw->ppages[i]); - - kfree(dw->ppages); + cifs_clear_xarray_buffer(&dw->buffer); cifs_small_buf_release(dw->buf); kfree(dw); } @@ -4852,47 +4814,65 @@ static int receive_encrypted_read(struct TCP_Server_Info *server, struct mid_q_entry **mid, int *num_mids) { + struct page *page; char *buf = server->smallbuf; struct smb2_transform_hdr *tr_hdr = (struct smb2_transform_hdr *)buf; - unsigned int npages; - struct page **pages; - unsigned int len; + struct iov_iter iter; + unsigned int len, npages; unsigned int buflen = server->pdu_size; int rc; int i = 0; struct smb2_decrypt_work *dw; + dw = kzalloc(sizeof(struct smb2_decrypt_work), GFP_KERNEL); + if (!dw) + return -ENOMEM; + xa_init(&dw->buffer); + INIT_WORK(&dw->decrypt, smb2_decrypt_offload); + dw->server = server; + *num_mids = 1; len = min_t(unsigned int, buflen, server->vals->read_rsp_size + sizeof(struct smb2_transform_hdr)) - HEADER_SIZE(server) + 1; rc = cifs_read_from_socket(server, buf + HEADER_SIZE(server) - 1, len); if (rc < 0) - return rc; + goto free_dw; server->total_read += rc; len = le32_to_cpu(tr_hdr->OriginalMessageSize) - server->vals->read_rsp_size; + dw->len = len; npages = DIV_ROUND_UP(len, PAGE_SIZE); - pages = kmalloc_array(npages, sizeof(struct page *), GFP_KERNEL); - if (!pages) { - rc = -ENOMEM; - goto discard_data; - } - + rc = -ENOMEM; for (; i < npages; i++) { - pages[i] = alloc_page(GFP_KERNEL|__GFP_HIGHMEM); - if (!pages[i]) { - rc = -ENOMEM; + void *old; + + page = alloc_page(GFP_KERNEL|__GFP_HIGHMEM); + if (!page) + goto discard_data; + page->index = i; + old = xa_store(&dw->buffer, i, page, GFP_KERNEL); + if (xa_is_err(old)) { + rc = xa_err(old); + put_page(page); goto discard_data; } } - /* read read data into pages */ - rc = read_data_into_pages(server, pages, npages, len); - if (rc) - goto free_pages; + iov_iter_xarray(&iter, READ, &dw->buffer, 0, npages * PAGE_SIZE); + + /* Read the data into the buffer and clear excess bufferage. */ + rc = cifs_read_iter_from_socket(server, &iter, dw->len); + if (rc < 0) + goto discard_data; + + server->total_read += rc; + if (rc < npages * PAGE_SIZE) + iov_iter_zero(npages * PAGE_SIZE - rc, &iter); + iov_iter_revert(&iter, npages * PAGE_SIZE); + iov_iter_truncate(&iter, dw->len); rc = cifs_discard_remaining_data(server); if (rc) @@ -4905,39 +4885,28 @@ receive_encrypted_read(struct TCP_Server_Info *server, struct mid_q_entry **mid, if ((server->min_offload) && (server->in_flight > 1) && (server->pdu_size >= server->min_offload)) { - dw = kmalloc(sizeof(struct smb2_decrypt_work), GFP_KERNEL); - if (dw == NULL) - goto non_offloaded_decrypt; - dw->buf = server->smallbuf; server->smallbuf = (char *)cifs_small_buf_get(); - INIT_WORK(&dw->decrypt, smb2_decrypt_offload); - - dw->npages = npages; - dw->server = server; - dw->ppages = pages; - dw->len = len; queue_work(decrypt_wq, &dw->decrypt); *num_mids = 0; /* worker thread takes care of finding mid */ return -1; } -non_offloaded_decrypt: rc = decrypt_raw_data(server, buf, server->vals->read_rsp_size, - pages, npages, len, false); + &iter, false); if (rc) goto free_pages; *mid = smb2_find_mid(server, buf); - if (*mid == NULL) + if (*mid == NULL) { cifs_dbg(FYI, "mid not found\n"); - else { + } else { cifs_dbg(FYI, "mid found\n"); (*mid)->decrypted = true; rc = handle_read_data(server, *mid, buf, server->vals->read_rsp_size, - pages, npages, len, false); + &dw->buffer, dw->len, false); if (rc >= 0) { if (server->ops->is_network_name_deleted) { server->ops->is_network_name_deleted(buf, @@ -4947,9 +4916,9 @@ receive_encrypted_read(struct TCP_Server_Info *server, struct mid_q_entry **mid, } free_pages: - for (i = i - 1; i >= 0; i--) - put_page(pages[i]); - kfree(pages); + cifs_clear_xarray_buffer(&dw->buffer); +free_dw: + kfree(dw); return rc; discard_data: cifs_discard_remaining_data(server); @@ -4987,7 +4956,7 @@ receive_encrypted_standard(struct TCP_Server_Info *server, server->total_read += length; buf_size = pdu_length - sizeof(struct smb2_transform_hdr); - length = decrypt_raw_data(server, buf, buf_size, NULL, 0, 0, false); + length = decrypt_raw_data(server, buf, buf_size, NULL, false); if (length) return length; @@ -5086,7 +5055,7 @@ smb3_handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid) char *buf = server->large_buf ? server->bigbuf : server->smallbuf; return handle_read_data(server, mid, buf, server->pdu_size, - NULL, 0, 0, false); + NULL, 0, false); } static int diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index a5695748a89b..66b76636660f 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -4096,10 +4096,8 @@ smb2_new_read_req(void **buf, unsigned int *total_len, struct smbd_buffer_descriptor_v1 *v1; bool need_invalidate = server->dialect == SMB30_PROT_ID; - rdata->mr = smbd_register_mr( - server->smbd_conn, rdata->pages, - rdata->nr_pages, rdata->page_offset, - rdata->tailsz, true, need_invalidate); + rdata->mr = smbd_register_mr(server->smbd_conn, &rdata->iter, + true, need_invalidate); if (!rdata->mr) return -EAGAIN; @@ -4157,11 +4155,7 @@ smb2_readv_callback(struct mid_q_entry *mid) struct cifs_credits credits = { .value = 0, .instance = 0 }; struct smb_rqst rqst = { .rq_iov = &rdata->iov[1], .rq_nvec = 1, - .rq_pages = rdata->pages, - .rq_offset = rdata->page_offset, - .rq_npages = rdata->nr_pages, - .rq_pagesz = rdata->pagesz, - .rq_tailsz = rdata->tailsz }; + .rq_iter = rdata->iter }; WARN_ONCE(rdata->server != mid->server, "rdata server %p != mid server %p", @@ -4179,6 +4173,7 @@ smb2_readv_callback(struct mid_q_entry *mid) if (server->sign && !mid->decrypted) { int rc; + iov_iter_truncate(&rqst.rq_iter, rdata->got_bytes); rc = smb2_verify_signature(&rqst, server); if (rc) cifs_tcon_dbg(VFS, "SMB signature verification returned error = %d\n", @@ -4504,7 +4499,7 @@ smb2_async_writev(struct cifs_writedata *wdata, req->VolatileFileId = wdata->cfile->fid.volatile_fid; req->WriteChannelInfoOffset = 0; req->WriteChannelInfoLength = 0; - req->Channel = 0; + req->Channel = SMB2_CHANNEL_NONE; req->Offset = cpu_to_le64(wdata->offset); req->DataOffset = cpu_to_le16( offsetof(struct smb2_write_req, Buffer)); @@ -4521,26 +4516,18 @@ smb2_async_writev(struct cifs_writedata *wdata, server->smbd_conn->rdma_readwrite_threshold) { struct smbd_buffer_descriptor_v1 *v1; + size_t data_size = iov_iter_count(&wdata->iter); bool need_invalidate = server->dialect == SMB30_PROT_ID; - wdata->mr = smbd_register_mr( - server->smbd_conn, wdata->pages, - wdata->nr_pages, wdata->page_offset, - wdata->tailsz, false, need_invalidate); + wdata->mr = smbd_register_mr(server->smbd_conn, &wdata->iter, + false, need_invalidate); if (!wdata->mr) { rc = -EAGAIN; goto async_writev_out; } req->Length = 0; req->DataOffset = 0; - if (wdata->nr_pages > 1) - req->RemainingBytes = - cpu_to_le32( - (wdata->nr_pages - 1) * wdata->pagesz - - wdata->page_offset + wdata->tailsz - ); - else - req->RemainingBytes = cpu_to_le32(wdata->tailsz); + req->RemainingBytes = cpu_to_le32(data_size); req->Channel = SMB2_CHANNEL_RDMA_V1_INVALIDATE; if (need_invalidate) req->Channel = SMB2_CHANNEL_RDMA_V1; @@ -4559,19 +4546,13 @@ smb2_async_writev(struct cifs_writedata *wdata, rqst.rq_iov = iov; rqst.rq_nvec = 1; - rqst.rq_pages = wdata->pages; - rqst.rq_offset = wdata->page_offset; - rqst.rq_npages = wdata->nr_pages; - rqst.rq_pagesz = wdata->pagesz; - rqst.rq_tailsz = wdata->tailsz; + rqst.rq_iter = wdata->iter; #ifdef CONFIG_CIFS_SMB_DIRECT - if (wdata->mr) { + if (wdata->mr) iov[0].iov_len += sizeof(struct smbd_buffer_descriptor_v1); - rqst.rq_npages = 0; - } #endif - cifs_dbg(FYI, "async write at %llu %u bytes\n", - wdata->offset, wdata->bytes); + cifs_dbg(FYI, "async write at %llu %u bytes iter=%zx\n", + wdata->offset, wdata->bytes, iov_iter_count(&rqst.rq_iter)); #ifdef CONFIG_CIFS_SMB_DIRECT /* For RDMA read, I/O size is in RemainingBytes not in Length */ diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c index 78a76752fafd..8bd320f0156e 100644 --- a/fs/cifs/smbdirect.c +++ b/fs/cifs/smbdirect.c @@ -34,12 +34,6 @@ static int smbd_post_recv( struct smbd_response *response); static int smbd_post_send_empty(struct smbd_connection *info); -static int smbd_post_send_data( - struct smbd_connection *info, - struct kvec *iov, int n_vec, int remaining_data_length); -static int smbd_post_send_page(struct smbd_connection *info, - struct page *page, unsigned long offset, - size_t size, int remaining_data_length); static void destroy_mr_list(struct smbd_connection *info); static int allocate_mr_list(struct smbd_connection *info); @@ -986,24 +980,6 @@ static int smbd_post_send_sgl(struct smbd_connection *info, return rc; } -/* - * Send a page - * page: the page to send - * offset: offset in the page to send - * size: length in the page to send - * remaining_data_length: remaining data to send in this payload - */ -static int smbd_post_send_page(struct smbd_connection *info, struct page *page, - unsigned long offset, size_t size, int remaining_data_length) -{ - struct scatterlist sgl; - - sg_init_table(&sgl, 1); - sg_set_page(&sgl, page, size, offset); - - return smbd_post_send_sgl(info, &sgl, size, remaining_data_length); -} - /* * Send an empty message * Empty message is used to extend credits to peer to for keep live @@ -1015,35 +991,6 @@ static int smbd_post_send_empty(struct smbd_connection *info) return smbd_post_send_sgl(info, NULL, 0, 0); } -/* - * Send a data buffer - * iov: the iov array describing the data buffers - * n_vec: number of iov array - * remaining_data_length: remaining data to send following this packet - * in segmented SMBD packet - */ -static int smbd_post_send_data( - struct smbd_connection *info, struct kvec *iov, int n_vec, - int remaining_data_length) -{ - int i; - u32 data_length = 0; - struct scatterlist sgl[SMBDIRECT_MAX_SEND_SGE - 1]; - - if (n_vec > SMBDIRECT_MAX_SEND_SGE - 1) { - cifs_dbg(VFS, "Can't fit data to SGL, n_vec=%d\n", n_vec); - return -EINVAL; - } - - sg_init_table(sgl, n_vec); - for (i = 0; i < n_vec; i++) { - data_length += iov[i].iov_len; - sg_set_buf(&sgl[i], iov[i].iov_base, iov[i].iov_len); - } - - return smbd_post_send_sgl(info, sgl, data_length, remaining_data_length); -} - /* * Post a receive request to the transport * The remote peer can only send data when a receive request is posted @@ -1976,6 +1923,42 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg) return rc; } +/* + * Send the contents of an iterator + * @iter: The iterator to send + * @_remaining_data_length: remaining data to send in this payload + */ +static int smbd_post_send_iter(struct smbd_connection *info, + struct iov_iter *iter, + int *_remaining_data_length) +{ + struct scatterlist sgl[SMBDIRECT_MAX_SEND_SGE - 1]; + unsigned int max_payload = info->max_send_size - sizeof(struct smbd_data_transfer); + unsigned int cleanup_mode; + ssize_t rc; + + do { + struct sg_table sgtable = { .sgl = sgl }; + size_t maxlen = min_t(size_t, *_remaining_data_length, max_payload); + + sg_init_table(sgtable.sgl, ARRAY_SIZE(sgl)); + rc = netfs_extract_iter_to_sg(iter, maxlen, + &sgtable, ARRAY_SIZE(sgl), + &cleanup_mode); + if (rc < 0) + break; + if (WARN_ON_ONCE(sgtable.nents == 0)) + return -EIO; + WARN_ON(cleanup_mode != 0); + + sg_mark_end(&sgl[sgtable.nents - 1]); + *_remaining_data_length -= rc; + rc = smbd_post_send_sgl(info, sgl, rc, *_remaining_data_length); + } while (rc == 0 && iov_iter_count(iter) > 0); + + return rc; +} + /* * Send data to transport * Each rqst is transported as a SMBDirect payload @@ -1986,18 +1969,10 @@ int smbd_send(struct TCP_Server_Info *server, int num_rqst, struct smb_rqst *rqst_array) { struct smbd_connection *info = server->smbd_conn; - struct kvec vecs[SMBDIRECT_MAX_SEND_SGE - 1]; - int nvecs; - int size; - unsigned int buflen, remaining_data_length; - unsigned int offset, remaining_vec_data_length; - int start, i, j; - int max_iov_size = - info->max_send_size - sizeof(struct smbd_data_transfer); - struct kvec *iov; - int rc; struct smb_rqst *rqst; - int rqst_idx; + struct iov_iter iter; + unsigned int remaining_data_length, klen; + int rc, i, rqst_idx; if (info->transport_status != SMBD_CONNECTED) return -EAGAIN; @@ -2024,84 +1999,36 @@ int smbd_send(struct TCP_Server_Info *server, rqst_idx = 0; do { rqst = &rqst_array[rqst_idx]; - iov = rqst->rq_iov; cifs_dbg(FYI, "Sending smb (RDMA): idx=%d smb_len=%lu\n", - rqst_idx, smb_rqst_len(server, rqst)); - remaining_vec_data_length = 0; - for (i = 0; i < rqst->rq_nvec; i++) { - remaining_vec_data_length += iov[i].iov_len; - dump_smb(iov[i].iov_base, iov[i].iov_len); - } - - log_write(INFO, "rqst_idx=%d nvec=%d rqst->rq_npages=%d rq_pagesz=%d rq_tailsz=%d buflen=%lu\n", - rqst_idx, rqst->rq_nvec, - rqst->rq_npages, rqst->rq_pagesz, - rqst->rq_tailsz, smb_rqst_len(server, rqst)); - - start = 0; - offset = 0; - do { - buflen = 0; - i = start; - j = 0; - while (i < rqst->rq_nvec && - j < SMBDIRECT_MAX_SEND_SGE - 1 && - buflen < max_iov_size) { - - vecs[j].iov_base = iov[i].iov_base + offset; - if (buflen + iov[i].iov_len > max_iov_size) { - vecs[j].iov_len = - max_iov_size - iov[i].iov_len; - buflen = max_iov_size; - offset = vecs[j].iov_len; - } else { - vecs[j].iov_len = - iov[i].iov_len - offset; - buflen += vecs[j].iov_len; - offset = 0; - ++i; - } - ++j; - } + rqst_idx, smb_rqst_len(server, rqst)); + for (i = 0; i < rqst->rq_nvec; i++) + dump_smb(rqst->rq_iov[i].iov_base, rqst->rq_iov[i].iov_len); + + log_write(INFO, "RDMA-WR[%u] nvec=%d len=%u iter=%zu rqlen=%lu\n", + rqst_idx, rqst->rq_nvec, remaining_data_length, + iov_iter_count(&rqst->rq_iter), smb_rqst_len(server, rqst)); + + /* Send the metadata pages. */ + klen = 0; + for (i = 0; i < rqst->rq_nvec; i++) + klen += rqst->rq_iov[i].iov_len; + iov_iter_kvec(&iter, WRITE, rqst->rq_iov, rqst->rq_nvec, klen); + + rc = smbd_post_send_iter(info, &iter, &remaining_data_length); + if (rc < 0) + break; - remaining_vec_data_length -= buflen; - remaining_data_length -= buflen; - log_write(INFO, "sending %s iov[%d] from start=%d nvecs=%d remaining_data_length=%d\n", - remaining_vec_data_length > 0 ? - "partial" : "complete", - rqst->rq_nvec, start, j, - remaining_data_length); - - start = i; - rc = smbd_post_send_data(info, vecs, j, remaining_data_length); - if (rc) - goto done; - } while (remaining_vec_data_length > 0); - - /* now sending pages if there are any */ - for (i = 0; i < rqst->rq_npages; i++) { - rqst_page_get_length(rqst, i, &buflen, &offset); - nvecs = (buflen + max_iov_size - 1) / max_iov_size; - log_write(INFO, "sending pages buflen=%d nvecs=%d\n", - buflen, nvecs); - for (j = 0; j < nvecs; j++) { - size = min_t(unsigned int, max_iov_size, remaining_data_length); - remaining_data_length -= size; - log_write(INFO, "sending pages i=%d offset=%d size=%d remaining_data_length=%d\n", - i, j * max_iov_size + offset, size, - remaining_data_length); - rc = smbd_post_send_page( - info, rqst->rq_pages[i], - j*max_iov_size + offset, - size, remaining_data_length); - if (rc) - goto done; - } + if (iov_iter_count(&rqst->rq_iter) > 0) { + /* And then the data pages if there are any */ + rc = smbd_post_send_iter(info, &rqst->rq_iter, + &remaining_data_length); + if (rc < 0) + break; } + } while (++rqst_idx < num_rqst); -done: /* * As an optimization, we don't wait for individual I/O to finish * before sending the next one. @@ -2305,27 +2232,49 @@ static struct smbd_mr *get_mr(struct smbd_connection *info) goto again; } +/* + * Transcribe the pages from an iterator into an MR scatterlist. + * @iter: The iterator to transcribe + * @_remaining_data_length: remaining data to send in this payload + */ +static int smbd_iter_to_mr(struct smbd_connection *info, + struct iov_iter *iter, + struct scatterlist *sgl, + unsigned int num_pages) +{ + struct sg_table sgtable = { .sgl = sgl }; + unsigned int cleanup_mode; + int ret; + + sg_init_table(sgl, num_pages); + + ret = netfs_extract_iter_to_sg(iter, iov_iter_count(iter), + &sgtable, num_pages, &cleanup_mode); + WARN_ON(ret < 0); + return ret; +} + /* * Register memory for RDMA read/write - * pages[]: the list of pages to register memory with - * num_pages: the number of pages to register - * tailsz: if non-zero, the bytes to register in the last page + * iter: the buffer to register memory with * writing: true if this is a RDMA write (SMB read), false for RDMA read * need_invalidate: true if this MR needs to be locally invalidated after I/O * return value: the MR registered, NULL if failed. */ -struct smbd_mr *smbd_register_mr( - struct smbd_connection *info, struct page *pages[], int num_pages, - int offset, int tailsz, bool writing, bool need_invalidate) +struct smbd_mr *smbd_register_mr(struct smbd_connection *info, + struct iov_iter *iter, + bool writing, bool need_invalidate) { struct smbd_mr *smbdirect_mr; - int rc, i; + int rc, num_pages; enum dma_data_direction dir; struct ib_reg_wr *reg_wr; + num_pages = iov_iter_npages(iter, info->max_frmr_depth + 1); if (num_pages > info->max_frmr_depth) { log_rdma_mr(ERR, "num_pages=%d max_frmr_depth=%d\n", num_pages, info->max_frmr_depth); + WARN_ON_ONCE(1); return NULL; } @@ -2334,32 +2283,16 @@ struct smbd_mr *smbd_register_mr( log_rdma_mr(ERR, "get_mr returning NULL\n"); return NULL; } + + dir = writing ? DMA_FROM_DEVICE : DMA_TO_DEVICE; + smbdirect_mr->dir = dir; smbdirect_mr->need_invalidate = need_invalidate; smbdirect_mr->sgl_count = num_pages; - sg_init_table(smbdirect_mr->sgl, num_pages); - - log_rdma_mr(INFO, "num_pages=0x%x offset=0x%x tailsz=0x%x\n", - num_pages, offset, tailsz); - - if (num_pages == 1) { - sg_set_page(&smbdirect_mr->sgl[0], pages[0], tailsz, offset); - goto skip_multiple_pages; - } - /* We have at least two pages to register */ - sg_set_page( - &smbdirect_mr->sgl[0], pages[0], PAGE_SIZE - offset, offset); - i = 1; - while (i < num_pages - 1) { - sg_set_page(&smbdirect_mr->sgl[i], pages[i], PAGE_SIZE, 0); - i++; - } - sg_set_page(&smbdirect_mr->sgl[i], pages[i], - tailsz ? tailsz : PAGE_SIZE, 0); + log_rdma_mr(INFO, "num_pages=0x%x count=0x%zx\n", + num_pages, iov_iter_count(iter)); + smbd_iter_to_mr(info, iter, smbdirect_mr->sgl, num_pages); -skip_multiple_pages: - dir = writing ? DMA_FROM_DEVICE : DMA_TO_DEVICE; - smbdirect_mr->dir = dir; rc = ib_dma_map_sg(info->id->device, smbdirect_mr->sgl, num_pages, dir); if (!rc) { log_rdma_mr(ERR, "ib_dma_map_sg num_pages=%x dir=%x rc=%x\n", diff --git a/fs/cifs/smbdirect.h b/fs/cifs/smbdirect.h index 207ef979cd51..be2cf18b7fec 100644 --- a/fs/cifs/smbdirect.h +++ b/fs/cifs/smbdirect.h @@ -302,8 +302,8 @@ struct smbd_mr { /* Interfaces to register and deregister MR for RDMA read/write */ struct smbd_mr *smbd_register_mr( - struct smbd_connection *info, struct page *pages[], int num_pages, - int offset, int tailsz, bool writing, bool need_invalidate); + struct smbd_connection *info, struct iov_iter *iter, + bool writing, bool need_invalidate); int smbd_deregister_mr(struct smbd_mr *mr); #else diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c index 3851d0aaa288..f39724093993 100644 --- a/fs/cifs/transport.c +++ b/fs/cifs/transport.c @@ -270,26 +270,7 @@ smb_rqst_len(struct TCP_Server_Info *server, struct smb_rqst *rqst) for (i = 0; i < nvec; i++) buflen += iov[i].iov_len; - /* - * Add in the page array if there is one. The caller needs to make - * sure rq_offset and rq_tailsz are set correctly. If a buffer of - * multiple pages ends at page boundary, rq_tailsz needs to be set to - * PAGE_SIZE. - */ - if (rqst->rq_npages) { - if (rqst->rq_npages == 1) - buflen += rqst->rq_tailsz; - else { - /* - * If there is more than one page, calculate the - * buffer length based on rq_offset and rq_tailsz - */ - buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) - - rqst->rq_offset; - buflen += rqst->rq_tailsz; - } - } - + buflen += iov_iter_count(&rqst->rq_iter); return buflen; } @@ -376,23 +357,15 @@ __smb_send_rqst(struct TCP_Server_Info *server, int num_rqst, total_len += sent; - /* now walk the page array and send each page in it */ - for (i = 0; i < rqst[j].rq_npages; i++) { - struct bio_vec bvec; - - bvec.bv_page = rqst[j].rq_pages[i]; - rqst_page_get_length(&rqst[j], i, &bvec.bv_len, - &bvec.bv_offset); - - iov_iter_bvec(&smb_msg.msg_iter, ITER_SOURCE, - &bvec, 1, bvec.bv_len); + if (iov_iter_count(&rqst[j].rq_iter) > 0) { + smb_msg.msg_iter = rqst[j].rq_iter; rc = smb_send_kvec(server, &smb_msg, &sent); if (rc < 0) break; - total_len += sent; } - } + +} unmask: sigprocmask(SIG_SETMASK, &oldmask, NULL); @@ -1640,11 +1613,11 @@ int cifs_discard_remaining_data(struct TCP_Server_Info *server) { unsigned int rfclen = server->pdu_size; - int remaining = rfclen + HEADER_PREAMBLE_SIZE(server) - + size_t remaining = rfclen + HEADER_PREAMBLE_SIZE(server) - server->total_read; while (remaining > 0) { - int length; + ssize_t length; length = cifs_discard_from_socket(server, min_t(size_t, remaining, @@ -1790,10 +1763,18 @@ cifs_readv_receive(struct TCP_Server_Info *server, struct mid_q_entry *mid) return cifs_readv_discard(server, mid); } - length = rdata->read_into_pages(server, rdata, data_len); - if (length < 0) - return length; - +#ifdef CONFIG_CIFS_SMB_DIRECT + if (rdata->mr) + length = data_len; /* An RDMA read is already done. */ + else +#endif + { + length = cifs_read_iter_from_socket(server, &rdata->iter, + data_len); + iov_iter_revert(&rdata->iter, data_len); + } + if (length > 0) + rdata->got_bytes += length; server->total_read += length; cifs_dbg(FYI, "total_read=%u buflen=%u remaining=%u\n", From patchwork Mon Jan 16 23:11:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44376 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1455972wrn; Mon, 16 Jan 2023 15:37:29 -0800 (PST) X-Google-Smtp-Source: AMrXdXsV/FmrrSBT3nJziTsFhmgo2Xqic9OWqMCMQpQLv8UofE/mIY0TbJLNYMXrekKMo/rGkVVz X-Received: by 2002:a05:6a20:8f21:b0:b5:389e:870e with SMTP id b33-20020a056a208f2100b000b5389e870emr577688pzk.4.1673912249361; Mon, 16 Jan 2023 15:37:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673912249; cv=none; d=google.com; s=arc-20160816; b=nJ7atbrCBrPm8gJqLyp4qfp7ZA3EPe5HmHm26VgwgPsfYbz1zLt0eAZhNDoUocmLBk clH15YnhLKrlGn+gmqV6ctssJ1Z+wy5HzuUZeurs3ly2cGiX4KYs+tkM6cqcoSgMhtiY BFZwmJIYwRVv+HZq4m3S4MEnhHzx3Sj6cCUqEX1DLkMR9XKOfT4XTWdwmrA2JgvhjGvw 9RSY+UIMfysZdGpSwiusLsfOUui6rskAMyCZsSumwAU+I99lR0GugWr41m8ZNZdBYjbW 5xcS+WuyjL+iZre/dvRbHU95kkg8GVWEHmgB33mUz8v06yJ34kMX9G2ZYCPQ/QDbby0h ld3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=74/QgH+XJrzH5EAulnEvBRSGrOR7PfUP7fIYYmuq8L8=; b=zk4Qo1mdkejBW+hykUPo6bmO3QSaFxM/aKuHwaz0kCiZ8IyEzVWHcq5vJ+//HuNxY1 VY2Vw19hTOhUxHu9L1P3WyydRHuo5UPi1LG2AoUZdAc4ScNnF/vRztOPXlsxW2ErB0dy GB0gcqgf8/Va4ev1rvANcEilKyKdcEi7JsXQEuQdcFb/RP6hdNrBgG0XB2Tlz0BW+EzO +PXLngw0LItMPp+GiJxy7rDxx9PUjP2ITJ2Dko3HfyVHZXU1omc9iNdZ9TSkcvpWcov4 JuFm1MNYhjZgH8peqwbflicewwjImO0locfZTCF83vWoGSqCaEanw0EEU+TctJkX+SYJ UVsw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=OxjjhlFF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u12-20020a63d34c000000b00482c6f5971asi30497265pgi.280.2023.01.16.15.37.16; Mon, 16 Jan 2023 15:37:29 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=OxjjhlFF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235485AbjAPXVP (ORCPT + 99 others); Mon, 16 Jan 2023 18:21:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54318 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235544AbjAPXSl (ORCPT ); Mon, 16 Jan 2023 18:18:41 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5677331E04 for ; Mon, 16 Jan 2023 15:11:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910699; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=74/QgH+XJrzH5EAulnEvBRSGrOR7PfUP7fIYYmuq8L8=; b=OxjjhlFFbfbmx48Jh2pQFaxhuFUxR993IRCFfuSDu6y/mBB/LW58bmQwWrpTZTmNcDe4TV iua0SCYoebely+wmI9uO9KZFUym36BxILeogd638Ol465nGGhw9E9F46u/QkxqvQzZx4xj QaxQCtikZ5+lNdp3dt+0ul333flZu3E= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-671-uQyuEPLfMXSz_xVa0tooCA-1; Mon, 16 Jan 2023 18:11:35 -0500 X-MC-Unique: uQyuEPLfMXSz_xVa0tooCA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 777EA1869B60; Mon, 16 Jan 2023 23:11:34 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9911B2166B26; Mon, 16 Jan 2023 23:11:32 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 29/34] cifs: Build the RDMA SGE list directly from an iterator From: David Howells To: Al Viro Cc: Steve French , Shyam Prasad N , Rohith Surabattula , Tom Talpey , Jeff Layton , linux-cifs@vger.kernel.org, linux-rdma@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:11:32 +0000 Message-ID: <167391069208.2311931.17037009522123506578.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755224210596501410?= X-GMAIL-MSGID: =?utf-8?q?1755224210596501410?= In the depths of the cifs RDMA code, extract part of an iov iterator directly into an SGE list without going through an intermediate scatterlist. Note that this doesn't support extraction from an IOBUF- or UBUF-type iterator (ie. user-supplied buffer). The assumption is that the higher layers will extract those to a BVEC-type iterator first and do whatever is required to stop the pages from going away. Signed-off-by: David Howells cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: Tom Talpey cc: Jeff Layton cc: linux-cifs@vger.kernel.org cc: linux-rdma@vger.kernel.org Link: https://lore.kernel.org/r/166697260361.61150.5064013393408112197.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732032518.3186319.1859601819981624629.stgit@warthog.procyon.org.uk/ # rfc --- fs/cifs/smbdirect.c | 111 ++++++++++++++++++--------------------------------- 1 file changed, 39 insertions(+), 72 deletions(-) diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c index 8bd320f0156e..4691b5a8e1ff 100644 --- a/fs/cifs/smbdirect.c +++ b/fs/cifs/smbdirect.c @@ -828,16 +828,16 @@ static int smbd_post_send(struct smbd_connection *info, return rc; } -static int smbd_post_send_sgl(struct smbd_connection *info, - struct scatterlist *sgl, int data_length, int remaining_data_length) +static int smbd_post_send_iter(struct smbd_connection *info, + struct iov_iter *iter, + int *_remaining_data_length) { - int num_sgs; int i, rc; int header_length; + int data_length; struct smbd_request *request; struct smbd_data_transfer *packet; int new_credits; - struct scatterlist *sg; wait_credit: /* Wait for send credits. A SMBD packet needs one credit */ @@ -881,6 +881,30 @@ static int smbd_post_send_sgl(struct smbd_connection *info, } request->info = info; + memset(request->sge, 0, sizeof(request->sge)); + + /* Fill in the data payload to find out how much data we can add */ + if (iter) { + struct smb_extract_to_rdma extract = { + .nr_sge = 1, + .max_sge = SMBDIRECT_MAX_SEND_SGE, + .sge = request->sge, + .device = info->id->device, + .local_dma_lkey = info->pd->local_dma_lkey, + .direction = DMA_TO_DEVICE, + }; + + rc = smb_extract_iter_to_rdma(iter, *_remaining_data_length, + &extract); + if (rc < 0) + goto err_dma; + data_length = rc; + request->num_sge = extract.nr_sge; + *_remaining_data_length -= data_length; + } else { + data_length = 0; + request->num_sge = 1; + } /* Fill in the packet header */ packet = smbd_request_payload(request); @@ -902,7 +926,7 @@ static int smbd_post_send_sgl(struct smbd_connection *info, else packet->data_offset = cpu_to_le32(24); packet->data_length = cpu_to_le32(data_length); - packet->remaining_data_length = cpu_to_le32(remaining_data_length); + packet->remaining_data_length = cpu_to_le32(*_remaining_data_length); packet->padding = 0; log_outgoing(INFO, "credits_requested=%d credits_granted=%d data_offset=%d data_length=%d remaining_data_length=%d\n", @@ -918,7 +942,6 @@ static int smbd_post_send_sgl(struct smbd_connection *info, if (!data_length) header_length = offsetof(struct smbd_data_transfer, padding); - request->num_sge = 1; request->sge[0].addr = ib_dma_map_single(info->id->device, (void *)packet, header_length, @@ -932,23 +955,6 @@ static int smbd_post_send_sgl(struct smbd_connection *info, request->sge[0].length = header_length; request->sge[0].lkey = info->pd->local_dma_lkey; - /* Fill in the packet data payload */ - num_sgs = sgl ? sg_nents(sgl) : 0; - for_each_sg(sgl, sg, num_sgs, i) { - request->sge[i+1].addr = - ib_dma_map_page(info->id->device, sg_page(sg), - sg->offset, sg->length, DMA_TO_DEVICE); - if (ib_dma_mapping_error( - info->id->device, request->sge[i+1].addr)) { - rc = -EIO; - request->sge[i+1].addr = 0; - goto err_dma; - } - request->sge[i+1].length = sg->length; - request->sge[i+1].lkey = info->pd->local_dma_lkey; - request->num_sge++; - } - rc = smbd_post_send(info, request); if (!rc) return 0; @@ -987,8 +993,10 @@ static int smbd_post_send_sgl(struct smbd_connection *info, */ static int smbd_post_send_empty(struct smbd_connection *info) { + int remaining_data_length = 0; + info->count_send_empty++; - return smbd_post_send_sgl(info, NULL, 0, 0); + return smbd_post_send_iter(info, NULL, &remaining_data_length); } /* @@ -1923,42 +1931,6 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg) return rc; } -/* - * Send the contents of an iterator - * @iter: The iterator to send - * @_remaining_data_length: remaining data to send in this payload - */ -static int smbd_post_send_iter(struct smbd_connection *info, - struct iov_iter *iter, - int *_remaining_data_length) -{ - struct scatterlist sgl[SMBDIRECT_MAX_SEND_SGE - 1]; - unsigned int max_payload = info->max_send_size - sizeof(struct smbd_data_transfer); - unsigned int cleanup_mode; - ssize_t rc; - - do { - struct sg_table sgtable = { .sgl = sgl }; - size_t maxlen = min_t(size_t, *_remaining_data_length, max_payload); - - sg_init_table(sgtable.sgl, ARRAY_SIZE(sgl)); - rc = netfs_extract_iter_to_sg(iter, maxlen, - &sgtable, ARRAY_SIZE(sgl), - &cleanup_mode); - if (rc < 0) - break; - if (WARN_ON_ONCE(sgtable.nents == 0)) - return -EIO; - WARN_ON(cleanup_mode != 0); - - sg_mark_end(&sgl[sgtable.nents - 1]); - *_remaining_data_length -= rc; - rc = smbd_post_send_sgl(info, sgl, rc, *_remaining_data_length); - } while (rc == 0 && iov_iter_count(iter) > 0); - - return rc; -} - /* * Send data to transport * Each rqst is transported as a SMBDirect payload @@ -2240,16 +2212,17 @@ static struct smbd_mr *get_mr(struct smbd_connection *info) static int smbd_iter_to_mr(struct smbd_connection *info, struct iov_iter *iter, struct scatterlist *sgl, - unsigned int num_pages) + unsigned int num_pages, + bool writing) { struct sg_table sgtable = { .sgl = sgl }; - unsigned int cleanup_mode; int ret; sg_init_table(sgl, num_pages); ret = netfs_extract_iter_to_sg(iter, iov_iter_count(iter), - &sgtable, num_pages, &cleanup_mode); + &sgtable, num_pages, + writing ? FOLL_SOURCE_BUF : FOLL_DEST_BUF); WARN_ON(ret < 0); return ret; } @@ -2291,7 +2264,7 @@ struct smbd_mr *smbd_register_mr(struct smbd_connection *info, log_rdma_mr(INFO, "num_pages=0x%x count=0x%zx\n", num_pages, iov_iter_count(iter)); - smbd_iter_to_mr(info, iter, smbdirect_mr->sgl, num_pages); + smbd_iter_to_mr(info, iter, smbdirect_mr->sgl, num_pages, writing); rc = ib_dma_map_sg(info->id->device, smbdirect_mr->sgl, num_pages, dir); if (!rc) { @@ -2602,13 +2575,6 @@ static ssize_t smb_extract_iter_to_rdma(struct iov_iter *iter, size_t len, ssize_t ret; int before = rdma->nr_sge; - if (iov_iter_is_discard(iter) || - iov_iter_is_pipe(iter) || - user_backed_iter(iter)) { - WARN_ON_ONCE(1); - return -EIO; - } - switch (iov_iter_type(iter)) { case ITER_BVEC: ret = smb_extract_bvec_to_rdma(iter, rdma, len); @@ -2620,7 +2586,8 @@ static ssize_t smb_extract_iter_to_rdma(struct iov_iter *iter, size_t len, ret = smb_extract_xarray_to_rdma(iter, rdma, len); break; default: - BUG(); + WARN_ON_ONCE(1); + return -EIO; } if (ret > 0) { From patchwork Mon Jan 16 23:11:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44377 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1455984wrn; Mon, 16 Jan 2023 15:37:31 -0800 (PST) X-Google-Smtp-Source: AMrXdXujt2aY3OnUwCWWOd81X3bvmKmeCBzCSjYWSA6N1UzwFZRdoys+yA6ihkPslojwk9BudvWS X-Received: by 2002:a05:6402:501e:b0:49e:eb5:ed05 with SMTP id p30-20020a056402501e00b0049e0eb5ed05mr424010eda.9.1673912251439; Mon, 16 Jan 2023 15:37:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673912251; cv=none; d=google.com; s=arc-20160816; b=M0wqe2g5Lp9hapnc01E3eUPlb6cRIrnMyY8LI5BWEWDWKE/s1yBa7tXLkTchIa3js7 tctpVRVEiS5ivo/SZop88zIk8Ns4TEi9EziadY6TCNyfRcQujzmPbkX1pH8GgaIO1JEi WLtNVPPKfT51XjtoMbDCfc5Ni1mScfuSU72KH+BhpxkEb0YaS4mYOAxX/4uK4ddIIUZO rsrOymku5WWTkJ6PF6S1+I4QPP99UeN2SjjB084PuCIvvX8c0N6aLZrctismtghk+M3W Y2J/L2TuUGWmD9Db0qADwX3rlwIXKNpKFtit0P1VLGXqk/sND9JjQQDieHT7bw2dLMU9 H/eA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=M+1LABBnnU+RnWR128vQInrdJjzioo+01iOiO20pla8=; b=N2PMO9Eyf1QEa4tVt+iKuqXUiFiwsDrooosm3xl7UX/DCxHO4qoYfvLeQi9yvjQ3fr wVGBvW3UXiSvwVBLatNNN2bj4ESjGqSPBiBHkZ06rFznSphBoFNchgEnaZxHeGuIs4Ph iH8xhajRE6G2jZ1GYCDDxJy4juGMtHgUI3nRdG9AkPi0dVC2x9TQg5EeoGEmYa+TX1Es hlf9HgUwFfGNK31jm/wW9eg5XAXbcUVrf5Xw7ovTGpFwdqURdZDw22Ey8bqlufD8CTS8 R+sduOcsK4q/BnS7iSfC55UKjmeZlIVWKDjODRYLmhoVPxPEKysbo4bTtWUP0xTzPPI1 ZQzA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=bincFhq0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f13-20020a0564021e8d00b0049d221b4b39si12752643edf.187.2023.01.16.15.37.07; Mon, 16 Jan 2023 15:37:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=bincFhq0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235178AbjAPXUj (ORCPT + 99 others); Mon, 16 Jan 2023 18:20:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32860 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235663AbjAPXTO (ORCPT ); Mon, 16 Jan 2023 18:19:14 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7F3AF3250C for ; Mon, 16 Jan 2023 15:12:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910705; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=M+1LABBnnU+RnWR128vQInrdJjzioo+01iOiO20pla8=; b=bincFhq0+noTuMlHdX/Z++C+AaH7LBArVdhrfHnuUBrrJ/MbJ0ZNvAH6KtYuUWUYKxYgWv GungURxe+QCkKswOR5/EvxIpTyvmYntqSeTcU+PAnoJRDG8hBVDnR9GV6ZYOYuzjKgX6DP fSg9DcOnwo+gZ0q9TxzB0n4vI07caqk= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-145-h2ESI8cHOX60ntS-S9i6nQ-1; Mon, 16 Jan 2023 18:11:42 -0500 X-MC-Unique: h2ESI8cHOX60ntS-S9i6nQ-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id F0EBD380390A; Mon, 16 Jan 2023 23:11:41 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2B970492B10; Mon, 16 Jan 2023 23:11:40 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 30/34] cifs: Remove unused code From: David Howells To: Al Viro Cc: Steve French , Shyam Prasad N , Rohith Surabattula , Jeff Layton , linux-cifs@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:11:39 +0000 Message-ID: <167391069962.2311931.2392376351847891810.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755224213167568197?= X-GMAIL-MSGID: =?utf-8?q?1755224213167568197?= Remove a bunch of functions that are no longer used and are commented out after the conversion to use iterators throughout the I/O path. Signed-off-by: David Howells cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: Jeff Layton cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/164928621823.457102.8777804402615654773.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165211421039.3154751.15199634443157779005.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165348881165.2106726.2993852968344861224.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165364827876.3334034.9331465096417303889.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/166126396915.708021.2010212654244139442.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/166697261080.61150.17513116912567922274.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732033255.3186319.5527423437137895940.stgit@warthog.procyon.org.uk/ # rfc --- fs/cifs/file.c | 606 -------------------------------------------------------- 1 file changed, 606 deletions(-) diff --git a/fs/cifs/file.c b/fs/cifs/file.c index cfa8ad8a59c4..6baf591f63a3 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -2603,314 +2603,6 @@ static int cifs_partialpagewrite(struct page *page, unsigned from, unsigned to) return rc; } -#if 0 // TODO: Remove for iov_iter support -static struct cifs_writedata * -wdata_alloc_and_fillpages(pgoff_t tofind, struct address_space *mapping, - pgoff_t end, pgoff_t *index, - unsigned int *found_pages) -{ - struct cifs_writedata *wdata; - - wdata = cifs_writedata_alloc((unsigned int)tofind, - cifs_writev_complete); - if (!wdata) - return NULL; - - *found_pages = find_get_pages_range_tag(mapping, index, end, - PAGECACHE_TAG_DIRTY, tofind, wdata->pages); - return wdata; -} - -static unsigned int -wdata_prepare_pages(struct cifs_writedata *wdata, unsigned int found_pages, - struct address_space *mapping, - struct writeback_control *wbc, - pgoff_t end, pgoff_t *index, pgoff_t *next, bool *done) -{ - unsigned int nr_pages = 0, i; - struct page *page; - - for (i = 0; i < found_pages; i++) { - page = wdata->pages[i]; - /* - * At this point we hold neither the i_pages lock nor the - * page lock: the page may be truncated or invalidated - * (changing page->mapping to NULL), or even swizzled - * back from swapper_space to tmpfs file mapping - */ - - if (nr_pages == 0) - lock_page(page); - else if (!trylock_page(page)) - break; - - if (unlikely(page->mapping != mapping)) { - unlock_page(page); - break; - } - - if (!wbc->range_cyclic && page->index > end) { - *done = true; - unlock_page(page); - break; - } - - if (*next && (page->index != *next)) { - /* Not next consecutive page */ - unlock_page(page); - break; - } - - if (wbc->sync_mode != WB_SYNC_NONE) - wait_on_page_writeback(page); - - if (PageWriteback(page) || - !clear_page_dirty_for_io(page)) { - unlock_page(page); - break; - } - - /* - * This actually clears the dirty bit in the radix tree. - * See cifs_writepage() for more commentary. - */ - set_page_writeback(page); - if (page_offset(page) >= i_size_read(mapping->host)) { - *done = true; - unlock_page(page); - end_page_writeback(page); - break; - } - - wdata->pages[i] = page; - *next = page->index + 1; - ++nr_pages; - } - - /* reset index to refind any pages skipped */ - if (nr_pages == 0) - *index = wdata->pages[0]->index + 1; - - /* put any pages we aren't going to use */ - for (i = nr_pages; i < found_pages; i++) { - put_page(wdata->pages[i]); - wdata->pages[i] = NULL; - } - - return nr_pages; -} - -static int -wdata_send_pages(struct cifs_writedata *wdata, unsigned int nr_pages, - struct address_space *mapping, struct writeback_control *wbc) -{ - int rc; - - wdata->sync_mode = wbc->sync_mode; - wdata->nr_pages = nr_pages; - wdata->offset = page_offset(wdata->pages[0]); - wdata->pagesz = PAGE_SIZE; - wdata->tailsz = min(i_size_read(mapping->host) - - page_offset(wdata->pages[nr_pages - 1]), - (loff_t)PAGE_SIZE); - wdata->bytes = ((nr_pages - 1) * PAGE_SIZE) + wdata->tailsz; - wdata->pid = wdata->cfile->pid; - - rc = adjust_credits(wdata->server, &wdata->credits, wdata->bytes); - if (rc) - return rc; - - if (wdata->cfile->invalidHandle) - rc = -EAGAIN; - else - rc = wdata->server->ops->async_writev(wdata, - cifs_writedata_release); - - return rc; -} - -static int -cifs_writepage_locked(struct page *page, struct writeback_control *wbc); - -static int cifs_write_one_page(struct page *page, struct writeback_control *wbc, - void *data) -{ - struct address_space *mapping = data; - int ret; - - ret = cifs_writepage_locked(page, wbc); - unlock_page(page); - mapping_set_error(mapping, ret); - return ret; -} - -static int cifs_writepages(struct address_space *mapping, - struct writeback_control *wbc) -{ - struct inode *inode = mapping->host; - struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); - struct TCP_Server_Info *server; - bool done = false, scanned = false, range_whole = false; - pgoff_t end, index; - struct cifs_writedata *wdata; - struct cifsFileInfo *cfile = NULL; - int rc = 0; - int saved_rc = 0; - unsigned int xid; - - /* - * If wsize is smaller than the page cache size, default to writing - * one page at a time. - */ - if (cifs_sb->ctx->wsize < PAGE_SIZE) - return write_cache_pages(mapping, wbc, cifs_write_one_page, - mapping); - - xid = get_xid(); - if (wbc->range_cyclic) { - index = mapping->writeback_index; /* Start from prev offset */ - end = -1; - } else { - index = wbc->range_start >> PAGE_SHIFT; - end = wbc->range_end >> PAGE_SHIFT; - if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) - range_whole = true; - scanned = true; - } - server = cifs_pick_channel(cifs_sb_master_tcon(cifs_sb)->ses); - -retry: - while (!done && index <= end) { - unsigned int i, nr_pages, found_pages, wsize; - pgoff_t next = 0, tofind, saved_index = index; - struct cifs_credits credits_on_stack; - struct cifs_credits *credits = &credits_on_stack; - int get_file_rc = 0; - - if (cfile) - cifsFileInfo_put(cfile); - - rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY, &cfile); - - /* in case of an error store it to return later */ - if (rc) - get_file_rc = rc; - - rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->wsize, - &wsize, credits); - if (rc != 0) { - done = true; - break; - } - - tofind = min((wsize / PAGE_SIZE) - 1, end - index) + 1; - - wdata = wdata_alloc_and_fillpages(tofind, mapping, end, &index, - &found_pages); - if (!wdata) { - rc = -ENOMEM; - done = true; - add_credits_and_wake_if(server, credits, 0); - break; - } - - if (found_pages == 0) { - kref_put(&wdata->refcount, cifs_writedata_release); - add_credits_and_wake_if(server, credits, 0); - break; - } - - nr_pages = wdata_prepare_pages(wdata, found_pages, mapping, wbc, - end, &index, &next, &done); - - /* nothing to write? */ - if (nr_pages == 0) { - kref_put(&wdata->refcount, cifs_writedata_release); - add_credits_and_wake_if(server, credits, 0); - continue; - } - - wdata->credits = credits_on_stack; - wdata->cfile = cfile; - wdata->server = server; - cfile = NULL; - - if (!wdata->cfile) { - cifs_dbg(VFS, "No writable handle in writepages rc=%d\n", - get_file_rc); - if (is_retryable_error(get_file_rc)) - rc = get_file_rc; - else - rc = -EBADF; - } else - rc = wdata_send_pages(wdata, nr_pages, mapping, wbc); - - for (i = 0; i < nr_pages; ++i) - unlock_page(wdata->pages[i]); - - /* send failure -- clean up the mess */ - if (rc != 0) { - add_credits_and_wake_if(server, &wdata->credits, 0); - for (i = 0; i < nr_pages; ++i) { - if (is_retryable_error(rc)) - redirty_page_for_writepage(wbc, - wdata->pages[i]); - else - SetPageError(wdata->pages[i]); - end_page_writeback(wdata->pages[i]); - put_page(wdata->pages[i]); - } - if (!is_retryable_error(rc)) - mapping_set_error(mapping, rc); - } - kref_put(&wdata->refcount, cifs_writedata_release); - - if (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN) { - index = saved_index; - continue; - } - - /* Return immediately if we received a signal during writing */ - if (is_interrupt_error(rc)) { - done = true; - break; - } - - if (rc != 0 && saved_rc == 0) - saved_rc = rc; - - wbc->nr_to_write -= nr_pages; - if (wbc->nr_to_write <= 0) - done = true; - - index = next; - } - - if (!scanned && !done) { - /* - * We hit the last page and there is more work to be done: wrap - * back to the start of the file - */ - scanned = true; - index = 0; - goto retry; - } - - if (saved_rc != 0) - rc = saved_rc; - - if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0)) - mapping->writeback_index = index; - - if (cfile) - cifsFileInfo_put(cfile); - free_xid(xid); - /* Indication to update ctime and mtime as close is deferred */ - set_bit(CIFS_INO_MODIFIED_ATTR, &CIFS_I(inode)->flags); - return rc; -} -#endif - /* * Extend the region to be written back to include subsequent contiguously * dirty pages if possible, but don't sleep while doing so. @@ -3505,49 +3197,6 @@ int cifs_flush(struct file *file, fl_owner_t id) return rc; } -#if 0 // TODO: Remove for iov_iter support -static int -cifs_write_allocate_pages(struct page **pages, unsigned long num_pages) -{ - int rc = 0; - unsigned long i; - - for (i = 0; i < num_pages; i++) { - pages[i] = alloc_page(GFP_KERNEL|__GFP_HIGHMEM); - if (!pages[i]) { - /* - * save number of pages we have already allocated and - * return with ENOMEM error - */ - num_pages = i; - rc = -ENOMEM; - break; - } - } - - if (rc) { - for (i = 0; i < num_pages; i++) - put_page(pages[i]); - } - return rc; -} - -static inline -size_t get_numpages(const size_t wsize, const size_t len, size_t *cur_len) -{ - size_t num_pages; - size_t clen; - - clen = min_t(const size_t, len, wsize); - num_pages = DIV_ROUND_UP(clen, PAGE_SIZE); - - if (cur_len) - *cur_len = clen; - - return num_pages; -} -#endif - static void cifs_uncached_writedata_release(struct kref *refcount) { @@ -3580,50 +3229,6 @@ cifs_uncached_writev_complete(struct work_struct *work) kref_put(&wdata->refcount, cifs_uncached_writedata_release); } -#if 0 // TODO: Remove for iov_iter support -static int -wdata_fill_from_iovec(struct cifs_writedata *wdata, struct iov_iter *from, - size_t *len, unsigned long *num_pages) -{ - size_t save_len, copied, bytes, cur_len = *len; - unsigned long i, nr_pages = *num_pages; - - save_len = cur_len; - for (i = 0; i < nr_pages; i++) { - bytes = min_t(const size_t, cur_len, PAGE_SIZE); - copied = copy_page_from_iter(wdata->pages[i], 0, bytes, from); - cur_len -= copied; - /* - * If we didn't copy as much as we expected, then that - * may mean we trod into an unmapped area. Stop copying - * at that point. On the next pass through the big - * loop, we'll likely end up getting a zero-length - * write and bailing out of it. - */ - if (copied < bytes) - break; - } - cur_len = save_len - cur_len; - *len = cur_len; - - /* - * If we have no data to send, then that probably means that - * the copy above failed altogether. That's most likely because - * the address in the iovec was bogus. Return -EFAULT and let - * the caller free anything we allocated and bail out. - */ - if (!cur_len) - return -EFAULT; - - /* - * i + 1 now represents the number of pages we actually used in - * the copy phase above. - */ - *num_pages = i + 1; - return 0; -} -#endif - static int cifs_resend_wdata(struct cifs_writedata *wdata, struct list_head *wdata_list, struct cifs_aio_ctx *ctx) @@ -4212,83 +3817,6 @@ cifs_uncached_readv_complete(struct work_struct *work) kref_put(&rdata->refcount, cifs_readdata_release); } -#if 0 // TODO: Remove for iov_iter support - -static int -uncached_fill_pages(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, struct iov_iter *iter, - unsigned int len) -{ - int result = 0; - unsigned int i; - unsigned int nr_pages = rdata->nr_pages; - unsigned int page_offset = rdata->page_offset; - - rdata->got_bytes = 0; - rdata->tailsz = PAGE_SIZE; - for (i = 0; i < nr_pages; i++) { - struct page *page = rdata->pages[i]; - size_t n; - unsigned int segment_size = rdata->pagesz; - - if (i == 0) - segment_size -= page_offset; - else - page_offset = 0; - - - if (len <= 0) { - /* no need to hold page hostage */ - rdata->pages[i] = NULL; - rdata->nr_pages--; - put_page(page); - continue; - } - - n = len; - if (len >= segment_size) - /* enough data to fill the page */ - n = segment_size; - else - rdata->tailsz = len; - len -= n; - - if (iter) - result = copy_page_from_iter( - page, page_offset, n, iter); -#ifdef CONFIG_CIFS_SMB_DIRECT - else if (rdata->mr) - result = n; -#endif - else - result = cifs_read_page_from_socket( - server, page, page_offset, n); - if (result < 0) - break; - - rdata->got_bytes += result; - } - - return rdata->got_bytes > 0 && result != -ECONNABORTED ? - rdata->got_bytes : result; -} - -static int -cifs_uncached_read_into_pages(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, unsigned int len) -{ - return uncached_fill_pages(server, rdata, NULL, len); -} - -static int -cifs_uncached_copy_into_pages(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, - struct iov_iter *iter) -{ - return uncached_fill_pages(server, rdata, iter, iter->count); -} -#endif - static int cifs_resend_rdata(struct cifs_readdata *rdata, struct list_head *rdata_list, struct cifs_aio_ctx *ctx) @@ -4901,140 +4429,6 @@ int cifs_file_mmap(struct file *file, struct vm_area_struct *vma) return rc; } -#if 0 // TODO: Remove for iov_iter support - -static void -cifs_readv_complete(struct work_struct *work) -{ - unsigned int i, got_bytes; - struct cifs_readdata *rdata = container_of(work, - struct cifs_readdata, work); - - got_bytes = rdata->got_bytes; - for (i = 0; i < rdata->nr_pages; i++) { - struct page *page = rdata->pages[i]; - - if (rdata->result == 0 || - (rdata->result == -EAGAIN && got_bytes)) { - flush_dcache_page(page); - SetPageUptodate(page); - } else - SetPageError(page); - - if (rdata->result == 0 || - (rdata->result == -EAGAIN && got_bytes)) - cifs_readpage_to_fscache(rdata->mapping->host, page); - - unlock_page(page); - - got_bytes -= min_t(unsigned int, PAGE_SIZE, got_bytes); - - put_page(page); - rdata->pages[i] = NULL; - } - kref_put(&rdata->refcount, cifs_readdata_release); -} - -static int -readpages_fill_pages(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, struct iov_iter *iter, - unsigned int len) -{ - int result = 0; - unsigned int i; - u64 eof; - pgoff_t eof_index; - unsigned int nr_pages = rdata->nr_pages; - unsigned int page_offset = rdata->page_offset; - - /* determine the eof that the server (probably) has */ - eof = CIFS_I(rdata->mapping->host)->server_eof; - eof_index = eof ? (eof - 1) >> PAGE_SHIFT : 0; - cifs_dbg(FYI, "eof=%llu eof_index=%lu\n", eof, eof_index); - - rdata->got_bytes = 0; - rdata->tailsz = PAGE_SIZE; - for (i = 0; i < nr_pages; i++) { - struct page *page = rdata->pages[i]; - unsigned int to_read = rdata->pagesz; - size_t n; - - if (i == 0) - to_read -= page_offset; - else - page_offset = 0; - - n = to_read; - - if (len >= to_read) { - len -= to_read; - } else if (len > 0) { - /* enough for partial page, fill and zero the rest */ - zero_user(page, len + page_offset, to_read - len); - n = rdata->tailsz = len; - len = 0; - } else if (page->index > eof_index) { - /* - * The VFS will not try to do readahead past the - * i_size, but it's possible that we have outstanding - * writes with gaps in the middle and the i_size hasn't - * caught up yet. Populate those with zeroed out pages - * to prevent the VFS from repeatedly attempting to - * fill them until the writes are flushed. - */ - zero_user(page, 0, PAGE_SIZE); - flush_dcache_page(page); - SetPageUptodate(page); - unlock_page(page); - put_page(page); - rdata->pages[i] = NULL; - rdata->nr_pages--; - continue; - } else { - /* no need to hold page hostage */ - unlock_page(page); - put_page(page); - rdata->pages[i] = NULL; - rdata->nr_pages--; - continue; - } - - if (iter) - result = copy_page_from_iter( - page, page_offset, n, iter); -#ifdef CONFIG_CIFS_SMB_DIRECT - else if (rdata->mr) - result = n; -#endif - else - result = cifs_read_page_from_socket( - server, page, page_offset, n); - if (result < 0) - break; - - rdata->got_bytes += result; - } - - return rdata->got_bytes > 0 && result != -ECONNABORTED ? - rdata->got_bytes : result; -} - -static int -cifs_readpages_read_into_pages(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, unsigned int len) -{ - return readpages_fill_pages(server, rdata, NULL, len); -} - -static int -cifs_readpages_copy_into_pages(struct TCP_Server_Info *server, - struct cifs_readdata *rdata, - struct iov_iter *iter) -{ - return readpages_fill_pages(server, rdata, iter, iter->count); -} -#endif - /* * Unlock a bunch of folios in the pagecache. */ From patchwork Mon Jan 16 23:11:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44380 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1458067wrn; Mon, 16 Jan 2023 15:44:05 -0800 (PST) X-Google-Smtp-Source: AMrXdXuiew4asM3LrY4COrWfQDtXR2m151OJUI1i+POgNaq8Udp9Ghjpj8UptHHNO+8fkItdFDLz X-Received: by 2002:a17:902:b705:b0:193:3989:b62 with SMTP id d5-20020a170902b70500b0019339890b62mr1229118pls.67.1673912644802; Mon, 16 Jan 2023 15:44:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673912644; cv=none; d=google.com; s=arc-20160816; b=tJ948wj00OsDNvQT9jEXJ6xz4ZxZHvhGuKCbDkKFq1E05+IGtNbnAvdlwqJxW/bANv mJuO5c4jdTazYuZbhVFAgwY8up6+RZQMO6qgA22OdBcSmjGodq9Y4QdeHEcnL0pvAAij F2hGI3tuKm1zds24w88UPbLBwYLe8yz6SJbtIR9IR/zl8qlXDmX2IqL9mnB4FOmEm5+F mxIsktPVA4L6+AZqHMAc2n9dvtghLd84Mr9R8nw5GKE9qCYhqijSYUEIX/AvzAyCF7ZG LnS3n1z8iS/ACw++JaGDN3wBN3glwKYeZyjNJUhTKaua//WXhQ1QbjP/15Pge3nGHLwA +g8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=Nw4FQbdWbVj4MBsaopY4M6MScTLsVpfgXMad66KYZwU=; b=V/xhyV1gLizlJ+u9Komo2VbMgVK1WLWG8BSL+ismtqIKsHHlu64H86WxnObnnoKuGb 3/I84JA3l6atcG0ISeEy3anOhvCcB+pKE5TrzFaxENpVXM+t56RYJzI1jHj86TzStOog 4n8tp26pWIw3bo3hY6q/NudtNL86e+thnLRvfzweYWI6xrk1RLGD6y9HnqGO0qhrDy2n hgL4Wa59cwjyxv0ge+oEVCggInWlduv3JwYkAiBWwbkOkVDaatHCi4+c6RxNDZvj7eIe C+Dqsd7F4uSmdwfN64t91O8paW8h/Kw6gcJWfO9GjRMC3iqgZtxE2hC7greNSCsCAwqv Kl/g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=RjpdJDD3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b12-20020a170902d50c00b00192721d6a97si2573304plg.499.2023.01.16.15.43.48; Mon, 16 Jan 2023 15:44:04 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=RjpdJDD3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235109AbjAPXU3 (ORCPT + 99 others); Mon, 16 Jan 2023 18:20:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55928 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235776AbjAPXTg (ORCPT ); Mon, 16 Jan 2023 18:19:36 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7D6663253D for ; Mon, 16 Jan 2023 15:12:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910713; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Nw4FQbdWbVj4MBsaopY4M6MScTLsVpfgXMad66KYZwU=; b=RjpdJDD3sLbe7gnvYV1G4r09VyO3HtvoRyFwJftJlu/7dYHqrHegEX9QrbeHEp/7Y5JrHL yH1Mzt6gLeiPH9rsTiyqKGiHFE+yJC/RXpuxXTjDPCMGAtU6CLdcUFusq1YHwYQSp6Pc0p srP+nU6Ap7+DvxNqKNhm8EpnzSWMq8Y= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-14-yziFx7y5NACL1vK30mvFNA-1; Mon, 16 Jan 2023 18:11:50 -0500 X-MC-Unique: yziFx7y5NACL1vK30mvFNA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7BA33380390A; Mon, 16 Jan 2023 23:11:49 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id AEA5240C6EC4; Mon, 16 Jan 2023 23:11:47 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 31/34] cifs: Fix problem with encrypted RDMA data read From: David Howells To: Al Viro Cc: Steve French , Tom Talpey , Long Li , Namjae Jeon , Stefan Metzmacher , linux-cifs@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:11:47 +0000 Message-ID: <167391070712.2311931.8909671251130425914.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755224625718782128?= X-GMAIL-MSGID: =?utf-8?q?1755224625718782128?= When the cifs client is talking to the ksmbd server by RDMA and the ksmbd server has "smb3 encryption = yes" in its config file, the normal PDU stream is encrypted, but the directly-delivered data isn't in the stream (and isn't encrypted), but is rather delivered by DDP/RDMA packets (at least with IWarp). Currently, the direct delivery fails with: buf can not contain only a part of read data WARNING: CPU: 0 PID: 4619 at fs/cifs/smb2ops.c:4731 handle_read_data+0x393/0x405 ... RIP: 0010:handle_read_data+0x393/0x405 ... smb3_handle_read_data+0x30/0x37 receive_encrypted_standard+0x141/0x224 cifs_demultiplex_thread+0x21a/0x63b kthread+0xe7/0xef ret_from_fork+0x22/0x30 The problem apparently stemming from the fact that it's trying to manage the decryption, but the data isn't in the smallbuf, the bigbuf or the page array). This can be fixed simply by inserting an extra case into handle_read_data() that checks to see if use_rdma_mr is true, and if it is, just setting rdata->got_bytes to the length of data delivered and allowing normal continuation. This can be seen in an IWarp packet trace. With the upstream code, it does a DDP/RDMA packet, which produces the warning above and then retries, retrieving the data inline, spread across several SMBDirect messages that get glued together into a single PDU. With the patch applied, only the DDP/RDMA packet is seen. Note that this doesn't happen if the server isn't told to encrypt stuff and it does also happen with softRoCE. Signed-off-by: David Howells cc: Steve French cc: Tom Talpey cc: Long Li cc: Namjae Jeon cc: Stefan Metzmacher cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/166855224228.1998592.2212551359609792175.stgit@warthog.procyon.org.uk/ # v1 --- fs/cifs/smb2ops.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c index 387effcb905d..fabb1e135faa 100644 --- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -4720,6 +4720,9 @@ handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid, if (length < 0) return length; rdata->got_bytes = data_len; + } else if (use_rdma_mr) { + /* The data was delivered directly by RDMA. */ + rdata->got_bytes = data_len; } else { /* read response payload cannot be in both buf and pages */ WARN_ONCE(1, "buf can not contain only a part of read data"); From patchwork Mon Jan 16 23:11:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44385 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1460757wrn; Mon, 16 Jan 2023 15:52:59 -0800 (PST) X-Google-Smtp-Source: AMrXdXtp9+jMOlLeUhZ5kR9OVWTiMN5nySg9Vu335GuIu9xkfLxfOHK0IYEHUOVxuoKWjqXLdlUS X-Received: by 2002:a17:906:9605:b0:86d:bf67:a1c5 with SMTP id s5-20020a170906960500b0086dbf67a1c5mr706925ejx.22.1673913179080; Mon, 16 Jan 2023 15:52:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673913179; cv=none; d=google.com; s=arc-20160816; b=jTx0PTmJa3TqHTXMLv/Df1fZyJzUins108LHQ44/QudR2UKrvhsgpNkhcnemVFfmX5 GsiJYkZEjn67SDAkFb2yhkYzq1BcekHeS8jGBelNYxtWlCGSBOhuwN3c/3LyDpeUGfXl Ym2bjbPVHLkBo7bdn5o6hOG4TLHzUuyLXweNRTktXMhB6HwE0MwBB0k4b1p+iwY+tLGt dCqQqZiPaL5Ibq9EXvy6U7mZBeOYil1nRck/5Hrbesm05ZMjGTtAChu/utM4G62/x1fc ArDfM6htVq5CsKzRnoO2PEfhzicokgjTHmGTGOJGLcxVzdtA8tipx4EfJO0dS0T1dH72 DtgA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=ntGo6rTYFCYIHyffDu5aqHQltoEFdTWJMIOs9Lz75sM=; b=Qk2Do+JPxYsg1ecDCz+Kvq3EEPFXV/arIe6zUvFHA6HekZTLSewM3ubBR37cbAqcNl iqP6tmkOL/PcJOyCwAwaGGQnblSOnLsR0/sO5AWB/xYANBTAjXdZ8EwtN+pyWd2DH/2p HKMwqiFuds13UVWo3GvAuBk8GKWHtAA9pDylPSwcmIqU0jyiUgRvzMDIutZouvmhVsx5 xsUtUQhGjgsECEIAIMNRNi0stIdF29LenP1eoKwYYTmSkd6bVMyiV3ZoQqzQd0x4nXLd cGunXMeKZgVA9S2MTwzAKZjhmEg2UrFhxeTjw6UySZvkwYcmSsL2cWUi+wIjZ7p8y6mZ zNaQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=JKYERWeM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g6-20020a1709065d0600b008718d601cfesi2039832ejt.474.2023.01.16.15.52.34; Mon, 16 Jan 2023 15:52:59 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=JKYERWeM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235462AbjAPXUo (ORCPT + 99 others); Mon, 16 Jan 2023 18:20:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33968 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235861AbjAPXUG (ORCPT ); Mon, 16 Jan 2023 18:20:06 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8C01133475 for ; Mon, 16 Jan 2023 15:12:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910721; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ntGo6rTYFCYIHyffDu5aqHQltoEFdTWJMIOs9Lz75sM=; b=JKYERWeMzbv+h9xUfONsU4poTFERs2fcsBQ/xZVbfvGtCDCWBTkwGhrb4ZxkVTVaotyqYF aR8xmW+agIRWGpe42URj3l9gWXWoWubmoy8yDVSDC+4ods1zUZnO4EqCZp3CWdxvizkFva uyQ4d5iV8rWI23GyroZqdseT/7aj9qM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-259-ab7qPHtTPKW5GI6TkEaaVA-1; Mon, 16 Jan 2023 18:11:57 -0500 X-MC-Unique: ab7qPHtTPKW5GI6TkEaaVA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D6A10811E6E; Mon, 16 Jan 2023 23:11:56 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2EF9E1415108; Mon, 16 Jan 2023 23:11:55 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 32/34] cifs: DIO to/from KVEC-type iterators should now work From: David Howells To: Al Viro Cc: Steve French , Shyam Prasad N , Rohith Surabattula , Tom Talpey , Jeff Layton , linux-cifs@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:11:54 +0000 Message-ID: <167391071464.2311931.14722270915404689054.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755225185656255183?= X-GMAIL-MSGID: =?utf-8?q?1755225185656255183?= DIO to/from KVEC-type iterators should now work as the iterator is passed down to the socket in non-RDMA/non-crypto mode and in RDMA or crypto mode care is taken to handle vmap/vmalloc correctly and not take page refs when building a scatterlist. Signed-off-by: David Howells cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: Tom Talpey cc: Jeff Layton cc: linux-cifs@vger.kernel.org --- fs/cifs/file.c | 20 -------------------- 1 file changed, 20 deletions(-) diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 6baf591f63a3..7f1e01cee83d 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -3545,16 +3545,6 @@ static ssize_t __cifs_writev( struct cifs_aio_ctx *ctx; int rc; - /* - * iov_iter_get_pages_alloc doesn't work with ITER_KVEC. - * In this case, fall back to non-direct write function. - * this could be improved by getting pages directly in ITER_KVEC - */ - if (direct && iov_iter_is_kvec(from)) { - cifs_dbg(FYI, "use non-direct cifs_writev for kvec I/O\n"); - direct = false; - } - rc = generic_write_checks(iocb, from); if (rc <= 0) return rc; @@ -4090,16 +4080,6 @@ static ssize_t __cifs_readv( loff_t offset = iocb->ki_pos; struct cifs_aio_ctx *ctx; - /* - * iov_iter_get_pages_alloc() doesn't work with ITER_KVEC, - * fall back to data copy read path - * this could be improved by getting pages directly in ITER_KVEC - */ - if (direct && iov_iter_is_kvec(to)) { - cifs_dbg(FYI, "use non-direct cifs_user_readv for kvec I/O\n"); - direct = false; - } - len = iov_iter_count(to); if (!len) return 0; From patchwork Mon Jan 16 23:12:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44384 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1460172wrn; Mon, 16 Jan 2023 15:51:06 -0800 (PST) X-Google-Smtp-Source: AMrXdXtBsFi8c2ZIEh7ZEIKL2jLgOQt9tmXcm9hJKroeNvnyQjX8R+iDsS1bTM25pc3tBAVSy8eA X-Received: by 2002:a17:906:e247:b0:84d:3928:66b6 with SMTP id gq7-20020a170906e24700b0084d392866b6mr795029ejb.40.1673913066420; Mon, 16 Jan 2023 15:51:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673913066; cv=none; d=google.com; s=arc-20160816; b=jdcMkNP2LaiHvR8T86jnjYNB1X88TxDQR4fm6QK7PuK7sKPJ830OuJ37xUTqbgSPWk 9UZDE7YNMkt71kNSs91a/RJH1qUo1M0JTObzq/8O/4NPl5qhw250Ui9XG4Yv2j89YUIL GNLroF8iOPVzGnpS9TIrfxmyWdo+CSmYZy4bkDOSPPUOE61VNA+xTB4iNhafhtunYvGM VBPhL7kv0Gq1r2NXJjRUpHxqJw18Cdfw21KQI5zUKMlr6w/qfW+bOZCEC0DpGw/8mcIf W27aUMgDNGN4pFsGE8bBx32hrHH7M6/2DU0nPC0yvW48qPFNLdKMVCjf/Y+4qsNWFNmJ c2GA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=kr0mxDWcfk97y+SD+0NWqgRc5pNq3uh0Z5vIhOFCaC0=; b=Sz8J1SMW4p06hLs5rlFobO2kDvsnexlAa4JFR/I1S6VclpzW2RhLi2NsWgl1Q61h7U FFft9eoQ4o99U2DX5kUC++xlDFA/Rl8OcBkp/sE7RHUeEEDeqsp0eIg+vTKtIFaMJeb2 WQPEY9go0bCNNFNN7BHBuKnOJEWmZTZ1k4rknjDj1hoHhVbJ6eskDY/YeAdNxtnbjwYX nByY/nVx3ZXcIQJun45EQV2B4k50GzPc7elwerqtpNmPOnE8QKmRBEAa+AKA6Bg1FQWc QdX7LRDDBy4MWY+04cNjzWV7IWdPUi9mdJ4sn4HRw3NvVE+jZ2QKK9juWnwJSWTtoufa z9vA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=eu9rfINE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hj1-20020a1709069fc100b007ae2dfe020bsi32320400ejc.783.2023.01.16.15.50.42; Mon, 16 Jan 2023 15:51:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=eu9rfINE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235494AbjAPXV0 (ORCPT + 99 others); Mon, 16 Jan 2023 18:21:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56620 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235889AbjAPXUL (ORCPT ); Mon, 16 Jan 2023 18:20:11 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 238B134568 for ; Mon, 16 Jan 2023 15:12:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910728; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kr0mxDWcfk97y+SD+0NWqgRc5pNq3uh0Z5vIhOFCaC0=; b=eu9rfINEtKSuNVgEWXE2aod0d791+8IfFCRflqNeFhTxPG2w8+FEjIfcbHU7c8LapIPazD dKNy1H+dIdUZe1eyOXmyPPtdvoVcV5ncgtnrwkVGuXaAmOkH1FtYgsKwUB8Kko4ohafbb+ Gjd211wkTA99H297UiLGtyYl30LU9Ok= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-70-WiODrIuvPam5N8R9hwNvnA-1; Mon, 16 Jan 2023 18:12:05 -0500 X-MC-Unique: WiODrIuvPam5N8R9hwNvnA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0F5C03C0F660; Mon, 16 Jan 2023 23:12:05 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8DEF914171B8; Mon, 16 Jan 2023 23:12:02 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 33/34] net: [RFC][WIP] Mark each skb_frags as to how they should be cleaned up From: David Howells To: Al Viro Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , netdev@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:12:02 +0000 Message-ID: <167391072201.2311931.4013360052592980054.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755225067528933198?= X-GMAIL-MSGID: =?utf-8?q?1755225067528933198?= [!] NOTE: This patch is mostly for illustrative/discussion purposes and makes an incomplete change and the networking code may not compile thereafter. There are a couple of problems with pasting foreign pages into sk_buffs with zerocopy that are analogous to the problems with direct I/O: (1) Pages derived from kernel buffers, such as KVEC iterators should not have refs taken on them. Rather, the caller should do whatever it needs to to retain the memory. (2) Pages derived from userspace buffers must not have refs taken on them if they're going to be written to (analogous to direct I/O read) as this may cause a malfunction of the VM CoW mechanism with a concurrent fork. Rather, they should have pins taken on them (FOLL_PIN). This will affect zerocopy-recvmsg where that is exists (eg. TLS, I think, though that might be decrypt-offload). This is further complicated by the possibility of a sk_buff containing data from mixed sources - for example a network filesystem might generate a message consisting of some metadata from a kernel buffer (which should not be pinned) and some data from userspace (which should have a ref taken). To this end, each page fragment attached to a sk_buff needs labelling with the appropriate cleanup to be applied. Do this by: (1) Replace struct bio_vec as the basis of skb_frag_t with a new struct skb_frag. This has an offset and a length, as before, plus a 'page_and_mode' member that contains the cleanup mode in the bottom two bits and the page pointer in the remaining bits. (FOLL_GET and FOLL_PIN got renumbered to bits 0 and 1 in an earlier patch). (2) The cleanup mode can be one of FOLL_GET (put a ref on the page), FOLL_PIN (unpin the page) or 0 (do nothing). (3) skb_frag_page() is used to access the page pointer as before. (4) __skb_frag_set_page() and skb_frag_set_page() acquire an extra argument to indicate the cleanup mode. (5) The cleanup mode is set to FOLL_GET on everything for the moment. (6) __skb_frag_ref() will call try_grab_page(), passing the cleanup mode to indicate whether an extra ref, an extra pin or nothing is required. [!] NOTE: If the cleanup mode was 0, this skbuff will also not pin the page and the caller needs to be aware of that. (7) __skb_frag_unref() will call page_put_unpin() to do the appropriate cleanup, based on the mode. Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: netdev@vger.kernel.org --- drivers/net/tun.c | 2 - include/linux/skbuff.h | 124 ++++++++++++++++++++++++++++++------------------ io_uring/net.c | 2 - net/bpf/test_run.c | 2 - net/core/datagram.c | 3 + net/core/gro.c | 2 - net/core/skbuff.c | 16 +++--- net/ipv4/ip_output.c | 2 - net/ipv4/tcp.c | 4 +- net/ipv6/esp6.c | 5 +- net/ipv6/ip6_output.c | 2 - net/packet/af_packet.c | 2 - net/xfrm/xfrm_ipcomp.c | 2 - 13 files changed, 101 insertions(+), 67 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index a7d17c680f4a..6c467c5163b2 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1496,7 +1496,7 @@ static struct sk_buff *tun_napi_alloc_frags(struct tun_file *tfile, } page = virt_to_head_page(frag); skb_fill_page_desc(skb, i - 1, page, - frag - page_address(page), fragsz); + frag - page_address(page), fragsz, FOLL_GET); } return skb; diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 4c8492401a10..a1a77909509b 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -357,7 +357,51 @@ extern int sysctl_max_skb_frags; */ #define GSO_BY_FRAGS 0xFFFF -typedef struct bio_vec skb_frag_t; +struct skb_frag { + unsigned long page_and_mode; /* page pointer | cleanup_mode (0/FOLL_GET/PIN) */ + unsigned int len; + unsigned int offset; +}; +typedef struct skb_frag skb_frag_t; + +/** + * skb_frag_cleanup() - Returns the cleanup mode for an skb fragment + * @frag: skb fragment + * + * Returns the cleanup mode associated with @frag. It will be FOLL_GET, + * FOLL_PUT or 0. + */ +static inline unsigned int skb_frag_cleanup(const skb_frag_t *frag) +{ + return frag->page_and_mode & 3; +} + +/** + * skb_frag_page() - Returns the page in an skb fragment + * @frag: skb fragment + * + * Returns the &struct page associated with @frag. + */ +static inline struct page *skb_frag_page(const skb_frag_t *frag) +{ + return (struct page *)(frag->page_and_mode & ~3); +} + +/** + * __skb_frag_set_page() - Sets the page in an skb fragment + * @frag: skb fragment + * @page: The page to set + * @cleanup_mode: The cleanup mode to set (0, FOLL_GET, FOLL_PIN) + * + * Sets the fragment @frag to contain @page with the specified method of + * cleaning it up. + */ +static inline void __skb_frag_set_page(skb_frag_t *frag, struct page *page, + unsigned int cleanup_mode) +{ + cleanup_mode &= FOLL_GET | FOLL_PIN; + frag->page_and_mode = (unsigned long)page | cleanup_mode; +} /** * skb_frag_size() - Returns the size of a skb fragment @@ -365,7 +409,7 @@ typedef struct bio_vec skb_frag_t; */ static inline unsigned int skb_frag_size(const skb_frag_t *frag) { - return frag->bv_len; + return frag->len; } /** @@ -375,7 +419,7 @@ static inline unsigned int skb_frag_size(const skb_frag_t *frag) */ static inline void skb_frag_size_set(skb_frag_t *frag, unsigned int size) { - frag->bv_len = size; + frag->len = size; } /** @@ -385,7 +429,7 @@ static inline void skb_frag_size_set(skb_frag_t *frag, unsigned int size) */ static inline void skb_frag_size_add(skb_frag_t *frag, int delta) { - frag->bv_len += delta; + frag->len += delta; } /** @@ -395,7 +439,7 @@ static inline void skb_frag_size_add(skb_frag_t *frag, int delta) */ static inline void skb_frag_size_sub(skb_frag_t *frag, int delta) { - frag->bv_len -= delta; + frag->len -= delta; } /** @@ -2388,7 +2432,8 @@ static inline unsigned int skb_pagelen(const struct sk_buff *skb) static inline void __skb_fill_page_desc_noacc(struct skb_shared_info *shinfo, int i, struct page *page, - int off, int size) + int off, int size, + unsigned int cleanup_mode) { skb_frag_t *frag = &shinfo->frags[i]; @@ -2397,9 +2442,9 @@ static inline void __skb_fill_page_desc_noacc(struct skb_shared_info *shinfo, * that not all callers have unique ownership of the page but rely * on page_is_pfmemalloc doing the right thing(tm). */ - frag->bv_page = page; - frag->bv_offset = off; + __skb_frag_set_page(frag, page, cleanup_mode); skb_frag_size_set(frag, size); + frag->offset = off; } /** @@ -2421,6 +2466,7 @@ static inline void skb_len_add(struct sk_buff *skb, int delta) * @page: the page to use for this fragment * @off: the offset to the data with @page * @size: the length of the data + * @cleanup_mode: The cleanup mode to set (0, FOLL_GET, FOLL_PIN) * * Initialises the @i'th fragment of @skb to point to &size bytes at * offset @off within @page. @@ -2428,9 +2474,11 @@ static inline void skb_len_add(struct sk_buff *skb, int delta) * Does not take any additional reference on the fragment. */ static inline void __skb_fill_page_desc(struct sk_buff *skb, int i, - struct page *page, int off, int size) + struct page *page, int off, int size, + unsigned int cleanup_mode) { - __skb_fill_page_desc_noacc(skb_shinfo(skb), i, page, off, size); + __skb_fill_page_desc_noacc(skb_shinfo(skb), i, page, off, size, + cleanup_mode); page = compound_head(page); if (page_is_pfmemalloc(page)) skb->pfmemalloc = true; @@ -2443,6 +2491,7 @@ static inline void __skb_fill_page_desc(struct sk_buff *skb, int i, * @page: the page to use for this fragment * @off: the offset to the data with @page * @size: the length of the data + * @cleanup_mode: The cleanup mode to set (0, FOLL_GET, FOLL_PIN) * * As per __skb_fill_page_desc() -- initialises the @i'th fragment of * @skb to point to @size bytes at offset @off within @page. In @@ -2451,9 +2500,10 @@ static inline void __skb_fill_page_desc(struct sk_buff *skb, int i, * Does not take any additional reference on the fragment. */ static inline void skb_fill_page_desc(struct sk_buff *skb, int i, - struct page *page, int off, int size) + struct page *page, int off, int size, + unsigned int cleanup_mode) { - __skb_fill_page_desc(skb, i, page, off, size); + __skb_fill_page_desc(skb, i, page, off, size, cleanup_mode); skb_shinfo(skb)->nr_frags = i + 1; } @@ -2464,17 +2514,18 @@ static inline void skb_fill_page_desc(struct sk_buff *skb, int i, * @page: the page to use for this fragment * @off: the offset to the data with @page * @size: the length of the data + * @cleanup_mode: The cleanup mode to set (0, FOLL_GET, FOLL_PIN) * * Variant of skb_fill_page_desc() which does not deal with * pfmemalloc, if page is not owned by us. */ static inline void skb_fill_page_desc_noacc(struct sk_buff *skb, int i, struct page *page, int off, - int size) + int size, unsigned int cleanup_mode) { struct skb_shared_info *shinfo = skb_shinfo(skb); - __skb_fill_page_desc_noacc(shinfo, i, page, off, size); + __skb_fill_page_desc_noacc(shinfo, i, page, off, size, cleanup_mode); shinfo->nr_frags = i + 1; } @@ -3301,7 +3352,7 @@ static inline void skb_propagate_pfmemalloc(const struct page *page, */ static inline unsigned int skb_frag_off(const skb_frag_t *frag) { - return frag->bv_offset; + return frag->offset; } /** @@ -3311,7 +3362,7 @@ static inline unsigned int skb_frag_off(const skb_frag_t *frag) */ static inline void skb_frag_off_add(skb_frag_t *frag, int delta) { - frag->bv_offset += delta; + frag->offset += delta; } /** @@ -3321,7 +3372,7 @@ static inline void skb_frag_off_add(skb_frag_t *frag, int delta) */ static inline void skb_frag_off_set(skb_frag_t *frag, unsigned int offset) { - frag->bv_offset = offset; + frag->offset = offset; } /** @@ -3332,18 +3383,7 @@ static inline void skb_frag_off_set(skb_frag_t *frag, unsigned int offset) static inline void skb_frag_off_copy(skb_frag_t *fragto, const skb_frag_t *fragfrom) { - fragto->bv_offset = fragfrom->bv_offset; -} - -/** - * skb_frag_page - retrieve the page referred to by a paged fragment - * @frag: the paged fragment - * - * Returns the &struct page associated with @frag. - */ -static inline struct page *skb_frag_page(const skb_frag_t *frag) -{ - return frag->bv_page; + fragto->offset = fragfrom->offset; } /** @@ -3354,7 +3394,9 @@ static inline struct page *skb_frag_page(const skb_frag_t *frag) */ static inline void __skb_frag_ref(skb_frag_t *frag) { - get_page(skb_frag_page(frag)); + struct page *page = skb_frag_page(frag); + + try_grab_page(page, skb_frag_cleanup(frag)); } /** @@ -3385,7 +3427,7 @@ static inline void __skb_frag_unref(skb_frag_t *frag, bool recycle) if (recycle && page_pool_return_skb_page(page)) return; #endif - put_page(page); + page_put_unpin(page, skb_frag_cleanup(frag)); } /** @@ -3439,19 +3481,7 @@ static inline void *skb_frag_address_safe(const skb_frag_t *frag) static inline void skb_frag_page_copy(skb_frag_t *fragto, const skb_frag_t *fragfrom) { - fragto->bv_page = fragfrom->bv_page; -} - -/** - * __skb_frag_set_page - sets the page contained in a paged fragment - * @frag: the paged fragment - * @page: the page to set - * - * Sets the fragment @frag to contain @page. - */ -static inline void __skb_frag_set_page(skb_frag_t *frag, struct page *page) -{ - frag->bv_page = page; + fragto->page_and_mode = fragfrom->page_and_mode; } /** @@ -3459,13 +3489,15 @@ static inline void __skb_frag_set_page(skb_frag_t *frag, struct page *page) * @skb: the buffer * @f: the fragment offset * @page: the page to set + * @cleanup_mode: The cleanup mode to set (0, FOLL_GET, FOLL_PIN) * * Sets the @f'th fragment of @skb to contain @page. */ static inline void skb_frag_set_page(struct sk_buff *skb, int f, - struct page *page) + struct page *page, + unsigned int cleanup_mode) { - __skb_frag_set_page(&skb_shinfo(skb)->frags[f], page); + __skb_frag_set_page(&skb_shinfo(skb)->frags[f], page, cleanup_mode); } bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t prio); diff --git a/io_uring/net.c b/io_uring/net.c index fbc34a7c2743..1d3e24404d75 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -1043,7 +1043,7 @@ static int io_sg_from_iter(struct sock *sk, struct sk_buff *skb, copied += v.bv_len; truesize += PAGE_ALIGN(v.bv_len + v.bv_offset); __skb_fill_page_desc_noacc(shinfo, frag++, v.bv_page, - v.bv_offset, v.bv_len); + v.bv_offset, v.bv_len, FOLL_GET); bvec_iter_advance_single(from->bvec, &bi, v.bv_len); } if (bi.bi_size) diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 2723623429ac..9ed2de52e1be 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -1370,7 +1370,7 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, } frag = &sinfo->frags[sinfo->nr_frags++]; - __skb_frag_set_page(frag, page); + __skb_frag_set_page(frag, page, FOLL_GET); data_len = min_t(u32, kattr->test.data_size_in - size, PAGE_SIZE); diff --git a/net/core/datagram.c b/net/core/datagram.c index 9f0914b781ad..122bfb144d32 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -678,7 +678,8 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, page_ref_sub(last_head, refs); refs = 0; } - skb_fill_page_desc_noacc(skb, frag++, head, start, size); + skb_fill_page_desc_noacc(skb, frag++, head, start, size, + FOLL_GET); } if (refs) page_ref_sub(last_head, refs); diff --git a/net/core/gro.c b/net/core/gro.c index fd8c6a7e8d3e..dfbf2279ce5c 100644 --- a/net/core/gro.c +++ b/net/core/gro.c @@ -228,7 +228,7 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb) pinfo->nr_frags = nr_frags + 1 + skbinfo->nr_frags; - __skb_frag_set_page(frag, page); + __skb_frag_set_page(frag, page, FOLL_GET); skb_frag_off_set(frag, first_offset); skb_frag_size_set(frag, first_size); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 4a0eb5593275..a6a21a27ebb4 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -765,7 +765,7 @@ EXPORT_SYMBOL(__napi_alloc_skb); void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off, int size, unsigned int truesize) { - skb_fill_page_desc(skb, i, page, off, size); + skb_fill_page_desc(skb, i, page, off, size, FOLL_GET); skb->len += size; skb->data_len += size; skb->truesize += truesize; @@ -1666,10 +1666,10 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask) /* skb frags point to kernel buffers */ for (i = 0; i < new_frags - 1; i++) { - __skb_fill_page_desc(skb, i, head, 0, PAGE_SIZE); + __skb_fill_page_desc(skb, i, head, 0, PAGE_SIZE, FOLL_GET); head = (struct page *)page_private(head); } - __skb_fill_page_desc(skb, new_frags - 1, head, 0, d_off); + __skb_fill_page_desc(skb, new_frags - 1, head, 0, d_off, FOLL_GET); skb_shinfo(skb)->nr_frags = new_frags; release: @@ -3389,7 +3389,7 @@ skb_zerocopy(struct sk_buff *to, struct sk_buff *from, int len, int hlen) if (plen) { page = virt_to_head_page(from->head); offset = from->data - (unsigned char *)page_address(page); - __skb_fill_page_desc(to, 0, page, offset, plen); + __skb_fill_page_desc(to, 0, page, offset, plen, FOLL_GET); get_page(page); j = 1; len -= plen; @@ -4040,7 +4040,7 @@ int skb_append_pagefrags(struct sk_buff *skb, struct page *page, } else if (i < MAX_SKB_FRAGS) { skb_zcopy_downgrade_managed(skb); get_page(page); - skb_fill_page_desc_noacc(skb, i, page, offset, size); + skb_fill_page_desc_noacc(skb, i, page, offset, size, FOLL_GET); } else { return -EMSGSIZE; } @@ -4077,7 +4077,7 @@ static inline skb_frag_t skb_head_frag_to_page_desc(struct sk_buff *frag_skb) struct page *page; page = virt_to_head_page(frag_skb->head); - __skb_frag_set_page(&head_frag, page); + __skb_frag_set_page(&head_frag, page, FOLL_GET); skb_frag_off_set(&head_frag, frag_skb->data - (unsigned char *)page_address(page)); skb_frag_size_set(&head_frag, skb_headlen(frag_skb)); @@ -5521,7 +5521,7 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, offset = from->data - (unsigned char *)page_address(page); skb_fill_page_desc(to, to_shinfo->nr_frags, - page, offset, skb_headlen(from)); + page, offset, skb_headlen(from), FOLL_GET); *fragstolen = true; } else { if (to_shinfo->nr_frags + @@ -6221,7 +6221,7 @@ struct sk_buff *alloc_skb_with_frags(unsigned long header_len, fill_page: chunk = min_t(unsigned long, data_len, PAGE_SIZE << order); - skb_fill_page_desc(skb, i, page, 0, chunk); + skb_fill_page_desc(skb, i, page, 0, chunk, FOLL_GET); data_len -= chunk; npages -= 1 << order; } diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 922c87ef1ab5..43ea2e7aeeea 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1221,7 +1221,7 @@ static int __ip_append_data(struct sock *sk, goto error; __skb_fill_page_desc(skb, i, pfrag->page, - pfrag->offset, 0); + pfrag->offset, 0, FOLL_GET); skb_shinfo(skb)->nr_frags = ++i; get_page(pfrag->page); } diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index c567d5e8053e..2cb88e67e152 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1016,7 +1016,7 @@ static struct sk_buff *tcp_build_frag(struct sock *sk, int size_goal, int flags, skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy); } else { get_page(page); - skb_fill_page_desc_noacc(skb, i, page, offset, copy); + skb_fill_page_desc_noacc(skb, i, page, offset, copy, FOLL_GET); } if (!(flags & MSG_NO_SHARED_FRAGS)) @@ -1385,7 +1385,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy); } else { skb_fill_page_desc(skb, i, pfrag->page, - pfrag->offset, copy); + pfrag->offset, copy, FOLL_GET); page_ref_inc(pfrag->page); } pfrag->offset += copy; diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c index 14ed868680c6..13e9d36e132e 100644 --- a/net/ipv6/esp6.c +++ b/net/ipv6/esp6.c @@ -529,7 +529,7 @@ int esp6_output_head(struct xfrm_state *x, struct sk_buff *skb, struct esp_info nfrags = skb_shinfo(skb)->nr_frags; __skb_fill_page_desc(skb, nfrags, page, pfrag->offset, - tailen); + tailen, FOLL_GET); skb_shinfo(skb)->nr_frags = ++nfrags; pfrag->offset = pfrag->offset + allocsize; @@ -635,7 +635,8 @@ int esp6_output_tail(struct xfrm_state *x, struct sk_buff *skb, struct esp_info page = pfrag->page; get_page(page); /* replace page frags in skb with new page */ - __skb_fill_page_desc(skb, 0, page, pfrag->offset, skb->data_len); + __skb_fill_page_desc(skb, 0, page, pfrag->offset, skb->data_len, + FOLL_GET); pfrag->offset = pfrag->offset + allocsize; spin_unlock_bh(&x->lock); diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 60fd91bb5171..117fb2bdad02 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1780,7 +1780,7 @@ static int __ip6_append_data(struct sock *sk, goto error; __skb_fill_page_desc(skb, i, pfrag->page, - pfrag->offset, 0); + pfrag->offset, 0, FOLL_GET); skb_shinfo(skb)->nr_frags = ++i; get_page(pfrag->page); } diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index b5ab98ca2511..15c9f17ce7d8 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2630,7 +2630,7 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb, data += len; flush_dcache_page(page); get_page(page); - skb_fill_page_desc(skb, nr_frags, page, offset, len); + skb_fill_page_desc(skb, nr_frags, page, offset, len, FOLL_GET); to_write -= len; offset = 0; len_max = PAGE_SIZE; diff --git a/net/xfrm/xfrm_ipcomp.c b/net/xfrm/xfrm_ipcomp.c index 80143360bf09..8e9574e00cd0 100644 --- a/net/xfrm/xfrm_ipcomp.c +++ b/net/xfrm/xfrm_ipcomp.c @@ -74,7 +74,7 @@ static int ipcomp_decompress(struct xfrm_state *x, struct sk_buff *skb) if (!page) return -ENOMEM; - __skb_frag_set_page(frag, page); + __skb_frag_set_page(frag, page, FOLL_GET); len = PAGE_SIZE; if (dlen < len) From patchwork Mon Jan 16 23:12:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 44383 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1459843wrn; Mon, 16 Jan 2023 15:50:02 -0800 (PST) X-Google-Smtp-Source: AMrXdXsp1XVZ9HW2UNVQVCU8FmiskwHYdYvkLgxFXCi7tLbGgxgcFn/Yp9Ls/HnZ1k6NUpmYb0SL X-Received: by 2002:a05:6a21:8cc3:b0:b8:7d27:2cbd with SMTP id ta3-20020a056a218cc300b000b87d272cbdmr650376pzb.43.1673913002289; Mon, 16 Jan 2023 15:50:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673913002; cv=none; d=google.com; s=arc-20160816; b=Gp23OstgygCnyYafok0wSMvj6gxLus2cL6OmMS/HGbmDMqNFUe8wZqiqDgdbux4c42 OhYCJr8si2aJ6rfiygmmJbnws2YJaNCz+h9qlZLY9U+gHQCf6UGnqKoSy2bOAI0p0Kyr S9ynuH3efG72JN3WC2mkhnEEuHKRxkLBSwifdyuoYIW6ogADiL1Q5yJDgJkQLLNfwI8R VsVRpwfbtP3Q3Rt5skl5sWiiGh8S2V1raaAzEcr3FYsRXwMuUCtidHXvrQJbkzTc/GtF GjeioYEB4pMzCtKlLGNBzVfLWwk67Vi2Lbxwo7YRqDaZ1+fvqozeicA6U3oiLUAA6+dC v79A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:organization:dkim-signature; bh=RemqltCjkZ1MEJ2va2p40ZyP53eaWFY2Pd4/PSibIZ0=; b=ygO4KqHWa9tD325+QvO2AffVrVoIm3XftMq+VsSG0wdNWDEO01GGYUzycICsVuPPNX KPur0zIYTsPD36H/lFcCLD09QnjquRI9OF5Wl3msS1fSf2r/tEfYkTtYGvlnJNU7JTm+ ZOMIGB/4dzhXg2+k6TF4OhCB4sy8CDjzSQZFKEBeG/tqvdXi4HBsgWK6TOw2E0Tu99BC GyQNZflEyoDfOTQ8fxA0ZfZPEVlg4aCHxmhNaQDxQ4YUuODQNcQJH/ZycP6COgQfsgFr vsQVqnaSAFi/jgfUgB1xdfnrxanvQUT4dencPKeVdC34ie/OonU7/dyyNr3Lm85GCDK2 Y8zw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Gn2wrn8g; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x24-20020a63db58000000b0047888d9f3aesi30736345pgi.547.2023.01.16.15.49.50; Mon, 16 Jan 2023 15:50:02 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Gn2wrn8g; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235097AbjAPXW5 (ORCPT + 99 others); Mon, 16 Jan 2023 18:22:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34164 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235463AbjAPXUq (ORCPT ); Mon, 16 Jan 2023 18:20:46 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 53E372E0E0 for ; Mon, 16 Jan 2023 15:13:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673910738; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RemqltCjkZ1MEJ2va2p40ZyP53eaWFY2Pd4/PSibIZ0=; b=Gn2wrn8gL7YvuNqRwgyV8UApYFZvadnYUlQqlELH9bF9IArtQ5jxh2CS7r5352WRrr/2RT 96B3PCfAGiZQ9kjBytRy30jO9YTBDj93Jj3gRrCDnM73toE3OOs+Y5CRwxojff9jhhYILs rGsOqCK2C5Wt0VCqozWaBibkAL/XPWk= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-631-QixFgPowM6ytp9eOTe22RA-1; Mon, 16 Jan 2023 18:12:13 -0500 X-MC-Unique: QixFgPowM6ytp9eOTe22RA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8248E1C0432A; Mon, 16 Jan 2023 23:12:12 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id B86192026D4B; Mon, 16 Jan 2023 23:12:10 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v6 34/34] net: [RFC][WIP] Make __zerocopy_sg_from_iter() correctly pin or leave pages unref'd From: David Howells To: Al Viro Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , netdev@vger.kernel.org, dhowells@redhat.com, Christoph Hellwig , Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 16 Jan 2023 23:12:10 +0000 Message-ID: <167391073019.2311931.11127613443740355536.stgit@warthog.procyon.org.uk> In-Reply-To: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> References: <167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755225000046306934?= X-GMAIL-MSGID: =?utf-8?q?1755225000046306934?= Make __zerocopy_sg_from_iter() call iov_iter_extract_pages() to get pages that have been ref'd, pinned or left alone as appropriate. As this is only used for source buffers, pinning isn't an option, but being unref'd is. The way __zerocopy_sg_from_iter() merges fragments is also altered, such that fragments must also match their cleanup modes to be merged. An extra helper and wrapper, folio_put_unpin_sub() and page_put_unpin_sub() are added to allow multiple refs to be put/unpinned. Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: netdev@vger.kernel.org --- include/linux/mm.h | 2 ++ mm/gup.c | 25 +++++++++++++++++++++++++ net/core/datagram.c | 23 +++++++++++++---------- 3 files changed, 40 insertions(+), 10 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index f14edb192394..e3923b89c75e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1368,7 +1368,9 @@ static inline bool is_cow_mapping(vm_flags_t flags) #endif void folio_put_unpin(struct folio *folio, unsigned int flags); +void folio_put_unpin_sub(struct folio *folio, unsigned int flags, unsigned int refs); void page_put_unpin(struct page *page, unsigned int flags); +void page_put_unpin_sub(struct page *page, unsigned int flags, unsigned int refs); /* * The identification function is mainly used by the buddy allocator for diff --git a/mm/gup.c b/mm/gup.c index 3ee4b4c7e0cb..49dd27ba6c13 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -213,6 +213,31 @@ void page_put_unpin(struct page *page, unsigned int flags) } EXPORT_SYMBOL_GPL(page_put_unpin); +/** + * folio_put_unpin_sub - Unpin/put a folio as appropriate + * @folio: The folio to release + * @flags: gup flags indicating the mode of release (FOLL_*) + * @refs: Number of refs/pins to drop + * + * Release a folio according to the flags. If FOLL_GET is set, the folio has a + * ref dropped; if FOLL_PIN is set, it is unpinned; otherwise it is left + * unaltered. + */ +void folio_put_unpin_sub(struct folio *folio, unsigned int flags, + unsigned int refs) +{ + if (flags & (FOLL_GET | FOLL_PIN)) + gup_put_folio(folio, refs, flags); +} +EXPORT_SYMBOL_GPL(folio_put_unpin_sub); + +void page_put_unpin_sub(struct page *page, unsigned int flags, + unsigned int refs) +{ + folio_put_unpin_sub(page_folio(page), flags, refs); +} +EXPORT_SYMBOL_GPL(page_put_unpin_sub); + /** * try_grab_page() - elevate a page's refcount by a flag-dependent amount * @page: pointer to page to be grabbed diff --git a/net/core/datagram.c b/net/core/datagram.c index 122bfb144d32..63ea1f8817e0 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -614,6 +614,7 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, struct sk_buff *skb, struct iov_iter *from, size_t length) { + unsigned int cleanup_mode = iov_iter_extract_mode(from, FOLL_SOURCE_BUF); int frag; if (msg && msg->msg_ubuf && msg->sg_from_iter) @@ -622,7 +623,7 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, frag = skb_shinfo(skb)->nr_frags; while (length && iov_iter_count(from)) { - struct page *pages[MAX_SKB_FRAGS]; + struct page *pages[MAX_SKB_FRAGS], **ppages = pages; struct page *last_head = NULL; size_t start; ssize_t copied; @@ -632,9 +633,9 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, if (frag == MAX_SKB_FRAGS) return -EMSGSIZE; - copied = iov_iter_get_pages(from, pages, length, - MAX_SKB_FRAGS - frag, &start, - FOLL_SOURCE_BUF); + copied = iov_iter_extract_pages(from, &ppages, length, + MAX_SKB_FRAGS - frag, + FOLL_SOURCE_BUF, &start); if (copied < 0) return -EFAULT; @@ -662,12 +663,14 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, skb_frag_t *last = &skb_shinfo(skb)->frags[frag - 1]; if (head == skb_frag_page(last) && + cleanup_mode == skb_frag_cleanup(last) && start == skb_frag_off(last) + skb_frag_size(last)) { skb_frag_size_add(last, size); /* We combined this page, we need to release - * a reference. Since compound pages refcount - * is shared among many pages, batch the refcount - * adjustments to limit false sharing. + * a reference or a pin. Since compound pages + * refcount is shared among many pages, batch + * the refcount adjustments to limit false + * sharing. */ last_head = head; refs++; @@ -675,14 +678,14 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, } } if (refs) { - page_ref_sub(last_head, refs); + page_put_unpin_sub(last_head, cleanup_mode, refs); refs = 0; } skb_fill_page_desc_noacc(skb, frag++, head, start, size, - FOLL_GET); + cleanup_mode); } if (refs) - page_ref_sub(last_head, refs); + page_put_unpin_sub(last_head, cleanup_mode, refs); } return 0; }