Message ID | 20231221132400.1601991-16-dhowells@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-8460-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:2483:b0:fb:cd0c:d3e with SMTP id q3csp411306dyi; Thu, 21 Dec 2023 05:32:58 -0800 (PST) X-Google-Smtp-Source: AGHT+IH5LZj2Sl6lOuQ7+1BUEFbNoypk0i/+LILVJohBv4uO3G7s4KA4zBKaZIhORU37gCzc8eyE X-Received: by 2002:a05:6214:19cc:b0:67f:1fc2:182f with SMTP id j12-20020a05621419cc00b0067f1fc2182fmr14851581qvc.50.1703165578524; Thu, 21 Dec 2023 05:32:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703165578; cv=none; d=google.com; s=arc-20160816; b=dOCmX1KLqDs19zB6juhxZX2aiP1IL+2qMlr05cBkwGRLt03rDQbpta8pOxCguwQe6f 1Lauv/hGoyX5IJaobS4mb0OvCS7l69YEVAlv7lQ1C6RjblyKDIilgYuDjsSjePqg68Ty Rz5xIjaSlNx6gUZUJidwCtFreMBqpo0NS1i0FozTt+Wo2dN2yjUPVAegl9q+at8eFnkk SRgX5BdyItu48a7XV0vsi2c/T5CpbSVpTqJ5932UcaSslMNaree9opboWMqreBt+bcKL x0qI8qUP91EXPRXZXtBY+gEguGXdq4AcDoExlxXPADgLFJUYQRl3X4B9Xb1dCSyekte8 LtCg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=lld2cTcpcS4VknVAxRos+fui6lNRTW6LKGLR14kA3iw=; fh=ib4gl95HfLmZdfX9QIMf3rTepWCH9JlNymcDKJTPhJg=; b=EW/iBNMWjyoyfHEaHI87gJAmsuH1pljQo7fYzzuEL/oFLSP4Rl01LkwAodsB8DmsgU IiGEw/QDGWekrpu1mBbl7lD0vIXrFW88PVE75SP96iOkTrgpx+0XPqPy+lf0hpz+ihVg IejNg6shXKp2KiI4PWMqBFnut9JzKwRbXAPRvED6tp5Cdl74UpQQ6FedbiqGO2Tpr7gX 2V9GHEMeiRgQawbJHpie/aY1o+vXZink+vNkgT8KhGacK9I9d5gVlc5Te0lZEyPKY3n6 WboAeItNbvYQQqhSUbufQQaNtIlMBmmvlj1rKHB525hTYmvDEjQKh+Kv8sm0ak4xK6yK 0LQw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dK4B9yBC; spf=pass (google.com: domain of linux-kernel+bounces-8460-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-8460-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id j18-20020a0cf512000000b0067efc398ea0si2123360qvm.161.2023.12.21.05.32.58 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Dec 2023 05:32:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-8460-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dK4B9yBC; spf=pass (google.com: domain of linux-kernel+bounces-8460-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-8460-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 4F6E21C22179 for <ouuuleilei@gmail.com>; Thu, 21 Dec 2023 13:32:58 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 0198E7D8AC; Thu, 21 Dec 2023 13:25:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="dK4B9yBC" X-Original-To: linux-kernel@vger.kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1A3FF7D881 for <linux-kernel@vger.kernel.org>; Thu, 21 Dec 2023 13:25:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1703165107; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lld2cTcpcS4VknVAxRos+fui6lNRTW6LKGLR14kA3iw=; b=dK4B9yBC0Xsf0CDkg+cDQvm30c8YfG1moYRis0ehTVIhvObYiHGh/4Zt6eP7old+y1SGCN GfI0O2m5tIMqg7Cj3Qv86S0T/wgTkRss0KtpJfZZEtAcj6DIgAT93Dm6veXntmXZnPlmY/ SQXMJ/XWdCw9AprCTlr1qwrzV9tt4XY= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-626-jKqHF8zzMwGemsv01FQNAw-1; Thu, 21 Dec 2023 08:25:04 -0500 X-MC-Unique: jKqHF8zzMwGemsv01FQNAw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 046793C000B4; Thu, 21 Dec 2023 13:25:03 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.39.195.169]) by smtp.corp.redhat.com (Postfix) with ESMTP id DF4B61121313; Thu, 21 Dec 2023 13:24:59 +0000 (UTC) From: David Howells <dhowells@redhat.com> To: Jeff Layton <jlayton@kernel.org>, Steve French <smfrench@gmail.com> Cc: David Howells <dhowells@redhat.com>, Matthew Wilcox <willy@infradead.org>, Marc Dionne <marc.dionne@auristor.com>, Paulo Alcantara <pc@manguebit.com>, Shyam Prasad N <sprasad@microsoft.com>, Tom Talpey <tom@talpey.com>, Dominique Martinet <asmadeus@codewreck.org>, Eric Van Hensbergen <ericvh@kernel.org>, Ilya Dryomov <idryomov@gmail.com>, Christian Brauner <christian@brauner.io>, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, linux-nfs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v5 15/40] netfs: Add support for DIO buffering Date: Thu, 21 Dec 2023 13:23:10 +0000 Message-ID: <20231221132400.1601991-16-dhowells@redhat.com> In-Reply-To: <20231221132400.1601991-1-dhowells@redhat.com> References: <20231221132400.1601991-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.3 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785898549668991181 X-GMAIL-MSGID: 1785898549668991181 |
Series |
netfs, afs, 9p: Delegate high-level I/O to netfslib
|
|
Commit Message
David Howells
Dec. 21, 2023, 1:23 p.m. UTC
Add a bvec array pointer and an iterator to netfs_io_request for either holding a copy of a DIO iterator or a list of all the bits of buffer pointed to by a DIO iterator. There are two problems: Firstly, if an iovec-class iov_iter is passed to ->read_iter() or ->write_iter(), this cannot be passed directly to kernel_sendmsg() or kernel_recvmsg() as that may cause locking recursion if a fault is generated, so we need to keep track of the pages involved separately. Secondly, if the I/O is asynchronous, we must copy the iov_iter describing the buffer before returning to the caller as it may be immediately deallocated. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org --- fs/netfs/objects.c | 10 ++++++++++ include/linux/netfs.h | 4 ++++ 2 files changed, 14 insertions(+)
Comments
On Thu, Dec 21, 2023 at 01:23:10PM +0000, David Howells wrote: > Add a bvec array pointer and an iterator to netfs_io_request for either > holding a copy of a DIO iterator or a list of all the bits of buffer > pointed to by a DIO iterator. > > There are two problems: Firstly, if an iovec-class iov_iter is passed to > ->read_iter() or ->write_iter(), this cannot be passed directly to > kernel_sendmsg() or kernel_recvmsg() as that may cause locking recursion if > a fault is generated, so we need to keep track of the pages involved > separately. > > Secondly, if the I/O is asynchronous, we must copy the iov_iter describing > the buffer before returning to the caller as it may be immediately > deallocated. > > Signed-off-by: David Howells <dhowells@redhat.com> > Reviewed-by: Jeff Layton <jlayton@kernel.org> > cc: linux-cachefs@redhat.com > cc: linux-fsdevel@vger.kernel.org > cc: linux-mm@kvack.org > --- > fs/netfs/objects.c | 10 ++++++++++ > include/linux/netfs.h | 4 ++++ > 2 files changed, 14 insertions(+) > > diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c > index 1bd20bdad983..4df5e5eeada6 100644 > --- a/fs/netfs/objects.c > +++ b/fs/netfs/objects.c > @@ -76,6 +76,7 @@ static void netfs_free_request(struct work_struct *work) > { > struct netfs_io_request *rreq = > container_of(work, struct netfs_io_request, work); > + unsigned int i; > > trace_netfs_rreq(rreq, netfs_rreq_trace_free); > netfs_proc_del_rreq(rreq); > @@ -84,6 +85,15 @@ static void netfs_free_request(struct work_struct *work) > rreq->netfs_ops->free_request(rreq); > if (rreq->cache_resources.ops) > rreq->cache_resources.ops->end_operation(&rreq->cache_resources); > + if (rreq->direct_bv) { > + for (i = 0; i < rreq->direct_bv_count; i++) { > + if (rreq->direct_bv[i].bv_page) { > + if (rreq->direct_bv_unpin) > + unpin_user_page(rreq->direct_bv[i].bv_page); > + } > + } > + kvfree(rreq->direct_bv); > + } > kfree_rcu(rreq, rcu); > netfs_stat_d(&netfs_n_rh_rreq); > } > diff --git a/include/linux/netfs.h b/include/linux/netfs.h > index 3da962e977f5..bbb33ccbf719 100644 > --- a/include/linux/netfs.h > +++ b/include/linux/netfs.h > @@ -190,6 +190,9 @@ struct netfs_io_request { > struct iov_iter iter; /* Unencrypted-side iterator */ > struct iov_iter io_iter; /* I/O (Encrypted-side) iterator */ > void *netfs_priv; /* Private data for the netfs */ > + struct bio_vec *direct_bv /* DIO buffer list (when handling iovec-iter) */ > + __counted_by(direct_bv_count); This will break the build with versions of clang that have support for counted_by (as it has been reverted in main but reapplication to main is being actively worked on) because while annotating pointers with this attribute is a goal of the counted_by attribute, it is not ready yet. Please consider removing this and adding a TODO to annotate it when support is available. In file included from /builds/linux/fs/ceph/crypto.c:14: In file included from /builds/linux/fs/ceph/super.h:20: /builds/linux/include/linux/netfs.h:259:19: error: 'counted_by' only applies to C99 flexible array members 259 | struct bio_vec *direct_bv /* DIO buffer list (when handling iovec-iter) */ | ^~~~~~~~~ 1 error generated. Some additional context from another recent error like this: https://lore.kernel.org/CAKwvOdkvGTGiWzqEFq=kzqvxSYP5vUj3g9Z-=MZSQROzzSa_dg@mail.gmail.com/ Cheers, Nathan > + unsigned int direct_bv_count; /* Number of elements in direct_bv[] */ > unsigned int debug_id; > atomic_t nr_outstanding; /* Number of ops in progress */ > atomic_t nr_copy_ops; /* Number of copy-to-cache ops in progress */ > @@ -197,6 +200,7 @@ struct netfs_io_request { > size_t len; /* Length of the request */ > short error; /* 0 or error that occurred */ > enum netfs_io_origin origin; /* Origin of the request */ > + bool direct_bv_unpin; /* T if direct_bv[] must be unpinned */ > loff_t i_size; /* Size of the file */ > loff_t start; /* Start position */ > pgoff_t no_unlock_folio; /* Don't unlock this folio after read */ >
> This will break the build with versions of clang that have support for > counted_by (as it has been reverted in main but reapplication to main is > being actively worked on) because while annotating pointers with this > attribute is a goal of the counted_by attribute, it is not ready yet. > Please consider removing this and adding a TODO to annotate it when > support is available. It's really unpleasant that we keep getting new attributes that we seemingly are encouraged to use and get sent patches for it. And then we learn a little later that that stuff isn't ready yet. It's annoying. I know it isn't your fault but it would be wise to be a little more careful. IOW, unless both clang and gcc do support that thing appropriately don't send patches to various subsystems for this. In any case, this is now fixed. I pulled an updated version from David.
On Thu, Dec 28, 2023 at 11:47:45AM +0100, Christian Brauner wrote: > > This will break the build with versions of clang that have support for > > counted_by (as it has been reverted in main but reapplication to main is > > being actively worked on) because while annotating pointers with this > > attribute is a goal of the counted_by attribute, it is not ready yet. > > Please consider removing this and adding a TODO to annotate it when > > support is available. > > It's really unpleasant that we keep getting new attributes that we > seemingly are encouraged to use and get sent patches for it. And then we > learn a little later that that stuff isn't ready yet. It's annoying. I I will assume the "get sent patches for it" is referring to the patches that Kees has been sending out to add this attribute to flexible array members. In his defense, that part of the attribute is very nearly ready (it is only the pointer annotations that are not ready, as in not worked on at all as far as I am aware). In fact, it was merged in clang's main branch for some time and the only reason that it was backed out was because adoption in the kernel had pointed out bugs in the original implementation that were harder to fix than initially thought; in other words, only because we started adding this attribute to the kernel were we able to realize that the initial implementation in clang needed to be improved, otherwise this feature may have shipped completely broken in clang 18.1.0 because it had not been stress tested yet. Now we can get it right. However, I do not necessarily disagree that it is annoying for maintainers who are not following this saga but are just receiving patches to add these annotatations because adds additional things to check for. Perhaps there should be some guidance added to the __counted_by definition or Documentation around how it is expected to be used so that there is clear advice for both developers and maintainers? > know it isn't your fault but it would be wise to be a little more > careful. IOW, unless both clang and gcc do support that thing > appropriately don't send patches to various subsystems for this. I will assume this was not necessarily directed at me because I have not sent any patches for __counted_by. > In any case, this is now fixed. I pulled an updated version from David. Thanks a lot. Cheers, Nathan
diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c index 1bd20bdad983..4df5e5eeada6 100644 --- a/fs/netfs/objects.c +++ b/fs/netfs/objects.c @@ -76,6 +76,7 @@ static void netfs_free_request(struct work_struct *work) { struct netfs_io_request *rreq = container_of(work, struct netfs_io_request, work); + unsigned int i; trace_netfs_rreq(rreq, netfs_rreq_trace_free); netfs_proc_del_rreq(rreq); @@ -84,6 +85,15 @@ static void netfs_free_request(struct work_struct *work) rreq->netfs_ops->free_request(rreq); if (rreq->cache_resources.ops) rreq->cache_resources.ops->end_operation(&rreq->cache_resources); + if (rreq->direct_bv) { + for (i = 0; i < rreq->direct_bv_count; i++) { + if (rreq->direct_bv[i].bv_page) { + if (rreq->direct_bv_unpin) + unpin_user_page(rreq->direct_bv[i].bv_page); + } + } + kvfree(rreq->direct_bv); + } kfree_rcu(rreq, rcu); netfs_stat_d(&netfs_n_rh_rreq); } diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 3da962e977f5..bbb33ccbf719 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -190,6 +190,9 @@ struct netfs_io_request { struct iov_iter iter; /* Unencrypted-side iterator */ struct iov_iter io_iter; /* I/O (Encrypted-side) iterator */ void *netfs_priv; /* Private data for the netfs */ + struct bio_vec *direct_bv /* DIO buffer list (when handling iovec-iter) */ + __counted_by(direct_bv_count); + unsigned int direct_bv_count; /* Number of elements in direct_bv[] */ unsigned int debug_id; atomic_t nr_outstanding; /* Number of ops in progress */ atomic_t nr_copy_ops; /* Number of copy-to-cache ops in progress */ @@ -197,6 +200,7 @@ struct netfs_io_request { size_t len; /* Length of the request */ short error; /* 0 or error that occurred */ enum netfs_io_origin origin; /* Origin of the request */ + bool direct_bv_unpin; /* T if direct_bv[] must be unpinned */ loff_t i_size; /* Size of the file */ loff_t start; /* Start position */ pgoff_t no_unlock_folio; /* Don't unlock this folio after read */