[5/5] docs: fuse: improve FUSE consistency explanation
Commit Message
Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
---
Documentation/filesystems/fuse-io.rst | 32 +++++++++++++++++++++++++--
1 file changed, 30 insertions(+), 2 deletions(-)
Comments
Hi--
On 7/10/23 21:34, Jiachen Zhang wrote:
> Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
> ---
> Documentation/filesystems/fuse-io.rst | 32 +++++++++++++++++++++++++--
> 1 file changed, 30 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/filesystems/fuse-io.rst b/Documentation/filesystems/fuse-io.rst
> index 255a368fe534..cdd292dd2e9c 100644
> --- a/Documentation/filesystems/fuse-io.rst
> +++ b/Documentation/filesystems/fuse-io.rst
> @@ -24,7 +31,8 @@ after any writes to the file. All mmap modes are supported.
> The cached mode has two sub modes controlling how writes are handled. The
> write-through mode is the default and is supported on all kernels. The
> writeback-cache mode may be selected by the FUSE_WRITEBACK_CACHE flag in the
> -FUSE_INIT reply.
> +FUSE_INIT reply. In either modes, if the FOPEN_KEEP_CACHE flag is not set in
either mode,
> +the FUSE_OPEN, cached pages of the file will be invalidated immediatedly.
immediately.
>
> In write-through mode each write is immediately sent to userspace as one or more
> WRITE requests, as well as updating any cached pages (and caching previously
> @@ -38,7 +46,27 @@ reclaim on memory pressure) or explicitly (invoked by close(2), fsync(2) and
> when the last ref to the file is being released on munmap(2)). This mode
> assumes that all changes to the filesystem go through the FUSE kernel module
> (size and atime/ctime/mtime attributes are kept up-to-date by the kernel), so
> -it's generally not suitable for network filesystems. If a partial page is
> +it's generally not suitable for network filesystems (you can consider the
> +writeback-cache-v2 mode mentioned latter for them). If a partial page is
later
> written, then the page needs to be first read from userspace. This means, that
> even for files opened for O_WRONLY it is possible that READ requests will be
> generated by the kernel.
On 2023/7/11 12:42, Randy Dunlap wrote:
> Hi--
>
> On 7/10/23 21:34, Jiachen Zhang wrote:
>> Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
>> ---
>> Documentation/filesystems/fuse-io.rst | 32 +++++++++++++++++++++++++--
>> 1 file changed, 30 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/filesystems/fuse-io.rst b/Documentation/filesystems/fuse-io.rst
>> index 255a368fe534..cdd292dd2e9c 100644
>> --- a/Documentation/filesystems/fuse-io.rst
>> +++ b/Documentation/filesystems/fuse-io.rst
>
>> @@ -24,7 +31,8 @@ after any writes to the file. All mmap modes are supported.
>> The cached mode has two sub modes controlling how writes are handled. The
>> write-through mode is the default and is supported on all kernels. The
>> writeback-cache mode may be selected by the FUSE_WRITEBACK_CACHE flag in the
>> -FUSE_INIT reply.
>> +FUSE_INIT reply. In either modes, if the FOPEN_KEEP_CACHE flag is not set in
>
> either mode,
>
>> +the FUSE_OPEN, cached pages of the file will be invalidated immediatedly.
>
> immediately.
>
>>
>> In write-through mode each write is immediately sent to userspace as one or more
>> WRITE requests, as well as updating any cached pages (and caching previously
>> @@ -38,7 +46,27 @@ reclaim on memory pressure) or explicitly (invoked by close(2), fsync(2) and
>> when the last ref to the file is being released on munmap(2)). This mode
>> assumes that all changes to the filesystem go through the FUSE kernel module
>> (size and atime/ctime/mtime attributes are kept up-to-date by the kernel), so
>> -it's generally not suitable for network filesystems. If a partial page is
>> +it's generally not suitable for network filesystems (you can consider the
>> +writeback-cache-v2 mode mentioned latter for them). If a partial page is
>
> later
>
>> written, then the page needs to be first read from userspace. This means, that
>> even for files opened for O_WRONLY it is possible that READ requests will be
>> generated by the kernel.
>
>
Thanks, Randy. I will fix them in the next version.
Jiachen
@@ -10,6 +10,10 @@ Fuse supports the following I/O modes:
- cached
+ write-through
+ writeback-cache
+ + writeback-cache-v2
+
+Direct-io Mode
+==============
The direct-io mode can be selected with the FOPEN_DIRECT_IO flag in the
FUSE_OPEN reply.
@@ -17,6 +21,9 @@ FUSE_OPEN reply.
In direct-io mode the page cache is completely bypassed for reads and writes.
No read-ahead takes place. Shared mmap is disabled.
+Cached Modes and Cache Coherence
+================================
+
In cached mode reads may be satisfied from the page cache, and data may be
read-ahead by the kernel to fill the cache. The cache is always kept consistent
after any writes to the file. All mmap modes are supported.
@@ -24,7 +31,8 @@ after any writes to the file. All mmap modes are supported.
The cached mode has two sub modes controlling how writes are handled. The
write-through mode is the default and is supported on all kernels. The
writeback-cache mode may be selected by the FUSE_WRITEBACK_CACHE flag in the
-FUSE_INIT reply.
+FUSE_INIT reply. In either modes, if the FOPEN_KEEP_CACHE flag is not set in
+the FUSE_OPEN, cached pages of the file will be invalidated immediatedly.
In write-through mode each write is immediately sent to userspace as one or more
WRITE requests, as well as updating any cached pages (and caching previously
@@ -38,7 +46,27 @@ reclaim on memory pressure) or explicitly (invoked by close(2), fsync(2) and
when the last ref to the file is being released on munmap(2)). This mode
assumes that all changes to the filesystem go through the FUSE kernel module
(size and atime/ctime/mtime attributes are kept up-to-date by the kernel), so
-it's generally not suitable for network filesystems. If a partial page is
+it's generally not suitable for network filesystems (you can consider the
+writeback-cache-v2 mode mentioned latter for them). If a partial page is
written, then the page needs to be first read from userspace. This means, that
even for files opened for O_WRONLY it is possible that READ requests will be
generated by the kernel.
+
+Writeback-cache-v2 mode (enabled by the FUSE_WRITEBACK_CACHE_V2 flag) retains
+the dirty page management logic of the writeback-cache mode, which provides
+great write performance. Furthermore, the v2 mode improves cache coherence for
+multiple FUSE mounts scenarios, especially for network filesystems. The kernel
+a/c/mtime and size attributes are allowed to be updated from the filesystem
+either on timeout or when they have been explicitly invalidated. Meanwhile, if
+ever updated by kernel locally, the attributes will not be propagated to the
+filesystem. In other words, the filesystem rather than kernel is considered the
+official source for generating these attributes.
+
+By combining the writeback-cache-v2 mode with the appropriate open flags
+(FOPEN_KEEP_CACHE and FOPEN_INVAL_ATTR for keeping page cache and invalidating
+attributes on FUSE_OPEN respectively), filesystems are able to implement the
+close-to-open (CTO) consistency semantics, which is widely supported by NFS
+client implementations. This allows for maintaining the writeback manner of
+dirty pages while ensuring cache coherence of attributes and file data if the
+operations among different FUSE mounts on a file are properly serialized by
+users using the open-after-close manner.