[v5,04/11] blksnap: header file of the module interface

Message ID 20230612135228.10702-5-sergei.shtepa@veeam.com
State New
Headers
Series blksnap - block devices snapshots module |

Commit Message

Sergei Shtepa June 12, 2023, 1:52 p.m. UTC
  The header file contains a set of declarations, structures and control
requests (ioctl) that allows to manage the module from the user space.

Co-developed-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Tested-by: Donald Buczek <buczek@molgen.mpg.de>
Signed-off-by: Sergei Shtepa <sergei.shtepa@veeam.com>
---
 MAINTAINERS                  |   1 +
 include/uapi/linux/blksnap.h | 421 +++++++++++++++++++++++++++++++++++
 2 files changed, 422 insertions(+)
 create mode 100644 include/uapi/linux/blksnap.h
  

Comments

Dave Chinner June 13, 2023, 10:25 p.m. UTC | #1
On Mon, Jun 12, 2023 at 03:52:21PM +0200, Sergei Shtepa wrote:
> The header file contains a set of declarations, structures and control
> requests (ioctl) that allows to manage the module from the user space.
> 
> Co-developed-by: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: Christoph Hellwig <hch@infradead.org>
> Tested-by: Donald Buczek <buczek@molgen.mpg.de>
> Signed-off-by: Sergei Shtepa <sergei.shtepa@veeam.com>
> ---
>  MAINTAINERS                  |   1 +
>  include/uapi/linux/blksnap.h | 421 +++++++++++++++++++++++++++++++++++
>  2 files changed, 422 insertions(+)
>  create mode 100644 include/uapi/linux/blksnap.h


.....

> +/**
> + * struct blksnap_snapshot_append_storage - Argument for the
> + *	&IOCTL_BLKSNAP_SNAPSHOT_APPEND_STORAGE control.
> + *
> + * @id:
> + *	Snapshot ID.
> + * @bdev_path:
> + *	Device path string buffer.
> + * @bdev_path_size:
> + *	Device path string buffer size.
> + * @count:
> + *	Size of @ranges in the number of &struct blksnap_sectors.
> + * @ranges:
> + *	Pointer to the array of &struct blksnap_sectors.
> + */
> +struct blksnap_snapshot_append_storage {
> +	struct blksnap_uuid id;
> +	__u64 bdev_path;
> +	__u32 bdev_path_size;
> +	__u32 count;
> +	__u64 ranges;
> +};
> +
> +/**
> + * define IOCTL_BLKSNAP_SNAPSHOT_APPEND_STORAGE - Append storage to the
> + *	difference storage of the snapshot.
> + *
> + * The snapshot difference storage can be set either before or after creating
> + * the snapshot images. This allows to dynamically expand the difference
> + * storage while holding the snapshot.
> + *
> + * Return: 0 if succeeded, negative errno otherwise.
> + */
> +#define IOCTL_BLKSNAP_SNAPSHOT_APPEND_STORAGE					\
> +	_IOW(BLKSNAP, blksnap_ioctl_snapshot_append_storage,			\
> +	     struct blksnap_snapshot_append_storage)

That's an API I'm extremely uncomfortable with. We've learnt the
lesson *many times* that userspace physical mappings of underlying
file storage are unreliable.

i.e.  This is reliant on userspace telling the kernel the physical
mapping of the filesystem file to block device LBA space and then
providing a guarantee (somehow) that the mapping will always remain
unchanged. i.e. It's reliant on passing FIEMAP data from the
filesystem to userspace and then back into the kernel without it
becoming stale and somehow providing a guarantee that nothing (not
even the filesystem doing internal garbage collection) will change
it.

It is reliant on userspace detecting shared blocks in files and
avoiding them; it's reliant on userspace never being able to read,
write or modify that file; it's reliant on the -filesystem- never
modifying the layout of that file; it's even reliant on a internal
filesystem state that has to be locked down before the block mapping
can be delegated to a third party for IO control.

Further, we can't allow userspace to have any read access to the
snapshot file even after it is no longer in use by the blksnap
driver.  The contents of the file will span multiple security
contexts, contain sensitive data, etc and so it's contents must
never be exposed to userspace. We cannot rely on userspace to delete
it safely after use and hence we have to protect it's contents
from exposure to userspace, too.

We already have a mechanism that provides all these guarantees to a
third party kernel subsystem: swap files.

We already have a trusted path in the kernel to allow internal block
mapping of a swap file to be retreived by the mm subsystem. We also
have an inode flag that protects it such files against access and
modification from anything other than internal kernel IO paths. We
also allow them to be allocated as unwritten extents using
fallocate() and we are never converted to written whilist in use as
a swapfile. Hence the contents of them cannot be exposed to
userspace even if the swapfile flag is removed and owner/permission
changes are made to the file after it is released by the kernel.

Swap files are an intrinsically safe mechanism for delegating fixed
file mappings to kernel subsystems that have requirements for
secure, trusted storage that userspace cannot tamper with.

I note that the code behind the
IOCTL_BLKSNAP_SNAPSHOT_APPEND_STORAGE ends up in
diff_storage_add_range(), which allocates an extent structure for
each range and links it into a linked list for later use.

This is effectively the same structure that the mm swapfile code
uses. It provides a swap_info_struct and a struct file to the
filesystem via aops->swap_activate. The filesystem then iterates the
extent list for the file and calls add_swap_extent() for each
physical range in the file. The mm code then allocates a new extent
structure for the range and links it into the extent rbtree in the
swap_info_struct. This is the mapping it uses later on in the IO
path.

Adding a similar, more generic mapping operation that allows a
private structure and a callback to the provided would allow the
filesystem to provide this callback directly to subsystems like
blksnap. Essentially diff_storage_add_range() becomes the iterator
callback for blksnap. This makes the whole "userspace provides the
mapping" problem goes away and we can use the swapfile mechanisms to
provide all the other guarantees the kernel needs to ensure it can
trust the contents and mappings of the blksnap snapshot files....

Thoughts?

-Dave.
  
Christoph Hellwig June 14, 2023, 6:26 a.m. UTC | #2
On Wed, Jun 14, 2023 at 08:25:15AM +1000, Dave Chinner wrote:
> > + * Return: 0 if succeeded, negative errno otherwise.
> > + */
> > +#define IOCTL_BLKSNAP_SNAPSHOT_APPEND_STORAGE					\
> > +	_IOW(BLKSNAP, blksnap_ioctl_snapshot_append_storage,			\
> > +	     struct blksnap_snapshot_append_storage)
> 
> That's an API I'm extremely uncomfortable with. We've learnt the
> lesson *many times* that userspace physical mappings of underlying
> file storage are unreliable.
> 
> i.e.  This is reliant on userspace telling the kernel the physical
> mapping of the filesystem file to block device LBA space and then
> providing a guarantee (somehow) that the mapping will always remain
> unchanged. i.e. It's reliant on passing FIEMAP data from the
> filesystem to userspace and then back into the kernel without it
> becoming stale and somehow providing a guarantee that nothing (not
> even the filesystem doing internal garbage collection) will change
> it.

Hmm, I never thought of this API as used on files that somewhere
had a logical to physical mapping applied to them.

Sergey, is that the indtended use case?  If so we really should
be going through the file system using direct I/O.
  
Sergei Shtepa June 14, 2023, 9:26 a.m. UTC | #3
On 6/14/23 08:26, Christoph Hellwig wrote:
> Subject:
> Re: [PATCH v5 04/11] blksnap: header file of the module interface
> From:
> Christoph Hellwig <hch@infradead.org>
> Date:
> 6/14/23, 08:26
> 
> To:
> Dave Chinner <david@fromorbit.com>
> CC:
> Sergei Shtepa <sergei.shtepa@veeam.com>, axboe@kernel.dk, hch@infradead.org, corbet@lwn.net, snitzer@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org, dchinner@redhat.com, willy@infradead.org, dlemoal@kernel.org, linux@weissschuh.net, jack@suse.cz, ming.lei@redhat.com, linux-block@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Donald Buczek <buczek@molgen.mpg.de>
> 
> 
> On Wed, Jun 14, 2023 at 08:25:15AM +1000, Dave Chinner wrote:
>>> + * Return: 0 if succeeded, negative errno otherwise.
>>> + */
>>> +#define IOCTL_BLKSNAP_SNAPSHOT_APPEND_STORAGE					\
>>> +	_IOW(BLKSNAP, blksnap_ioctl_snapshot_append_storage,			\
>>> +	     struct blksnap_snapshot_append_storage)
>> That's an API I'm extremely uncomfortable with. We've learnt the
>> lesson *many times* that userspace physical mappings of underlying
>> file storage are unreliable.
>>
>> i.e.  This is reliant on userspace telling the kernel the physical
>> mapping of the filesystem file to block device LBA space and then
>> providing a guarantee (somehow) that the mapping will always remain
>> unchanged. i.e. It's reliant on passing FIEMAP data from the
>> filesystem to userspace and then back into the kernel without it
>> becoming stale and somehow providing a guarantee that nothing (not
>> even the filesystem doing internal garbage collection) will change
>> it.
> Hmm, I never thought of this API as used on files that somewhere
> had a logical to physical mapping applied to them.
> 
> Sergey, is that the indtended use case?  If so we really should
> be going through the file system using direct I/O.
> 

Hi!
Thank you, Dave, for such a detailed comment. 
Yes, everything is really as you described.

This code worked quite successfully for the veeamsnap module, on the
basis of which blksnap was created. Indeed, such an allocation of an
area on a block device using a file does not look safe.

We've already discussed this with Donald Buczek <buczek@molgen.mpg.de>.
Link: https://github.com/veeam/blksnap/issues/57#issuecomment-1576569075
And I have planned work on moving to a more secure ioctl in the future.
Link: https://github.com/veeam/blksnap/issues/61

Now, thanks to Dave, it becomes clear to me how to solve this problem best.
swapfile is a good example of how to do it right.

Fixing this vulnerability will entail transferring the algorithm for
allocating the difference storage from the user-space to the blksnap code.
The changes are quite significant. The UAPI will be changed.

So I agree that the blksnap module is not good enough for upstream yet.
  
Christoph Hellwig June 14, 2023, 2:07 p.m. UTC | #4
On Wed, Jun 14, 2023 at 11:26:20AM +0200, Sergei Shtepa wrote:
> This code worked quite successfully for the veeamsnap module, on the
> basis of which blksnap was created. Indeed, such an allocation of an
> area on a block device using a file does not look safe.
> 
> We've already discussed this with Donald Buczek <buczek@molgen.mpg.de>.
> Link: https://github.com/veeam/blksnap/issues/57#issuecomment-1576569075
> And I have planned work on moving to a more secure ioctl in the future.
> Link: https://github.com/veeam/blksnap/issues/61
> 
> Now, thanks to Dave, it becomes clear to me how to solve this problem best.
> swapfile is a good example of how to do it right.

I don't actually think swapfile is a very good idea, in fact the Linux
swap code in general is not a very good place to look for inspirations
:)

IFF the usage is always to have a whole file for the diff storage the
over all API is very simple - just pass a fd to the kernel for the area,
and then use in-kernel direct I/O on it.  Now if that file should also
be able to reside on the same file system that the snapshot is taken
of things get a little more complicated, because writes to it also need
to automatically set the BIO_REFFED flag.  I have some ideas for that
and will share some draft code with you.
  
Sergei Shtepa June 14, 2023, 4:43 p.m. UTC | #5
On 6/14/23 16:07, Christoph Hellwig wrote:
> I don't actually think swapfile is a very good idea, in fact the Linux
> swap code in general is not a very good place to look for inspirations
> 😄

Perhaps. I haven't looked at the code yet. But I like the idea of
protecting the file from any access from the user-space, as it is
implemented for swapfile.

> 
> IFF the usage is always to have a whole file for the diff storage the
> over all API is very simple - just pass a fd to the kernel for the area,
> and then use in-kernel direct I/O on it.  Now if that file should also
> be able to reside on the same file system that the snapshot is taken
> of things get a little more complicated, because writes to it also need
> to automatically set the BIO_REFFED flag.

There is definitely no task to create a difference storage file on the
same block device for which the snapshot is being created. The file can
be created on any block device.

Still, the variant when a whole partition is allocated for the difference
storage can also be useful.

> I have some ideas for that and will share some draft code with you.

I'll be glad.
  
Dave Chinner June 15, 2023, 12:08 a.m. UTC | #6
On Wed, Jun 14, 2023 at 07:07:16AM -0700, Christoph Hellwig wrote:
> On Wed, Jun 14, 2023 at 11:26:20AM +0200, Sergei Shtepa wrote:
> > This code worked quite successfully for the veeamsnap module, on the
> > basis of which blksnap was created. Indeed, such an allocation of an
> > area on a block device using a file does not look safe.
> > 
> > We've already discussed this with Donald Buczek <buczek@molgen.mpg.de>.
> > Link: https://github.com/veeam/blksnap/issues/57#issuecomment-1576569075
> > And I have planned work on moving to a more secure ioctl in the future.
> > Link: https://github.com/veeam/blksnap/issues/61
> > 
> > Now, thanks to Dave, it becomes clear to me how to solve this problem best.
> > swapfile is a good example of how to do it right.
> 
> I don't actually think swapfile is a very good idea, in fact the Linux
> swap code in general is not a very good place to look for inspirations
> :)

Yeah, the swapfile implementation isn't very nice, I was really just
using it as an example of how we can implement the requirements of
block mapping delegation in a safe manner to a kernel subsystem.

I think the important part is the swapfile inode flag, because that
is what keeps userspace from being able to screw with the file while
the kernel is using it and allows us to do read/write IO to
unwritten extents without converting them to written...

> IFF the usage is always to have a whole file for the diff storage the
> over all API is very simple - just pass a fd to the kernel for the area,
> and then use in-kernel direct I/O on it.

Yeah, I was thinking a fd is a better choice for the UAPI as it
frees up the kernel implementation, and it doesn't need us to pass a
separate bdev identifier in the ioctl. It also means we can pass a
regular file or a block device and the kernel code doesn't need to
care that they are different.

If you think direct IO is a better idea, then I have no objection to
that - I haven't looked into the implementation that deeply at this
point. I wanted to get an understanding of how all the pieces went
together first, so all I've read is the documentation and looked at
the UAPI.

I made a leap from that: the documentation keeps talking about using
files a the filesystem for the difference storage, but the only UAPI
for telling the kernel about storage regions it can use is this
physical bdev LBA mapping ioctl. Hence if file storage is being
used....

> Now if that file should also
> be able to reside on the same file system that the snapshot is taken
> of things get a little more complicated, because writes to it also need
> to automatically set the BIO_REFFED flag.  I have some ideas for that
> and will share some draft code with you.

Cool, I look forward to the updates; I know of a couple of
applications that could make use of this functionality right
away....

Cheers,

Dave.
  
Thomas Weißschuh July 17, 2023, 6:57 p.m. UTC | #7
On 2023-06-12 15:52:21+0200, Sergei Shtepa wrote:

> [..]

> diff --git a/include/uapi/linux/blksnap.h b/include/uapi/linux/blksnap.h
> new file mode 100644
> index 000000000000..2fe3f2a43bc5
> --- /dev/null
> +++ b/include/uapi/linux/blksnap.h
> @@ -0,0 +1,421 @@

> [..]

> +/**
> + * struct blksnap_snapshotinfo - Result for the command
> + *	&blkfilter_ctl_blksnap.blkfilter_ctl_blksnap_snapshotinfo.
> + *
> + * @error_code:
> + *	Zero if there were no errors while holding the snapshot.
> + *	The error code -ENOSPC means that while holding the snapshot, a snapshot
> + *	overflow situation has occurred. Other error codes mean other reasons
> + *	for failure.
> + *	The error code is reset when the device is added to a new snapshot.
> + * @image:
> + *	If the snapshot was taken, it stores the block device name of the
> + *	image, or empty string otherwise.
> + */
> +struct blksnap_snapshotinfo {
> +	__s32 error_code;
> +	__u8 image[IMAGE_DISK_NAME_LEN];

Nitpick:

Seems a bit weird to have a signed error code that is always negative.
Couldn't this be an unsigned number or directly return the error from
the ioctl() itself?

> +};
> +
> +/**
> + * DOC: Interface for managing snapshots
> + *
> + * Control commands that are transmitted through the blksnap module interface.
> + */
> +enum blksnap_ioctl {
> +	blksnap_ioctl_version,
> +	blksnap_ioctl_snapshot_create,
> +	blksnap_ioctl_snapshot_destroy,
> +	blksnap_ioctl_snapshot_append_storage,
> +	blksnap_ioctl_snapshot_take,
> +	blksnap_ioctl_snapshot_collect,
> +	blksnap_ioctl_snapshot_wait_event,
> +};
> +
> +/**
> + * struct blksnap_version - Module version.
> + *
> + * @major:
> + *	Version major part.
> + * @minor:
> + *	Version minor part.
> + * @revision:
> + *	Revision number.
> + * @build:
> + *	Build number. Should be zero.
> + */
> +struct blksnap_version {
> +	__u16 major;
> +	__u16 minor;
> +	__u16 revision;
> +	__u16 build;
> +};
> +
> +/**
> + * define IOCTL_BLKSNAP_VERSION - Get module version.
> + *
> + * The version may increase when the API changes. But linking the user space
> + * behavior to the version code does not seem to be a good idea.
> + * To ensure backward compatibility, API changes should be made by adding new
> + * ioctl without changing the behavior of existing ones. The version should be
> + * used for logs.
> + *
> + * Return: 0 if succeeded, negative errno otherwise.
> + */
> +#define IOCTL_BLKSNAP_VERSION							\
> +	_IOW(BLKSNAP, blksnap_ioctl_version, struct blksnap_version)

Shouldn't this be _IOR()?

  "_IOW means userland is writing and kernel is reading. _IOR
  means userland is reading and kernel is writing."

The other ioctl definitions seem to need a review, too.
  
Sergei Shtepa July 18, 2023, 9:53 a.m. UTC | #8
Hi!
Thanks for the review.

On 7/17/23 20:57, Thomas Weißschuh wrote:
> Subject:
> Re: [PATCH v5 04/11] blksnap: header file of the module interface
> From:
> Thomas Weißschuh <thomas@t-8ch.de>
> Date:
> 7/17/23, 20:57
> 
> To:
> Sergei Shtepa <sergei.shtepa@veeam.com>
> CC:
> axboe@kernel.dk, hch@infradead.org, corbet@lwn.net, snitzer@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org, dchinner@redhat.com, willy@infradead.org, dlemoal@kernel.org, jack@suse.cz, ming.lei@redhat.com, linux-block@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Donald Buczek <buczek@molgen.mpg.de>
> 
> 
> On 2023-06-12 15:52:21+0200, Sergei Shtepa wrote:
> 
>> [..]
>> diff --git a/include/uapi/linux/blksnap.h b/include/uapi/linux/blksnap.h
>> new file mode 100644
>> index 000000000000..2fe3f2a43bc5
>> --- /dev/null
>> +++ b/include/uapi/linux/blksnap.h
>> @@ -0,0 +1,421 @@
>> [..]
>> +/**
>> + * struct blksnap_snapshotinfo - Result for the command
>> + *	&blkfilter_ctl_blksnap.blkfilter_ctl_blksnap_snapshotinfo.
>> + *
>> + * @error_code:
>> + *	Zero if there were no errors while holding the snapshot.
>> + *	The error code -ENOSPC means that while holding the snapshot, a snapshot
>> + *	overflow situation has occurred. Other error codes mean other reasons
>> + *	for failure.
>> + *	The error code is reset when the device is added to a new snapshot.
>> + * @image:
>> + *	If the snapshot was taken, it stores the block device name of the
>> + *	image, or empty string otherwise.
>> + */
>> +struct blksnap_snapshotinfo {
>> +	__s32 error_code;
>> +	__u8 image[IMAGE_DISK_NAME_LEN];
> Nitpick:
> 
> Seems a bit weird to have a signed error code that is always negative.
> Couldn't this be an unsigned number or directly return the error from
> the ioctl() itself?

Yes, it's a good idea to pass the error code as an unsigned value.
And this positive value can be passed in case of successful execution
of ioctl(), but I would not like to put different error signs in one value.

> 
>> +};
>> +
>> +/**
>> + * DOC: Interface for managing snapshots
>> + *
>> + * Control commands that are transmitted through the blksnap module interface.
>> + */
>> +enum blksnap_ioctl {
>> +	blksnap_ioctl_version,
>> +	blksnap_ioctl_snapshot_create,
>> +	blksnap_ioctl_snapshot_destroy,
>> +	blksnap_ioctl_snapshot_append_storage,
>> +	blksnap_ioctl_snapshot_take,
>> +	blksnap_ioctl_snapshot_collect,
>> +	blksnap_ioctl_snapshot_wait_event,
>> +};
>> +
>> +/**
>> + * struct blksnap_version - Module version.
>> + *
>> + * @major:
>> + *	Version major part.
>> + * @minor:
>> + *	Version minor part.
>> + * @revision:
>> + *	Revision number.
>> + * @build:
>> + *	Build number. Should be zero.
>> + */
>> +struct blksnap_version {
>> +	__u16 major;
>> +	__u16 minor;
>> +	__u16 revision;
>> +	__u16 build;
>> +};
>> +
>> +/**
>> + * define IOCTL_BLKSNAP_VERSION - Get module version.
>> + *
>> + * The version may increase when the API changes. But linking the user space
>> + * behavior to the version code does not seem to be a good idea.
>> + * To ensure backward compatibility, API changes should be made by adding new
>> + * ioctl without changing the behavior of existing ones. The version should be
>> + * used for logs.
>> + *
>> + * Return: 0 if succeeded, negative errno otherwise.
>> + */
>> +#define IOCTL_BLKSNAP_VERSION							\
>> +	_IOW(BLKSNAP, blksnap_ioctl_version, struct blksnap_version)
> Shouldn't this be _IOR()?
> 
>   "_IOW means userland is writing and kernel is reading. _IOR
>   means userland is reading and kernel is writing."
> 
> The other ioctl definitions seem to need a review, too.
> 

Yeah. I need to replace _IOR and _IOW in all ioctl.
Thanks!
  
Christoph Hellwig July 20, 2023, 6:16 a.m. UTC | #9
On Tue, Jul 18, 2023 at 11:53:54AM +0200, Sergei Shtepa wrote:
> > Seems a bit weird to have a signed error code that is always negative.
> > Couldn't this be an unsigned number or directly return the error from
> > the ioctl() itself?
> 
> Yes, it's a good idea to pass the error code as an unsigned value.
> And this positive value can be passed in case of successful execution
> of ioctl(), but I would not like to put different error signs in one value.

Linux tends to use negative error values in basically all interfaces.
I think it will be less confusing to stick to that.
  

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index c7dabe785cf1..76b14ad604dc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3594,6 +3594,7 @@  M:	Sergei Shtepa <sergei.shtepa@veeam.com>
 L:	linux-block@vger.kernel.org
 S:	Supported
 F:	Documentation/block/blksnap.rst
+F:	include/uapi/linux/blksnap.h
 
 BLOCK LAYER
 M:	Jens Axboe <axboe@kernel.dk>
diff --git a/include/uapi/linux/blksnap.h b/include/uapi/linux/blksnap.h
new file mode 100644
index 000000000000..2fe3f2a43bc5
--- /dev/null
+++ b/include/uapi/linux/blksnap.h
@@ -0,0 +1,421 @@ 
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/* Copyright (C) 2023 Veeam Software Group GmbH */
+#ifndef _UAPI_LINUX_BLKSNAP_H
+#define _UAPI_LINUX_BLKSNAP_H
+
+#include <linux/types.h>
+
+#define BLKSNAP_CTL "blksnap-control"
+#define BLKSNAP_IMAGE_NAME "blksnap-image"
+#define BLKSNAP 'V'
+
+/**
+ * DOC: Block device filter interface.
+ *
+ * Control commands that are transmitted through the block device filter
+ * interface.
+ */
+
+/**
+ * enum blkfilter_ctl_blksnap - List of commands for BLKFILTER_CTL ioctl
+ *
+ * @blkfilter_ctl_blksnap_cbtinfo:
+ *	Get CBT information.
+ *	The result of executing the command is a &struct blksnap_cbtinfo.
+ *	Return 0 if succeeded, negative errno otherwise.
+ * @blkfilter_ctl_blksnap_cbtmap:
+ *	Read the CBT map.
+ *	The option passes the &struct blksnap_cbtmap.
+ *	The size of the table can be quite large. Thus, the table is read in
+ *	a loop, in each cycle of which the next offset is set to
+ *	&blksnap_tracker_read_cbt_bitmap.offset.
+ *	Return a count of bytes read if succeeded, negative errno otherwise.
+ * @blkfilter_ctl_blksnap_cbtdirty:
+ *	Set dirty blocks in the CBT map.
+ *	The option passes the &struct blksnap_cbtdirty.
+ *	There are cases when some blocks need to be marked as changed.
+ *	This ioctl allows to do this.
+ *	Return 0 if succeeded, negative errno otherwise.
+ * @blkfilter_ctl_blksnap_snapshotadd:
+ *	Add device to snapshot.
+ *	The option passes the &struct blksnap_snapshotadd.
+ *	Return 0 if succeeded, negative errno otherwise.
+ * @blkfilter_ctl_blksnap_snapshotinfo:
+ *	Get information about snapshot.
+ *	The result of executing the command is a &struct blksnap_snapshotinfo.
+ *	Return 0 if succeeded, negative errno otherwise.
+ */
+enum blkfilter_ctl_blksnap {
+	blkfilter_ctl_blksnap_cbtinfo,
+	blkfilter_ctl_blksnap_cbtmap,
+	blkfilter_ctl_blksnap_cbtdirty,
+	blkfilter_ctl_blksnap_snapshotadd,
+	blkfilter_ctl_blksnap_snapshotinfo,
+};
+
+#ifndef UUID_SIZE
+#define UUID_SIZE 16
+#endif
+
+/**
+ * struct blksnap_uuid - Unique 16-byte identifier.
+ *
+ * @b:
+ *	An array of 16 bytes.
+ */
+struct blksnap_uuid {
+	__u8 b[UUID_SIZE];
+};
+
+/**
+ * struct blksnap_cbtinfo - Result for the command
+ *	&blkfilter_ctl_blksnap.blkfilter_ctl_blksnap_cbtinfo.
+ *
+ * @device_capacity:
+ *	Device capacity in bytes.
+ * @block_size:
+ *	Block size in bytes.
+ * @block_count:
+ *	Number of blocks.
+ * @generation_id:
+ *	Unique identifier of change tracking generation.
+ * @changes_number:
+ *	Current changes number.
+ */
+struct blksnap_cbtinfo {
+	__u64 device_capacity;
+	__u32 block_size;
+	__u32 block_count;
+	struct blksnap_uuid generation_id;
+	__u8 changes_number;
+};
+
+/**
+ * struct blksnap_cbtmap - Option for the command
+ *	&blkfilter_ctl_blksnap.blkfilter_ctl_blksnap_cbtmap.
+ *
+ * @offset:
+ *	Offset from the beginning of the CBT bitmap in bytes.
+ * @length:
+ *	Size of @buffer in bytes.
+ * @buffer:
+ *	Pointer to the buffer for output.
+ */
+struct blksnap_cbtmap {
+	__u32 offset;
+	__u32 length;
+	__u64 buffer;
+};
+
+/**
+ * struct blksnap_sectors - Description of the block device region.
+ *
+ * @offset:
+ *	Offset from the beginning of the disk in sectors.
+ * @count:
+ *	Count of sectors.
+ */
+struct blksnap_sectors {
+	__u64 offset;
+	__u64 count;
+};
+
+/**
+ * struct blksnap_cbtdirty - Option for the command
+ *	&blkfilter_ctl_blksnap.blkfilter_ctl_blksnap_cbtdirty.
+ *
+ * @count:
+ *	Count of elements in the @dirty_sectors.
+ * @dirty_sectors:
+ *	Pointer to the array of &struct blksnap_sectors.
+ */
+struct blksnap_cbtdirty {
+	__u32 count;
+	__u64 dirty_sectors;
+};
+
+/**
+ * struct blksnap_snapshotadd - Option for the command
+ *	&blkfilter_ctl_blksnap.blkfilter_ctl_blksnap_snapshotadd.
+ *
+ * @id:
+ *	ID of the snapshot to which the block device should be added.
+ */
+struct blksnap_snapshotadd {
+	struct blksnap_uuid id;
+};
+
+#define IMAGE_DISK_NAME_LEN 32
+
+/**
+ * struct blksnap_snapshotinfo - Result for the command
+ *	&blkfilter_ctl_blksnap.blkfilter_ctl_blksnap_snapshotinfo.
+ *
+ * @error_code:
+ *	Zero if there were no errors while holding the snapshot.
+ *	The error code -ENOSPC means that while holding the snapshot, a snapshot
+ *	overflow situation has occurred. Other error codes mean other reasons
+ *	for failure.
+ *	The error code is reset when the device is added to a new snapshot.
+ * @image:
+ *	If the snapshot was taken, it stores the block device name of the
+ *	image, or empty string otherwise.
+ */
+struct blksnap_snapshotinfo {
+	__s32 error_code;
+	__u8 image[IMAGE_DISK_NAME_LEN];
+};
+
+/**
+ * DOC: Interface for managing snapshots
+ *
+ * Control commands that are transmitted through the blksnap module interface.
+ */
+enum blksnap_ioctl {
+	blksnap_ioctl_version,
+	blksnap_ioctl_snapshot_create,
+	blksnap_ioctl_snapshot_destroy,
+	blksnap_ioctl_snapshot_append_storage,
+	blksnap_ioctl_snapshot_take,
+	blksnap_ioctl_snapshot_collect,
+	blksnap_ioctl_snapshot_wait_event,
+};
+
+/**
+ * struct blksnap_version - Module version.
+ *
+ * @major:
+ *	Version major part.
+ * @minor:
+ *	Version minor part.
+ * @revision:
+ *	Revision number.
+ * @build:
+ *	Build number. Should be zero.
+ */
+struct blksnap_version {
+	__u16 major;
+	__u16 minor;
+	__u16 revision;
+	__u16 build;
+};
+
+/**
+ * define IOCTL_BLKSNAP_VERSION - Get module version.
+ *
+ * The version may increase when the API changes. But linking the user space
+ * behavior to the version code does not seem to be a good idea.
+ * To ensure backward compatibility, API changes should be made by adding new
+ * ioctl without changing the behavior of existing ones. The version should be
+ * used for logs.
+ *
+ * Return: 0 if succeeded, negative errno otherwise.
+ */
+#define IOCTL_BLKSNAP_VERSION							\
+	_IOW(BLKSNAP, blksnap_ioctl_version, struct blksnap_version)
+
+
+/**
+ * define IOCTL_BLKSNAP_SNAPSHOT_CREATE - Create snapshot.
+ *
+ * Creates a snapshot structure in the memory and allocates an identifier for
+ * it. Further interaction with the snapshot is possible by this identifier.
+ * A snapshot is created for several block devices at once.
+ * Several snapshots can be created at the same time, but with the condition
+ * that one block device can only be included in one snapshot.
+ *
+ * Return: 0 if succeeded, negative errno otherwise.
+ */
+#define IOCTL_BLKSNAP_SNAPSHOT_CREATE						\
+	_IOW(BLKSNAP, blksnap_ioctl_snapshot_create,				\
+	     struct blksnap_uuid)
+
+
+/**
+ * define IOCTL_BLKSNAP_SNAPSHOT_DESTROY - Release and destroy the snapshot.
+ *
+ * Destroys snapshot with &blksnap_snapshot_destroy.id. This leads to the
+ * deletion of all block device images of the snapshot. The difference storage
+ * is being released. But the change tracker keeps tracking.
+ *
+ * Return: 0 if succeeded, negative errno otherwise.
+ */
+#define IOCTL_BLKSNAP_SNAPSHOT_DESTROY						\
+	_IOR(BLKSNAP, blksnap_ioctl_snapshot_destroy,				\
+	     struct blksnap_uuid)
+
+/**
+ * struct blksnap_snapshot_append_storage - Argument for the
+ *	&IOCTL_BLKSNAP_SNAPSHOT_APPEND_STORAGE control.
+ *
+ * @id:
+ *	Snapshot ID.
+ * @bdev_path:
+ *	Device path string buffer.
+ * @bdev_path_size:
+ *	Device path string buffer size.
+ * @count:
+ *	Size of @ranges in the number of &struct blksnap_sectors.
+ * @ranges:
+ *	Pointer to the array of &struct blksnap_sectors.
+ */
+struct blksnap_snapshot_append_storage {
+	struct blksnap_uuid id;
+	__u64 bdev_path;
+	__u32 bdev_path_size;
+	__u32 count;
+	__u64 ranges;
+};
+
+/**
+ * define IOCTL_BLKSNAP_SNAPSHOT_APPEND_STORAGE - Append storage to the
+ *	difference storage of the snapshot.
+ *
+ * The snapshot difference storage can be set either before or after creating
+ * the snapshot images. This allows to dynamically expand the difference
+ * storage while holding the snapshot.
+ *
+ * Return: 0 if succeeded, negative errno otherwise.
+ */
+#define IOCTL_BLKSNAP_SNAPSHOT_APPEND_STORAGE					\
+	_IOW(BLKSNAP, blksnap_ioctl_snapshot_append_storage,			\
+	     struct blksnap_snapshot_append_storage)
+
+/**
+ * define IOCTL_BLKSNAP_SNAPSHOT_TAKE - Take snapshot.
+ *
+ * Creates snapshot images of block devices and switches change trackers tables.
+ * The snapshot must be created before this call, and the areas of block
+ * devices should be added to the difference storage.
+ *
+ * Return: 0 if succeeded, negative errno otherwise.
+ */
+#define IOCTL_BLKSNAP_SNAPSHOT_TAKE						\
+	_IOR(BLKSNAP, blksnap_ioctl_snapshot_take,				\
+	     struct blksnap_uuid)
+
+/**
+ * struct blksnap_snapshot_collect - Argument for the
+ *	&IOCTL_BLKSNAP_SNAPSHOT_COLLECT control.
+ *
+ * @count:
+ *	Size of &blksnap_snapshot_collect.ids in the number of 16-byte UUID.
+ * @ids:
+ *	Pointer to the array of struct blksnap_uuid for output.
+ */
+struct blksnap_snapshot_collect {
+	__u32 count;
+	__u64 ids;
+};
+
+/**
+ * define IOCTL_BLKSNAP_SNAPSHOT_COLLECT - Get collection of created snapshots.
+ *
+ * Multiple snapshots can be created at the same time. This allows for one
+ * system to create backups for different data with a independent schedules.
+ *
+ * If in &blksnap_snapshot_collect.count is less than required to store the
+ * &blksnap_snapshot_collect.ids, the array is not filled, and the ioctl
+ * returns the required count for &blksnap_snapshot_collect.ids.
+ *
+ * So, it is recommended to call the ioctl twice. The first call with an null
+ * pointer &blksnap_snapshot_collect.ids and a zero value in
+ * &blksnap_snapshot_collect.count. It will set the required array size in
+ * &blksnap_snapshot_collect.count. The second call with a pointer
+ * &blksnap_snapshot_collect.ids to an array of the required size will allow to
+ * get collection of active snapshots.
+ *
+ * Return: 0 if succeeded, -ENODATA if there is not enough space in the array
+ * to store collection of active snapshots, or negative errno otherwise.
+ */
+#define IOCTL_BLKSNAP_SNAPSHOT_COLLECT						\
+	_IOW(BLKSNAP, blksnap_ioctl_snapshot_collect,				\
+	     struct blksnap_snapshot_collect)
+
+/**
+ * enum blksnap_event_codes - Variants of event codes.
+ *
+ * @blksnap_event_code_low_free_space:
+ *	Low free space in difference storage event.
+ *	If the free space in the difference storage is reduced to the specified
+ *	limit, the module generates this event.
+ * @blksnap_event_code_corrupted:
+ *	Snapshot image is corrupted event.
+ *	If a chunk could not be allocated when trying to save data to the
+ *	difference storage, this event is generated. However, this does not mean
+ *	that the backup process was interrupted with an error. If the snapshot
+ *	image has been read to the end by this time, the backup process is
+ *	considered successful.
+ */
+enum blksnap_event_codes {
+	blksnap_event_code_low_free_space,
+	blksnap_event_code_corrupted,
+};
+
+/**
+ * struct blksnap_snapshot_event - Argument for the
+ *	&IOCTL_BLKSNAP_SNAPSHOT_WAIT_EVENT control.
+ *
+ * @id:
+ *	Snapshot ID.
+ * @timeout_ms:
+ *	Timeout for waiting in milliseconds.
+ * @code:
+ *      Code of the received event &enum blksnap_event_codes.
+ * @time_label:
+ *	Timestamp of the received event.
+ * @data:
+ *	The received event body.
+ */
+struct blksnap_snapshot_event {
+	struct blksnap_uuid id;
+	__u32 timeout_ms;
+	__u32 code;
+	__s64 time_label;
+	__u8 data[4096 - 32];
+};
+
+/**
+ * define IOCTL_BLKSNAP_SNAPSHOT_WAIT_EVENT - Wait and get the event from the
+ *	snapshot.
+ *
+ * While holding the snapshot, the kernel module can transmit information about
+ * changes in its state in the form of events to the user level.
+ * It is very important to receive these events as quickly as possible, so the
+ * user's thread is in the state of interruptable sleep.
+ *
+ * Return: 0 if succeeded, negative errno otherwise.
+ */
+#define IOCTL_BLKSNAP_SNAPSHOT_WAIT_EVENT					\
+	_IOW(BLKSNAP, blksnap_ioctl_snapshot_wait_event,			\
+	     struct blksnap_snapshot_event)
+
+/**
+ * struct blksnap_event_low_free_space - Data for the
+ *	&blksnap_event_code_low_free_space event.
+ *
+ * @requested_nr_sect:
+ *	The required number of sectors.
+ */
+struct blksnap_event_low_free_space {
+	__u64 requested_nr_sect;
+};
+
+/**
+ * struct blksnap_event_corrupted - Data for the
+ *	&blksnap_event_code_corrupted event.
+ *
+ * @dev_id_mj:
+ *	Major part of original device ID.
+ * @dev_id_mn:
+ *	Minor part of original device ID.
+ * @err_code:
+ *	Error code.
+ */
+struct blksnap_event_corrupted {
+	__u32 dev_id_mj;
+	__u32 dev_id_mn;
+	__s32 err_code;
+};
+
+#endif /* _UAPI_LINUX_BLKSNAP_H */