[v5,00/11] blksnap - block devices snapshots module

Message ID 20230612135228.10702-1-sergei.shtepa@veeam.com
Headers
Series blksnap - block devices snapshots module |

Message

Sergei Shtepa June 12, 2023, 1:52 p.m. UTC
  Hi all.

I am happy to offer a improved version of the Block Devices Snapshots
Module. It allows to create non-persistent snapshots of any block devices.
The main purpose of such snapshots is to provide backups of block devices.
See more in Documentation/block/blksnap.rst.

The Block Device Filtering Mechanism is added to the block layer. This
allows to attach and detach block device filters to the block layer.
Filters allow to extend the functionality of the block layer.
See more in Documentation/block/blkfilter.rst.

The tool, library and tests for working with blksnap can be found on github.
Link: https://github.com/veeam/blksnap/tree/stable-v2.0

There are few changes in this patch version. The experience of using the
out-of-tree version of the blksnap module on real servers was taken into
account.

v5 changes:
- Rebase for "kernel/git/axboe/linux-block.git" branch "for-6.5/block".
  Link: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/log/?h=for-6.5/block

v4 changes:
- Structures for describing the state of chunks are allocated dynamically.
  This reduces memory consumption, since the struct chunk is allocated only
  for those blocks for which the snapshot image state differs from the
  original block device.
- The algorithm for calculating the chunk size depending on the size of the
  block device has been changed. For large block devices, it is now
  possible to allocate a larger number of chunks, and their size is smaller.
- For block devices, a 'filter' file has been added to /sys/block/<device>.
  It displays the name of the filter that is attached to the block device.
- Fixed a problem with the lack of protection against re-adding a block
  device to a snapshot.
- Fixed a bug in the algorithm of allocating the next bio for a chunk.
  This problem was accurred on large disks, for which a chunk consists of
  at least two bio.
- The ownership mechanism of the diff_area structure has been changed.
  This fixed the error of prematurely releasing the diff_area structure
  when destroying the snapshot.
- Documentation corrected.
- The Sparse analyzer is passed.
- Use __u64 type instead pointers in UAPI.

v3 changes:
- New block device I/O controls BLKFILTER_ATTACH and BLKFILTER_DETACH allow
  to attach and detach filters.
- New block device I/O control BLKFILTER_CTL allow send command to attached
  block device filter.
- The copy-on-write algorithm for processing I/O units has been optimized
  and has become asynchronous.
- The snapshot image reading algorithm has been optimized and has become
  asynchronous.
- Optimized the finite state machine for processing chunks.
- Fixed a tracking block size calculation bug.

v2 changes:
- Added documentation for Block Device Filtering Mechanism.
- Added documentation for Block Devices Snapshots Module (blksnap).
- The MAINTAINERS file has been updated.
- Optimized queue code for snapshot images.
- Fixed comments, log messages and code for better readability.

v1 changes:
- Forgotten "static" declarations have been added.
- The text of the comments has been corrected.
- It is possible to connect only one filter, since there are no others in
  upstream.
- Do not have additional locks for attach/detach filter.
- blksnap.h moved to include/uapi/.
- #pragma once and commented code removed.
- uuid_t removed from user API.
- Removed default values for module parameters from the configuration file.
- The debugging code for tracking memory leaks has been removed.
- Simplified Makefile.
- Optimized work with large memory buffers, CBT tables are now in virtual
  memory.
- The allocation code of minor numbers has been optimized.
- The implementation of the snapshot image block device has been
  simplified, now it is a bio-based block device.
- Removed initialization of global variables with null values.
- only one bio is used to copy one chunk.
- Checked on ppc64le.

Thanks for preparing v4 patch:
- Christoph Hellwig <hch@infradead.org> for his significant contribution
  to the project.
- Fabio Fantoni <fantonifabio@tiscali.it> for his participation in the
  project, useful advice and faith in the success of the project.
- Donald Buczek <buczek@molgen.mpg.de> for researching the module and
  user-space tool. His fresh look revealed a number of flaw.
- Bagas Sanjaya <bagasdotme@gmail.com> for comments on the documentation.

Sergei Shtepa (11):
  documentation: Block Device Filtering Mechanism
  block: Block Device Filtering Mechanism
  documentation: Block Devices Snapshots Module
  blksnap: header file of the module interface
  blksnap: module management interface functions
  blksnap: handling and tracking I/O units
  blksnap: minimum data storage unit of the original block device
  blksnap: difference storage
  blksnap: event queue from the difference storage
  blksnap: snapshot and snapshot image block device
  blksnap: Kconfig and Makefile

 Documentation/block/blkfilter.rst    |  64 ++++
 Documentation/block/blksnap.rst      | 345 +++++++++++++++++
 Documentation/block/index.rst        |   2 +
 MAINTAINERS                          |  17 +
 block/Makefile                       |   3 +-
 block/bdev.c                         |   1 +
 block/blk-core.c                     |  27 ++
 block/blk-filter.c                   | 213 ++++++++++
 block/blk.h                          |  11 +
 block/genhd.c                        |  10 +
 block/ioctl.c                        |   7 +
 block/partitions/core.c              |  10 +
 drivers/block/Kconfig                |   2 +
 drivers/block/Makefile               |   2 +
 drivers/block/blksnap/Kconfig        |  12 +
 drivers/block/blksnap/Makefile       |  15 +
 drivers/block/blksnap/cbt_map.c      | 227 +++++++++++
 drivers/block/blksnap/cbt_map.h      |  90 +++++
 drivers/block/blksnap/chunk.c        | 454 ++++++++++++++++++++++
 drivers/block/blksnap/chunk.h        | 114 ++++++
 drivers/block/blksnap/diff_area.c    | 554 +++++++++++++++++++++++++++
 drivers/block/blksnap/diff_area.h    | 144 +++++++
 drivers/block/blksnap/diff_buffer.c  | 127 ++++++
 drivers/block/blksnap/diff_buffer.h  |  37 ++
 drivers/block/blksnap/diff_storage.c | 316 +++++++++++++++
 drivers/block/blksnap/diff_storage.h | 111 ++++++
 drivers/block/blksnap/event_queue.c  |  87 +++++
 drivers/block/blksnap/event_queue.h  |  65 ++++
 drivers/block/blksnap/main.c         | 483 +++++++++++++++++++++++
 drivers/block/blksnap/params.h       |  16 +
 drivers/block/blksnap/snapimage.c    | 124 ++++++
 drivers/block/blksnap/snapimage.h    |  10 +
 drivers/block/blksnap/snapshot.c     | 443 +++++++++++++++++++++
 drivers/block/blksnap/snapshot.h     |  68 ++++
 drivers/block/blksnap/tracker.c      | 339 ++++++++++++++++
 drivers/block/blksnap/tracker.h      |  75 ++++
 include/linux/blk-filter.h           |  51 +++
 include/linux/blk_types.h            |   2 +
 include/linux/blkdev.h               |   1 +
 include/uapi/linux/blk-filter.h      |  35 ++
 include/uapi/linux/blksnap.h         | 421 ++++++++++++++++++++
 include/uapi/linux/fs.h              |   3 +
 42 files changed, 5137 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/block/blkfilter.rst
 create mode 100644 Documentation/block/blksnap.rst
 create mode 100644 block/blk-filter.c
 create mode 100644 drivers/block/blksnap/Kconfig
 create mode 100644 drivers/block/blksnap/Makefile
 create mode 100644 drivers/block/blksnap/cbt_map.c
 create mode 100644 drivers/block/blksnap/cbt_map.h
 create mode 100644 drivers/block/blksnap/chunk.c
 create mode 100644 drivers/block/blksnap/chunk.h
 create mode 100644 drivers/block/blksnap/diff_area.c
 create mode 100644 drivers/block/blksnap/diff_area.h
 create mode 100644 drivers/block/blksnap/diff_buffer.c
 create mode 100644 drivers/block/blksnap/diff_buffer.h
 create mode 100644 drivers/block/blksnap/diff_storage.c
 create mode 100644 drivers/block/blksnap/diff_storage.h
 create mode 100644 drivers/block/blksnap/event_queue.c
 create mode 100644 drivers/block/blksnap/event_queue.h
 create mode 100644 drivers/block/blksnap/main.c
 create mode 100644 drivers/block/blksnap/params.h
 create mode 100644 drivers/block/blksnap/snapimage.c
 create mode 100644 drivers/block/blksnap/snapimage.h
 create mode 100644 drivers/block/blksnap/snapshot.c
 create mode 100644 drivers/block/blksnap/snapshot.h
 create mode 100644 drivers/block/blksnap/tracker.c
 create mode 100644 drivers/block/blksnap/tracker.h
 create mode 100644 include/linux/blk-filter.h
 create mode 100644 include/uapi/linux/blk-filter.h
 create mode 100644 include/uapi/linux/blksnap.h
  

Comments

Christoph Hellwig June 12, 2023, 2:32 p.m. UTC | #1
I'm of course a little byassed by having spent a lot of my own time
on this, but this version now looks ready to merge to me:

Acked-by: Christoph Hellwig <hch@lst.de>

But as Jens just merged my series to reopen the open flag we'll also
need to fold this in:

diff --git a/drivers/block/blksnap/diff_area.c b/drivers/block/blksnap/diff_area.c
index 169fa003b6d66d..0848c947591508 100644
--- a/drivers/block/blksnap/diff_area.c
+++ b/drivers/block/blksnap/diff_area.c
@@ -128,7 +128,7 @@ void diff_area_free(struct kref *kref)
 	xa_destroy(&diff_area->chunk_map);
 
 	if (diff_area->orig_bdev) {
-		blkdev_put(diff_area->orig_bdev, FMODE_READ | FMODE_WRITE);
+		blkdev_put(diff_area->orig_bdev, NULL);
 		diff_area->orig_bdev = NULL;
 	}
 
@@ -214,7 +214,8 @@ struct diff_area *diff_area_new(dev_t dev_id, struct diff_storage *diff_storage)
 
 	pr_debug("Open device [%u:%u]\n", MAJOR(dev_id), MINOR(dev_id));
 
-	bdev = blkdev_get_by_dev(dev_id, FMODE_READ | FMODE_WRITE, NULL, NULL);
+	bdev = blkdev_get_by_dev(dev_id, BLK_OPEN_READ | BLK_OPEN_WRITE, NULL,
+				 NULL);
 	if (IS_ERR(bdev)) {
 		int err = PTR_ERR(bdev);
 
@@ -224,7 +225,7 @@ struct diff_area *diff_area_new(dev_t dev_id, struct diff_storage *diff_storage)
 
 	diff_area = kzalloc(sizeof(struct diff_area), GFP_KERNEL);
 	if (!diff_area) {
-		blkdev_put(bdev, FMODE_READ | FMODE_WRITE);
+		blkdev_put(bdev, NULL);
 		return ERR_PTR(-ENOMEM);
 	}
 
diff --git a/drivers/block/blksnap/diff_storage.c b/drivers/block/blksnap/diff_storage.c
index 1787fa6931a816..f3814474b9804a 100644
--- a/drivers/block/blksnap/diff_storage.c
+++ b/drivers/block/blksnap/diff_storage.c
@@ -123,7 +123,7 @@ void diff_storage_free(struct kref *kref)
 	}
 
 	while ((storage_bdev = first_storage_bdev(diff_storage))) {
-		blkdev_put(storage_bdev->bdev, FMODE_READ | FMODE_WRITE);
+		blkdev_put(storage_bdev->bdev, NULL);
 		list_del(&storage_bdev->link);
 		kfree(storage_bdev);
 	}
@@ -138,7 +138,7 @@ static struct block_device *diff_storage_add_storage_bdev(
 	struct storage_bdev *storage_bdev, *existing_bdev = NULL;
 	struct block_device *bdev;
 
-	bdev = blkdev_get_by_path(bdev_path, FMODE_READ | FMODE_WRITE,
+	bdev = blkdev_get_by_path(bdev_path, BLK_OPEN_READ | BLK_OPEN_WRITE,
 				  NULL, NULL);
 	if (IS_ERR(bdev)) {
 		pr_err("Failed to open device. errno=%ld\n", PTR_ERR(bdev));
@@ -153,14 +153,14 @@ static struct block_device *diff_storage_add_storage_bdev(
 	spin_unlock(&diff_storage->lock);
 
 	if (existing_bdev->bdev == bdev) {
-		blkdev_put(bdev, FMODE_READ | FMODE_WRITE);
+		blkdev_put(bdev, NULL);
 		return existing_bdev->bdev;
 	}
 
 	storage_bdev = kzalloc(sizeof(struct storage_bdev) +
 			       strlen(bdev_path) + 1, GFP_KERNEL);
 	if (!storage_bdev) {
-		blkdev_put(bdev, FMODE_READ | FMODE_WRITE);
+		blkdev_put(bdev, NULL);
 		return ERR_PTR(-ENOMEM);
 	}
  
Eric Biggers June 12, 2023, 4:19 p.m. UTC | #2
On Mon, Jun 12, 2023 at 03:52:17PM +0200, Sergei Shtepa wrote:
> Hi all.
> 
> I am happy to offer a improved version of the Block Devices Snapshots
> Module. It allows to create non-persistent snapshots of any block devices.
> The main purpose of such snapshots is to provide backups of block devices.
> See more in Documentation/block/blksnap.rst.

How does blksnap interact with blk-crypto?

I.e., what happens if a bio with a ->bi_crypt_context set is submitted to a
block device that has blksnap active?

If you are unfamiliar with blk-crypto, please read
Documentation/block/inline-encryption.rst

It looks like blksnap hooks into the block layer directly, via the new
"blkfilter" mechanism.  I'm concerned that it might ignore ->bi_crypt_context
and write data to the disk in plaintext, when it is supposed to be encrypted.

- Eric
  
Christoph Hellwig June 13, 2023, 5:50 a.m. UTC | #3
On Mon, Jun 12, 2023 at 09:19:11AM -0700, Eric Biggers wrote:
> On Mon, Jun 12, 2023 at 03:52:17PM +0200, Sergei Shtepa wrote:
> > Hi all.
> > 
> > I am happy to offer a improved version of the Block Devices Snapshots
> > Module. It allows to create non-persistent snapshots of any block devices.
> > The main purpose of such snapshots is to provide backups of block devices.
> > See more in Documentation/block/blksnap.rst.
> 
> How does blksnap interact with blk-crypto?
> 
> I.e., what happens if a bio with a ->bi_crypt_context set is submitted to a
> block device that has blksnap active?
> 
> If you are unfamiliar with blk-crypto, please read
> Documentation/block/inline-encryption.rst
> 
> It looks like blksnap hooks into the block layer directly, via the new
> "blkfilter" mechanism.  I'm concerned that it might ignore ->bi_crypt_context
> and write data to the disk in plaintext, when it is supposed to be encrypted.

Yeah.  Same for integrity.  I guess for now the best would be to
not allow attaching a filter to block devices that have encryption or
integrity enabled and then look into that as a separate project fully
reviewed by the respective experts.
  
Sergei Shtepa June 13, 2023, 10:12 a.m. UTC | #4
On 6/12/23 18:19, Eric Biggers wrote:
> This is the first time you've received an email from this sender 
> ebiggers@kernel.org, please exercise caution when clicking on links or opening 
> attachments.
> 
> 
> On Mon, Jun 12, 2023 at 03:52:17PM +0200, Sergei Shtepa wrote:
>  > Hi all.
>  >
>  > I am happy to offer a improved version of the Block Devices Snapshots
>  > Module. It allows to create non-persistent snapshots of any block devices.
>  > The main purpose of such snapshots is to provide backups of block devices.
>  > See more in Documentation/block/blksnap.rst.
> 
> How does blksnap interact with blk-crypto?
> 
> I.e., what happens if a bio with a ->bi_crypt_context set is submitted to a
> block device that has blksnap active?
> 
> If you are unfamiliar with blk-crypto, please read
> Documentation/block/inline-encryption.rst

Thank you, this is an important point. Yes, that's right.
The current version of blksnap can cause blk-crypto to malfunction while
holding a snapshot. When handling bios from the file system, the
->bi_crypt_context is preserved. But the bio requests serving the snapshot
are executed without context. I think that the snapshot will be unreadable.

But I don't see any obstacles in the way of blksnap and blk-crypto
compatibility. If DM implements support for blk-crypto, then the same
principle can be applied for blksnap. I think that the integration of
blksnap with blk-crypto may be one of the stages of further development.

The dm-crypto should work properly. 

It is noteworthy that in 7 years of using the out-of-tree module to take
a snapshot, I have not encountered cases of such problems.
But incompatibility with blk-crypto is possible, this is already a pain
for some users. I will request this information from our support team.

> 
> It looks like blksnap hooks into the block layer directly, via the new
> "blkfilter" mechanism. I'm concerned that it might ignore ->bi_crypt_context
> and write data to the disk in plaintext, when it is supposed to be encrypted.

No. The "blkfilter" mechanism should not affect the operation of blk-crypto.
It does not change the bio.
Only a module that has been attached and provides its own filtering algorithm,
such as blksnap, can violate the logic of blk-crypto.
Therefore, until the blksnap module is loaded, blk-crypto should work as before.
  
Eric Biggers June 14, 2023, 5:22 p.m. UTC | #5
On Tue, Jun 13, 2023 at 12:12:19PM +0200, Sergei Shtepa wrote:
> On 6/12/23 18:19, Eric Biggers wrote:
> > This is the first time you've received an email from this sender 
> > ebiggers@kernel.org, please exercise caution when clicking on links or opening 
> > attachments.
> > 
> > 
> > On Mon, Jun 12, 2023 at 03:52:17PM +0200, Sergei Shtepa wrote:
> >  > Hi all.
> >  >
> >  > I am happy to offer a improved version of the Block Devices Snapshots
> >  > Module. It allows to create non-persistent snapshots of any block devices.
> >  > The main purpose of such snapshots is to provide backups of block devices.
> >  > See more in Documentation/block/blksnap.rst.
> > 
> > How does blksnap interact with blk-crypto?
> > 
> > I.e., what happens if a bio with a ->bi_crypt_context set is submitted to a
> > block device that has blksnap active?
> > 
> > If you are unfamiliar with blk-crypto, please read
> > Documentation/block/inline-encryption.rst
> 
> Thank you, this is an important point. Yes, that's right.
> The current version of blksnap can cause blk-crypto to malfunction while
> holding a snapshot. When handling bios from the file system, the
> ->bi_crypt_context is preserved. But the bio requests serving the snapshot
> are executed without context. I think that the snapshot will be unreadable.

Well not only would the resulting snapshot be unreadable, but plaintext data
would be written to disk, contrary to the intent of the submitter of the bios.
That would be a security vulnerability.

If the initial version of blksnap isn't going to be compatible with blk-crypto,
that is tolerable for now, but there needs to be an explicit check to cause an
error to be returned if the two features are combined, before anything is
written to disk.

- Eric