[RFQ] dm-integrity: Add a lazy commit mode for journal

Message ID 20240209192542.449367-1-simone.weiss@elektrobit.com
State New
Headers
Series [RFQ] dm-integrity: Add a lazy commit mode for journal |

Commit Message

Weiß, Simone Feb. 9, 2024, 7:25 p.m. UTC
  Extend the dm-integrity driver to omit writing unused journal data sectors.
Instead of filling up the whole journal section, mark the last used
sector with a special commit ID. The commit ID still uses the same base value,
but section number and sector number are inverted. At replay when commit IDs
are analyzed this special commit ID is detected as end of valid data for this
section. The main goal is to prolong the live times of e.g. eMMCs by avoiding
to write the whole journal data sectors.

The change is right now to be seen as experimental and gets applied if
CONFIG_DMINT_LAZY_COMMIT is set to y. Note please that this is NOT
planned for a final version of the changes. I would make it configurable
via flags passed e.g. via dmsetup and stored in the superblock.

Architectural Limitations:
- A dm-integrity partition, that was previously used with lazy commit,
 can't be replayed with a dm-integrity driver not using lazy commit.
- A dm-integrity driver that uses lazy commit is expected
 to be able to cope with a partition that was created and used without
 lazy commit.
- With dm-integrity lazy commit, a partially written journal (e.g. due to a
 power cut) can cause a tag mismatch during replay if the journal entry marking
 the end of the journal section is missing. Due to lazy commit, older journal
 entries are not erased and might be processed if they have the same commit ID
 as adjacent newer journal entries. If dm-integrity detects bad sections while
 replaying the journal, keep track about those sections and try to at least
 replay older, good sections.
 This is based on the assumption that most likely the newest
 section(s) will be damaged, which might have been only partially written
 due to a sudden reset. Previously, the whole journal would be cleared in
 such a case.

Signed-off-by: Simone Weiß <simone.weiss@elektrobit.com>
Signed-off-by: Kai Tomerius <kai.tomerius@elektrobit.com>

---
This is just a very initial version. Bear that in mind please. I would like to
get feedback about the general idea and am aware that further work is needed.

Tests done so far:
- Tests where executed on qemu.
- Test scripts can be found under:
  git@github.com:simone-weiss/dm-integrity-lazy-commit.git
- Suggestions on how to test this further, what testscases to run this against
  are appreciated.

Further work:
- The superblock should carry information about lazy-commit. Should the
  version be increased for this?
- Add handling/logging if a partition that was created with lazy commits,
  but gets replayed with a "normal" journal mode.
- Allow configuration if you want to use lazy commits or normal commits in the
  journal if lazy commits are enabled
- userspace setup tooling like dmsetup should be adapted accordingly

 drivers/md/Kconfig        |  10 ++
 drivers/md/dm-integrity.c | 250 ++++++++++++++++++++++++++++++++------
 2 files changed, 222 insertions(+), 38 deletions(-)
  

Comments

Mikulas Patocka Feb. 20, 2024, 6:52 p.m. UTC | #1
On Fri, 9 Feb 2024, Simone Weiß wrote:

> Extend the dm-integrity driver to omit writing unused journal data sectors.
> Instead of filling up the whole journal section, mark the last used
> sector with a special commit ID. The commit ID still uses the same base value,
> but section number and sector number are inverted. At replay when commit IDs
> are analyzed this special commit ID is detected as end of valid data for this
> section. The main goal is to prolong the live times of e.g. eMMCs by avoiding
> to write the whole journal data sectors.
> 
> The change is right now to be seen as experimental and gets applied if
> CONFIG_DMINT_LAZY_COMMIT is set to y. Note please that this is NOT
> planned for a final version of the changes. I would make it configurable
> via flags passed e.g. via dmsetup and stored in the superblock.
> 
> Architectural Limitations:
> - A dm-integrity partition, that was previously used with lazy commit,
>  can't be replayed with a dm-integrity driver not using lazy commit.
> - A dm-integrity driver that uses lazy commit is expected
>  to be able to cope with a partition that was created and used without
>  lazy commit.
> - With dm-integrity lazy commit, a partially written journal (e.g. due to a
>  power cut) can cause a tag mismatch during replay if the journal entry marking
>  the end of the journal section is missing. Due to lazy commit, older journal
>  entries are not erased and might be processed if they have the same commit ID
>  as adjacent newer journal entries.

Hi

I was thinking about it and I think that this problem is a showstopper.

Suppose that a journal section contains these commit IDs:

	2	2	2	2(EOF)	3	3	3	3

The IDs "3" are left over from previous iterations. The IDs "2" contain 
the current data. And now, the journal rolls over and we attempt to write 
all 8 pages with the ID "3". However, a power failure happens and we only 
write 4 pages with the ID "3". So, the journal will look like:

	3(new)	3(new)	3(new)	3(new)	3(old)	3(old)	3(old)	3(old)

After a reboot, the journal-replay logic will falsely believe that the 
whole journal section is consistent and it will attempt to replay it.

This could be fixed by having always increasing commit IDs - the commit 
IDs have 8 bytes, so we can assume that they never roll-over and it would 
prevent us from mixing old IDs into the current transaction.

Mikulas

>  If dm-integrity detects bad sections while
>  replaying the journal, keep track about those sections and try to at least
>  replay older, good sections.
>  This is based on the assumption that most likely the newest
>  section(s) will be damaged, which might have been only partially written
>  due to a sudden reset. Previously, the whole journal would be cleared in
>  such a case.
> 
> Signed-off-by: Simone Weiß <simone.weiss@elektrobit.com>
> Signed-off-by: Kai Tomerius <kai.tomerius@elektrobit.com>
  
Weiß, Simone Feb. 23, 2024, 5:53 p.m. UTC | #2
On Tue, 2024-02-20 at 19:52 +0100, Mikulas Patocka wrote:
> CAUTION: This email originated from outside of the Elektrobit organization. Do
> not click links or open attachments unless you recognize the sender and know
> the content is safe.
> 
> 
> On Fri, 9 Feb 2024, Simone Weiß wrote:
> 
> > Extend the dm-integrity driver to omit writing unused journal data sectors.
> > Instead of filling up the whole journal section, mark the last used
> > sector with a special commit ID. The commit ID still uses the same base
> > value,
> > but section number and sector number are inverted. At replay when commit IDs
> > are analyzed this special commit ID is detected as end of valid data for
> > this
> > section. The main goal is to prolong the live times of e.g. eMMCs by
> > avoiding
> > to write the whole journal data sectors.
> > 
> > The change is right now to be seen as experimental and gets applied if
> > CONFIG_DMINT_LAZY_COMMIT is set to y. Note please that this is NOT
> > planned for a final version of the changes. I would make it configurable
> > via flags passed e.g. via dmsetup and stored in the superblock.
> > 
> > Architectural Limitations:
> > - A dm-integrity partition, that was previously used with lazy commit,
> >  can't be replayed with a dm-integrity driver not using lazy commit.
> > - A dm-integrity driver that uses lazy commit is expected
> >  to be able to cope with a partition that was created and used without
> >  lazy commit.
> > - With dm-integrity lazy commit, a partially written journal (e.g. due to a
> >  power cut) can cause a tag mismatch during replay if the journal entry
> > marking
> >  the end of the journal section is missing. Due to lazy commit, older
> > journal
> >  entries are not erased and might be processed if they have the same commit
> > ID
> >  as adjacent newer journal entries.
> 
> Hi
> 
> I was thinking about it and I think that this problem is a showstopper.
> 
> Suppose that a journal section contains these commit IDs:
> 
>         2       2       2       2(EOF)  3       3       3       3
> 
> The IDs "3" are left over from previous iterations. The IDs "2" contain
> the current data. And now, the journal rolls over and we attempt to write
> all 8 pages with the ID "3". However, a power failure happens and we only
> write 4 pages with the ID "3". So, the journal will look like:
> 
>         3(new)  3(new)  3(new)  3(new)  3(old)  3(old)  3(old)  3(old)
> 
> After a reboot, the journal-replay logic will falsely believe that the
> whole journal section is consistent and it will attempt to replay it.
> 
> This could be fixed by having always increasing commit IDs - the commit
> IDs have 8 bytes, so we can assume that they never roll-over and it would
> prevent us from mixing old IDs into the current transaction.
Hi

Thanks for the review of the concept. I was out this week and could only think
about it now. I understood it right, that the proposal is to add an extra value
to the commit ID, that is e.g. incremented when integrity_commit is executed?

If so, I tried this quickly and looks good on first glance. Will check and test
further next.

Simone
> 
> Mikulas
> 
> >  If dm-integrity detects bad sections while
> >  replaying the journal, keep track about those sections and try to at least
> >  replay older, good sections.
> >  This is based on the assumption that most likely the newest
> >  section(s) will be damaged, which might have been only partially written
> >  due to a sudden reset. Previously, the whole journal would be cleared in
> >  such a case.
> > 
> > Signed-off-by: Simone Weiß <simone.weiss@elektrobit.com>
> > Signed-off-by: Kai Tomerius <kai.tomerius@elektrobit.com>
  
Mikulas Patocka Feb. 23, 2024, 8:33 p.m. UTC | #3
On Fri, 23 Feb 2024, Weiß, Simone wrote:

> On Tue, 2024-02-20 at 19:52 +0100, Mikulas Patocka wrote:
> > CAUTION: This email originated from outside of the Elektrobit organization. Do
> > not click links or open attachments unless you recognize the sender and know
> > the content is safe.
> > 
> > 
> > On Fri, 9 Feb 2024, Simone Weiß wrote:
> > 
> > > Extend the dm-integrity driver to omit writing unused journal data sectors.
> > > Instead of filling up the whole journal section, mark the last used
> > > sector with a special commit ID. The commit ID still uses the same base
> > > value,
> > > but section number and sector number are inverted. At replay when commit IDs
> > > are analyzed this special commit ID is detected as end of valid data for
> > > this
> > > section. The main goal is to prolong the live times of e.g. eMMCs by
> > > avoiding
> > > to write the whole journal data sectors.
> > > 
> > > The change is right now to be seen as experimental and gets applied if
> > > CONFIG_DMINT_LAZY_COMMIT is set to y. Note please that this is NOT
> > > planned for a final version of the changes. I would make it configurable
> > > via flags passed e.g. via dmsetup and stored in the superblock.
> > > 
> > > Architectural Limitations:
> > > - A dm-integrity partition, that was previously used with lazy commit,
> > >  can't be replayed with a dm-integrity driver not using lazy commit.
> > > - A dm-integrity driver that uses lazy commit is expected
> > >  to be able to cope with a partition that was created and used without
> > >  lazy commit.
> > > - With dm-integrity lazy commit, a partially written journal (e.g. due to a
> > >  power cut) can cause a tag mismatch during replay if the journal entry
> > > marking
> > >  the end of the journal section is missing. Due to lazy commit, older
> > > journal
> > >  entries are not erased and might be processed if they have the same commit
> > > ID
> > >  as adjacent newer journal entries.
> > 
> > Hi
> > 
> > I was thinking about it and I think that this problem is a showstopper.
> > 
> > Suppose that a journal section contains these commit IDs:
> > 
> >         2       2       2       2(EOF)  3       3       3       3
> > 
> > The IDs "3" are left over from previous iterations. The IDs "2" contain
> > the current data. And now, the journal rolls over and we attempt to write
> > all 8 pages with the ID "3". However, a power failure happens and we only
> > write 4 pages with the ID "3". So, the journal will look like:
> > 
> >         3(new)  3(new)  3(new)  3(new)  3(old)  3(old)  3(old)  3(old)
> > 
> > After a reboot, the journal-replay logic will falsely believe that the
> > whole journal section is consistent and it will attempt to replay it.
> > 
> > This could be fixed by having always increasing commit IDs - the commit
> > IDs have 8 bytes, so we can assume that they never roll-over and it would
> > prevent us from mixing old IDs into the current transaction.
> Hi
> 
> Thanks for the review of the concept. I was out this week and could only think
> about it now. I understood it right, that the proposal is to add an extra value
> to the commit ID, that is e.g. incremented when integrity_commit is executed?
> 
> If so, I tried this quickly and looks good on first glance. Will check and test
> further next.
> 
> Simone

I propose to use the commit ID 0 when writing the journal for the first 
time, then 1 when the journal rolls over, 2 when it rolls over again, 3 
when it rolls over again, 4 on another roll over and so on up to 
0x7fffffffffffffff (which will be never reached in practice).

And use the top bit as an end-of-section marker. As the commit IDs will 
never roll over, it won't happen that an old transaction would be mixed 
into a new transaction on partial journal write.

Mikulas
  
Weiß, Simone Feb. 26, 2024, 6:47 a.m. UTC | #4
On Fri, 2024-02-23 at 21:33 +0100, Mikulas Patocka wrote:
> CAUTION: This email originated from outside of the Elektrobit organization. Do
> not click links or open attachments unless you recognize the sender and know
> the content is safe.
> 
> 
> On Fri, 23 Feb 2024, Weiß, Simone wrote:
> 
> > On Tue, 2024-02-20 at 19:52 +0100, Mikulas Patocka wrote:
> > > CAUTION: This email originated from outside of the Elektrobit
> > > organization. Do
> > > not click links or open attachments unless you recognize the sender and
> > > know
> > > the content is safe.
> > > 
> > > 
> > > On Fri, 9 Feb 2024, Simone Weiß wrote:
> > > 
> > > > Extend the dm-integrity driver to omit writing unused journal data
> > > > sectors.
> > > > Instead of filling up the whole journal section, mark the last used
> > > > sector with a special commit ID. The commit ID still uses the same base
> > > > value,
> > > > but section number and sector number are inverted. At replay when commit
> > > > IDs
> > > > are analyzed this special commit ID is detected as end of valid data for
> > > > this
> > > > section. The main goal is to prolong the live times of e.g. eMMCs by
> > > > avoiding
> > > > to write the whole journal data sectors.
> > > > 
> > > > The change is right now to be seen as experimental and gets applied if
> > > > CONFIG_DMINT_LAZY_COMMIT is set to y. Note please that this is NOT
> > > > planned for a final version of the changes. I would make it configurable
> > > > via flags passed e.g. via dmsetup and stored in the superblock.
> > > > 
> > > > Architectural Limitations:
> > > > - A dm-integrity partition, that was previously used with lazy commit,
> > > >  can't be replayed with a dm-integrity driver not using lazy commit.
> > > > - A dm-integrity driver that uses lazy commit is expected
> > > >  to be able to cope with a partition that was created and used without
> > > >  lazy commit.
> > > > - With dm-integrity lazy commit, a partially written journal (e.g. due
> > > > to a
> > > >  power cut) can cause a tag mismatch during replay if the journal entry
> > > > marking
> > > >  the end of the journal section is missing. Due to lazy commit, older
> > > > journal
> > > >  entries are not erased and might be processed if they have the same
> > > > commit
> > > > ID
> > > >  as adjacent newer journal entries.
> > > 
> > > Hi
> > > 
> > > I was thinking about it and I think that this problem is a showstopper.
> > > 
> > > Suppose that a journal section contains these commit IDs:
> > > 
> > >         2       2       2       2(EOF)  3       3       3       3
> > > 
> > > The IDs "3" are left over from previous iterations. The IDs "2" contain
> > > the current data. And now, the journal rolls over and we attempt to write
> > > all 8 pages with the ID "3". However, a power failure happens and we only
> > > write 4 pages with the ID "3". So, the journal will look like:
> > > 
> > >         3(new)  3(new)  3(new)  3(new)  3(old)  3(old)  3(old)  3(old)
> > > 
> > > After a reboot, the journal-replay logic will falsely believe that the
> > > whole journal section is consistent and it will attempt to replay it.
> > > 
> > > This could be fixed by having always increasing commit IDs - the commit
> > > IDs have 8 bytes, so we can assume that they never roll-over and it would
> > > prevent us from mixing old IDs into the current transaction.
> > Hi
> > 
> > Thanks for the review of the concept. I was out this week and could only
> > think
> > about it now. I understood it right, that the proposal is to add an extra
> > value
> > to the commit ID, that is e.g. incremented when integrity_commit is
> > executed?
> > 
> > If so, I tried this quickly and looks good on first glance. Will check and
> > test
> > further next.
> > 
> > Simone
> 
> I propose to use the commit ID 0 when writing the journal for the first
> time, then 1 when the journal rolls over, 2 when it rolls over again, 3
> when it rolls over again, 4 on another roll over and so on up to
> 0x7fffffffffffffff (which will be never reached in practice).
> 
> And use the top bit as an end-of-section marker. As the commit IDs will
> never roll over, it won't happen that an old transaction would be mixed
> into a new transaction on partial journal write.
> 
> Mikulas
Hi,

I can do it this way for sure as well. Another point still in my mind is the
superblock: I would like to get rid of the build time switch and carry
information about lazy commits enabled in the superblock. As there is J, B, D
and R as mode already, a new mode L or such could be added. I will work on this
and also take a look at stuff like dmsetup to check if something would be needed
there. If there are further points for now on anyone's mind, please tell.

Best,
Simone
  
Milan Broz Feb. 26, 2024, 8:48 a.m. UTC | #5
On 2/26/24 7:47 AM, Weiß, Simone wrote:
..
> I can do it this way for sure as well. Another point still in my mind is the
> superblock: I would like to get rid of the build time switch and carry
> information about lazy commits enabled in the superblock. As there is J, B, D
> and R as mode already, a new mode L or such could be added. I will work on this
> and also take a look at stuff like dmsetup to check if something would be needed
> there. If there are further points for now on anyone's mind, please tell.

Just FYI: I do not think you need to add anything to dmsetup, but integritysetup
(part of the cryptsetup project) needs to understand new metadata and dm-integrity
table options.

And I guess it needs to add a new option to use the new mode.

Perhaps it is best to create an issue for cryptsetup to discuss it, but it will not
be merged until the kernel code is on the way to mainline.

Milan
  

Patch

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index 68ce56fc61d0..d28a65dd54ad 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -604,6 +604,16 @@  config DM_INTEGRITY
 	  To compile this code as a module, choose M here: the module will
 	  be called dm-integrity.
 
+config DMINT_LAZY_COMMIT
+	tristate "Lazy commit for dm-integrity target"
+	depends on DM_INTEGRITY
+	default n
+	help
+	  Extend the dm-integrity driver to omit writing unused journal data.
+	  Instead use a special lazy commit id that marks the end of the data
+	  in the journal.
+	  To be seen as experimental.
+
 config DM_ZONED
 	tristate "Drive-managed zoned block device target support"
 	depends on BLK_DEV_DM
diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c
index ed45411eb68d..d521b5d4d2d5 100644
--- a/drivers/md/dm-integrity.c
+++ b/drivers/md/dm-integrity.c
@@ -1083,18 +1083,19 @@  static void rw_journal_sectors(struct dm_integrity_c *ic, blk_opf_t opf,
 }
 
 static void rw_journal(struct dm_integrity_c *ic, blk_opf_t opf,
-		       unsigned int section, unsigned int n_sections,
-		       struct journal_completion *comp)
+		      unsigned int section, unsigned int n_sections,
+		      unsigned int omit_sectors, struct journal_completion *comp)
 {
 	unsigned int sector, n_sectors;
 
 	sector = section * ic->journal_section_sectors;
-	n_sectors = n_sections * ic->journal_section_sectors;
+	n_sectors = n_sections * ic->journal_section_sectors - omit_sectors;
 
 	rw_journal_sectors(ic, opf, sector, n_sectors, comp);
 }
 
-static void write_journal(struct dm_integrity_c *ic, unsigned int commit_start, unsigned int commit_sections)
+static void write_journal(struct dm_integrity_c *ic, unsigned int commit_start,
+			  unsigned int commit_sections, unsigned int omit_sectors)
 {
 	struct journal_completion io_comp;
 	struct journal_completion crypt_comp_1;
@@ -1117,7 +1118,7 @@  static void write_journal(struct dm_integrity_c *ic, unsigned int commit_start,
 				rw_section_mac(ic, commit_start + i, true);
 		}
 		rw_journal(ic, REQ_OP_WRITE | REQ_FUA | REQ_SYNC, commit_start,
-			   commit_sections, &io_comp);
+			   commit_sections, omit_sectors, &io_comp);
 	} else {
 		unsigned int to_end;
 
@@ -1130,7 +1131,7 @@  static void write_journal(struct dm_integrity_c *ic, unsigned int commit_start,
 			encrypt_journal(ic, true, commit_start, to_end, &crypt_comp_1);
 			if (try_wait_for_completion(&crypt_comp_1.comp)) {
 				rw_journal(ic, REQ_OP_WRITE | REQ_FUA,
-					   commit_start, to_end, &io_comp);
+					   commit_start, to_end, 0, &io_comp);
 				reinit_completion(&crypt_comp_1.comp);
 				crypt_comp_1.in_flight = (atomic_t)ATOMIC_INIT(0);
 				encrypt_journal(ic, true, 0, commit_sections - to_end, &crypt_comp_1);
@@ -1141,17 +1142,19 @@  static void write_journal(struct dm_integrity_c *ic, unsigned int commit_start,
 				crypt_comp_2.in_flight = (atomic_t)ATOMIC_INIT(0);
 				encrypt_journal(ic, true, 0, commit_sections - to_end, &crypt_comp_2);
 				wait_for_completion_io(&crypt_comp_1.comp);
-				rw_journal(ic, REQ_OP_WRITE | REQ_FUA, commit_start, to_end, &io_comp);
+				rw_journal(ic, REQ_OP_WRITE | REQ_FUA, commit_start, to_end, 0,
+					   &io_comp);
 				wait_for_completion_io(&crypt_comp_2.comp);
 			}
 		} else {
 			for (i = 0; i < to_end; i++)
 				rw_section_mac(ic, commit_start + i, true);
-			rw_journal(ic, REQ_OP_WRITE | REQ_FUA, commit_start, to_end, &io_comp);
+			rw_journal(ic, REQ_OP_WRITE | REQ_FUA, commit_start, to_end, 0, &io_comp);
 			for (i = 0; i < commit_sections - to_end; i++)
 				rw_section_mac(ic, i, true);
 		}
-		rw_journal(ic, REQ_OP_WRITE | REQ_FUA, 0, commit_sections - to_end, &io_comp);
+		rw_journal(ic, REQ_OP_WRITE | REQ_FUA, 0, commit_sections - to_end,
+			   omit_sectors, &io_comp);
 	}
 
 	wait_for_completion_io(&io_comp.comp);
@@ -1777,7 +1780,6 @@  static void integrity_metadata(struct work_struct *w)
 			if (unlikely(r)) {
 				if (r > 0) {
 					sector_t s;
-
 					s = sector - ((r + ic->tag_size - 1) / ic->tag_size);
 					DMERR_LIMIT("%pg: Checksum failed at sector 0x%llx",
 						    bio->bi_bdev, s);
@@ -2355,6 +2357,9 @@  static void integrity_commit(struct work_struct *w)
 	unsigned int commit_start, commit_sections;
 	unsigned int i, j, n;
 	struct bio *flushes;
+#ifdef CONFIG_DMINT_LAZY_COMMIT
+	unsigned int used_sectors;
+#endif
 
 	del_timer(&ic->autocommit_timer);
 
@@ -2366,6 +2371,15 @@  static void integrity_commit(struct work_struct *w)
 		goto release_flush_bios;
 	}
 
+#ifdef CONFIG_DMINT_LAZY_COMMIT
+	if (ic->free_section_entry)
+		used_sectors = (ic->free_section_entry <<
+				ic->sb->log2_sectors_per_block) +
+			JOURNAL_BLOCK_SECTORS;
+	else
+		used_sectors = ic->journal_section_sectors;
+#endif
+
 	pad_uncommitted(ic);
 	commit_start = ic->uncommitted_section;
 	commit_sections = ic->n_uncommitted_sections;
@@ -2388,6 +2402,16 @@  static void integrity_commit(struct work_struct *w)
 			struct journal_sector *js;
 
 			js = access_journal(ic, i, j);
+#ifdef CONFIG_DMINT_LAZY_COMMIT
+			if (n == commit_sections-1 && j == used_sectors-1) {
+				js->commit_id = dm_integrity_commit_id(ic, ~i,
+								       ~j, ic->commit_seq);
+				DEBUG_print("Lazy commit id=0x%llx: Sections %u.%u. Last section with %u sectors\n",
+					    js->commit_id, commit_start, i,
+					    used_sectors);
+				break;
+			}
+#endif
 			js->commit_id = dm_integrity_commit_id(ic, i, j, ic->commit_seq);
 		}
 		i++;
@@ -2397,7 +2421,12 @@  static void integrity_commit(struct work_struct *w)
 	}
 	smp_rmb();
 
-	write_journal(ic, commit_start, commit_sections);
+#ifdef CONFIG_DMINT_LAZY_COMMIT
+	write_journal(ic, commit_start, commit_sections,
+		      ic->journal_section_sectors-used_sectors);
+#else
+	write_journal(ic, commit_start, commit_sections, 0);
+#endif
 
 	spin_lock_irq(&ic->endio_wait.lock);
 	ic->uncommitted_section += commit_sections;
@@ -2443,12 +2472,13 @@  static void restore_last_bytes(struct dm_integrity_c *ic, struct journal_sector
 	} while (++s < ic->sectors_per_block);
 }
 
-static void do_journal_write(struct dm_integrity_c *ic, unsigned int write_start,
-			     unsigned int write_sections, bool from_replay)
+static int do_journal_write(struct dm_integrity_c *ic, unsigned int write_start,
+			    unsigned int write_sections, bool from_replay)
 {
 	unsigned int i, j, n;
 	struct journal_completion comp;
 	struct blk_plug plug;
+	unsigned int rc = 0;
 
 	blk_start_plug(&plug);
 
@@ -2465,7 +2495,7 @@  static void do_journal_write(struct dm_integrity_c *ic, unsigned int write_start
 		for (j = 0; j < ic->journal_section_entries; j++) {
 			struct journal_entry *je = access_journal_entry(ic, i, j);
 			sector_t sec, area, offset;
-			unsigned int k, l, next_loop;
+			unsigned int k, l, next_loop, end;
 			sector_t metadata_block;
 			unsigned int metadata_offset;
 			struct journal_io *io;
@@ -2543,6 +2573,7 @@  static void do_journal_write(struct dm_integrity_c *ic, unsigned int write_start
 			spin_unlock_irq(&ic->endio_wait.lock);
 
 			metadata_block = get_metadata_sector_and_offset(ic, area, offset, &metadata_offset);
+			end = k;
 			for (l = j; l < k; l++) {
 				int r;
 				struct journal_entry *je2 = access_journal_entry(ic, i, l);
@@ -2557,8 +2588,24 @@  static void do_journal_write(struct dm_integrity_c *ic, unsigned int write_start
 					integrity_sector_checksum(ic, sec + ((l - j) << ic->sb->log2_sectors_per_block),
 								  (char *)access_journal_data(ic, i, l), test_tag);
 					if (unlikely(memcmp(test_tag, journal_entry_tag(ic, je2), ic->tag_size))) {
+#ifdef CONFIG_DMINT_LAZY_COMMIT
+						if (!from_replay)
+							dm_integrity_io_error(ic, "tag mismatch when writing journal",
+									      -EILSEQ);
+
+						/*
+						 * during replay, continue processing and discard
+						 * data with a tag mismatch
+						 */
+						rc = -1;
+						if (end > l)
+							end = l;
+
+						DEBUG_print("tag mismatch at section %u entry %u\n", n, l);
+#else
 						dm_integrity_io_error(ic, "tag mismatch when replaying journal", -EILSEQ);
 						dm_audit_log_target(DM_MSG_PREFIX, "integrity-replay-journal", ic->ti, 0);
+#endif
 					}
 				}
 
@@ -2569,11 +2616,15 @@  static void do_journal_write(struct dm_integrity_c *ic, unsigned int write_start
 					dm_integrity_io_error(ic, "reading tags", r);
 			}
 
-			atomic_inc(&comp.in_flight);
-			copy_from_journal(ic, i, j << ic->sb->log2_sectors_per_block,
-					  (k - j) << ic->sb->log2_sectors_per_block,
-					  get_data_sector(ic, area, offset),
-					  complete_copy_from_journal, io);
+			// copy data that has not been discarded
+			if (end > j) {
+				atomic_inc(&comp.in_flight);
+				copy_from_journal(ic, i, j << ic->sb->log2_sectors_per_block,
+						  (end - j) << ic->sb->log2_sectors_per_block,
+						  get_data_sector(ic, area, offset),
+						  complete_copy_from_journal, io);
+			}
+
 skip_io:
 			j = next_loop;
 		}
@@ -2587,6 +2638,8 @@  static void do_journal_write(struct dm_integrity_c *ic, unsigned int write_start
 	wait_for_completion_io(&comp.comp);
 
 	dm_integrity_flush_buffers(ic, true);
+
+	return rc;
 }
 
 static void integrity_writer(struct work_struct *w)
@@ -2603,7 +2656,8 @@  static void integrity_writer(struct work_struct *w)
 	if (!write_sections)
 		return;
 
-	do_journal_write(ic, write_start, write_sections, false);
+	if (do_journal_write(ic, write_start, write_sections, false) < 0)
+		write_sections = ~0;
 
 	spin_lock_irq(&ic->endio_wait.lock);
 
@@ -2914,7 +2968,7 @@  static void init_journal(struct dm_integrity_c *ic, unsigned int start_section,
 		}
 	}
 
-	write_journal(ic, start_section, n_sections);
+	write_journal(ic, start_section, n_sections, 0);
 }
 
 static int find_commit_seq(struct dm_integrity_c *ic, unsigned int i, unsigned int j, commit_id_t id)
@@ -2929,6 +2983,50 @@  static int find_commit_seq(struct dm_integrity_c *ic, unsigned int i, unsigned i
 	return -EIO;
 }
 
+#ifdef CONFIG_DMINT_LAZY_COMMIT
+static int find_commit_seq_lazy(struct dm_integrity_c *ic, unsigned int i,
+	unsigned int j, commit_id_t id, bool *lazy)
+{
+	unsigned char k;
+	*lazy = false;
+	for (k = 0; k < N_COMMIT_IDS; k++) {
+		if (dm_integrity_commit_id(ic, i, j, k) == id)
+			return k;
+	}
+	for (k = 0; k < N_COMMIT_IDS; k++) {
+		if (dm_integrity_commit_id(ic, ~i, ~j, k) == id) {
+			DEBUG_print("Found a lazy commit id at %d:%d\n", i, j);
+			*lazy = true;
+			return k;
+		}
+	}
+	dm_integrity_io_error(ic, "journal commit id", -EIO);
+	return -EIO;
+}
+
+static bool journal_check_lazy_commit(struct dm_integrity_c *ic,
+	unsigned int i, unsigned int sector)
+{
+	unsigned int j;
+
+	if (sector%ic->sectors_per_block) {
+		DEBUG_print("The lazy commit id is not aligned to the block size. Not replaying section\n");
+		return false;
+	}
+
+	for (j = sector>>ic->sb->log2_sectors_per_block;
+		j < ic->journal_section_entries; j++) {
+		struct journal_entry *je = access_journal_entry(ic, i, j);
+
+		if (!journal_entry_is_unused(je)) {
+			DEBUG_print("Found used journal entry after lazy commit. Not replaying section\n");
+			return false;
+		}
+	}
+	return true;
+}
+#endif
+
 static void replay_journal(struct dm_integrity_c *ic)
 {
 	unsigned int i, j;
@@ -2938,6 +3036,7 @@  static void replay_journal(struct dm_integrity_c *ic)
 	unsigned int continue_section;
 	bool journal_empty;
 	unsigned char unused, last_used, want_commit_seq;
+	unsigned int first_bad, last_bad, dead;
 
 	if (ic->mode == 'R')
 		return;
@@ -2947,10 +3046,13 @@  static void replay_journal(struct dm_integrity_c *ic)
 
 	last_used = 0;
 	write_start = 0;
+	first_bad = 0;
+	last_bad = 0;
+	dead = 0;
 
 	if (!ic->just_formatted) {
 		DEBUG_print("reading journal\n");
-		rw_journal(ic, REQ_OP_READ, 0, ic->journal_sections, NULL);
+		rw_journal(ic, REQ_OP_READ, 0, ic->journal_sections, 0, NULL);
 		if (ic->journal_io)
 			DEBUG_bytes(lowmem_page_address(ic->journal_io[0].page), 64, "read journal");
 		if (ic->journal_io) {
@@ -2972,17 +3074,32 @@  static void replay_journal(struct dm_integrity_c *ic)
 	memset(used_commit_ids, 0, sizeof(used_commit_ids));
 	memset(max_commit_id_sections, 0, sizeof(max_commit_id_sections));
 	for (i = 0; i < ic->journal_sections; i++) {
+		bool bad = false;
 		for (j = 0; j < ic->journal_section_sectors; j++) {
 			int k;
 			struct journal_sector *js = access_journal(ic, i, j);
-
+#ifndef CONFIG_DMINT_LAZY_COMMIT
 			k = find_commit_seq(ic, i, j, js->commit_id);
-			if (k < 0)
-				goto clear_journal;
+#else
+			bool lazy;
+
+			k = find_commit_seq_lazy(ic, i, j, js->commit_id,
+						 &lazy);
+			if (lazy)
+				j = ic->journal_section_sectors;
+#endif
+			if (k < 0) {
+				/* remember the first and last bad section */
+				bad = true;
+				if (!first_bad)
+					first_bad = i + 1;
+				last_bad = i + 1;
+				break;
+			}
 			used_commit_ids[k] = true;
 			max_commit_id_sections[k] = i;
 		}
-		if (journal_empty) {
+		if (!bad && journal_empty) {
 			for (j = 0; j < ic->journal_section_entries; j++) {
 				struct journal_entry *je = access_journal_entry(ic, i, j);
 
@@ -3022,21 +3139,75 @@  static void replay_journal(struct dm_integrity_c *ic)
 		want_commit_seq = next_commit_seq(want_commit_seq);
 	wraparound_section(ic, &write_start);
 
+	if (unlikely(first_bad)) {
+		DEBUG_print("dm-integrity: write_start=%u first_bad=%u last_bad=%u\n",
+			    write_start, first_bad, last_bad);
+
+		if (last_bad <= write_start)
+			/*
+			 * section     0   1   2   3  | 4   5   6   7
+			 * id          2   2   2   2  | 2   1   1   1
+			 * first_bad=3         ^
+			 * last_bad=4              ^
+			 * start=4                      ^
+			 * dead=2              X   X
+			 */
+			dead = write_start - first_bad + 1;
+		else if (first_bad > write_start)
+			/*
+			 * section     0   1   2   3  | 4   5   6   7
+			 * id          2   2   2   2  | 2   1   1   1
+			 * first_bad=7                          ^
+			 * last_bad=8                               ^
+			 * start=4                      ^
+			 * dead=6      X   X   X   X            X   X
+			 */
+			dead = ic->journal_sections + write_start - first_bad + 1;
+		else
+			/*
+			 * section     0   1   2   3  | 4   5   6   7
+			 * id          2   2   2   2  | 2   1   1   1
+			 * first_bad=4             ^
+			 * last_bad=7                           ^
+			 * start=4                      ^
+			 * dead=0      X   X   X   X    X   X   X   X
+			 */
+			dead = 0;
+
+		DEBUG_print("dm-integrity: sections=%u, empty=%s, dead=%u\n",
+			    ic->journal_sections, journal_empty ? "true" : "false", dead);
+
+		if (journal_empty || dead == 0)
+			goto clear_journal;
+	}
+
 	i = write_start;
-	for (write_sections = 0; write_sections < ic->journal_sections; write_sections++) {
+	for (write_sections = 0; write_sections < ic->journal_sections - dead;
+	     write_sections++) {
 		for (j = 0; j < ic->journal_section_sectors; j++) {
 			struct journal_sector *js = access_journal(ic, i, j);
-
-			if (js->commit_id != dm_integrity_commit_id(ic, i, j, want_commit_seq)) {
-				/*
-				 * This could be caused by crash during writing.
-				 * We won't replay the inconsistent part of the
-				 * journal.
-				 */
-				DEBUG_print("commit id mismatch at position (%u, %u): %d != %d\n",
-					    i, j, find_commit_seq(ic, i, j, js->commit_id), want_commit_seq);
-				goto brk;
+			if (js->commit_id == dm_integrity_commit_id(ic, i, j,
+				want_commit_seq))
+				continue; /* regular commit */
+#ifdef CONFIG_DMINT_LAZY_COMMIT
+			if (js->commit_id == dm_integrity_commit_id(ic, ~i, ~j,
+				want_commit_seq)) {
+				/* Lazy commit */
+				DEBUG_print("Found lazy commit in replay: %u, %u\n",
+					i, j);
+				if (journal_check_lazy_commit(ic, i, j + 1))
+					break;
 			}
+#endif
+			/*
+			 * This could be caused by crash during writing.
+			 * We won't replay the inconsistent part of the
+			 * journal.
+			 */
+			DEBUG_print("commit id mismatch at position (%u, %u): %d != %d\n",
+				i, j, find_commit_seq(ic, i, j,
+				js->commit_id), want_commit_seq);
+			goto brk;
 		}
 		i++;
 		if (unlikely(i >= ic->journal_sections))
@@ -3785,7 +3956,10 @@  static int create_journal(struct dm_integrity_c *ic, char **error)
 	if (ic->journal_crypt_alg.alg_string) {
 		unsigned int ivsize, blocksize;
 		struct journal_completion comp;
-
+#ifdef CONFIG_DMINT_LAZY_COMMIT
+		*error = "Lazy commit with journal encryption is currently not supported";
+		goto bad;
+#endif
 		comp.ic = ic;
 		ic->journal_crypt = crypto_alloc_skcipher(ic->journal_crypt_alg.alg_string, 0, CRYPTO_ALG_ALLOCATES_MEMORY);
 		if (IS_ERR(ic->journal_crypt)) {