diff mbox series

[v2] ext4: fix corruption during on-line resize

Message ID	20240215155009.94493-1-mheyne@amazon.de
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel+bounces-67241-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; From: Maximilian Heyne <mheyne@amazon.de> To: CC: <ravib@amazon.com>, Maximilian Heyne <mheyne@amazon.de>, <stable@vger.kernel.org>, Theodore Ts'o <tytso@mit.edu>, Andreas Dilger <adilger.kernel@dilger.ca>, Yongqiang Yang <xiaoqiangnk@gmail.com>, <linux-ext4@vger.kernel.org>, <linux-kernel@vger.kernel.org> Subject: [PATCH v2] ext4: fix corruption during on-line resize Date: Thu, 15 Feb 2024 15:50:09 +0000 Message-ID: <20240215155009.94493-1-mheyne@amazon.de> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-getmail-retrieved-from-mailbox: INBOX
Series	[v2] ext4: fix corruption during on-line resize \| [v2] ext4: fix corruption during on-line resize

Commit Message

Maximilian Heyne Feb. 15, 2024, 3:50 p.m. UTC

  We observed a corruption during on-line resize of a file system that is
larger than 16 TiB with 4k block size. With having more then 2^32 blocks
resize_inode is turned off by default by mke2fs. The issue can be
reproduced on a smaller file system for convenience by explicitly
turning off resize_inode. An on-line resize across an 8 GiB boundary (the
size of a meta block group in this setup) then leads to a corruption:

  dev=/dev/<some_dev> # should be >= 16 GiB
  mkdir -p /corruption
  /sbin/mke2fs -t ext4 -b 4096 -O ^resize_inode $dev $((2 * 2**21 - 2**15))
  mount -t ext4 $dev /corruption

  dd if=/dev/zero bs=4096 of=/corruption/test count=$((2*2**21 - 4*2**15))
  sha1sum /corruption/test
  # 79d2658b39dcfd77274e435b0934028adafaab11  /corruption/test

  /sbin/resize2fs $dev $((2*2**21))
  # drop page cache to force reload the block from disk
  echo 1 > /proc/sys/vm/drop_caches

  sha1sum /corruption/test
  # 3c2abc63cbf1a94c9e6977e0fbd72cd832c4d5c3  /corruption/test

2^21 = 2^15*2^6 equals 8 GiB whereof 2^15 is the number of blocks per
block group and 2^6 are the number of block groups that make a meta
block group.

The last checksum might be different depending on how the file is laid
out across the physical blocks. The actual corruption occurs at physical
block 63*2^15 = 2064384 which would be the location of the backup of the
meta block group's block descriptor. During the on-line resize the file
system will be converted to meta_bg starting at s_first_meta_bg which is
2 in the example - meaning all block groups after 16 GiB. However, in
ext4_flex_group_add we might add block groups that are not part of the
first meta block group yet. In the reproducer we achieved this by
substracting the size of a whole block group from the point where the
meta block group would start. This must be considered when updating the
backup block group descriptors to follow the non-meta_bg layout. The fix
is to add a test whether the group to add is already part of the meta
block group or not.

Fixes: 01f795f9e0d67 ("ext4: add online resizing support for meta_bg and 64-bit file systems")
Cc: stable@vger.kernel.org
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
---
 fs/ext4/resize.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Srivathsa Dara Feb. 19, 2024, 7:06 a.m. UTC | #1

-----Original Message-----
From: Maximilian Heyne <mheyne@amazon.de> 
Sent: Thursday, February 15, 2024 9:20 PM
Cc: ravib@amazon.com; Maximilian Heyne <mheyne@amazon.de>; 
stable@vger.kernel.org; Theodore Ts'o <tytso@mit.edu>; Andreas Dilger 
<adilger.kernel@dilger.ca>; Yongqiang Yang <xiaoqiangnk@gmail.com>; 
linux-ext4@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: [External] : [PATCH v2] ext4: fix corruption during on-line resize

> We observed a corruption during on-line resize of a file system that is
> larger than 16 TiB with 4k block size. With having more then 2^32 blocks
> resize_inode is turned off by default by mke2fs. The issue can be
> reproduced on a smaller file system for convenience by explicitly
> turning off resize_inode. An on-line resize across an 8 GiB boundary (the
> size of a meta block group in this setup) then leads to a corruption:
>
>  dev=/dev/<some_dev> # should be >= 16 GiB
>  mkdir -p /corruption
>  /sbin/mke2fs -t ext4 -b 4096 -O ^resize_inode $dev $((2 * 2**21 - 2**15))
>  mount -t ext4 $dev /corruption
>
>  dd if=/dev/zero bs=4096 of=/corruption/test count=$((2*2**21 - 4*2**15))
>  sha1sum /corruption/test
>  # 79d2658b39dcfd77274e435b0934028adafaab11  /corruption/test
>
>  /sbin/resize2fs $dev $((2*2**21))
>  # drop page cache to force reload the block from disk
>  echo 1 > /proc/sys/vm/drop_caches
>
>  sha1sum /corruption/test
>  # 3c2abc63cbf1a94c9e6977e0fbd72cd832c4d5c3  /corruption/test
>
> 2^21 = 2^15*2^6 equals 8 GiB whereof 2^15 is the number of blocks per
> block group and 2^6 are the number of block groups that make a meta
> block group.
>
> The last checksum might be different depending on how the file is laid
> out across the physical blocks. The actual corruption occurs at physical
> block 63*2^15 = 2064384 which would be the location of the backup of the
> meta block group's block descriptor. During the on-line resize the file
> system will be converted to meta_bg starting at s_first_meta_bg which is
> 2 in the example - meaning all block groups after 16 GiB. However, in
> ext4_flex_group_add we might add block groups that are not part of the
> first meta block group yet. In the reproducer we achieved this by
> substracting the size of a whole block group from the point where the
> meta block group would start. This must be considered when updating the
> backup block group descriptors to follow the non-meta_bg layout. The fix
> is to add a test whether the group to add is already part of the meta
> block group or not.
>
> Fixes: 01f795f9e0d67 ("ext4: add online resizing support for meta_bg and 64-bit file systems")
> Cc: stable@vger.kernel.org

Tested the patch across filesystem of various sizes and blocksizes. The patch stops
the corruption.

> Signed-off-by: Maximilian Heyne <mheyne@amazon.de>

Tested-by: Srivathsa Dara <srivathsa.d.dara@oracle.com>
Reviewed-by: Srivathsa Dara <srivathsa.d.dara@oracle.com>

> ---
> fs/ext4/resize.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
> index 4d4a5a32e310..3c0d12382e06 100644
> --- a/fs/ext4/resize.c
> +++ b/fs/ext4/resize.c
> @@ -1602,7 +1602,8 @@ static int ext4_flex_group_add(struct super_block *sb,
> 		int gdb_num = group / EXT4_DESC_PER_BLOCK(sb);
> 		int gdb_num_end = ((group + flex_gd->count - 1) /
> 				   EXT4_DESC_PER_BLOCK(sb));
> -		int meta_bg = ext4_has_feature_meta_bg(sb);
> +		int meta_bg = ext4_has_feature_meta_bg(sb) &&
> +			      gdb_num >= le32_to_cpu(es->s_first_meta_bg);
> 		sector_t padding_blocks = meta_bg ? 0 : sbi->s_sbh->b_blocknr -
> 					 ext4_group_first_block_no(sb, 0);
> 
> -- 
> 2.40.1
>
>
>
>
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879

Theodore Ts'o Feb. 22, 2024, 3:54 p.m. UTC | #2

On Thu, 15 Feb 2024 15:50:09 +0000, Maximilian Heyne wrote:
> We observed a corruption during on-line resize of a file system that is
> larger than 16 TiB with 4k block size. With having more then 2^32 blocks
> resize_inode is turned off by default by mke2fs. The issue can be
> reproduced on a smaller file system for convenience by explicitly
> turning off resize_inode. An on-line resize across an 8 GiB boundary (the
> size of a meta block group in this setup) then leads to a corruption:
> 
> [...]

Applied, thanks!

[1/1] ext4: fix corruption during on-line resize
      commit: 3a944549dd26ccaf1f898a4be952e75a42bf37dd

Best regards,

diff mbox series

Patch

diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 4d4a5a32e310..3c0d12382e06 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1602,7 +1602,8 @@  static int ext4_flex_group_add(struct super_block *sb,
 		int gdb_num = group / EXT4_DESC_PER_BLOCK(sb);
 		int gdb_num_end = ((group + flex_gd->count - 1) /
 				   EXT4_DESC_PER_BLOCK(sb));
-		int meta_bg = ext4_has_feature_meta_bg(sb);
+		int meta_bg = ext4_has_feature_meta_bg(sb) &&
+			      gdb_num >= le32_to_cpu(es->s_first_meta_bg);
 		sector_t padding_blocks = meta_bg ? 0 : sbi->s_sbh->b_blocknr -
 					 ext4_group_first_block_no(sb, 0);