ext4: fix underflow in group bitmap calculation

Message ID 20221222020244.1821308-1-jun.nie@linaro.org
State New
Headers
Series ext4: fix underflow in group bitmap calculation |

Commit Message

Jun Nie Dec. 22, 2022, 2:02 a.m. UTC
  There is case that s_first_data_block is not 0 and block nr is smaller than
s_first_data_block when calculating group bitmap during allocation. This
underflow make index exceed es->s_groups_count in ext4_get_group_info()
and trigger the BUG_ON.

Fix it with protection of underflow.

Fixes: 72b64b594081ef ("ext4 uninline ext4_get_group_no_and_offset()")
Link: https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002
Reported-by: syzbot+6be2b977c89f79b6b153@syzkaller.appspotmail.com
Signed-off-by: Jun Nie <jun.nie@linaro.org>
---
 fs/ext4/balloc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
  

Comments

Theodore Ts'o Dec. 22, 2022, 5:41 p.m. UTC | #1
On Thu, Dec 22, 2022 at 10:02:44AM +0800, Jun Nie wrote:
> There is case that s_first_data_block is not 0 and block nr is smaller than
> s_first_data_block when calculating group bitmap during allocation. This
> underflow make index exceed es->s_groups_count in ext4_get_group_info()
> and trigger the BUG_ON.
> 
> Fix it with protection of underflow.

When was this happening, and why?  If blocknr is less than
s_first_data_block, this is either a insufficient input validation,
insufficient validation to detection file system corruption. or some
other kernel bug.

Looking quickly at the code and the repro, it appears that issue is
that FS_IOC_GETFSMAP is getting passed a stating physical block of 0
in fmh_keys[0] when on a file system with a blocksize of 1k (in which
case s_first_data_block is 1).  It's unclear to me what
FS_IOC_GETFSMAP should *do* when passed a value which requests that it
provide a mapping for a block which is out of bounds (either too big,
or too small)?.  Should it return an error?  Should it simply not
return a mapping?  The map page for ioctl_getfsmap() doesn't shed any
light on this question.

Darrick, you designed the interface and wrote most of fs/ext4/fsmap.c.
Can you let us know what is supposed to happen in this case?  Many
thanks!!

> Fixes: 72b64b594081ef ("ext4 uninline ext4_get_group_no_and_offset()")

This makes ***no*** sense; the commit in question is from 2006, which
means that in some jourisdictions it's old enough to drive a car.  :-)
Futhermore, all it does is move the function from an inline function
to a C file (in this case, balloc.c).  It also long predates
introduction of FS_IOC_GETFSMAP support, which was in 2017.  

I'm guessing you just did a "git blame" and blindly assumed that
whatever commit last touched the C code in question was what
introduced the problem?

Anyway, please try to understand what is going on instead of doing the
moral equivalent of taking a sledgehammer to the code until the
reproducer stops triggering a BUG.  It's not enough to shut up the
reproducer; you should understand what is happening, and why, and then
strive to find the best fix to the problem.  Papering over problems in
the end will result in more fragile code, and the goal of syzkaller is
to improve kernel quality.  But syzkaller is just a tool and used
wrongly, it can have the opposite effect.

Regards,

					- Ted
  
Darrick J. Wong Dec. 22, 2022, 6:08 p.m. UTC | #2
On Thu, Dec 22, 2022 at 12:41:58PM -0500, Theodore Ts'o wrote:
> On Thu, Dec 22, 2022 at 10:02:44AM +0800, Jun Nie wrote:
> > There is case that s_first_data_block is not 0 and block nr is smaller than
> > s_first_data_block when calculating group bitmap during allocation. This
> > underflow make index exceed es->s_groups_count in ext4_get_group_info()
> > and trigger the BUG_ON.
> > 
> > Fix it with protection of underflow.
> 
> When was this happening, and why?  If blocknr is less than
> s_first_data_block, this is either a insufficient input validation,
> insufficient validation to detection file system corruption. or some
> other kernel bug.
> 
> Looking quickly at the code and the repro, it appears that issue is
> that FS_IOC_GETFSMAP is getting passed a stating physical block of 0
> in fmh_keys[0] when on a file system with a blocksize of 1k (in which
> case s_first_data_block is 1).  It's unclear to me what

Question -- on a 1k-block filesystem, are the first 1024 bytes of the
device *reserved* by ext4 for whatever bootloader crud goes in there?
Or is that space undefined in the filesystem specification?

I never did figure that out when I was writing the ondisk specification
that's in the kernel, but maybe you remember?

> FS_IOC_GETFSMAP should *do* when passed a value which requests that it
> provide a mapping for a block which is out of bounds (either too big,
> or too small)?.  Should it return an error?  Should it simply not
> return a mapping?  The map page for ioctl_getfsmap() doesn't shed any
> light on this question.
> 
> Darrick, you designed the interface and wrote most of fs/ext4/fsmap.c.
> Can you let us know what is supposed to happen in this case?  Many
> thanks!!

If those first 1024 bytes are defined to be reserved in the ondisk
format, then you could return a mapping for those bytes with the owner
code set to EXT4_FMR_OWN_UNKNOWN.

If, however, the space is undefined, then going off this statement in
the manpage:

"For example, if the low key (fsmap_head.fmh_keys[0]) is set to (8:0,
36864, 0, 0, 0), the filesystem  will  only  return  records for extents
starting at or above 36 KiB on disk."

I think the 'at or above' clause means that ext4 should not pass back
any mapping for the byte range 0-1023 on a 1k-block filesystem.

If the low key is set to (8:0, 0, 0, 0, 0) and high key is set to (8:0,
1023, 0, 0, 0) then ext4 shouldn't return any mapping at all, because
there's no space usage defined for that region of the disk.

If the low key is set to (8:0, 0, 0, 0, 0) and high key is set to all
ones, then ext4 can return mappings for the primary superblock at offset
1024.

--D

> 
> > Fixes: 72b64b594081ef ("ext4 uninline ext4_get_group_no_and_offset()")
> 
> This makes ***no*** sense; the commit in question is from 2006, which
> means that in some jourisdictions it's old enough to drive a car.  :-)
> Futhermore, all it does is move the function from an inline function
> to a C file (in this case, balloc.c).  It also long predates
> introduction of FS_IOC_GETFSMAP support, which was in 2017.  
> 
> I'm guessing you just did a "git blame" and blindly assumed that
> whatever commit last touched the C code in question was what
> introduced the problem?
> 
> Anyway, please try to understand what is going on instead of doing the
> moral equivalent of taking a sledgehammer to the code until the
> reproducer stops triggering a BUG.  It's not enough to shut up the
> reproducer; you should understand what is happening, and why, and then
> strive to find the best fix to the problem.  Papering over problems in
> the end will result in more fragile code, and the goal of syzkaller is
> to improve kernel quality.  But syzkaller is just a tool and used
> wrongly, it can have the opposite effect.
> 
> Regards,
> 
> 					- Ted
  
Theodore Ts'o Dec. 23, 2022, 5:08 a.m. UTC | #3
On Thu, Dec 22, 2022 at 10:08:59AM -0800, Darrick J. Wong wrote:
> 
> Question -- on a 1k-block filesystem, are the first 1024 bytes of the
> device *reserved* by ext4 for whatever bootloader crud goes in there?
> Or is that space undefined in the filesystem specification?
> 
> I never did figure that out when I was writing the ondisk specification
> that's in the kernel, but maybe you remember?

That's an interesting (and philosophical) question.  The ext2 file
system never had a formal specification, and this part of the file
system format was devised by Remy Card before I had gotten involved
with ext2.  (I first got started writing e2fsprogs; which replaced the
previous file system utilities, which were forked from minix's tools,
and which were quite inefficient.)

In favor of it being undefined, the first 1024 bytes are not part of
any block group in an ext2 file system with a 1k block size.  (The
first block group is composed of physical blocks 1 through 8192
inclusive when the block size is 1k.  Whereas if the blocksize is 4k,
the first block group is composed of physical blocks 0 through 32767.)
In addition, the status of the first 1024 bytes is not controlled by
an ext2 block allocation bitmap.

One could also argue that to the extent that ext2 was derived the ext
file system, which in turn was derived from Minix --- and Minix File
System (which does have a specification, explicitly states that "block
0" is reserved for the Bootloader, with "Block 1" being the location
of the superblock.  But Minix only supports a 1k blocksize, and
doesn't have the concept of FFS-style block (cylinder) groups.

So I'd come down on the side which states that the first 1024 bytes
are "undefined" on a 1k block file system.

(One could also aruge that they are "undefined" on a 2k and 4k block
file system, but the first 1024 bytes are part of "block 0", and on 2k
and 4k block file systems, "block 0" is part of a block group.)

> If those first 1024 bytes are defined to be reserved in the ondisk
> format, then you could return a mapping for those bytes with the owner
> code set to EXT4_FMR_OWN_UNKNOWN.
> 
> If, however, the space is undefined, then going off this statement in
> the manpage:
> 
> "For example, if the low key (fsmap_head.fmh_keys[0]) is set to (8:0,
> 36864, 0, 0, 0), the filesystem  will  only  return  records for extents
> starting at or above 36 KiB on disk."
> 
> I think the 'at or above' clause means that ext4 should not pass back
> any mapping for the byte range 0-1023 on a 1k-block filesystem.

Sure, sounds good to me.

						- Ted
  

Patch

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 8ff4b9192a9f..177ef6bd635a 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -56,7 +56,8 @@  void ext4_get_group_no_and_offset(struct super_block *sb, ext4_fsblk_t blocknr,
 	struct ext4_super_block *es = EXT4_SB(sb)->s_es;
 	ext4_grpblk_t offset;
 
-	blocknr = blocknr - le32_to_cpu(es->s_first_data_block);
+	blocknr = blocknr > le32_to_cpu(es->s_first_data_block) ?
+		blocknr - le32_to_cpu(es->s_first_data_block) : 0;
 	offset = do_div(blocknr, EXT4_BLOCKS_PER_GROUP(sb)) >>
 		EXT4_SB(sb)->s_cluster_bits;
 	if (offsetp)