diff mbox series

ext4: reject 1k block fs on the first block of disk

Message ID	20221229014502.2322727-1-jun.nie@linaro.org
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; From: Jun Nie <jun.nie@linaro.org> To: djwong@kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org Cc: tudor.ambarus@linaro.org Subject: [PATCH] ext4: reject 1k block fs on the first block of disk Date: Thu, 29 Dec 2022 09:45:02 +0800 Message-Id: <20221229014502.2322727-1-jun.nie@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	ext4: reject 1k block fs on the first block of disk \| ext4: reject 1k block fs on the first block of disk

Commit Message

Jun Nie Dec. 29, 2022, 1:45 a.m. UTC

  For 1k-block filesystems, the filesystem starts at block 1, not block 0.
If start_fsb is 0, it will be bump up to s_first_data_block. Then
ext4_get_group_no_and_offset don't know what to do and return garbage
results (blockgroup 2^32-1). The underflow make index
exceed es->s_groups_count in ext4_get_group_info() and trigger the BUG_ON.

Fixes: 4a4956249dac0 ("ext4: fix off-by-one fsmap error on 1k block filesystems")
Link: https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002
Reported-by: syzbot+6be2b977c89f79b6b153@syzkaller.appspotmail.com
Signed-off-by: Jun Nie <jun.nie@linaro.org>
---
 fs/ext4/fsmap.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Darrick J. Wong Jan. 3, 2023, 7:17 p.m. UTC | #1

On Thu, Dec 29, 2022 at 09:45:02AM +0800, Jun Nie wrote:
> For 1k-block filesystems, the filesystem starts at block 1, not block 0.
> If start_fsb is 0, it will be bump up to s_first_data_block. Then
> ext4_get_group_no_and_offset don't know what to do and return garbage
> results (blockgroup 2^32-1). The underflow make index
> exceed es->s_groups_count in ext4_get_group_info() and trigger the BUG_ON.
> 
> Fixes: 4a4956249dac0 ("ext4: fix off-by-one fsmap error on 1k block filesystems")
> Link: https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002
> Reported-by: syzbot+6be2b977c89f79b6b153@syzkaller.appspotmail.com
> Signed-off-by: Jun Nie <jun.nie@linaro.org>
> ---
>  fs/ext4/fsmap.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/ext4/fsmap.c b/fs/ext4/fsmap.c
> index 4493ef0c715e..1aef127b0634 100644
> --- a/fs/ext4/fsmap.c
> +++ b/fs/ext4/fsmap.c
> @@ -702,6 +702,12 @@ int ext4_getfsmap(struct super_block *sb, struct ext4_fsmap_head *head,
>  		if (handlers[i].gfd_dev > head->fmh_keys[0].fmr_device)
>  			memset(&dkeys[0], 0, sizeof(struct ext4_fsmap));
>  
> +		/*
> +		 * Re-check the range after above limit operation and reject
> +		 * 1K fs on block 0 as fs should start block 1. */
> +		if (dkeys[0].fmr_physical ==0 && dkeys[1].fmr_physical == 0)
> +			continue;

...and if this filesystem has 4k blocks, and therefore *does* define a
block 0?

--D

> +
>  		info.gfi_dev = handlers[i].gfd_dev;
>  		info.gfi_last = false;
>  		info.gfi_agno = -1;
> -- 
> 2.34.1
>

Jun Nie Jan. 4, 2023, 1:58 a.m. UTC | #2

Darrick J. Wong <djwong@kernel.org> 于2023年1月4日周三 03:17写道：
>
> On Thu, Dec 29, 2022 at 09:45:02AM +0800, Jun Nie wrote:
> > For 1k-block filesystems, the filesystem starts at block 1, not block 0.
> > If start_fsb is 0, it will be bump up to s_first_data_block. Then
> > ext4_get_group_no_and_offset don't know what to do and return garbage
> > results (blockgroup 2^32-1). The underflow make index
> > exceed es->s_groups_count in ext4_get_group_info() and trigger the BUG_ON.
> >
> > Fixes: 4a4956249dac0 ("ext4: fix off-by-one fsmap error on 1k block filesystems")
> > Link: https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002
> > Reported-by: syzbot+6be2b977c89f79b6b153@syzkaller.appspotmail.com
> > Signed-off-by: Jun Nie <jun.nie@linaro.org>
> > ---
> >  fs/ext4/fsmap.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/fs/ext4/fsmap.c b/fs/ext4/fsmap.c
> > index 4493ef0c715e..1aef127b0634 100644
> > --- a/fs/ext4/fsmap.c
> > +++ b/fs/ext4/fsmap.c
> > @@ -702,6 +702,12 @@ int ext4_getfsmap(struct super_block *sb, struct ext4_fsmap_head *head,
> >               if (handlers[i].gfd_dev > head->fmh_keys[0].fmr_device)
> >                       memset(&dkeys[0], 0, sizeof(struct ext4_fsmap));
> >
> > +             /*
> > +              * Re-check the range after above limit operation and reject
> > +              * 1K fs on block 0 as fs should start block 1. */
> > +             if (dkeys[0].fmr_physical ==0 && dkeys[1].fmr_physical == 0)
> > +                     continue;
>
> ...and if this filesystem has 4k blocks, and therefore *does* define a
> block 0?

Yes, this is a real corner case test :-)

>
> --D
>
> > +
> >               info.gfi_dev = handlers[i].gfd_dev;
> >               info.gfi_last = false;
> >               info.gfi_agno = -1;
> > --
> > 2.34.1
> >

Theodore Ts'o Feb. 15, 2023, 4:32 a.m. UTC | #3

On Wed, Jan 04, 2023 at 09:58:03AM +0800, Jun Nie wrote:
> Darrick J. Wong <djwong@kernel.org> 于2023年1月4日周三 03:17写道：
> >
> > On Thu, Dec 29, 2022 at 09:45:02AM +0800, Jun Nie wrote:
> > > For 1k-block filesystems, the filesystem starts at block 1, not block 0.
> > > If start_fsb is 0, it will be bump up to s_first_data_block. Then
> > > ext4_get_group_no_and_offset don't know what to do and return garbage
> > > results (blockgroup 2^32-1). The underflow make index
> > > exceed es->s_groups_count in ext4_get_group_info() and trigger the BUG_ON.
> > >
> > > Fixes: 4a4956249dac0 ("ext4: fix off-by-one fsmap error on 1k block filesystems")
> > > Link: https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002
> > > Reported-by: syzbot+6be2b977c89f79b6b153@syzkaller.appspotmail.com
> > > Signed-off-by: Jun Nie <jun.nie@linaro.org>
> > > ---
> > >  fs/ext4/fsmap.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/fs/ext4/fsmap.c b/fs/ext4/fsmap.c
> > > index 4493ef0c715e..1aef127b0634 100644
> > > --- a/fs/ext4/fsmap.c
> > > +++ b/fs/ext4/fsmap.c
> > > @@ -702,6 +702,12 @@ int ext4_getfsmap(struct super_block *sb, struct ext4_fsmap_head *head,
> > >               if (handlers[i].gfd_dev > head->fmh_keys[0].fmr_device)
> > >                       memset(&dkeys[0], 0, sizeof(struct ext4_fsmap));
> > >
> > > +             /*
> > > +              * Re-check the range after above limit operation and reject
> > > +              * 1K fs on block 0 as fs should start block 1. */
> > > +             if (dkeys[0].fmr_physical ==0 && dkeys[1].fmr_physical == 0)
> > > +                     continue;
> >
> > ...and if this filesystem has 4k blocks, and therefore *does* define a
> > block 0?
> 
> Yes, this is a real corner case test :-)

So I'm really nervous about this change.  I don't understand the code;
and I don't understand how the reproducer works.  I can certainly
reproduce it using the reproducer found here[1], but it seems to
require running multiple processes all creating loop devices and then
running FS_IOC_GETMAP.

[1] https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002

If I change the reproducer to just run the execute_one() once, it
doesn't trigger the bug.  It seems to only trigger when you have
multiple processes all racing to create a loop device, mount the file
system, try running FS_IOC_GETMAP --- and then delete the loop device
without actually unmounting the file system.  Which is **weird***.

I've tried taking the image, and just running "xfs_io -c fsmap /mnt",
and that doesn't trigger it either.

And I don't understand the reply to Darrick's question about why it's
safe to add the check since for 4k block file systems, block 0 *is*
valid.

So if someone can explain to me what is going on here with this code
(there are too many abstractions and what's going on with keys is just
making my head hurt), *and* what the change actually does, and how to
reproduce the problem with a ***simple*** reproducer -- the syzbot
mess doesn't count, that would be great.  But applying a change that I
don't understand to code I don't understand, to fix a reproducer which
I also doesn't understand, just doesn't make me feel comfortable.

Regards,

					- Ted

Tudor Ambarus Feb. 15, 2023, 11:46 a.m. UTC | #4

Hi, Ted!

On 2/15/23 04:32, Theodore Ts'o wrote:
> On Wed, Jan 04, 2023 at 09:58:03AM +0800, Jun Nie wrote:
>> Darrick J. Wong <djwong@kernel.org> 于2023年1月4日周三 03:17写道：
>>>
>>> On Thu, Dec 29, 2022 at 09:45:02AM +0800, Jun Nie wrote:
>>>> For 1k-block filesystems, the filesystem starts at block 1, not block 0.
>>>> If start_fsb is 0, it will be bump up to s_first_data_block. Then
>>>> ext4_get_group_no_and_offset don't know what to do and return garbage
>>>> results (blockgroup 2^32-1). The underflow make index
>>>> exceed es->s_groups_count in ext4_get_group_info() and trigger the BUG_ON.
>>>>
>>>> Fixes: 4a4956249dac0 ("ext4: fix off-by-one fsmap error on 1k block filesystems")
>>>> Link: https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002
>>>> Reported-by: syzbot+6be2b977c89f79b6b153@syzkaller.appspotmail.com
>>>> Signed-off-by: Jun Nie <jun.nie@linaro.org>
>>>> ---
>>>>   fs/ext4/fsmap.c | 6 ++++++
>>>>   1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/fs/ext4/fsmap.c b/fs/ext4/fsmap.c
>>>> index 4493ef0c715e..1aef127b0634 100644
>>>> --- a/fs/ext4/fsmap.c
>>>> +++ b/fs/ext4/fsmap.c
>>>> @@ -702,6 +702,12 @@ int ext4_getfsmap(struct super_block *sb, struct ext4_fsmap_head *head,
>>>>                if (handlers[i].gfd_dev > head->fmh_keys[0].fmr_device)
>>>>                        memset(&dkeys[0], 0, sizeof(struct ext4_fsmap));
>>>>
>>>> +             /*
>>>> +              * Re-check the range after above limit operation and reject
>>>> +              * 1K fs on block 0 as fs should start block 1. */
>>>> +             if (dkeys[0].fmr_physical ==0 && dkeys[1].fmr_physical == 0)
>>>> +                     continue;
>>>
>>> ...and if this filesystem has 4k blocks, and therefore *does* define a
>>> block 0?
>>
>> Yes, this is a real corner case test :-)
> 
> So I'm really nervous about this change.  I don't understand the code;
> and I don't understand how the reproducer works.  I can certainly
> reproduce it using the reproducer found here[1], but it seems to
> require running multiple processes all creating loop devices and then
> running FS_IOC_GETMAP.
> 
> [1] https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002
> 
> If I change the reproducer to just run the execute_one() once, it
> doesn't trigger the bug.  It seems to only trigger when you have
> multiple processes all racing to create a loop device, mount the file
> system, try running FS_IOC_GETMAP --- and then delete the loop device
> without actually unmounting the file system.  Which is **weird***.
> 
> I've tried taking the image, and just running "xfs_io -c fsmap /mnt",
> and that doesn't trigger it either.
> 
> And I don't understand the reply to Darrick's question about why it's
> safe to add the check since for 4k block file systems, block 0 *is*
> valid.
> 
> So if someone can explain to me what is going on here with this code
> (there are too many abstractions and what's going on with keys is just
> making my head hurt), *and* what the change actually does, and how to
> reproduce the problem with a ***simple*** reproducer -- the syzbot
> mess doesn't count, that would be great.  But applying a change that I
> don't understand to code I don't understand, to fix a reproducer which
> I also doesn't understand, just doesn't make me feel comfortable.
> 

Let me share what I understood until now. The low key is zeroed. The
high key is defined and uses a fmr_physical of value zero, which is
smaller than the first data block for the 1k-block ext4 fs (which starts
at offset 1024).

-> ext4_getfsmap_datadev()
   keys[0].fmr_physical = 0, keys[1].fmr_physical = 0
   bofs = le32_to_cpu(sbi->s_es->s_first_data_block) = 1, eofs = 256
   start_fsb = keys[0].fmr_physical = 1, end_fsb = keys[1].fmr_physical = 0
   -> ext4_get_group_no_and_offset()
     blocknr = 1, le32_to_cpu(es->s_first_data_block) =1
   start_ag = 0, first_cluster = 0
   ->
     blocknr = 0, le32_to_cpu(es->s_first_data_block) =1
   end_ag = 4294967295, last_cluster = 8191

   Then there's a loop that stops when info->gfi_agno <= end_ag; that 
will trigger the BUG_ON in ext4_get_group_info() as the group nr exceeds 
EXT4_SB(sb)->s_groups_count)
   -> ext4_mballoc_query_range()
     -> ext4_mb_load_buddy()
       -> ext4_mb_load_buddy_gfp()
         -> ext4_get_group_info()

It's an out of bounds request and Darrick suggested to not return any
mapping for the byte range 0-1023 for the 1k-block filesystem. The
alternative would be to return -EINVAL when the high key starts at
fmr_phisical of value zero for the 1k-block fs.

In order to reproduce this one would have to create an 1k-block ext4 fs
and to pass a high key with fmr_physical of value zero, thus I would
expect to reproduce it with something like this:
xfs_io -c 'fsmap -d 0 0' /mnt/scratch

However when doing this I notice that in
xfsprogs-dev/io/fsmap.c l->fmr_device and h->fmr_device will have value
zero, FS_IOC_GETFSMAP is called and then we receive no entries
(head->fmh_entries = 0). Now I'm trying to see what I do wrong, and how
to reproduce the bug.

Cheers,
ta

Tudor Ambarus Feb. 15, 2023, 11:53 a.m. UTC | #5

On 2/15/23 11:46, Tudor Ambarus wrote:
> Hi, Ted!
> 
> On 2/15/23 04:32, Theodore Ts'o wrote:
>> On Wed, Jan 04, 2023 at 09:58:03AM +0800, Jun Nie wrote:
>>> Darrick J. Wong <djwong@kernel.org> 于2023年1月4日周三 03:17写道：
>>>>
>>>> On Thu, Dec 29, 2022 at 09:45:02AM +0800, Jun Nie wrote:
>>>>> For 1k-block filesystems, the filesystem starts at block 1, not 
>>>>> block 0.
>>>>> If start_fsb is 0, it will be bump up to s_first_data_block. Then
>>>>> ext4_get_group_no_and_offset don't know what to do and return garbage
>>>>> results (blockgroup 2^32-1). The underflow make index
>>>>> exceed es->s_groups_count in ext4_get_group_info() and trigger the 
>>>>> BUG_ON.
>>>>>
>>>>> Fixes: 4a4956249dac0 ("ext4: fix off-by-one fsmap error on 1k block 
>>>>> filesystems")
>>>>> Link: 
>>>>> https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002
>>>>> Reported-by: syzbot+6be2b977c89f79b6b153@syzkaller.appspotmail.com
>>>>> Signed-off-by: Jun Nie <jun.nie@linaro.org>
>>>>> ---
>>>>>   fs/ext4/fsmap.c | 6 ++++++
>>>>>   1 file changed, 6 insertions(+)
>>>>>
>>>>> diff --git a/fs/ext4/fsmap.c b/fs/ext4/fsmap.c
>>>>> index 4493ef0c715e..1aef127b0634 100644
>>>>> --- a/fs/ext4/fsmap.c
>>>>> +++ b/fs/ext4/fsmap.c
>>>>> @@ -702,6 +702,12 @@ int ext4_getfsmap(struct super_block *sb, 
>>>>> struct ext4_fsmap_head *head,
>>>>>                if (handlers[i].gfd_dev > head->fmh_keys[0].fmr_device)
>>>>>                        memset(&dkeys[0], 0, sizeof(struct 
>>>>> ext4_fsmap));
>>>>>
>>>>> +             /*
>>>>> +              * Re-check the range after above limit operation and 
>>>>> reject
>>>>> +              * 1K fs on block 0 as fs should start block 1. */
>>>>> +             if (dkeys[0].fmr_physical ==0 && 
>>>>> dkeys[1].fmr_physical == 0)
>>>>> +                     continue;
>>>>
>>>> ...and if this filesystem has 4k blocks, and therefore *does* define a
>>>> block 0?
>>>
>>> Yes, this is a real corner case test :-)
>>
>> So I'm really nervous about this change.  I don't understand the code;
>> and I don't understand how the reproducer works.  I can certainly
>> reproduce it using the reproducer found here[1], but it seems to
>> require running multiple processes all creating loop devices and then
>> running FS_IOC_GETMAP.
>>
>> [1] 
>> https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002
>>
>> If I change the reproducer to just run the execute_one() once, it
>> doesn't trigger the bug.  It seems to only trigger when you have
>> multiple processes all racing to create a loop device, mount the file
>> system, try running FS_IOC_GETMAP --- and then delete the loop device
>> without actually unmounting the file system.  Which is **weird***.
>>
>> I've tried taking the image, and just running "xfs_io -c fsmap /mnt",
>> and that doesn't trigger it either.
>>
>> And I don't understand the reply to Darrick's question about why it's
>> safe to add the check since for 4k block file systems, block 0 *is*
>> valid.
>>
>> So if someone can explain to me what is going on here with this code
>> (there are too many abstractions and what's going on with keys is just
>> making my head hurt), *and* what the change actually does, and how to
>> reproduce the problem with a ***simple*** reproducer -- the syzbot
>> mess doesn't count, that would be great.  But applying a change that I
>> don't understand to code I don't understand, to fix a reproducer which
>> I also doesn't understand, just doesn't make me feel comfortable.
>>
> 
> Let me share what I understood until now. The low key is zeroed. The
> high key is defined and uses a fmr_physical of value zero, which is
> smaller than the first data block for the 1k-block ext4 fs (which starts
> at offset 1024).
> 
> -> ext4_getfsmap_datadev()
>    keys[0].fmr_physical = 0, keys[1].fmr_physical = 0
>    bofs = le32_to_cpu(sbi->s_es->s_first_data_block) = 1, eofs = 256
>    start_fsb = keys[0].fmr_physical = 1, end_fsb = keys[1].fmr_physical = 0
>    -> ext4_get_group_no_and_offset()
>      blocknr = 1, le32_to_cpu(es->s_first_data_block) =1
>    start_ag = 0, first_cluster = 0
>    ->
>      blocknr = 0, le32_to_cpu(es->s_first_data_block) =1
>    end_ag = 4294967295, last_cluster = 8191

because of poor key validation we get a wrong end_ag which eventually
causes the BUG_ON.

> 
>    Then there's a loop that stops when info->gfi_agno <= end_ag; that 
> will trigger the BUG_ON in ext4_get_group_info() as the group nr exceeds 
> EXT4_SB(sb)->s_groups_count)
>    -> ext4_mballoc_query_range()
>      -> ext4_mb_load_buddy()
>        -> ext4_mb_load_buddy_gfp()
>          -> ext4_get_group_info()
> 
> It's an out of bounds request and Darrick suggested to not return any
> mapping for the byte range 0-1023 for the 1k-block filesystem. The
> alternative would be to return -EINVAL when the high key starts at
> fmr_phisical of value zero for the 1k-block fs.
> 
> In order to reproduce this one would have to create an 1k-block ext4 fs
> and to pass a high key with fmr_physical of value zero, thus I would
> expect to reproduce it with something like this:
> xfs_io -c 'fsmap -d 0 0' /mnt/scratch
> 
> However when doing this I notice that in
> xfsprogs-dev/io/fsmap.c l->fmr_device and h->fmr_device will have value
> zero, FS_IOC_GETFSMAP is called and then we receive no entries
> (head->fmh_entries = 0). Now I'm trying to see what I do wrong, and how
> to reproduce the bug.
> 
> Cheers,
> ta

Tudor Ambarus Feb. 15, 2023, 4:26 p.m. UTC | #6

On 2/15/23 11:53, Tudor Ambarus wrote:
> 
> 
> On 2/15/23 11:46, Tudor Ambarus wrote:
>> Hi, Ted!
>>
>> On 2/15/23 04:32, Theodore Ts'o wrote:
>>> On Wed, Jan 04, 2023 at 09:58:03AM +0800, Jun Nie wrote:
>>>> Darrick J. Wong <djwong@kernel.org> 于2023年1月4日周三 03:17写道：
>>>>>
>>>>> On Thu, Dec 29, 2022 at 09:45:02AM +0800, Jun Nie wrote:
>>>>>> For 1k-block filesystems, the filesystem starts at block 1, not 
>>>>>> block 0.
>>>>>> If start_fsb is 0, it will be bump up to s_first_data_block. Then
>>>>>> ext4_get_group_no_and_offset don't know what to do and return garbage
>>>>>> results (blockgroup 2^32-1). The underflow make index
>>>>>> exceed es->s_groups_count in ext4_get_group_info() and trigger the 
>>>>>> BUG_ON.
>>>>>>
>>>>>> Fixes: 4a4956249dac0 ("ext4: fix off-by-one fsmap error on 1k 
>>>>>> block filesystems")
>>>>>> Link: 
>>>>>> https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002
>>>>>> Reported-by: syzbot+6be2b977c89f79b6b153@syzkaller.appspotmail.com
>>>>>> Signed-off-by: Jun Nie <jun.nie@linaro.org>
>>>>>> ---
>>>>>>   fs/ext4/fsmap.c | 6 ++++++
>>>>>>   1 file changed, 6 insertions(+)
>>>>>>
>>>>>> diff --git a/fs/ext4/fsmap.c b/fs/ext4/fsmap.c
>>>>>> index 4493ef0c715e..1aef127b0634 100644
>>>>>> --- a/fs/ext4/fsmap.c
>>>>>> +++ b/fs/ext4/fsmap.c
>>>>>> @@ -702,6 +702,12 @@ int ext4_getfsmap(struct super_block *sb, 
>>>>>> struct ext4_fsmap_head *head,
>>>>>>                if (handlers[i].gfd_dev > 
>>>>>> head->fmh_keys[0].fmr_device)
>>>>>>                        memset(&dkeys[0], 0, sizeof(struct 
>>>>>> ext4_fsmap));
>>>>>>
>>>>>> +             /*
>>>>>> +              * Re-check the range after above limit operation 
>>>>>> and reject
>>>>>> +              * 1K fs on block 0 as fs should start block 1. */
>>>>>> +             if (dkeys[0].fmr_physical ==0 && 
>>>>>> dkeys[1].fmr_physical == 0)
>>>>>> +                     continue;
>>>>>
>>>>> ...and if this filesystem has 4k blocks, and therefore *does* define a
>>>>> block 0?
>>>>
>>>> Yes, this is a real corner case test :-)
>>>
>>> So I'm really nervous about this change.  I don't understand the code;
>>> and I don't understand how the reproducer works.  I can certainly
>>> reproduce it using the reproducer found here[1], but it seems to
>>> require running multiple processes all creating loop devices and then
>>> running FS_IOC_GETMAP.
>>>
>>> [1] 
>>> https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002
>>>
>>> If I change the reproducer to just run the execute_one() once, it
>>> doesn't trigger the bug.  It seems to only trigger when you have
>>> multiple processes all racing to create a loop device, mount the file
>>> system, try running FS_IOC_GETMAP --- and then delete the loop device
>>> without actually unmounting the file system.  Which is **weird***.
>>>
>>> I've tried taking the image, and just running "xfs_io -c fsmap /mnt",
>>> and that doesn't trigger it either.
>>>
>>> And I don't understand the reply to Darrick's question about why it's
>>> safe to add the check since for 4k block file systems, block 0 *is*
>>> valid.
>>>
>>> So if someone can explain to me what is going on here with this code
>>> (there are too many abstractions and what's going on with keys is just
>>> making my head hurt), *and* what the change actually does, and how to
>>> reproduce the problem with a ***simple*** reproducer -- the syzbot
>>> mess doesn't count, that would be great.  But applying a change that I
>>> don't understand to code I don't understand, to fix a reproducer which
>>> I also doesn't understand, just doesn't make me feel comfortable.
>>>
>>
>> Let me share what I understood until now. The low key is zeroed. The
>> high key is defined and uses a fmr_physical of value zero, which is
>> smaller than the first data block for the 1k-block ext4 fs (which starts
>> at offset 1024).
>>
>> -> ext4_getfsmap_datadev()
>>    keys[0].fmr_physical = 0, keys[1].fmr_physical = 0
>>    bofs = le32_to_cpu(sbi->s_es->s_first_data_block) = 1, eofs = 256
>>    start_fsb = keys[0].fmr_physical = 1, end_fsb = 
>> keys[1].fmr_physical = 0
>>    -> ext4_get_group_no_and_offset()
>>      blocknr = 1, le32_to_cpu(es->s_first_data_block) =1
>>    start_ag = 0, first_cluster = 0
>>    ->
>>      blocknr = 0, le32_to_cpu(es->s_first_data_block) =1
>>    end_ag = 4294967295, last_cluster = 8191
> 
> because of poor key validation we get a wrong end_ag which eventually
> causes the BUG_ON.
> 
>>
>>    Then there's a loop that stops when info->gfi_agno <= end_ag; that 
>> will trigger the BUG_ON in ext4_get_group_info() as the group nr 
>> exceeds EXT4_SB(sb)->s_groups_count)
>>    -> ext4_mballoc_query_range()
>>      -> ext4_mb_load_buddy()
>>        -> ext4_mb_load_buddy_gfp()
>>          -> ext4_get_group_info()
>>
>> It's an out of bounds request and Darrick suggested to not return any
>> mapping for the byte range 0-1023 for the 1k-block filesystem. The
>> alternative would be to return -EINVAL when the high key starts at
>> fmr_phisical of value zero for the 1k-block fs.
>>
>> In order to reproduce this one would have to create an 1k-block ext4 fs
>> and to pass a high key with fmr_physical of value zero, thus I would
>> expect to reproduce it with something like this:
>> xfs_io -c 'fsmap -d 0 0' /mnt/scratch
>>
>> However when doing this I notice that in
>> xfsprogs-dev/io/fsmap.c l->fmr_device and h->fmr_device will have value
>> zero, FS_IOC_GETFSMAP is called and then we receive no entries
>> (head->fmh_entries = 0). Now I'm trying to see what I do wrong, and how
>> to reproduce the bug.
>>


What I think it happens for the reproducer that I proposed, is that when
both {l, h}->fmr_device have value zero, the code exits early before
getting the fsmap:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/ext4/fsmap.c?h=v6.2-rc8#n691

Also, to my untrained fs eye it seems that the [-d|-l|-r] xfs_io's fsmap
options are intended only for XFS, as the {data, log, realtime} sections
are XFS specific. I wonder why "struct fs_path" from libfrog/paths.h is
not renamed to "struct xfs_path", it would have been less confusing.

It looks there's no support for xfs_io to query for a start and end
offset when asking for a fsmap on an ext4 fs. I'm checking how I can
extend the xfs_io fsmap ext4 support to validate my assumptions.

Cheers,
ta

Tudor Ambarus Feb. 22, 2023, 3:27 p.m. UTC | #7

Hi!

On 2/15/23 04:32, Theodore Ts'o wrote:
> So if someone can explain to me what is going on here with this code
> (there are too many abstractions and what's going on with keys is just
> making my head hurt),*and*  what the change actually does, and how to
> reproduce the problem with a ***simple*** reproducer -- the syzbot
> mess doesn't count, that would be great.  But applying a change that I

I proposed a patch fixing this at:
https://lore.kernel.org/linux-ext4/20230222131211.3898066-1-tudor.ambarus@linaro.org/T/

Darrick proposed a similar one at:
https://lore.kernel.org/linux-ext4/Y+58NPTH7VNGgzdd@magnolia/

I explained the difference between the two in my cover letter.

Cheers,
ta

diff mbox series

Patch

diff --git a/fs/ext4/fsmap.c b/fs/ext4/fsmap.c
index 4493ef0c715e..1aef127b0634 100644
--- a/fs/ext4/fsmap.c
+++ b/fs/ext4/fsmap.c
@@ -702,6 +702,12 @@  int ext4_getfsmap(struct super_block *sb, struct ext4_fsmap_head *head,
 		if (handlers[i].gfd_dev > head->fmh_keys[0].fmr_device)
 			memset(&dkeys[0], 0, sizeof(struct ext4_fsmap));
 
+		/*
+		 * Re-check the range after above limit operation and reject
+		 * 1K fs on block 0 as fs should start block 1. */
+		if (dkeys[0].fmr_physical ==0 && dkeys[1].fmr_physical == 0)
+			continue;
+
 		info.gfi_dev = handlers[i].gfd_dev;
 		info.gfi_last = false;
 		info.gfi_agno = -1;