Message ID | 20221229014502.2322727-1-jun.nie@linaro.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp2186295wrt; Wed, 28 Dec 2022 18:12:24 -0800 (PST) X-Google-Smtp-Source: AMrXdXv/ChGGtkq8OAoaQmVP+1JA5jusrnYjYlkdvJnLsJ/vM5U5jdAU7CJ4oSVrcJLOhlYrKPaV X-Received: by 2002:a05:6a20:cb44:b0:af:e129:cc4 with SMTP id hd4-20020a056a20cb4400b000afe1290cc4mr26809644pzb.41.1672279943823; Wed, 28 Dec 2022 18:12:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672279943; cv=none; d=google.com; s=arc-20160816; b=BpW9c3VUXFKtyZqHD1CdUbid9wQbHdNKgl812CHNX7wErkK9XVFdLzpgEXG+jFn/4k KEl+sM7YnqJCD1RvJRUokg4Nqb6cK7qyoRrRv3DVzf9ztXZdLIF6sEVvPtiwfaIcmoXv Ex5SLMPdzSRS+3M37MVfUCPrymIeHKqs32DVBwchgcqa767GKqKMagadXaEREJ1O7uwc 6S4fXBSY4EVXy0x5OQ48V/0lhLLCURxXDBt5Wk9YjmHWtYcxtLUQSZ6yeopFeap0Ngqw CkgHrUtMqif/kU98uoDPC04yyTKv8mG7XX0TdKMK//oD5Pl6aUAHAKAl4+MAU0fhoycQ wTKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=ZPAYQwDWE9OMZ/U/cOHoYvo1QQGJp7a/uZzvsCRbps0=; b=W40ebY7YDo/2hX+7+8iftvrSaQ7MZcKDA1KZngVnSmHas/oSdTgpStjolHk+BfK6e7 GFy8g7EQogI7wApwTMC1BE5CcXlyyrHe5VV3RP2pdr4w8zH+QLorEjQtUDTpmJREdBBM DPiOG6GynrPipGlG6BSXf17Qrsg4DJWoag9fx+F7za2JE/uXjxrlc3iJtvy0l0vsE5IL NiLrcT3hm+BposqoLW+oi9TghFAX8IXbYFwQUlHcQiVLaTeqxgbdtXOxUvHcWYlgAZY+ W87OwP0NRVmrFlUfEe4D7Y8q3STJUWZcVNKsm01UbgJpXCf5RBYaQtwoPnkiA7O+bU9M 8YVQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=nFcHKhnk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h15-20020a63574f000000b00493e77a24eesi17577244pgm.713.2022.12.28.18.12.12; Wed, 28 Dec 2022 18:12:23 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=nFcHKhnk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231229AbiL2Bov (ORCPT <rfc822;eddaouddi.ayoub@gmail.com> + 99 others); Wed, 28 Dec 2022 20:44:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38978 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230106AbiL2Bos (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 28 Dec 2022 20:44:48 -0500 Received: from mail-pg1-x52b.google.com (mail-pg1-x52b.google.com [IPv6:2607:f8b0:4864:20::52b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 359372DD8 for <linux-kernel@vger.kernel.org>; Wed, 28 Dec 2022 17:44:48 -0800 (PST) Received: by mail-pg1-x52b.google.com with SMTP id d10so11525412pgm.13 for <linux-kernel@vger.kernel.org>; Wed, 28 Dec 2022 17:44:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=ZPAYQwDWE9OMZ/U/cOHoYvo1QQGJp7a/uZzvsCRbps0=; b=nFcHKhnkpBC8BREMcxP6tbEj+TD9ajmkyJeQtRZZQ/8zkzT5/QZOU78DQ21N+GzqYH CQXOJTtSFRibm/v+SzFg61MduaSpd+9kBPFEwfxmRkSMyd8Uw+jyIdBrkCCr/3WNhkD9 5e6g8pY/n6Towtipvc5j8LuWaEuVoX4/0heST9kOcWC30YYNCB/oo1hUg+DJYMuFBlxx 8lNsSEjSkC4nDBGRC8Mnqjjl6gNx+i4XRQHMKta54ZX+TSftPH3BgAH8gzvmpny1Z347 1qjn6qbwDMQ0FyiyZPn3E7Tqjy1vCRF+3L3ijw3jKeXuHOl4nopIbpcJa8nB+8djRSta E0CA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ZPAYQwDWE9OMZ/U/cOHoYvo1QQGJp7a/uZzvsCRbps0=; b=f0lFdSY5hKkTp+8Of/yvTJeAi9IKO7hyARgFX8cd7FuXfsInPChW50WbqGiRUZgnNw sip9E8gg5Cd30NSQnOfs3vdYjAK3lbXmJjax398KuI01uSL8Ip0N56WTPOxO3rBwxWIZ VOHv/l7IIrFmqvJh1H2D3/vQwqEsZb+OVeWZe0QHmZO/F7e/xZQJBCf7Tibdh/g4i1Rg AhjWf6slgwg8Xkxing8Bd3SDfA7K4rFTlSTpyqpwpWEs+eaqDT9MLd3mK8+5q8qgzwOA DP7jhgcjz23BRvXQgL4LsvN8jJgOmKTEJNsXZjjxArCaqW6u8OP020bUMO9RuQq8YupK +RSg== X-Gm-Message-State: AFqh2kp9HZZkLWxtkjiT3DUrwfKkhCqHuInyuxo1H9RCNnugYED/r3ho KxhOQSc20QC0C4YEKHBuMSNAxg== X-Received: by 2002:a05:6a00:410b:b0:57a:9482:843b with SMTP id bu11-20020a056a00410b00b0057a9482843bmr30916851pfb.5.1672278287678; Wed, 28 Dec 2022 17:44:47 -0800 (PST) Received: from niej-dt-7B47.. (80.251.214.228.16clouds.com. [80.251.214.228]) by smtp.gmail.com with ESMTPSA id 14-20020a62150e000000b005609d3d3008sm5433780pfv.171.2022.12.28.17.44.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Dec 2022 17:44:46 -0800 (PST) From: Jun Nie <jun.nie@linaro.org> To: djwong@kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org Cc: tudor.ambarus@linaro.org Subject: [PATCH] ext4: reject 1k block fs on the first block of disk Date: Thu, 29 Dec 2022 09:45:02 +0800 Message-Id: <20221229014502.2322727-1-jun.nie@linaro.org> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1753512614334277728?= X-GMAIL-MSGID: =?utf-8?q?1753512614334277728?= |
Series |
ext4: reject 1k block fs on the first block of disk
|
|
Commit Message
Jun Nie
Dec. 29, 2022, 1:45 a.m. UTC
For 1k-block filesystems, the filesystem starts at block 1, not block 0.
If start_fsb is 0, it will be bump up to s_first_data_block. Then
ext4_get_group_no_and_offset don't know what to do and return garbage
results (blockgroup 2^32-1). The underflow make index
exceed es->s_groups_count in ext4_get_group_info() and trigger the BUG_ON.
Fixes: 4a4956249dac0 ("ext4: fix off-by-one fsmap error on 1k block filesystems")
Link: https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002
Reported-by: syzbot+6be2b977c89f79b6b153@syzkaller.appspotmail.com
Signed-off-by: Jun Nie <jun.nie@linaro.org>
---
fs/ext4/fsmap.c | 6 ++++++
1 file changed, 6 insertions(+)
Comments
On Thu, Dec 29, 2022 at 09:45:02AM +0800, Jun Nie wrote: > For 1k-block filesystems, the filesystem starts at block 1, not block 0. > If start_fsb is 0, it will be bump up to s_first_data_block. Then > ext4_get_group_no_and_offset don't know what to do and return garbage > results (blockgroup 2^32-1). The underflow make index > exceed es->s_groups_count in ext4_get_group_info() and trigger the BUG_ON. > > Fixes: 4a4956249dac0 ("ext4: fix off-by-one fsmap error on 1k block filesystems") > Link: https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002 > Reported-by: syzbot+6be2b977c89f79b6b153@syzkaller.appspotmail.com > Signed-off-by: Jun Nie <jun.nie@linaro.org> > --- > fs/ext4/fsmap.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/fs/ext4/fsmap.c b/fs/ext4/fsmap.c > index 4493ef0c715e..1aef127b0634 100644 > --- a/fs/ext4/fsmap.c > +++ b/fs/ext4/fsmap.c > @@ -702,6 +702,12 @@ int ext4_getfsmap(struct super_block *sb, struct ext4_fsmap_head *head, > if (handlers[i].gfd_dev > head->fmh_keys[0].fmr_device) > memset(&dkeys[0], 0, sizeof(struct ext4_fsmap)); > > + /* > + * Re-check the range after above limit operation and reject > + * 1K fs on block 0 as fs should start block 1. */ > + if (dkeys[0].fmr_physical ==0 && dkeys[1].fmr_physical == 0) > + continue; ...and if this filesystem has 4k blocks, and therefore *does* define a block 0? --D > + > info.gfi_dev = handlers[i].gfd_dev; > info.gfi_last = false; > info.gfi_agno = -1; > -- > 2.34.1 >
Darrick J. Wong <djwong@kernel.org> 于2023年1月4日周三 03:17写道: > > On Thu, Dec 29, 2022 at 09:45:02AM +0800, Jun Nie wrote: > > For 1k-block filesystems, the filesystem starts at block 1, not block 0. > > If start_fsb is 0, it will be bump up to s_first_data_block. Then > > ext4_get_group_no_and_offset don't know what to do and return garbage > > results (blockgroup 2^32-1). The underflow make index > > exceed es->s_groups_count in ext4_get_group_info() and trigger the BUG_ON. > > > > Fixes: 4a4956249dac0 ("ext4: fix off-by-one fsmap error on 1k block filesystems") > > Link: https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002 > > Reported-by: syzbot+6be2b977c89f79b6b153@syzkaller.appspotmail.com > > Signed-off-by: Jun Nie <jun.nie@linaro.org> > > --- > > fs/ext4/fsmap.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/fs/ext4/fsmap.c b/fs/ext4/fsmap.c > > index 4493ef0c715e..1aef127b0634 100644 > > --- a/fs/ext4/fsmap.c > > +++ b/fs/ext4/fsmap.c > > @@ -702,6 +702,12 @@ int ext4_getfsmap(struct super_block *sb, struct ext4_fsmap_head *head, > > if (handlers[i].gfd_dev > head->fmh_keys[0].fmr_device) > > memset(&dkeys[0], 0, sizeof(struct ext4_fsmap)); > > > > + /* > > + * Re-check the range after above limit operation and reject > > + * 1K fs on block 0 as fs should start block 1. */ > > + if (dkeys[0].fmr_physical ==0 && dkeys[1].fmr_physical == 0) > > + continue; > > ...and if this filesystem has 4k blocks, and therefore *does* define a > block 0? Yes, this is a real corner case test :-) > > --D > > > + > > info.gfi_dev = handlers[i].gfd_dev; > > info.gfi_last = false; > > info.gfi_agno = -1; > > -- > > 2.34.1 > >
On Wed, Jan 04, 2023 at 09:58:03AM +0800, Jun Nie wrote: > Darrick J. Wong <djwong@kernel.org> 于2023年1月4日周三 03:17写道: > > > > On Thu, Dec 29, 2022 at 09:45:02AM +0800, Jun Nie wrote: > > > For 1k-block filesystems, the filesystem starts at block 1, not block 0. > > > If start_fsb is 0, it will be bump up to s_first_data_block. Then > > > ext4_get_group_no_and_offset don't know what to do and return garbage > > > results (blockgroup 2^32-1). The underflow make index > > > exceed es->s_groups_count in ext4_get_group_info() and trigger the BUG_ON. > > > > > > Fixes: 4a4956249dac0 ("ext4: fix off-by-one fsmap error on 1k block filesystems") > > > Link: https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002 > > > Reported-by: syzbot+6be2b977c89f79b6b153@syzkaller.appspotmail.com > > > Signed-off-by: Jun Nie <jun.nie@linaro.org> > > > --- > > > fs/ext4/fsmap.c | 6 ++++++ > > > 1 file changed, 6 insertions(+) > > > > > > diff --git a/fs/ext4/fsmap.c b/fs/ext4/fsmap.c > > > index 4493ef0c715e..1aef127b0634 100644 > > > --- a/fs/ext4/fsmap.c > > > +++ b/fs/ext4/fsmap.c > > > @@ -702,6 +702,12 @@ int ext4_getfsmap(struct super_block *sb, struct ext4_fsmap_head *head, > > > if (handlers[i].gfd_dev > head->fmh_keys[0].fmr_device) > > > memset(&dkeys[0], 0, sizeof(struct ext4_fsmap)); > > > > > > + /* > > > + * Re-check the range after above limit operation and reject > > > + * 1K fs on block 0 as fs should start block 1. */ > > > + if (dkeys[0].fmr_physical ==0 && dkeys[1].fmr_physical == 0) > > > + continue; > > > > ...and if this filesystem has 4k blocks, and therefore *does* define a > > block 0? > > Yes, this is a real corner case test :-) So I'm really nervous about this change. I don't understand the code; and I don't understand how the reproducer works. I can certainly reproduce it using the reproducer found here[1], but it seems to require running multiple processes all creating loop devices and then running FS_IOC_GETMAP. [1] https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002 If I change the reproducer to just run the execute_one() once, it doesn't trigger the bug. It seems to only trigger when you have multiple processes all racing to create a loop device, mount the file system, try running FS_IOC_GETMAP --- and then delete the loop device without actually unmounting the file system. Which is **weird***. I've tried taking the image, and just running "xfs_io -c fsmap /mnt", and that doesn't trigger it either. And I don't understand the reply to Darrick's question about why it's safe to add the check since for 4k block file systems, block 0 *is* valid. So if someone can explain to me what is going on here with this code (there are too many abstractions and what's going on with keys is just making my head hurt), *and* what the change actually does, and how to reproduce the problem with a ***simple*** reproducer -- the syzbot mess doesn't count, that would be great. But applying a change that I don't understand to code I don't understand, to fix a reproducer which I also doesn't understand, just doesn't make me feel comfortable. Regards, - Ted
Hi, Ted! On 2/15/23 04:32, Theodore Ts'o wrote: > On Wed, Jan 04, 2023 at 09:58:03AM +0800, Jun Nie wrote: >> Darrick J. Wong <djwong@kernel.org> 于2023年1月4日周三 03:17写道: >>> >>> On Thu, Dec 29, 2022 at 09:45:02AM +0800, Jun Nie wrote: >>>> For 1k-block filesystems, the filesystem starts at block 1, not block 0. >>>> If start_fsb is 0, it will be bump up to s_first_data_block. Then >>>> ext4_get_group_no_and_offset don't know what to do and return garbage >>>> results (blockgroup 2^32-1). The underflow make index >>>> exceed es->s_groups_count in ext4_get_group_info() and trigger the BUG_ON. >>>> >>>> Fixes: 4a4956249dac0 ("ext4: fix off-by-one fsmap error on 1k block filesystems") >>>> Link: https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002 >>>> Reported-by: syzbot+6be2b977c89f79b6b153@syzkaller.appspotmail.com >>>> Signed-off-by: Jun Nie <jun.nie@linaro.org> >>>> --- >>>> fs/ext4/fsmap.c | 6 ++++++ >>>> 1 file changed, 6 insertions(+) >>>> >>>> diff --git a/fs/ext4/fsmap.c b/fs/ext4/fsmap.c >>>> index 4493ef0c715e..1aef127b0634 100644 >>>> --- a/fs/ext4/fsmap.c >>>> +++ b/fs/ext4/fsmap.c >>>> @@ -702,6 +702,12 @@ int ext4_getfsmap(struct super_block *sb, struct ext4_fsmap_head *head, >>>> if (handlers[i].gfd_dev > head->fmh_keys[0].fmr_device) >>>> memset(&dkeys[0], 0, sizeof(struct ext4_fsmap)); >>>> >>>> + /* >>>> + * Re-check the range after above limit operation and reject >>>> + * 1K fs on block 0 as fs should start block 1. */ >>>> + if (dkeys[0].fmr_physical ==0 && dkeys[1].fmr_physical == 0) >>>> + continue; >>> >>> ...and if this filesystem has 4k blocks, and therefore *does* define a >>> block 0? >> >> Yes, this is a real corner case test :-) > > So I'm really nervous about this change. I don't understand the code; > and I don't understand how the reproducer works. I can certainly > reproduce it using the reproducer found here[1], but it seems to > require running multiple processes all creating loop devices and then > running FS_IOC_GETMAP. > > [1] https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002 > > If I change the reproducer to just run the execute_one() once, it > doesn't trigger the bug. It seems to only trigger when you have > multiple processes all racing to create a loop device, mount the file > system, try running FS_IOC_GETMAP --- and then delete the loop device > without actually unmounting the file system. Which is **weird***. > > I've tried taking the image, and just running "xfs_io -c fsmap /mnt", > and that doesn't trigger it either. > > And I don't understand the reply to Darrick's question about why it's > safe to add the check since for 4k block file systems, block 0 *is* > valid. > > So if someone can explain to me what is going on here with this code > (there are too many abstractions and what's going on with keys is just > making my head hurt), *and* what the change actually does, and how to > reproduce the problem with a ***simple*** reproducer -- the syzbot > mess doesn't count, that would be great. But applying a change that I > don't understand to code I don't understand, to fix a reproducer which > I also doesn't understand, just doesn't make me feel comfortable. > Let me share what I understood until now. The low key is zeroed. The high key is defined and uses a fmr_physical of value zero, which is smaller than the first data block for the 1k-block ext4 fs (which starts at offset 1024). -> ext4_getfsmap_datadev() keys[0].fmr_physical = 0, keys[1].fmr_physical = 0 bofs = le32_to_cpu(sbi->s_es->s_first_data_block) = 1, eofs = 256 start_fsb = keys[0].fmr_physical = 1, end_fsb = keys[1].fmr_physical = 0 -> ext4_get_group_no_and_offset() blocknr = 1, le32_to_cpu(es->s_first_data_block) =1 start_ag = 0, first_cluster = 0 -> blocknr = 0, le32_to_cpu(es->s_first_data_block) =1 end_ag = 4294967295, last_cluster = 8191 Then there's a loop that stops when info->gfi_agno <= end_ag; that will trigger the BUG_ON in ext4_get_group_info() as the group nr exceeds EXT4_SB(sb)->s_groups_count) -> ext4_mballoc_query_range() -> ext4_mb_load_buddy() -> ext4_mb_load_buddy_gfp() -> ext4_get_group_info() It's an out of bounds request and Darrick suggested to not return any mapping for the byte range 0-1023 for the 1k-block filesystem. The alternative would be to return -EINVAL when the high key starts at fmr_phisical of value zero for the 1k-block fs. In order to reproduce this one would have to create an 1k-block ext4 fs and to pass a high key with fmr_physical of value zero, thus I would expect to reproduce it with something like this: xfs_io -c 'fsmap -d 0 0' /mnt/scratch However when doing this I notice that in xfsprogs-dev/io/fsmap.c l->fmr_device and h->fmr_device will have value zero, FS_IOC_GETFSMAP is called and then we receive no entries (head->fmh_entries = 0). Now I'm trying to see what I do wrong, and how to reproduce the bug. Cheers, ta
On 2/15/23 11:46, Tudor Ambarus wrote: > Hi, Ted! > > On 2/15/23 04:32, Theodore Ts'o wrote: >> On Wed, Jan 04, 2023 at 09:58:03AM +0800, Jun Nie wrote: >>> Darrick J. Wong <djwong@kernel.org> 于2023年1月4日周三 03:17写道: >>>> >>>> On Thu, Dec 29, 2022 at 09:45:02AM +0800, Jun Nie wrote: >>>>> For 1k-block filesystems, the filesystem starts at block 1, not >>>>> block 0. >>>>> If start_fsb is 0, it will be bump up to s_first_data_block. Then >>>>> ext4_get_group_no_and_offset don't know what to do and return garbage >>>>> results (blockgroup 2^32-1). The underflow make index >>>>> exceed es->s_groups_count in ext4_get_group_info() and trigger the >>>>> BUG_ON. >>>>> >>>>> Fixes: 4a4956249dac0 ("ext4: fix off-by-one fsmap error on 1k block >>>>> filesystems") >>>>> Link: >>>>> https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002 >>>>> Reported-by: syzbot+6be2b977c89f79b6b153@syzkaller.appspotmail.com >>>>> Signed-off-by: Jun Nie <jun.nie@linaro.org> >>>>> --- >>>>> fs/ext4/fsmap.c | 6 ++++++ >>>>> 1 file changed, 6 insertions(+) >>>>> >>>>> diff --git a/fs/ext4/fsmap.c b/fs/ext4/fsmap.c >>>>> index 4493ef0c715e..1aef127b0634 100644 >>>>> --- a/fs/ext4/fsmap.c >>>>> +++ b/fs/ext4/fsmap.c >>>>> @@ -702,6 +702,12 @@ int ext4_getfsmap(struct super_block *sb, >>>>> struct ext4_fsmap_head *head, >>>>> if (handlers[i].gfd_dev > head->fmh_keys[0].fmr_device) >>>>> memset(&dkeys[0], 0, sizeof(struct >>>>> ext4_fsmap)); >>>>> >>>>> + /* >>>>> + * Re-check the range after above limit operation and >>>>> reject >>>>> + * 1K fs on block 0 as fs should start block 1. */ >>>>> + if (dkeys[0].fmr_physical ==0 && >>>>> dkeys[1].fmr_physical == 0) >>>>> + continue; >>>> >>>> ...and if this filesystem has 4k blocks, and therefore *does* define a >>>> block 0? >>> >>> Yes, this is a real corner case test :-) >> >> So I'm really nervous about this change. I don't understand the code; >> and I don't understand how the reproducer works. I can certainly >> reproduce it using the reproducer found here[1], but it seems to >> require running multiple processes all creating loop devices and then >> running FS_IOC_GETMAP. >> >> [1] >> https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002 >> >> If I change the reproducer to just run the execute_one() once, it >> doesn't trigger the bug. It seems to only trigger when you have >> multiple processes all racing to create a loop device, mount the file >> system, try running FS_IOC_GETMAP --- and then delete the loop device >> without actually unmounting the file system. Which is **weird***. >> >> I've tried taking the image, and just running "xfs_io -c fsmap /mnt", >> and that doesn't trigger it either. >> >> And I don't understand the reply to Darrick's question about why it's >> safe to add the check since for 4k block file systems, block 0 *is* >> valid. >> >> So if someone can explain to me what is going on here with this code >> (there are too many abstractions and what's going on with keys is just >> making my head hurt), *and* what the change actually does, and how to >> reproduce the problem with a ***simple*** reproducer -- the syzbot >> mess doesn't count, that would be great. But applying a change that I >> don't understand to code I don't understand, to fix a reproducer which >> I also doesn't understand, just doesn't make me feel comfortable. >> > > Let me share what I understood until now. The low key is zeroed. The > high key is defined and uses a fmr_physical of value zero, which is > smaller than the first data block for the 1k-block ext4 fs (which starts > at offset 1024). > > -> ext4_getfsmap_datadev() > keys[0].fmr_physical = 0, keys[1].fmr_physical = 0 > bofs = le32_to_cpu(sbi->s_es->s_first_data_block) = 1, eofs = 256 > start_fsb = keys[0].fmr_physical = 1, end_fsb = keys[1].fmr_physical = 0 > -> ext4_get_group_no_and_offset() > blocknr = 1, le32_to_cpu(es->s_first_data_block) =1 > start_ag = 0, first_cluster = 0 > -> > blocknr = 0, le32_to_cpu(es->s_first_data_block) =1 > end_ag = 4294967295, last_cluster = 8191 because of poor key validation we get a wrong end_ag which eventually causes the BUG_ON. > > Then there's a loop that stops when info->gfi_agno <= end_ag; that > will trigger the BUG_ON in ext4_get_group_info() as the group nr exceeds > EXT4_SB(sb)->s_groups_count) > -> ext4_mballoc_query_range() > -> ext4_mb_load_buddy() > -> ext4_mb_load_buddy_gfp() > -> ext4_get_group_info() > > It's an out of bounds request and Darrick suggested to not return any > mapping for the byte range 0-1023 for the 1k-block filesystem. The > alternative would be to return -EINVAL when the high key starts at > fmr_phisical of value zero for the 1k-block fs. > > In order to reproduce this one would have to create an 1k-block ext4 fs > and to pass a high key with fmr_physical of value zero, thus I would > expect to reproduce it with something like this: > xfs_io -c 'fsmap -d 0 0' /mnt/scratch > > However when doing this I notice that in > xfsprogs-dev/io/fsmap.c l->fmr_device and h->fmr_device will have value > zero, FS_IOC_GETFSMAP is called and then we receive no entries > (head->fmh_entries = 0). Now I'm trying to see what I do wrong, and how > to reproduce the bug. > > Cheers, > ta
On 2/15/23 11:53, Tudor Ambarus wrote: > > > On 2/15/23 11:46, Tudor Ambarus wrote: >> Hi, Ted! >> >> On 2/15/23 04:32, Theodore Ts'o wrote: >>> On Wed, Jan 04, 2023 at 09:58:03AM +0800, Jun Nie wrote: >>>> Darrick J. Wong <djwong@kernel.org> 于2023年1月4日周三 03:17写道: >>>>> >>>>> On Thu, Dec 29, 2022 at 09:45:02AM +0800, Jun Nie wrote: >>>>>> For 1k-block filesystems, the filesystem starts at block 1, not >>>>>> block 0. >>>>>> If start_fsb is 0, it will be bump up to s_first_data_block. Then >>>>>> ext4_get_group_no_and_offset don't know what to do and return garbage >>>>>> results (blockgroup 2^32-1). The underflow make index >>>>>> exceed es->s_groups_count in ext4_get_group_info() and trigger the >>>>>> BUG_ON. >>>>>> >>>>>> Fixes: 4a4956249dac0 ("ext4: fix off-by-one fsmap error on 1k >>>>>> block filesystems") >>>>>> Link: >>>>>> https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002 >>>>>> Reported-by: syzbot+6be2b977c89f79b6b153@syzkaller.appspotmail.com >>>>>> Signed-off-by: Jun Nie <jun.nie@linaro.org> >>>>>> --- >>>>>> fs/ext4/fsmap.c | 6 ++++++ >>>>>> 1 file changed, 6 insertions(+) >>>>>> >>>>>> diff --git a/fs/ext4/fsmap.c b/fs/ext4/fsmap.c >>>>>> index 4493ef0c715e..1aef127b0634 100644 >>>>>> --- a/fs/ext4/fsmap.c >>>>>> +++ b/fs/ext4/fsmap.c >>>>>> @@ -702,6 +702,12 @@ int ext4_getfsmap(struct super_block *sb, >>>>>> struct ext4_fsmap_head *head, >>>>>> if (handlers[i].gfd_dev > >>>>>> head->fmh_keys[0].fmr_device) >>>>>> memset(&dkeys[0], 0, sizeof(struct >>>>>> ext4_fsmap)); >>>>>> >>>>>> + /* >>>>>> + * Re-check the range after above limit operation >>>>>> and reject >>>>>> + * 1K fs on block 0 as fs should start block 1. */ >>>>>> + if (dkeys[0].fmr_physical ==0 && >>>>>> dkeys[1].fmr_physical == 0) >>>>>> + continue; >>>>> >>>>> ...and if this filesystem has 4k blocks, and therefore *does* define a >>>>> block 0? >>>> >>>> Yes, this is a real corner case test :-) >>> >>> So I'm really nervous about this change. I don't understand the code; >>> and I don't understand how the reproducer works. I can certainly >>> reproduce it using the reproducer found here[1], but it seems to >>> require running multiple processes all creating loop devices and then >>> running FS_IOC_GETMAP. >>> >>> [1] >>> https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002 >>> >>> If I change the reproducer to just run the execute_one() once, it >>> doesn't trigger the bug. It seems to only trigger when you have >>> multiple processes all racing to create a loop device, mount the file >>> system, try running FS_IOC_GETMAP --- and then delete the loop device >>> without actually unmounting the file system. Which is **weird***. >>> >>> I've tried taking the image, and just running "xfs_io -c fsmap /mnt", >>> and that doesn't trigger it either. >>> >>> And I don't understand the reply to Darrick's question about why it's >>> safe to add the check since for 4k block file systems, block 0 *is* >>> valid. >>> >>> So if someone can explain to me what is going on here with this code >>> (there are too many abstractions and what's going on with keys is just >>> making my head hurt), *and* what the change actually does, and how to >>> reproduce the problem with a ***simple*** reproducer -- the syzbot >>> mess doesn't count, that would be great. But applying a change that I >>> don't understand to code I don't understand, to fix a reproducer which >>> I also doesn't understand, just doesn't make me feel comfortable. >>> >> >> Let me share what I understood until now. The low key is zeroed. The >> high key is defined and uses a fmr_physical of value zero, which is >> smaller than the first data block for the 1k-block ext4 fs (which starts >> at offset 1024). >> >> -> ext4_getfsmap_datadev() >> keys[0].fmr_physical = 0, keys[1].fmr_physical = 0 >> bofs = le32_to_cpu(sbi->s_es->s_first_data_block) = 1, eofs = 256 >> start_fsb = keys[0].fmr_physical = 1, end_fsb = >> keys[1].fmr_physical = 0 >> -> ext4_get_group_no_and_offset() >> blocknr = 1, le32_to_cpu(es->s_first_data_block) =1 >> start_ag = 0, first_cluster = 0 >> -> >> blocknr = 0, le32_to_cpu(es->s_first_data_block) =1 >> end_ag = 4294967295, last_cluster = 8191 > > because of poor key validation we get a wrong end_ag which eventually > causes the BUG_ON. > >> >> Then there's a loop that stops when info->gfi_agno <= end_ag; that >> will trigger the BUG_ON in ext4_get_group_info() as the group nr >> exceeds EXT4_SB(sb)->s_groups_count) >> -> ext4_mballoc_query_range() >> -> ext4_mb_load_buddy() >> -> ext4_mb_load_buddy_gfp() >> -> ext4_get_group_info() >> >> It's an out of bounds request and Darrick suggested to not return any >> mapping for the byte range 0-1023 for the 1k-block filesystem. The >> alternative would be to return -EINVAL when the high key starts at >> fmr_phisical of value zero for the 1k-block fs. >> >> In order to reproduce this one would have to create an 1k-block ext4 fs >> and to pass a high key with fmr_physical of value zero, thus I would >> expect to reproduce it with something like this: >> xfs_io -c 'fsmap -d 0 0' /mnt/scratch >> >> However when doing this I notice that in >> xfsprogs-dev/io/fsmap.c l->fmr_device and h->fmr_device will have value >> zero, FS_IOC_GETFSMAP is called and then we receive no entries >> (head->fmh_entries = 0). Now I'm trying to see what I do wrong, and how >> to reproduce the bug. >> What I think it happens for the reproducer that I proposed, is that when both {l, h}->fmr_device have value zero, the code exits early before getting the fsmap: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/ext4/fsmap.c?h=v6.2-rc8#n691 Also, to my untrained fs eye it seems that the [-d|-l|-r] xfs_io's fsmap options are intended only for XFS, as the {data, log, realtime} sections are XFS specific. I wonder why "struct fs_path" from libfrog/paths.h is not renamed to "struct xfs_path", it would have been less confusing. It looks there's no support for xfs_io to query for a start and end offset when asking for a fsmap on an ext4 fs. I'm checking how I can extend the xfs_io fsmap ext4 support to validate my assumptions. Cheers, ta
Hi! On 2/15/23 04:32, Theodore Ts'o wrote: > So if someone can explain to me what is going on here with this code > (there are too many abstractions and what's going on with keys is just > making my head hurt),*and* what the change actually does, and how to > reproduce the problem with a ***simple*** reproducer -- the syzbot > mess doesn't count, that would be great. But applying a change that I I proposed a patch fixing this at: https://lore.kernel.org/linux-ext4/20230222131211.3898066-1-tudor.ambarus@linaro.org/T/ Darrick proposed a similar one at: https://lore.kernel.org/linux-ext4/Y+58NPTH7VNGgzdd@magnolia/ I explained the difference between the two in my cover letter. Cheers, ta
diff --git a/fs/ext4/fsmap.c b/fs/ext4/fsmap.c index 4493ef0c715e..1aef127b0634 100644 --- a/fs/ext4/fsmap.c +++ b/fs/ext4/fsmap.c @@ -702,6 +702,12 @@ int ext4_getfsmap(struct super_block *sb, struct ext4_fsmap_head *head, if (handlers[i].gfd_dev > head->fmh_keys[0].fmr_device) memset(&dkeys[0], 0, sizeof(struct ext4_fsmap)); + /* + * Re-check the range after above limit operation and reject + * 1K fs on block 0 as fs should start block 1. */ + if (dkeys[0].fmr_physical ==0 && dkeys[1].fmr_physical == 0) + continue; + info.gfi_dev = handlers[i].gfd_dev; info.gfi_last = false; info.gfi_agno = -1;