[12/13] ext4: remove unnecessary check for avoiding multiple update_backups in ext4_flex_group_add
Message ID | 20230629120044.1261968-13-shikemeng@huaweicloud.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp9389623vqr; Wed, 28 Jun 2023 21:31:58 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4WP9PvxQMDa0pKucm715lmfRaSHgr6otwTy5nfyTxAIjnQzSS/wd1UeOBU/nRZ19bXmuHQ X-Received: by 2002:a17:903:40c6:b0:1b8:3ddb:f9aa with SMTP id t6-20020a17090340c600b001b83ddbf9aamr3240547pld.14.1688013117709; Wed, 28 Jun 2023 21:31:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688013117; cv=none; d=google.com; s=arc-20160816; b=TZPZ2dOWXXhpvmPfZtHpkGykt4xu2QHe+HGSXT8r5Mil9wIw+Sp3r/IkD4fQ6+b79j YEbE4n1hs9emgNsKTctFoOkRO3kenXe/fqQ2p7wxuh72zH4s1KyV9VBpnZwhxl+D+fmi 7S7FUtCKvc4krGDM6UBS+aI+GRmHbGu/cgI6ROaa2uYKyo4b5MoifpgSqP32IAYrKkMo 76RDoreLwPVXxCibTBg4wMg4mBrgutU+9sWr/kdTkfpgufVARSjdrpPWgxx4MQyo5eQt J3X4cTV4p65VFCnz/XnJoqAEOK7ReNwC5IFPEFG+hhZe91RKz7Z5QKyniboxisIhEZsw SFYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=e9veEI1H38Bstlt5sUtgdoGTH0Um9U45TCpwtJ9GXyg=; fh=eu2llfKjKKC0QBUtW6YWJ13du6subP7SB8ZN6DjSWMA=; b=o+Ko6Bx3IWckZuZE/tj/3fsjUgXYIOdPodajODYU0+39pGHwo2Z0DOMA0BmJdfApzr zRmADHYvRBJbh2IyxY7l6zcWQRpw14koTpVpO5yO1dbN0HZnJLdXFC7720eeNoI1Bar1 ZjyLQFVjDcJujW9cTCHqE1p0aYJo8hvWhrJG71MhOQ+40kfezx85nIy/RHu4u/Ov9mo1 /T7m7Tb4cvC/mtkrLK06LUK+83fUSQIpZn2HarSLnJx6bPHr8aXq9C7tsDJ4RJ1pt9sN NaHRZ/YSSWzqnUhGrA9mmSa+JwrDjCDSBecPK3u0cvYuQ5h8exN5H1Dp81D5g2//u7Ga P8VA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ji20-20020a170903325400b001b8216cfbe0si3828758plb.138.2023.06.28.21.31.43; Wed, 28 Jun 2023 21:31:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232397AbjF2EBe (ORCPT <rfc822;adanhawthorn@gmail.com> + 99 others); Thu, 29 Jun 2023 00:01:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44338 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231925AbjF2D7z (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 28 Jun 2023 23:59:55 -0400 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 85C992D66; Wed, 28 Jun 2023 20:59:53 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.143]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4Qs4Tf1L0Vz4f3lXF; Thu, 29 Jun 2023 11:59:50 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP2 (Coremail) with SMTP id Syh0CgC3quiwAZ1kw6DMMg--.26995S14; Thu, 29 Jun 2023 11:59:50 +0800 (CST) From: Kemeng Shi <shikemeng@huaweicloud.com> To: tytso@mit.edu, adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 12/13] ext4: remove unnecessary check for avoiding multiple update_backups in ext4_flex_group_add Date: Thu, 29 Jun 2023 20:00:43 +0800 Message-Id: <20230629120044.1261968-13-shikemeng@huaweicloud.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: <20230629120044.1261968-1-shikemeng@huaweicloud.com> References: <20230629120044.1261968-1-shikemeng@huaweicloud.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID: Syh0CgC3quiwAZ1kw6DMMg--.26995S14 X-Coremail-Antispam: 1UD129KBjvJXoW7ZF1DJFy5trWkGF18AFyDGFg_yoW8Gr1fpw n0y3W8GrWjgr4v9a17Z34jg392kw48Ja4UZFyak3WYkrWUJas7WF97tF9YyF45tFZxAwn2 qF1YvrW7Aw1IgaDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBIb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M280x2IEY4vEnII2IxkI6r1a6r45M2 8IrcIa0xkI8VA2jI8067AKxVWUAVCq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAv FVAK0II2c7xJM28CjxkF64kEwVA0rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVW7JVWDJw A2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE 3s1l84ACjcxK6I8E87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr2 1l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv 67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82IYc2 Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s02 6x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAKI48JMIIF0x vE2Ix0cI8IcVAFwI0_Gr0_Xr1lIxAIcVC0I7IYx2IY6xkF7I0E14v26F4j6r4UJwCI42IY 6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcVC2z280aV CY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7IU0TqcUUUUUU== X-CM-SenderInfo: 5vklyvpphqwq5kxd4v5lfo033gof0z/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=0.0 required=5.0 tests=BAYES_00,DATE_IN_FUTURE_06_12, KHOP_HELO_FCRDNS,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1770010042690528913?= X-GMAIL-MSGID: =?utf-8?q?1770010042690528913?= |
Series |
fixes and cleanups to ext4 resize
|
|
Commit Message
Kemeng Shi
June 29, 2023, noon UTC
Commit 0acdb8876fead ("ext4: don't call update_backups() multiple times
for the same bg") add check in ext4_flex_group_add to avoid call
update_backups multiple times for block group descriptors in the same
group descriptor block. However, we have already call update_backups in
block step, so the added check is unnecessary.
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
fs/ext4/resize.c | 4 ----
1 file changed, 4 deletions(-)
Comments
On Thu, Jun 29, 2023 at 08:00:43PM +0800, Kemeng Shi wrote: > Commit 0acdb8876fead ("ext4: don't call update_backups() multiple times > for the same bg") add check in ext4_flex_group_add to avoid call > update_backups multiple times for block group descriptors in the same > group descriptor block. However, we have already call update_backups in > block step, so the added check is unnecessary. I'm having trouble understaind this comit. What do you mean by the "block step" in the last paragraph? - Ted
on 8/17/2023 10:11 PM, Theodore Ts'o wrote: > On Thu, Aug 17, 2023 at 11:53:52AM +0800, Kemeng Shi wrote: >> >> >> on 8/16/2023 11:47 AM, Theodore Ts'o wrote: >>> On Thu, Jun 29, 2023 at 08:00:43PM +0800, Kemeng Shi wrote: >>>> Commit 0acdb8876fead ("ext4: don't call update_backups() multiple times >>>> for the same bg") add check in ext4_flex_group_add to avoid call >>>> update_backups multiple times for block group descriptors in the same >>>> group descriptor block. However, we have already call update_backups in >>>> block step, so the added check is unnecessary. >>> >>> I'm having trouble understaind this comit. What do you mean by the >>> "block step" in the last paragraph? >>> >> Sorry for the confusing stuff. I mean we foreach in group descriptor block >> step instead of foreach in group descriptor step to update backup. >> So there is no chance to call update_backups for different descriptors >> in the same bg. > > I'm still confused. This commit is not so much removing an > unnecessary check as much as forcing update_backups() to be called for > every single block group. Right? > Ah, I guess here is the thing I missed that make this confusing: sbi->s_group_desc contains only primary block of each group. i.e. sbi->s_group_desc['i'] is the primary gdb block of group 'i'. It's clear that primary gdb blocks of different groups will not be the same. Then all blocks in s_group_desc[*] should be different and the removed check is unnecessary. From add_new_gdb and add_new_gdb_meta_bg, we can find that we always add primary gdb block of new group to s_group_desc. To be more specific: add_new_gdb gdblock = EXT4_SB(sb)->s_sbh->b_blocknr + 1 + gdb_num; gdb_bh = ext4_sb_bread(sb, gdblock, 0); n_group_desc[gdb_num] = gdb_bh; add_new_gdb_meta_bg gdblock = ext4_meta_bg_first_block_no(sb, group) + ext4_bg_has_super(sb, group); gdb_bh = ext4_sb_bread(sb, gdblock, 0); n_group_desc[gdb_num] = gdb_bh; > So either we're doing extra work, or (b) we're fixing a problem > because we actually *need* to call update_backups() for every single > block group. > > More generally, this whole patch series is making it clear that the > online resize code is hard to understand, because it's super > complicated, leading to potential bugs, and very clearly code which is > very hard to maintain. So this may mean we need better comments to > make it clear *when* the backup block groups are going to be > accomplished, given the various different cases (e.g., no flex_bg or > meta_bg, with flex_bg, or with meat_bg). > Yes, I agree with this. I wonder if a series to add comments of some common rules is good to you. Like the information mentioned above that s_group_desc contains primary gdb block of each group. > - Ted >
on 8/18/2023 1:00 PM, Theodore Ts'o wrote: > On Fri, Aug 18, 2023 at 11:16:35AM +0800, Kemeng Shi wrote: >> Ah, I guess here is the thing I missed that make this confusing: >> sbi->s_group_desc contains only primary block of each group. i.e. >> sbi->s_group_desc['i'] is the primary gdb block of group 'i'. > > Correct. In fact, when we need to modify a block group descriptor for > a group, we call ext4_get_group_desc(), and it references > sbi->s_group_desc to find the appropriate buffer_head for the bg > descriptor that we want to modify. > > I'm not sure "only" is the right adjective to use here, since the > whole *point* of s_group_desc[] is to keep the buffer_heads for the > block group descriptor blocks in memory, so we can modify them when we > allocate or free blocks, inodes, etc. And we only modify the primary > block group descriptors. > > The secondary, or backup block group descriptors are only by used > e2fsck when the primary block group descriptor has been overwritten, > so we can find the inode table, allocation bitmaps, and so on. So we > do *not* modify them in the course of normal operations, and that's by > design. The only time the kernel will update those block group > descriptors is when we are doing an online resize, and we need make > sure the backup descriptors are updated, so that if the primary > descriptors get completely smashed, we can still get critical > information such as the location of that block group's inode table. > >> From add_new_gdb and add_new_gdb_meta_bg, we can find that we always >> add primary gdb block of new group to s_group_desc. To be more specific: >> add_new_gdb >> gdblock = EXT4_SB(sb)->s_sbh->b_blocknr + 1 + gdb_num; >> gdb_bh = ext4_sb_bread(sb, gdblock, 0); >> n_group_desc[gdb_num] = gdb_bh; >> >> add_new_gdb_meta_bg >> gdblock = ext4_meta_bg_first_block_no(sb, group) + >> ext4_bg_has_super(sb, group); >> gdb_bh = ext4_sb_bread(sb, gdblock, 0); >> n_group_desc[gdb_num] = gdb_bh; > > Put another way, there are EXT4_DESC_PER_BLOCK(sb) bg descriptors in a > block. For a file system with the 64-bit feature enabled, the size of > the block group descriptor is 128. If the block size is 4096, then we > can fit 32 block group descriptors in a block. > > When we add a new block group such that its block group descriptor > will spill into a new block, then we need to expand s_group_desc[] > array, and initialize the new block group descriptor block. And > that's the job of add_new_gdb() and add_new_gdb_meta_bg(). > Hi Ted, thanks for the explain. Here are more updates from what I found: I find that descriptor_loc show layout of gdb blocks in s_group_desc[] which is: block of s_group_desc[i] will be superblock + i + 1 for non-meta_bg and 'first block of meta_bg' + has_super for meta_bg. Although descriptor_loc is called to initialize s_group_desc[], the expanded gdb block to s_group_desc from add_new_gdb obeys the same layout. Back to the original purpose of this patch which is to remove unnecessary of equality check of s_group_desc[gdb_num - 1].b_blocknr and s_group_desc[gdb_num].b_blocknr, we can see each s_group_desc has it's own block from layout above and the check should be unnecessary. But no insistant on this if you still have concern about it. >>> More generally, this whole patch series is making it clear that the >>> online resize code is hard to understand, because it's super >>> complicated, leading to potential bugs, and very clearly code which is >>> very hard to maintain. So this may mean we need better comments to >>> make it clear *when* the backup block groups are going to be >>> accomplished, given the various different cases (e.g., no flex_bg or >>> meta_bg, with flex_bg, or with meat_bg). >>> >> Yes, I agree with this. I wonder if a series to add comments of some >> common rules is good to you. Like the information mentioned above >> that s_group_desc contains primary gdb block of each group. > > Well, the meaning of s_group_desc[] was obvious to me, but that's why > it's useful to have somone with "new eyes" take a look at code, since > what may be obvious to old hands might not be obvious to someone > looking at the code for the first time --- and so yes, it's probably Yes. this is just for anyone starting to read this code. > worth documenting. The question is where is the best place to put it, > since the primary place where s_group_desc[] is *not* online resize. > > s_group_desc[] is initialized in ext4_group_desc_init() in > fs/ext4/super.c, and it is used in fs/ext4/balloc.c, and of course, it > is defined in fs/ext4.h. I plan to add comment in fs/ext4.h as following: struct ext4_sb_info { ... struct buffer_head * __rcu *s_group_desc; /* Primary gdb blocks of online groups */ But I'm not sure it's proper now as you menthioned s_group_desc[] is to keep the buffer_heads for the block group descriptor blocks in memory and it contains primary gdb block is a coincidence that we only modify primary block in kernel. Besides, I plan to go through the resize code again in fulture and add some comments to make it easy for anyone starting read this or make it easy to maintain. Please let me if you disklike it. > > - Ted >
on 8/19/2023 12:00 AM, Theodore Ts'o wrote: > On Fri, Aug 18, 2023 at 04:42:31PM +0800, Kemeng Shi wrote: >>> s_group_desc[] is initialized in ext4_group_desc_init() in >>> fs/ext4/super.c, and it is used in fs/ext4/balloc.c, and of course, it >>> is defined in fs/ext4.h. >> I plan to add comment in fs/ext4.h as following: >> struct ext4_sb_info { >> ... >> struct buffer_head * __rcu *s_group_desc; /* Primary gdb blocks of online groups */ >> But I'm not sure it's proper now as you menthioned s_group_desc[] is to >> keep the buffer_heads for the block group descriptor blocks in memory >> and it contains primary gdb block is a coincidence that we only modify >> primary block in kernel. > > In general, the terminology that ext4 developers have used is "block > group descriptors" and "backup block group descriptors". The kernel > never *uses* the backup block group users; and with the sparse_super2 > feature, the "backup superblocks" and "backup block group descriptors" > are optional. > Sure. > They are used by e2fsck if we need to recover a trashed superblock and > block group descriptors, which is why code that is resizing the file > system, or updating the label or the UUID need to update the backup > superblocks and/or backup block group descriptors so that we can > better recover disaster. OK, kernel updates backup blocks and userspace tool recovers from backup in case that blocks used by kernel is corrupted. > > So I'd just suggest changing the comment above to "array of bh's for > the block group descriptors". > Sure, I will do this in next version. Besides, what about the check this patch tries to remove. Would you prefer to keep it or remove it with better git description. Both ways are acceptable to me. Thanks. > Cheers, > > - Ted > >> Besides, I plan to go through the resize code again in fulture and >> add some comments to make it easy for anyone starting read this >> or make it easy to maintain. Please let me if you disklike it. > > P.S. > > BTW, a useful test program to add is one that checks to make sure that > the "static" parts of the superblock and block group descriptors > (i.e., the parts that don't get changed under normal operation while > the file system is mounted when the kernel *isn't* trying to do a > resize or change the label, UUID, or in the future, the new ioctl's to > update the parts of the superblock that can get modified by tune2fs), > and to make sure that all of the backup superblock and block group > descriptors have gotten updated> > Some of the bugs that you found may have resulted in some of the > backup bg descriptors not getting updataed, which we wouldn't > necessarily notice unless we had a test program that explicitly > checked for them. > > And truth to tell, the only backup superblock and block group > descriptor that actually gets used to recover the file system is the > first one, since that's the one e2fsck will fall back to > automatically. An expert might try to use one of the other backup > block groups as a desperate measure, and there might be some automated > programs that might be smart enough to use the backup block groups > when trying to recover the location of the partition table when the > file system and partition table is very badly damaged --- so that's > one of the reasons why with sparse_super2, the number of backup block > group descriptors can be limited to (for example) one located in the > first block group, and one located in the very last block group. > > Thanks for let me know and I do have no knowldge about how backup is used in usersapce.
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c index da832466ce74..d2b3ee50af31 100644 --- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -1589,7 +1589,6 @@ static int ext4_flex_group_add(struct super_block *sb, int meta_bg = ext4_has_feature_meta_bg(sb); sector_t padding_blocks = meta_bg ? 0 : sbi->s_sbh->b_blocknr - ext4_group_first_block_no(sb, 0); - sector_t old_gdb = 0; update_backups(sb, ext4_group_first_block_no(sb, 0), (char *)es, sizeof(struct ext4_super_block), 0); @@ -1598,11 +1597,8 @@ static int ext4_flex_group_add(struct super_block *sb, gdb_bh = sbi_array_rcu_deref(sbi, s_group_desc, gdb_num); - if (old_gdb == gdb_bh->b_blocknr) - continue; update_backups(sb, gdb_bh->b_blocknr - padding_blocks, gdb_bh->b_data, gdb_bh->b_size, meta_bg); - old_gdb = gdb_bh->b_blocknr; } } exit: