From patchwork Thu Sep 28 16:04:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kemeng Shi X-Patchwork-Id: 145849 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp3152514vqu; Thu, 28 Sep 2023 01:18:24 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGXf0gSfGPd1ss3lxvqKQJ5n13gjkLpBdt6oS2vm5mdm3s8+jPL50sSUrT2RCjSX/dEmbfI X-Received: by 2002:a05:6a00:3912:b0:690:2ecd:a593 with SMTP id fh18-20020a056a00391200b006902ecda593mr618617pfb.26.1695889104657; Thu, 28 Sep 2023 01:18:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695889104; cv=none; d=google.com; s=arc-20160816; b=wBpA1T5lTRgbz6MNIoMnaBH3O7QSKS0asyfOo2RZHtCA1FOtbVLlD2FX49DTGYKQQV CsYrNo39+PP+hOnPBoU1IO3hsixN55fRJHWV/uGdOX3B8OZM1UCkVjm9rQNdaV07GDNJ ROmXOL5x+vMuTzqV/9+lfHSZaA26KsVk0NLUFx1vZTxtyLcmgcxqF64TiT3PlWgWBqjo qzyLo1teIk7bsQyLJIayAlMQYrLjUizhrG0bPz+8dzN6g3Bn2oSF127UislC8TMe+flY 6AcTth7Np53veAqF5KnbFK5Q3Vs2EQxkUaZmnRiUmilQd1JoEpQTcLqA4dAA3Jf+2Gvg 1PwA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=8LDqVyufqp6/mt19FsEPdxtgNlZmMF316wL1B7J3CkI=; fh=qR1AtRP5/Uzr7BtlZREjHAGeAkw+P94zJeruTfY3MzU=; b=UK9wP/RmqPF9FxKArIxDxUzcSKVoHnXRidb6P6PApDZotPZlGZ2TfVLXDqgYlB6NgV e+6s7/6Hs8KiAbXL/WGcz6GBWgwHL6SmjWRwaFYR6HD14FriBtUFyeruLtrFN1W9CFua G9P62R/amy/8GsvsQM3AkcszBcZJPyfCuLgEoh41++FSVCNMLYkOzz3e3NB7jg+HGvQ7 TZ2Akw4/59XZJtdA1Fz3HDyjk90P6MTvRpaohvBNYDo48D27HbnszXMjzIDaIq2tzwRa XI183R2NBIAH+KJcW8uGYRQdreas0fqJPws9hc9qfizbBqQKWQLyLw/56y4cCRzpUcbY RcJw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id u9-20020a056a00158900b0069100d1fc37si19062614pfk.49.2023.09.28.01.18.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Sep 2023 01:18:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id D9BB9801B804; Thu, 28 Sep 2023 01:07:08 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231221AbjI1IFU (ORCPT + 20 others); Thu, 28 Sep 2023 04:05:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57948 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230509AbjI1IE5 (ORCPT ); Thu, 28 Sep 2023 04:04:57 -0400 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EE8E9AC; Thu, 28 Sep 2023 01:04:53 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.143]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4Rx5cK2QnZz4f3n6p; Thu, 28 Sep 2023 16:04:49 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgC3Td2eMxVlgAtdBg--.36922S8; Thu, 28 Sep 2023 16:04:51 +0800 (CST) From: Kemeng Shi To: ritesh.list@gmail.com, tytso@mit.edu, adilger.kernel@dilger.ca Cc: ojaswin@linux.ibm.com, linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v8 06/12] ext4: Separate block bitmap and buddy bitmap freeing in ext4_mb_clear_bb() Date: Fri, 29 Sep 2023 00:04:01 +0800 Message-Id: <20230928160407.142069-7-shikemeng@huaweicloud.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: <20230928160407.142069-1-shikemeng@huaweicloud.com> References: <20230928160407.142069-1-shikemeng@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: gCh0CgC3Td2eMxVlgAtdBg--.36922S8 X-Coremail-Antispam: 1UD129KBjvJXoW3WFWDAF4fCryrKw1fXF4ruFg_yoW3JFyxpr 9FyFnrCrn5GrnF9F40k34jq3WSkw48ua1DGrW3uryxCr1ayr9akFZ7tF93ZFWUtFZ7X3Z0 qr1Y93y8Cr4ag37anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUB2b4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M280x2IEY4vEnII2IxkI6r1a6r45M2 8IrcIa0xkI8VA2jI8067AKxVWUAVCq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAv FVAK0II2c7xJM28CjxkF64kEwVA0rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3w A2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE 3s1l84ACjcxK6I8E87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr2 1l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv 67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82IYc2 Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s02 6x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAKI48JMIIF0x vE2Ix0cI8IcVAFwI0_JFI_Gr1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4j6F4UMIIF0xvE 42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6x kF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjxUIL05UUUUU X-CM-SenderInfo: 5vklyvpphqwq5kxd4v5lfo033gof0z/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=-0.8 required=5.0 tests=DATE_IN_FUTURE_06_12, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Thu, 28 Sep 2023 01:07:08 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1778268613570292377 X-GMAIL-MSGID: 1778268613570292377 This patch separates block bitmap and buddy bitmap freeing in order to update block bitmap with ext4_mb_mark_context in following patch. Separated freeing is safe with concurrent allocation as long as: 1. Firstly allocate block in buddy bitmap, and then in block bitmap. 2. Firstly free block in block bitmap, and then buddy bitmap. Then freed block will only be available to allocation when both buddy bitmap and block bitmap are updated by freeing. Allocation obeys rule 1 already, just do sperated freeing with rule 2. Separated freeing has no race with generate_buddy as: Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date buddy page can be found in sbi->s_buddy_cache and no more buddy initialization of the buddy page will be executed concurrently until buddy page is unloaded. As we always do free in "load buddy, free, unload buddy" sequence, separated freeing has no race with generate_buddy. Signed-off-by: Kemeng Shi Reviewed-by: Ritesh Harjani (IBM) --- fs/ext4/mballoc.c | 98 +++++++++++++++++++++++------------------------ 1 file changed, 49 insertions(+), 49 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 236540a178ff..79d62078da9a 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -6423,7 +6423,7 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, ext4_error(sb, "Freeing blocks in system zone - " "Block = %llu, count = %lu", block, count); /* err = 0. ext4_std_error should be a no op */ - goto error_return; + goto error_out; } flags |= EXT4_FREE_BLOCKS_VALIDATED; @@ -6447,31 +6447,39 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, flags &= ~EXT4_FREE_BLOCKS_VALIDATED; } count_clusters = EXT4_NUM_B2C(sbi, count); + trace_ext4_mballoc_free(sb, inode, block_group, bit, count_clusters); + + /* __GFP_NOFAIL: retry infinitely, ignore TIF_MEMDIE and memcg limit. */ + err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b, + GFP_NOFS|__GFP_NOFAIL); + if (err) + goto error_out; + + if (!(flags & EXT4_FREE_BLOCKS_VALIDATED) && + !ext4_inode_block_valid(inode, block, count)) { + ext4_error(sb, "Freeing blocks in system zone - " + "Block = %llu, count = %lu", block, count); + /* err = 0. ext4_std_error should be a no op */ + goto error_clean; + } + bitmap_bh = ext4_read_block_bitmap(sb, block_group); if (IS_ERR(bitmap_bh)) { err = PTR_ERR(bitmap_bh); bitmap_bh = NULL; - goto error_return; + goto error_clean; } gdp = ext4_get_group_desc(sb, block_group, &gd_bh); if (!gdp) { err = -EIO; - goto error_return; - } - - if (!(flags & EXT4_FREE_BLOCKS_VALIDATED) && - !ext4_inode_block_valid(inode, block, count)) { - ext4_error(sb, "Freeing blocks in system zone - " - "Block = %llu, count = %lu", block, count); - /* err = 0. ext4_std_error should be a no op */ - goto error_return; + goto error_clean; } BUFFER_TRACE(bitmap_bh, "getting write access"); err = ext4_journal_get_write_access(handle, sb, bitmap_bh, EXT4_JTR_NONE); if (err) - goto error_return; + goto error_clean; /* * We are about to modify some metadata. Call the journal APIs @@ -6481,7 +6489,7 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, BUFFER_TRACE(gd_bh, "get_write_access"); err = ext4_journal_get_write_access(handle, sb, gd_bh, EXT4_JTR_NONE); if (err) - goto error_return; + goto error_clean; #ifdef AGGRESSIVE_CHECK { int i; @@ -6489,13 +6497,30 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, BUG_ON(!mb_test_bit(bit + i, bitmap_bh->b_data)); } #endif - trace_ext4_mballoc_free(sb, inode, block_group, bit, count_clusters); + ext4_lock_group(sb, block_group); + mb_clear_bits(bitmap_bh->b_data, bit, count_clusters); + ret = ext4_free_group_clusters(sb, gdp) + count_clusters; + ext4_free_group_clusters_set(sb, gdp, ret); + ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh); + ext4_group_desc_csum_set(sb, block_group, gdp); + ext4_unlock_group(sb, block_group); - /* __GFP_NOFAIL: retry infinitely, ignore TIF_MEMDIE and memcg limit. */ - err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b, - GFP_NOFS|__GFP_NOFAIL); - if (err) - goto error_return; + if (sbi->s_log_groups_per_flex) { + ext4_group_t flex_group = ext4_flex_group(sbi, block_group); + atomic64_add(count_clusters, + &sbi_array_rcu_deref(sbi, s_flex_groups, + flex_group)->free_clusters); + } + + /* We dirtied the bitmap block */ + BUFFER_TRACE(bitmap_bh, "dirtied bitmap block"); + err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh); + + /* And the group descriptor block */ + BUFFER_TRACE(gd_bh, "dirtied group descriptor block"); + ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh); + if (!err) + err = ret; /* * We need to make sure we don't reuse the freed block until after the @@ -6519,13 +6544,8 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, new_entry->efd_tid = handle->h_transaction->t_tid; ext4_lock_group(sb, block_group); - mb_clear_bits(bitmap_bh->b_data, bit, count_clusters); ext4_mb_free_metadata(handle, &e4b, new_entry); } else { - /* need to update group_info->bb_free and bitmap - * with group lock held. generate_buddy look at - * them with group lock_held - */ if (test_opt(sb, DISCARD)) { err = ext4_issue_discard(sb, block_group, bit, count_clusters, NULL); @@ -6538,23 +6558,11 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, EXT4_MB_GRP_CLEAR_TRIMMED(e4b.bd_info); ext4_lock_group(sb, block_group); - mb_clear_bits(bitmap_bh->b_data, bit, count_clusters); mb_free_blocks(inode, &e4b, bit, count_clusters); } - ret = ext4_free_group_clusters(sb, gdp) + count_clusters; - ext4_free_group_clusters_set(sb, gdp, ret); - ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh); - ext4_group_desc_csum_set(sb, block_group, gdp); ext4_unlock_group(sb, block_group); - if (sbi->s_log_groups_per_flex) { - ext4_group_t flex_group = ext4_flex_group(sbi, block_group); - atomic64_add(count_clusters, - &sbi_array_rcu_deref(sbi, s_flex_groups, - flex_group)->free_clusters); - } - /* * on a bigalloc file system, defer the s_freeclusters_counter * update to the caller (ext4_remove_space and friends) so they @@ -6567,28 +6575,20 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, count_clusters); } - ext4_mb_unload_buddy(&e4b); - - /* We dirtied the bitmap block */ - BUFFER_TRACE(bitmap_bh, "dirtied bitmap block"); - err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh); - - /* And the group descriptor block */ - BUFFER_TRACE(gd_bh, "dirtied group descriptor block"); - ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh); - if (!err) - err = ret; - if (overflow && !err) { block += count; count = overflow; + ext4_mb_unload_buddy(&e4b); put_bh(bitmap_bh); /* The range changed so it's no longer validated */ flags &= ~EXT4_FREE_BLOCKS_VALIDATED; goto do_more; } -error_return: + +error_clean: + ext4_mb_unload_buddy(&e4b); brelse(bitmap_bh); +error_out: ext4_std_error(sb, err); }