From patchwork Wed Feb 14 01:34:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yonggil Song X-Patchwork-Id: 200780 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:bc8a:b0:106:860b:bbdd with SMTP id dn10csp921486dyb; Tue, 13 Feb 2024 17:34:54 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVpcq+HoxFku918ZxnH0/AnmThAWPWEOcVZFwRM/nyc0rQv2loE4KQzn9aulFN1699Kpv/vLm0d3WGtEIKtuhwZjZlq1A== X-Google-Smtp-Source: AGHT+IEeGJVy9XDUrQib8EB7iz4PcdisR1g9Uz1co0JMKU0H3lgQcvFzOS4bj7EXAnEoqXE6PYsQ X-Received: by 2002:a2e:bc1c:0:b0:2d0:ec8b:2a2 with SMTP id b28-20020a2ebc1c000000b002d0ec8b02a2mr1076246ljf.18.1707874494093; Tue, 13 Feb 2024 17:34:54 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707874494; cv=pass; d=google.com; s=arc-20160816; b=ts9OHVDdueP4/yNQAdrBf32gQgI7ifITCGej+36ruF8BcKvLLsRTOooSFoAova7Oa3 TajeGbtqkqBYo0m8R19MwZa0gtlj8/W9RIfEUywdVjb9yxAXe4sqLy9S0dFM77ygns3w b2wNsAewyycjEcI+mVClbFIhcBe863KoR/NmGaLtFSacd4hX0bv+jdVLj9iogWF1OC/e bYKY0sPvWz3xn2YGV6PpOeFimvo4ZPfLaECwP7Ls0u3qZE8Ze3pki+MM3yAPEDs8lPsj rGjM5eD61nVfH7GxZEOY77oIAMf91b7hS3jqY1AzjFiE/C8RhlGpN2QRiQrTKztuZak/ T9bQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:dlp-filter:cms-type:content-transfer-encoding:date :message-id:to:from:sender:reply-to:subject:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:dkim-signature :dkim-filter; bh=wl79DqeXUlvsgVNJe52a7LLETNMoxIhqRQaM60zvAPE=; fh=ofK8X7cvfUEEoiBf1emO0SymwWTcbUTqRqara4u1OEE=; b=y/rbQwMGcY5uZsowF5YdbO0hRuO3dmors78bcGHF7ieCX3Q41zR1ohSGdw0IeI6zMC oC2Qvgv5UaveiE6UpBiUICigdv82CLG4HCLdwAEt1Q+MJahArBhvWYD8Flbqw2pAYQvC UKLzK9ph+TlLfrUHnvKch8gb0t150/jxFTV1agoihcMu9TkAYRnEbhhBAkden/hICfZ5 1UNM/qClrWFNOokcnyS2Z0BiUZfa8dbjpX0S6ikI+je3NN6V55FY2gALR0zerB7Vyfhf ADabjg7pxCjqg1WzPMqexPx4pjhV9fyr6EP2NlwcVJytpdD1TuV9eAc+sdMjnaJ9hxCJ 0iOg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@samsung.com header.s=mail20170921 header.b=NBNI05yE; arc=pass (i=1 spf=pass spfdomain=samsung.com dkim=pass dkdomain=samsung.com dmarc=pass fromdomain=samsung.com); spf=pass (google.com: domain of linux-kernel+bounces-64669-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-64669-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=samsung.com X-Forwarded-Encrypted: i=2; AJvYcCViNZSnx0zeJtmdBFbEfOu+SJqtQXPuiZRFNZt/GBwQYomNIrzRGa8GXndC/vjAaRd1p8XL4Pxxe2vN5eye4fxo/VIkkQ== Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id f15-20020a056402354f00b005619b38461esi3199159edd.40.2024.02.13.17.34.53 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Feb 2024 17:34:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-64669-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@samsung.com header.s=mail20170921 header.b=NBNI05yE; arc=pass (i=1 spf=pass spfdomain=samsung.com dkim=pass dkdomain=samsung.com dmarc=pass fromdomain=samsung.com); spf=pass (google.com: domain of linux-kernel+bounces-64669-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-64669-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=samsung.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 83F561F279B1 for ; Wed, 14 Feb 2024 01:34:53 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 06BD5111BC; Wed, 14 Feb 2024 01:34:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=samsung.com header.i=@samsung.com header.b="NBNI05yE" Received: from mailout2.samsung.com (mailout2.samsung.com [203.254.224.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C946411183 for ; Wed, 14 Feb 2024 01:34:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=203.254.224.25 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707874472; cv=none; b=eVrekU4FWAz8xb4TyjlWcCPbJl2xW6+g4CUUHsHfGLJlHAW7TibRMsjjO60AoVL1EctElVeD5OQF7TzWSrLzZ+Un3vBLI5j1TxJOtBDV07RdUcAdL9wUrLHTKboSHJtkGzgbl4GNzDAJSarSE800htCoqiLOgcRQLGxqmFerMmw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707874472; c=relaxed/simple; bh=H3y5zeEt5taFJplG2Onz+FPQRT8kzDW0Oda0EW27Csk=; h=Mime-Version:Subject:From:To:Message-ID:Date:Content-Type: References; b=RzIBKK0l4BB1JSdF3Wt3LmDIlStV3yap5wUbupAlBu+WuCIG1kNGoAIbGaJb+bzRiqjrffwHJkQrb8KJbsrZ689uVdWkSEg28WT6mdiO1Lt9CuRKd+dK8JVmEn3+ojdrZdQAAay8ik9+AhZInnVgJqf/hmU6fiZ4ubIPgAtkdFc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=samsung.com; spf=pass smtp.mailfrom=samsung.com; dkim=pass (1024-bit key) header.d=samsung.com header.i=@samsung.com header.b=NBNI05yE; arc=none smtp.client-ip=203.254.224.25 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=samsung.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=samsung.com Received: from epcas2p3.samsung.com (unknown [182.195.41.55]) by mailout2.samsung.com (KnoxPortal) with ESMTP id 20240214013427epoutp02c0c13cb7f80a50af7eb32fb3526c6fff~zlseW_eIv0077600776epoutp021 for ; Wed, 14 Feb 2024 01:34:27 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 mailout2.samsung.com 20240214013427epoutp02c0c13cb7f80a50af7eb32fb3526c6fff~zlseW_eIv0077600776epoutp021 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=samsung.com; s=mail20170921; t=1707874467; bh=wl79DqeXUlvsgVNJe52a7LLETNMoxIhqRQaM60zvAPE=; h=Subject:Reply-To:From:To:Date:References:From; b=NBNI05yECMbyBqdYwZcMf1Nvj+VnsGXa6nKe8nuN6YiHyPOXVmT300DiQ0DzlbXQ4 5oQvl7xonoo8F2vPDBOfIcXv0E+1OsEGHPKFn6IIPsEsw6xr1vXjz70vTAuUXTxfjb Di+TO/ReIC49f+y86fU3jmbGqH/EvKOdTASriSbw= Received: from epsnrtp3.localdomain (unknown [182.195.42.164]) by epcas2p4.samsung.com (KnoxPortal) with ESMTP id 20240214013427epcas2p4c25ea8c1bf63b77965203e4d6c7e5bf4~zlseDOfWY3148331483epcas2p4l; Wed, 14 Feb 2024 01:34:27 +0000 (GMT) Received: from epsmges2p2.samsung.com (unknown [182.195.36.88]) by epsnrtp3.localdomain (Postfix) with ESMTP id 4TZLMk5cFQz4x9Px; Wed, 14 Feb 2024 01:34:26 +0000 (GMT) X-AuditID: b6c32a46-fcdfd70000002596-77-65cc18a20239 Received: from epcas2p2.samsung.com ( [182.195.41.54]) by epsmges2p2.samsung.com (Symantec Messaging Gateway) with SMTP id D4.28.09622.2A81CC56; Wed, 14 Feb 2024 10:34:26 +0900 (KST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Subject: [PATCH v6] f2fs: New victim selection for GC Reply-To: yonggil.song@samsung.com Sender: Yonggil Song From: Yonggil Song To: "jaegeuk@kernel.org" , "chao@kernel.org" , "linux-f2fs-devel@lists.sourceforge.net" , "linux-kernel@vger.kernel.org" , Dongjin Kim , Daejun Park , Siwoo Jung X-Priority: 3 X-Content-Kind-Code: NORMAL X-CPGS-Detection: blocking_info_exchange X-Drm-Type: N,general X-Msg-Generator: Mail X-Msg-Type: PERSONAL X-Reply-Demand: N Message-ID: <20240214013426epcms2p655328452ef7fac82f3df56855d7dd99b@epcms2p6> Date: Wed, 14 Feb 2024 10:34:26 +0900 X-CMS-MailID: 20240214013426epcms2p655328452ef7fac82f3df56855d7dd99b X-Sendblock-Type: AUTO_CONFIDENTIAL CMS-TYPE: 102P X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrCKsWRmVeSWpSXmKPExsWy7bCmme4iiTOpBovnalucnnqWyWLVg3CL HydNLJ6sn8VscWmRu8XlXXPYLM5PfM1kMfX8ESYHDo9NqzrZPHYv+Mzk0bdlFaPH501yASxR 2TYZqYkpqUUKqXnJ+SmZeem2St7B8c7xpmYGhrqGlhbmSgp5ibmptkouPgG6bpk5QDcoKZQl 5pQChQISi4uV9O1sivJLS1IVMvKLS2yVUgtScgrMC/SKE3OLS/PS9fJSS6wMDQyMTIEKE7Iz 7l5/ylawwLHizrynrA2Mi4y7GDk5JARMJCb+3s8OYgsJ7GCUWHrKtYuRg4NXQFDi7w5hkLCw gKnEm1unoUqUJK4d6GWBiOtLbF68DCzOJqAr8XfDciCbi0NE4D6TxKp/15kg5vNKzGh/ygJh S0tsX76VEcLWkPixrJcZwhaVuLn6LTuM/f7YfKgaEYnWe2ehagQlHvzcDRWXlFh06DzU/HyJ vyuus0HYNRJbG9qg4voS1zo2gu3lFfCVOD9jPjvIXywCqhK3bshClLhILDz2A2wts4C8xPa3 c5hBSpgFNCXW79IHMSUElCWO3GKBqOCT6Dj8lx3mqR3znkAtUpPYvGkzK4QtI3HhcRvUkR4S RzdNZAQZIyQQKPFxeekERvlZiKCdhWTtLIS1CxiZVzGKpRYU56anFhsVGMEjMzk/dxMjOPlp ue1gnPL2g94hRiYOxkOMEhzMSiK8l2acSBXiTUmsrEotyo8vKs1JLT7EaAr070RmKdHkfGD6 zSuJNzSxNDAxMzM0NzI1MFcS573XOjdFSCA9sSQ1OzW1ILUIpo+Jg1OqgUn/bzvTRJe9Kz7P MwieUDiP/09z4zPnb0mRcZ+TJzFIMmXrlZj3TmFdYl1ucFRMsuZXE1Nr9HHnWVFhSww3e/wy +S22dOdb1ffvuBR/e085fu9t9x/3jHMizJHC6xbUNrzPvt+1ZpHBlYxHsveYJ0e/2ZK37uke xZvVffxOiXMv7zly5qVMhJF21fLfG5JXMIg5BPL9iVC+rZS2UqZe2vN53M1eyyW7QsuWHvp9 7ILj/BmdzQ80pHSe/VtgNEmY1aUszuti88G+uX/Z7u1dW71SgydzWS+j6fbHPItr6wrTeY+s m3nbTm9RFlePIu+jouB3Ww2krm4+HzR57scLB7OipxQGhz1X3S33Yp+PnroSS3FGoqEWc1Fx IgD6fPkBBwQAAA== DLP-Filter: Pass X-CFilter-Loop: Reflected X-CMS-RootMailID: 20240214013426epcms2p655328452ef7fac82f3df56855d7dd99b References: X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790836205622904684 X-GMAIL-MSGID: 1790836205622904684 Overview ======== This patch introduces a new way to preference data sections when selecting GC victims. Migration of data blocks causes invalidation of node blocks. Therefore, in situations where GC is frequent, selecting data blocks as victims can reduce unnecessary block migration by invalidating node blocks. For exceptional situations where free sections are insufficient, node blocks are selected as victims instead of data blocks to get extra free sections. Problem ======= If the total amount of nodes is larger than the size of one section, nodes occupy multiple sections, and node victims are often selected because the gc cost is lowered by data block migration in GC. Since moving the data section causes frequent node victim selection, victim threshing occurs in the node section. This results in an increase in WAF. Experiment ========== Test environment is as follows. System info - 3.6GHz, 16 core CPU - 36GiB Memory Device info - a conventional null_blk with 228MiB - a sequential null_blk with 4068 zones of 8MiB Format - mkfs.f2fs -c -m -Z 8 -o 3.89 Mount - mount Fio script - fio --rw=randwrite --bs=4k --ba=4k --filesize=31187m --norandommap --overwrite=1 --name=job1 --filename=./mnt/sustain --io_size=128g WAF calculation - (IOs on conv. null_blk + IOs on seq. null_blk) / random write IOs Conclusion ========== This experiment showed that the WAF was reduced by 29% (18.75 -> 13.3) when the data section was selected first when selecting GC victims. This was achieved by reducing the migration of the node blocks by 69.4% (253,131,743 blks -> 77,463,278 blks). It is possible to achieve low WAF performance with the GC victim selection method in environments where the section size is relatively small. Signed-off-by: Yonggil Song --- fs/f2fs/f2fs.h | 1 + fs/f2fs/gc.c | 96 +++++++++++++++++++++++++++++++++++++++----------- fs/f2fs/gc.h | 6 ++++ 3 files changed, 82 insertions(+), 21 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 65294e3b0bef..b129f62ba541 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -1654,6 +1654,7 @@ struct f2fs_sb_info { struct f2fs_mount_info mount_opt; /* mount options */ /* for cleaning operations */ + bool require_node_gc; /* flag for node GC */ struct f2fs_rwsem gc_lock; /* * semaphore for GC, avoid * race between GC and GC or CP diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index a079eebfb080..53a51a668567 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -341,6 +341,14 @@ static unsigned int get_cb_cost(struct f2fs_sb_info *sbi, unsigned int segno) unsigned int i; unsigned int usable_segs_per_sec = f2fs_usable_segs_in_sec(sbi, segno); + /* + * When BG_GC selects victims based on age, it prevents node victims + * from being selected. This is because node blocks can be invalidated + * by moving data blocks. + */ + if (__skip_node_gc(sbi, segno)) + return UINT_MAX; + for (i = 0; i < usable_segs_per_sec; i++) mtime += get_seg_entry(sbi, start + i)->mtime; vblocks = get_valid_blocks(sbi, segno, true); @@ -369,10 +377,24 @@ static inline unsigned int get_gc_cost(struct f2fs_sb_info *sbi, return get_seg_entry(sbi, segno)->ckpt_valid_blocks; /* alloc_mode == LFS */ - if (p->gc_mode == GC_GREEDY) - return get_valid_blocks(sbi, segno, true); - else if (p->gc_mode == GC_CB) + if (p->gc_mode == GC_GREEDY) { + /* + * If the data block that the node block pointed to is GCed, + * the node block is invalidated. For this reason, we add a + * weight to cost of node victims to give priority to data + * victims during the gc process. However, in a situation + * where we run out of free sections, we remove the weight + * because we need to clean up node blocks. + */ + unsigned int weight = 0; + + if (__skip_node_gc(sbi, segno)) + weight = BLKS_PER_SEC(sbi); + + return get_valid_blocks(sbi, segno, true) + weight; + } else if (p->gc_mode == GC_CB) { return get_cb_cost(sbi, segno); + } f2fs_bug_on(sbi, 1); return 0; @@ -557,6 +579,14 @@ static void atgc_lookup_victim(struct f2fs_sb_info *sbi, if (ve->mtime >= max_mtime || ve->mtime < min_mtime) goto skip; + /* + * When BG_GC selects victims based on age, it prevents node victims + * from being selected. This is because node blocks can be invalidated + * by moving data blocks. + */ + if (__skip_node_gc(sbi, ve->segno)) + goto skip; + /* age = 10000 * x% * 60 */ age = div64_u64(accu * (max_mtime - ve->mtime), total_time) * age_weight; @@ -1827,8 +1857,27 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control) goto stop; } + __get_secs_required(sbi, NULL, &upper_secs, NULL); + + /* + * Write checkpoint to reclaim prefree segments. + * We need more three extra sections for writer's data/node/dentry. + */ + if (free_sections(sbi) <= upper_secs + NR_GC_CHECKPOINT_SECS) { + sbi->require_node_gc = true; + + if (prefree_segments(sbi)) { + stat_inc_cp_call_count(sbi, TOTAL_CALL); + ret = f2fs_write_checkpoint(sbi, &cpc); + if (ret) + goto stop; + /* Reset due to checkpoint */ + sec_freed = 0; + } + } + /* Let's run FG_GC, if we don't have enough space. */ - if (has_not_enough_free_secs(sbi, 0, 0)) { + if (gc_type == BG_GC && has_not_enough_free_secs(sbi, 0, 0)) { gc_type = FG_GC; /* @@ -1863,6 +1912,18 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control) goto stop; } + if (sbi->require_node_gc && + IS_DATASEG(get_seg_entry(sbi, segno)->type)) { + /* + * We need to clean node sections. but, data victim + * cost is the lowest. If free sections are enough, + * stop cleaning node victim. If not, it goes on + * by GCing data victims. + */ + if (has_enough_free_secs(sbi, sec_freed, 0)) + goto stop; + } + seg_freed = do_garbage_collect(sbi, segno, &gc_list, gc_type, gc_control->should_migrate_blocks); if (seg_freed < 0) @@ -1882,7 +1943,13 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control) if (!gc_control->no_bg_gc && total_sec_freed < gc_control->nr_free_secs) goto go_gc_more; - goto stop; + /* + * If require_node_gc flag is set even though there + * are enough free sections, node cleaning will + * continue. + */ + if (!sbi->require_node_gc) + goto stop; } if (sbi->skipped_gc_rwsem) skipped_round++; @@ -1897,21 +1964,6 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control) goto stop; } - __get_secs_required(sbi, NULL, &upper_secs, NULL); - - /* - * Write checkpoint to reclaim prefree segments. - * We need more three extra sections for writer's data/node/dentry. - */ - if (free_sections(sbi) <= upper_secs + NR_GC_CHECKPOINT_SECS && - prefree_segments(sbi)) { - stat_inc_cp_call_count(sbi, TOTAL_CALL); - ret = f2fs_write_checkpoint(sbi, &cpc); - if (ret) - goto stop; - /* Reset due to checkpoint */ - sec_freed = 0; - } go_gc_more: segno = NULL_SEGNO; goto gc_more; @@ -1920,8 +1972,10 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control) SIT_I(sbi)->last_victim[ALLOC_NEXT] = 0; SIT_I(sbi)->last_victim[FLUSH_DEVICE] = gc_control->victim_segno; - if (gc_type == FG_GC) + if (gc_type == FG_GC) { f2fs_unpin_all_sections(sbi, true); + sbi->require_node_gc = false; + } trace_f2fs_gc_end(sbi->sb, ret, total_freed, total_sec_freed, get_pages(sbi, F2FS_DIRTY_NODES), diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h index 28a00942802c..cd07bf125177 100644 --- a/fs/f2fs/gc.h +++ b/fs/f2fs/gc.h @@ -166,3 +166,9 @@ static inline bool has_enough_invalid_blocks(struct f2fs_sb_info *sbi) free_user_blocks(sbi) < limit_free_user_blocks(invalid_user_blocks)); } + +static inline bool __skip_node_gc(struct f2fs_sb_info *sbi, unsigned int segno) +{ + return (IS_NODESEG(get_seg_entry(sbi, segno)->type) && + !sbi->require_node_gc); +}