From patchwork Thu May 18 07:50:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Chao Yu X-Patchwork-Id: 95693 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp315606vqo; Thu, 18 May 2023 01:01:36 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6Onh4Kt7W1+gLG9lcGWTzcwVcPGDvXanKqIgvMHUMxAHGiJPTLa8T4N7eY2yGaM16JhvDy X-Received: by 2002:a05:6a00:15d3:b0:647:2ce5:57c4 with SMTP id o19-20020a056a0015d300b006472ce557c4mr3686942pfu.5.1684396896319; Thu, 18 May 2023 01:01:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684396896; cv=none; d=google.com; s=arc-20160816; b=efp+/7v5PR3BrEzIMBoG6dNPAU2Ui0vlkGp1vWGjLKufd0t4pPZhdQOYZNCdA6+TZF fgk6PBBElKrhedSubqVSs2XFNlmbGGo+xMXc+vOFkuTdF12BIoQjC1JwPDDp7TST71uS BdN11EOrymCaTpkj9SbINGgkuJBQzJxGy+JkeH9lWNf40tbpfWpeYEJ8UKdxLJ57ensp poXjTrtQRBvNGWIfny0WaC9MFR2rCL5jbyI048LzXgS0jRl9VVL8+KQObywHgKfIfmWd /GEse+krn/+wAQGVr3SswP5Y9yD0bBIkahy87ZiuDllBhuTZ0rR22tBeSkVzqSw7tfIW 2ggw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=NbVCYeRUFr8vrl8toQBMMjXMVKk793Wk+oDzilMw+vU=; b=ssAXfrFDyDJ85d+THHwKLWcGptvwbGQHqNJyxXhHqbjrvWxAe+PrvTeE7Exg/oBT1j 70OI3qdtIj11bI+m73jSNjKPgd6HGRczOGJl9tq5zdsfqZAXHfETfYv70/DPRtz9WjC5 Sv9xFHHvGoDTCesggn9C/fbq7+E1uxRIo0rT8pdQ8OFI1IR27M/otjpWaNWduXUn07pH aosQXx7ZxW/eeMrpqXBbBoxh5kSC7QYQpiXOHFC0DgnbufGkn9FWaRcFGQdkwh7MP4A1 F8n4VfiF+fPk2Pmwq1WP6N6DiqlFAy8FcjNVerVslhgLm7I3rI4g5ipaS1UN4Jxv5ww7 OUng== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=adQoqBnk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w26-20020aa79a1a000000b0063b8a054e06si1050699pfj.259.2023.05.18.01.01.23; Thu, 18 May 2023 01:01:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=adQoqBnk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230117AbjERHvK (ORCPT + 99 others); Thu, 18 May 2023 03:51:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40190 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230099AbjERHu7 (ORCPT ); Thu, 18 May 2023 03:50:59 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 86ED02690 for ; Thu, 18 May 2023 00:50:54 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 0D98264D75 for ; Thu, 18 May 2023 07:50:52 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 45A60C433EF; Thu, 18 May 2023 07:50:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1684396251; bh=gX+Be+fV7upnPUuap1LrRfsLMGHN89kol6H009qPnqg=; h=From:To:Cc:Subject:Date:From; b=adQoqBnk3GN3ctAx4FDL/usWgsVM//LCjHhS2SZS4Ffu/4RGOxmys1L3YJ33u4/uy EZJ7H0s2jJ6H3MIPM6FgWvOlOr0bJ1xXajrH7gM7Jt6myBqYPs8d+RCC7iHHVMke8n Zpi95wXHXNT7flHjFWPp/7pPQtU6HINshO6QwPnFJ3eMj3We3OY/pdu6Lr2joImWL9 guICjWCtZ5RMO6zMPLeS1By9QEHBVeEPBVlBripMWRsge6yagX4EN64vJl+ywI3Et0 066cgoP0ohin1XJTs7JrFnikYJLD1lB0Z05h/aF7oPQAtnE69A8rjDqAekbtNG0o2A b/s9/lyInwE9w== From: Chao Yu To: jaegeuk@kernel.org Cc: linux-f2fs-devel@lists.sourceforge.net, linux-kernel@vger.kernel.org, Chao Yu , Weichao Guo Subject: [PATCH] f2fs: support background_gc=adjust mount option Date: Thu, 18 May 2023 15:50:41 +0800 Message-Id: <20230518075041.38786-1-chao@kernel.org> X-Mailer: git-send-email 2.40.1 MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766218159937134373?= X-GMAIL-MSGID: =?utf-8?q?1766218159937134373?= As JuHyung reported in [1]: "In most consumer-grade blackbox SSDs, device-side GCs are handled automatically for various workloads. f2fs, however, leaves that responsibility to the userspace with conservative tuning on the kernel-side by default. Android handles this by init.rc tunings and a separate code running in vold to trigger gc_urgent. For regular Linux desktop distros, f2fs just runs on the default configuration set on the kernel and unless it’s running 24/7 with plentiful idle time, it quickly runs out of free segments and starts triggering foreground GC. This is giving people the wrong impression that f2fs slows down far drastically than other file-systems when that’s quite the contrary (i.e., less fragmentation overtime)." This patch supports background_gc=adjust mount option. If background_gc=adjust, gc will adjust its policy depends on conditions: speed up if there no free segments, and slow down if there is no free space. The main logic is as below: 1. performance mode - condition: if free_segments is less than 10 * ovp_segments and reclaimable_block is more than 20 * unused_user_block - action: reduce sleep time of GC thread based on free user block ratio, that is to say, the more reclaimable blocks, the less time thread will sleep 2. lifetime mode: - condition: if free space is less than 90% - action: a) reset min_sleep_time to default 30000 ms b) reduce cost weight of age when cacluating cost of dirty segment, so that GC may select victim which contains less blocks 3. balance mode - condition: it is default mode - action: reduce min_sleep_time from 30000 ms to 10000 ms [1] https://lore.kernel.org/linux-f2fs-devel/CAD14+f3z=kS9E+NTKH7t1J2xL1PpLOVMNx=CabD_t2K6U=T9uQ@mail.gmail.com Original patch was developed by Weichao Guo, I refactor it a bit and rebase the code. Signed-off-by: Weichao Guo Signed-off-by: Chao Yu --- Documentation/filesystems/f2fs.rst | 7 ++- fs/f2fs/f2fs.h | 4 ++ fs/f2fs/gc.c | 92 +++++++++++++++++++++++++++++- fs/f2fs/gc.h | 23 ++++++++ fs/f2fs/super.c | 4 ++ 5 files changed, 126 insertions(+), 4 deletions(-) diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst index 9359978a5af2..764301f7391e 100644 --- a/Documentation/filesystems/f2fs.rst +++ b/Documentation/filesystems/f2fs.rst @@ -112,8 +112,11 @@ background_gc=%s Turn on/off cleaning operations, namely garbage collection and if background_gc=off, garbage collection will be turned off. If background_gc=sync, it will turn on synchronous garbage collection running in background. - Default value for this option is on. So garbage - collection is on by default. + If background_gc=adjust, gc will adjust its policy depends + on conditions: speed up if there no free segments, and slow + down if there is no free space. + Default value for this option is on. So garbage collection + is on by default. gc_merge When background_gc is on, this option can be enabled to let background GC thread to handle foreground GC requests, it can eliminate the sluggish issue caused by slow foreground diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 8d4eaf4d2246..4c2f65d3c208 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -1333,6 +1333,10 @@ enum { * background gc is on, migrating blocks * like foreground gc */ + BGGC_MODE_ADJUST, /* + * background gc is on, and tune its speed + * dependso n conditions + */ }; enum { diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 51d7e8d29bf1..43f935c2502a 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -28,6 +28,67 @@ static struct kmem_cache *victim_entry_slab; static unsigned int count_bits(const unsigned long *addr, unsigned int offset, unsigned int len); +static inline int free_user_block_ratio(struct f2fs_sb_info *sbi) +{ + block_t unused_user_blocks = sbi->user_block_count - + written_block_count(sbi); + return unused_user_blocks == 0 ? 100 : + (100 * free_user_blocks(sbi) / unused_user_blocks); +} + +static bool has_few_free_segments(struct f2fs_sb_info *sbi) +{ + unsigned int free_segs = free_segments(sbi); + unsigned int ovp_segs = overprovision_segments(sbi); + + return free_segs <= DEF_FEW_FREE_SEGMENT_MULTIPLE * ovp_segs; +} + +static bool has_few_free_space(struct f2fs_sb_info *sbi) +{ + block_t total_user_block = sbi->user_block_count; + block_t free_user_blocks = total_user_block - written_block_count(sbi); + + return 100 * free_user_blocks / total_user_block <= + DEF_FEW_FREE_SPACE_RATIO; +} + +static bool has_enough_reclaimable_blocks(struct f2fs_sb_info *sbi) +{ + return 100 - free_user_block_ratio(sbi) >= + DEF_ENOUGH_RECLAIMABLE_BLOCK_RATIO; +} + +static void adjust_gc_perference(struct f2fs_sb_info *sbi, + unsigned int *wait_ms) +{ + struct f2fs_gc_kthread *gc_th = sbi->gc_thread; + + if (has_few_free_space(sbi)) + gc_th->gc_preference = GC_LIFETIME_MODE; + else if (has_few_free_segments(sbi) && + has_enough_reclaimable_blocks(sbi)) + gc_th->gc_preference = GC_PERFORMANCE_MODE; + else + gc_th->gc_preference = GC_BALANCE_MODE; + + switch (gc_th->gc_preference) { + case GC_PERFORMANCE_MODE: + *wait_ms = max(DEF_GC_BALANCE_MIN_SLEEP_TIME * + free_user_block_ratio(sbi) / 100, + DEF_GC_PERFORMANCE_MIN_SLEEP_TIME); + break; + case GC_LIFETIME_MODE: + gc_th->min_sleep_time = DEF_GC_THREAD_MIN_SLEEP_TIME; + break; + case GC_BALANCE_MODE: + gc_th->min_sleep_time = DEF_GC_BALANCE_MIN_SLEEP_TIME; + break; + default: + f2fs_bug_on(sbi, 1); + } +} + static int gc_thread_func(void *data) { struct f2fs_sb_info *sbi = data; @@ -46,6 +107,9 @@ static int gc_thread_func(void *data) do { bool sync_mode, foreground = false; + if (F2FS_OPTION(sbi).bggc_mode == BGGC_MODE_ADJUST) + adjust_gc_perference(sbi, &wait_ms); + wait_event_interruptible_timeout(*wq, kthread_should_stop() || freezing(current) || waitqueue_active(fggc_wq) || @@ -109,7 +173,8 @@ static int gc_thread_func(void *data) goto next; } - if (!is_idle(sbi, GC_TIME)) { + if (!is_idle(sbi, GC_TIME) && + F2FS_OPTION(sbi).bggc_mode != BGGC_MODE_ADJUST) { increase_sleep_time(gc_th, &wait_ms); f2fs_up_write(&sbi->gc_lock); stat_io_skip_bggc_count(sbi); @@ -183,6 +248,8 @@ int f2fs_start_gc_thread(struct f2fs_sb_info *sbi) gc_th->max_sleep_time = DEF_GC_THREAD_MAX_SLEEP_TIME; gc_th->no_gc_sleep_time = DEF_GC_THREAD_NOGC_SLEEP_TIME; + gc_th->gc_preference = GC_BALANCE_MODE; + gc_th->gc_wake = false; sbi->gc_thread = gc_th; @@ -329,6 +396,23 @@ static unsigned int check_bg_victims(struct f2fs_sb_info *sbi) return NULL_SEGNO; } +static unsigned char get_max_age(struct f2fs_sb_info *sbi) +{ + struct f2fs_gc_kthread *gc_th = sbi->gc_thread; + unsigned char max_age = 100; + unsigned char ratio; + + if (!gc_th || gc_th->gc_preference != GC_LIFETIME_MODE) + goto out; + + /* if free block count is less than 10%, reduce cost weight of age */ + ratio = free_user_block_ratio(sbi); + if (ratio <= DEF_FEW_FREE_SEGMENT_RATIO) + max_age = max(10 * ratio, 1); +out: + return max_age; +} + static unsigned int get_cb_cost(struct f2fs_sb_info *sbi, unsigned int segno) { struct sit_info *sit_i = SIT_I(sbi); @@ -336,6 +420,7 @@ static unsigned int get_cb_cost(struct f2fs_sb_info *sbi, unsigned int segno) unsigned int start = GET_SEG_FROM_SEC(sbi, secno); unsigned long long mtime = 0; unsigned int vblocks; + unsigned char max_age; unsigned char age = 0; unsigned char u; unsigned int i; @@ -355,8 +440,11 @@ static unsigned int get_cb_cost(struct f2fs_sb_info *sbi, unsigned int segno) sit_i->min_mtime = mtime; if (mtime > sit_i->max_mtime) sit_i->max_mtime = mtime; + + max_age = get_max_age(sbi); + if (sit_i->max_mtime != sit_i->min_mtime) - age = 100 - div64_u64(100 * (mtime - sit_i->min_mtime), + age = max_age - div64_u64(max_age * (mtime - sit_i->min_mtime), sit_i->max_mtime - sit_i->min_mtime); return UINT_MAX - ((100 * (100 - u) * age) / (100 + u)); diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h index 28a00942802c..66f6a30dd494 100644 --- a/fs/f2fs/gc.h +++ b/fs/f2fs/gc.h @@ -15,6 +15,14 @@ #define DEF_GC_THREAD_MAX_SLEEP_TIME 60000 #define DEF_GC_THREAD_NOGC_SLEEP_TIME 300000 /* wait 5 min */ +/* for BGGC_MODE_ADJUST */ +#define DEF_GC_PERFORMANCE_MIN_SLEEP_TIME 100 /* 100 ms */ +#define DEF_GC_BALANCE_MIN_SLEEP_TIME 10000 /* 10 sec */ +#define DEF_FEW_FREE_SPACE_RATIO 10 /* few free space ratio */ +#define DEF_FEW_FREE_SEGMENT_MULTIPLE 10 /* few free segments multiple */ +#define DEF_ENOUGH_RECLAIMABLE_BLOCK_RATIO 20 /* enough reclaimable block ratio */ +#define DEF_FEW_FREE_SEGMENT_RATIO 10 /* few free segment ratio */ + /* choose candidates from sections which has age of more than 7 days */ #define DEF_GC_THREAD_AGE_THRESHOLD (60 * 60 * 24 * 7) #define DEF_GC_THREAD_CANDIDATE_RATIO 20 /* select 20% oldest sections as candidates */ @@ -32,6 +40,19 @@ #define NR_GC_CHECKPOINT_SECS (3) /* data/node/dentry sections */ +/* GC preference */ +enum { + GC_PERFORMANCE_MODE, /* + * speed up background gc to recycle + * slack space for better performance + */ + GC_LIFETIME_MODE, /* + * slow down background gc to avoid high + * WAF if there is less free space. + */ + GC_BALANCE_MODE, /* tradeoff in between perf and lifetime */ +}; + struct f2fs_gc_kthread { struct task_struct *f2fs_gc_task; wait_queue_head_t gc_wait_queue_head; @@ -42,6 +63,8 @@ struct f2fs_gc_kthread { unsigned int max_sleep_time; unsigned int no_gc_sleep_time; + unsigned char gc_preference; /* gc perference */ + /* for changing gc mode */ bool gc_wake; diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index f19217219c3b..806c8119f021 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -693,6 +693,8 @@ static int parse_options(struct super_block *sb, char *options, bool is_remount) F2FS_OPTION(sbi).bggc_mode = BGGC_MODE_OFF; } else if (!strcmp(name, "sync")) { F2FS_OPTION(sbi).bggc_mode = BGGC_MODE_SYNC; + } else if (!strcmp(name, "adjust")) { + F2FS_OPTION(sbi).bggc_mode = BGGC_MODE_ADJUST; } else { kfree(name); return -EINVAL; @@ -1927,6 +1929,8 @@ static int f2fs_show_options(struct seq_file *seq, struct dentry *root) seq_printf(seq, ",background_gc=%s", "on"); else if (F2FS_OPTION(sbi).bggc_mode == BGGC_MODE_OFF) seq_printf(seq, ",background_gc=%s", "off"); + else if (F2FS_OPTION(sbi).bggc_mode == BGGC_MODE_ADJUST) + seq_printf(seq, ",background_gc=%s", "adjust"); if (test_opt(sbi, GC_MERGE)) seq_puts(seq, ",gc_merge");