Message ID | 20240122-reclaim-fix-v1-2-761234a6d005@wdc.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-32791-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2bc4:b0:101:a8e8:374 with SMTP id hx4csp2489741dyb; Mon, 22 Jan 2024 02:52:39 -0800 (PST) X-Google-Smtp-Source: AGHT+IGILqSbCoQO4BKsjI4dKr4U147wMXQzZLmlCb+Hy1+Yrfw7chRcoZiIf549k3PjQmfxmoZE X-Received: by 2002:a19:2d45:0:b0:50e:d4b3:df21 with SMTP id t5-20020a192d45000000b0050ed4b3df21mr1491196lft.34.1705920759202; Mon, 22 Jan 2024 02:52:39 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705920759; cv=pass; d=google.com; s=arc-20160816; b=qJX9RAKjkUBaclwFvIzW6FbMsjkJHpCqizuPg/U50SO+vvbb0GFp4RLQm55myLX/v3 AknqVqGhBWxniqh7BhxVd4a0ffz/KWN2tyxLXDAY2GjieW33AURUds6UENHWHaQFZ/ym 7UAZwfVEdA6sVmSH41S/MmPpn9OAmIXBBXHRAgYX3qGAMDdQ3zt2VhyWBpVxD6NiKtWL cLs/6plUAgJ+WglUHLMs6igwg0B4sNUvnSaQ9LJGYtEdBF8lfwKX45uMwblM5SSpTZCe hTDFm6mVxafAz2h0x02d/jvBV6HGAdHXpk2ciUvUtc6UlLEufp76Ig31ED5GAv/ul6Tc Oi2g== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :subject:date:from:wdcironportexception:ironport-sdr:ironport-sdr :dkim-signature; bh=Vp+BFMz/Dgh55Fk/cRnsV6CYDjJytn/VMO7p30T2KEA=; fh=owwVPd3KXP7+r4IPWuBOqKSvRpyfd4ISITVwdR5C8Qk=; b=A9GvIBNd2GUkzf+zqXxr5vnB9xnMqf0P5q1ExrrbqGMW8Nw2ic2cTOXwRM8ZH87hgm JnT2yOhZ8RGUbWO70iNGm4Zc2/SSEX3AAkjz1GdthZG+ALcHQHNUzxbALwdU9Vbo32Pb +8jXp+bmH3A08rdRxRpBTEysUG2OTszKj2eOHggzxQpj+xkx32FzV9MByth35MSY8017 ZuIwKOkwY9W+v7vRH01JpxYz8apeCzgZnUWmEcoJfCfCBrvzpwQUQ9U8vKc6oVh6CXvl NwUkNOMflIkIQ+6wH6zmQkOa3V4hK/sGU+nBuCEihyZnUWkBDecJvgtEfg7VtWFCGhQK /TSA== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@wdc.com header.s=dkim.wdc.com header.b=Lfs6mLb4; arc=pass (i=1 spf=pass spfdomain=wdc.com dkim=pass dkdomain=wdc.com dmarc=pass fromdomain=wdc.com); spf=pass (google.com: domain of linux-kernel+bounces-32791-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-32791-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=wdc.com Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id t9-20020a17090616c900b00a2c86ebe164si10182072ejd.954.2024.01.22.02.52.39 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Jan 2024 02:52:39 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-32791-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@wdc.com header.s=dkim.wdc.com header.b=Lfs6mLb4; arc=pass (i=1 spf=pass spfdomain=wdc.com dkim=pass dkdomain=wdc.com dmarc=pass fromdomain=wdc.com); spf=pass (google.com: domain of linux-kernel+bounces-32791-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-32791-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=wdc.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id A32FD1F234E0 for <ouuuleilei@gmail.com>; Mon, 22 Jan 2024 10:52:30 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id AF94E3B795; Mon, 22 Jan 2024 10:51:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="Lfs6mLb4" Received: from esa2.hgst.iphmx.com (esa2.hgst.iphmx.com [68.232.143.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 68A0C3A8CD; Mon, 22 Jan 2024 10:51:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=68.232.143.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705920677; cv=none; b=pUgH5GxBwUM3rs/Ui5/zMNSgCymGVf3C3e48loaZfImHVsJv5St25lPcr3DQ6+nFu0ReQeV5SV6b8V9UHo4MuhnVCOcLrc9vCooLgqs5m4d88jKS+tKkdWnqQShYYSFIAWeUhRTjrxVZoyReuLdOkLvKeq2Mw9T4Hk6Y94+kM6w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705920677; c=relaxed/simple; bh=svg98VKnZP3fRHaDjBbkju9KVDMrZOgy1g4WP5WhQD8=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=MW+h/1nXpPEX6xsag4jv/WhBLHlqaG92TxmqblvS5wy0CUfdUZYU6DLygfJk1T3heW+r0DUFF+vLTQ8EOg8N0WLPbUGP80oEAYkxWFQnc+pw4/5L0DpMQ8p3Yr/yk+UCUuKs9jnmYq0wKsl3ww49slYgmV4bcwKSE9vOzUq/7r0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=wdc.com; spf=pass smtp.mailfrom=wdc.com; dkim=pass (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b=Lfs6mLb4; arc=none smtp.client-ip=68.232.143.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=wdc.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=wdc.com DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1705920675; x=1737456675; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=svg98VKnZP3fRHaDjBbkju9KVDMrZOgy1g4WP5WhQD8=; b=Lfs6mLb4NIc/zjcrtvaiG3jfqtHe6vBEs+cxB4txS4HQ1Np2BtDAexTE OmTzZDxy2TbDbpU31DWuuh1MXIbTTMPm3jePEm8TUcGOYfp7EB1W6W0nQ IGr4f4L8tt93TZbTuvGioKtdLuqPeeSwAFXnMnZm9CV+KXieycLl/V6Eo F/oG9S3UG5Waagfh/uDdeLlHaiGuuAB7kQ62IsiC+GX5Bn7yXqg1sWcIx C85PZR42Pf0vkYVPpzcwF/WGRKNSGLVvZ+v1sT0vlW5DgipdQ3+8wTzky 1+277gGJOjFoT0+z2Hzy6Zsyc6LNCmX/DTkvKK+nvOd2JNd0R5LKIIyBk g==; X-CSE-ConnectionGUID: fAln4nUlTd+41qLbaJkDGw== X-CSE-MsgGUID: fLY4dCxFTJadzOIiXJQpDA== X-IronPort-AV: E=Sophos;i="6.05,211,1701100800"; d="scan'208";a="7427196" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 22 Jan 2024 18:51:14 +0800 IronPort-SDR: Ol0B+PAA1Xfy4fEmP4YDPQ+AyE0T1Nw4ssoKvgy9fwd7nAiCDylfiv13aGgQ7+l1j1K8QCED70 c3TD8o/6KR1A== Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 22 Jan 2024 02:01:18 -0800 IronPort-SDR: HN3211Og+EnTAn1/uYEQkSJ5IpLy/3YoVt9P+QxZvh1hwrxvbYrIaEPgmZ3kVPmAxboLudfs7h T0v8FSZEZkvQ== WDCIronportException: Internal Received: from unknown (HELO redsun91.ssa.fujisawa.hgst.com) ([10.149.66.6]) by uls-op-cesaip02.wdc.com with ESMTP; 22 Jan 2024 02:51:13 -0800 From: Johannes Thumshirn <johannes.thumshirn@wdc.com> Date: Mon, 22 Jan 2024 02:51:04 -0800 Subject: [PATCH 2/2] btrfs: zoned: wake up cleaner sooner if needed Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20240122-reclaim-fix-v1-2-761234a6d005@wdc.com> References: <20240122-reclaim-fix-v1-0-761234a6d005@wdc.com> In-Reply-To: <20240122-reclaim-fix-v1-0-761234a6d005@wdc.com> To: Josef Bacik <josef@toxicpanda.com>, David Sterba <dsterba@suse.com> Cc: Naohiro Aota <naohiro.aota@wdc.com>, linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org, Johannes Thumshirn <johannes.thumshirn@wdc.com>, Damien Le Moal <dlemoal@kernel.org> X-Mailer: b4 0.12.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1705920670; l=1609; i=johannes.thumshirn@wdc.com; s=20230613; h=from:subject:message-id; bh=svg98VKnZP3fRHaDjBbkju9KVDMrZOgy1g4WP5WhQD8=; b=xWcXukl8CocPEfPTRefwvLuPkvBUrurbPRTiQsU8YMfBGpQ8agEMYORtkjbOPJVHTXE2+g1cq xtjSqYemPEBBX1S9tIcXZ8j9wWBY88FBT3OSNbny+hF+tbfujjBD5Km X-Developer-Key: i=johannes.thumshirn@wdc.com; a=ed25519; pk=TGmHKs78FdPi+QhrViEvjKIGwReUGCfa+3LEnGoR2KM= X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1788787565959885619 X-GMAIL-MSGID: 1788787565959885619 |
Series |
btrfs: zoned: kick reclaim earlier on fast zoned devices
|
|
Commit Message
Johannes Thumshirn
Jan. 22, 2024, 10:51 a.m. UTC
On very fast but small devices, waiting for a transaction commit can be
too long of a wait in order to wake up the cleaner kthread to remove unused
and reclaimable block-groups.
Check every time we're adding back free space to a block group, if we need
to activate the cleaner kthread.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/free-space-cache.c | 6 ++++++
1 file changed, 6 insertions(+)
Comments
On Mon, Jan 22, 2024 at 02:51:04AM -0800, Johannes Thumshirn wrote: > On very fast but small devices, waiting for a transaction commit can be > too long of a wait in order to wake up the cleaner kthread to remove unused > and reclaimable block-groups. > > Check every time we're adding back free space to a block group, if we need > to activate the cleaner kthread. > > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > --- > fs/btrfs/free-space-cache.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c > index d372c7ce0e6b..2d98b9ca0e83 100644 > --- a/fs/btrfs/free-space-cache.c > +++ b/fs/btrfs/free-space-cache.c > @@ -30,6 +30,7 @@ > #include "file-item.h" > #include "file.h" > #include "super.h" > +#include "zoned.h" > > #define BITS_PER_BITMAP (PAGE_SIZE * 8UL) > #define MAX_CACHE_BYTES_PER_GIG SZ_64K > @@ -2694,6 +2695,7 @@ int __btrfs_add_free_space(struct btrfs_block_group *block_group, > static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group, > u64 bytenr, u64 size, bool used) > { > + struct btrfs_fs_info *fs_info = block_group->fs_info; > struct btrfs_space_info *sinfo = block_group->space_info; > struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; > u64 offset = bytenr - block_group->start; > @@ -2745,6 +2747,10 @@ static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group, > btrfs_mark_bg_to_reclaim(block_group); > } > > + if (btrfs_zoned_should_reclaim(fs_info) && > + !test_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags)) > + wake_up_process(fs_info->cleaner_kthread); > + Isn't it too costly to call btrfs_zoned_should_reclaim() every time something updated? Can we wake up it in btrfs_mark_bg_to_reclaim and btrfs_mark_bg_unused ? Also, looking into btrfs_zoned_should_reclaim(), it sums device->bytes_used for each fs_devices->devices. And, device->bytes_used is set at create_chunk() or at btrfs_remove_chunk(). Isn't it feasible to do the calculation only there? > return 0; > } > > > -- > 2.43.0 >
On 22.01.24 13:22, Naohiro Aota wrote: > On Mon, Jan 22, 2024 at 02:51:04AM -0800, Johannes Thumshirn wrote: >> On very fast but small devices, waiting for a transaction commit can be >> too long of a wait in order to wake up the cleaner kthread to remove unused >> and reclaimable block-groups. >> >> Check every time we're adding back free space to a block group, if we need >> to activate the cleaner kthread. >> >> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> >> --- >> fs/btrfs/free-space-cache.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c >> index d372c7ce0e6b..2d98b9ca0e83 100644 >> --- a/fs/btrfs/free-space-cache.c >> +++ b/fs/btrfs/free-space-cache.c >> @@ -30,6 +30,7 @@ >> #include "file-item.h" >> #include "file.h" >> #include "super.h" >> +#include "zoned.h" >> >> #define BITS_PER_BITMAP (PAGE_SIZE * 8UL) >> #define MAX_CACHE_BYTES_PER_GIG SZ_64K >> @@ -2694,6 +2695,7 @@ int __btrfs_add_free_space(struct btrfs_block_group *block_group, >> static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group, >> u64 bytenr, u64 size, bool used) >> { >> + struct btrfs_fs_info *fs_info = block_group->fs_info; >> struct btrfs_space_info *sinfo = block_group->space_info; >> struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; >> u64 offset = bytenr - block_group->start; >> @@ -2745,6 +2747,10 @@ static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group, >> btrfs_mark_bg_to_reclaim(block_group); >> } >> >> + if (btrfs_zoned_should_reclaim(fs_info) && >> + !test_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags)) >> + wake_up_process(fs_info->cleaner_kthread); >> + > > Isn't it too costly to call btrfs_zoned_should_reclaim() every time > something updated? Can we wake up it in btrfs_mark_bg_to_reclaim and > btrfs_mark_bg_unused ? Hmm yes, I've thought of adding a list_count() for the reclaim and unused lists, and only calling into should_reclaim if these lists have entries. Or even better !list_is_singular(). > > Also, looking into btrfs_zoned_should_reclaim(), it sums device->bytes_used > for each fs_devices->devices. And, device->bytes_used is set at > create_chunk() or at btrfs_remove_chunk(). Isn't it feasible to do the > calculation only there? Oh sh*t! Right we should check bytes_used from all space_infos in btrfs_zoned_should_reclaim() and compare that to the disk total bytes.
On Mon, Jan 22, 2024 at 12:30:52PM +0000, Johannes Thumshirn wrote: > On 22.01.24 13:22, Naohiro Aota wrote: > > On Mon, Jan 22, 2024 at 02:51:04AM -0800, Johannes Thumshirn wrote: > >> On very fast but small devices, waiting for a transaction commit can be > >> too long of a wait in order to wake up the cleaner kthread to remove unused > >> and reclaimable block-groups. > >> > >> Check every time we're adding back free space to a block group, if we need > >> to activate the cleaner kthread. > >> > >> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > >> --- > >> fs/btrfs/free-space-cache.c | 6 ++++++ > >> 1 file changed, 6 insertions(+) > >> > >> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c > >> index d372c7ce0e6b..2d98b9ca0e83 100644 > >> --- a/fs/btrfs/free-space-cache.c > >> +++ b/fs/btrfs/free-space-cache.c > >> @@ -30,6 +30,7 @@ > >> #include "file-item.h" > >> #include "file.h" > >> #include "super.h" > >> +#include "zoned.h" > >> > >> #define BITS_PER_BITMAP (PAGE_SIZE * 8UL) > >> #define MAX_CACHE_BYTES_PER_GIG SZ_64K > >> @@ -2694,6 +2695,7 @@ int __btrfs_add_free_space(struct btrfs_block_group *block_group, > >> static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group, > >> u64 bytenr, u64 size, bool used) > >> { > >> + struct btrfs_fs_info *fs_info = block_group->fs_info; > >> struct btrfs_space_info *sinfo = block_group->space_info; > >> struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; > >> u64 offset = bytenr - block_group->start; > >> @@ -2745,6 +2747,10 @@ static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group, > >> btrfs_mark_bg_to_reclaim(block_group); > >> } > >> > >> + if (btrfs_zoned_should_reclaim(fs_info) && > >> + !test_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags)) > >> + wake_up_process(fs_info->cleaner_kthread); > >> + > > > > Isn't it too costly to call btrfs_zoned_should_reclaim() every time > > something updated? Can we wake up it in btrfs_mark_bg_to_reclaim and > > btrfs_mark_bg_unused ? > > Hmm yes, I've thought of adding a list_count() for the reclaim and > unused lists, and only calling into should_reclaim if these lists have > entries. Or even better !list_is_singular(). That sounds reasonable. > > > > Also, looking into btrfs_zoned_should_reclaim(), it sums device->bytes_used > > for each fs_devices->devices. And, device->bytes_used is set at > > create_chunk() or at btrfs_remove_chunk(). Isn't it feasible to do the > > calculation only there? > > Oh sh*t! Right we should check bytes_used from all space_infos in > btrfs_zoned_should_reclaim() and compare that to the disk total bytes. You mean device->bytes_used? space_info->bytes_used does not count free space and zone_unusable in BGs, so using that changes the behavior. Even, it won't kick the thread if there are many zone_unusable but small used space. >
On 22.01.24 15:39, Naohiro Aota wrote: >>> >>> Also, looking into btrfs_zoned_should_reclaim(), it sums device->bytes_used >>> for each fs_devices->devices. And, device->bytes_used is set at >>> create_chunk() or at btrfs_remove_chunk(). Isn't it feasible to do the >>> calculation only there? >> >> Oh sh*t! Right we should check bytes_used from all space_infos in >> btrfs_zoned_should_reclaim() and compare that to the disk total bytes. > > You mean device->bytes_used? space_info->bytes_used does not count free > space and zone_unusable in BGs, so using that changes the behavior. Even, > it won't kick the thread if there are many zone_unusable but small used > space. > I did mean btrfs_space_info_used(): diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index b7e7b5a5a6fa..d5242c96c97c 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -2414,6 +2414,7 @@ bool btrfs_zoned_should_reclaim(struct btrfs_fs_info *fs_info) { struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; struct btrfs_device *device; + struct btrfs_space_info *space_info; u64 used = 0; u64 total = 0; u64 factor; @@ -2429,10 +2430,15 @@ bool btrfs_zoned_should_reclaim(struct btrfs_fs_info *fs_info) continue; total += device->disk_total_bytes; - used += device->bytes_used; } rcu_read_unlock(); + list_for_each_entry(space_info, &fs_info->space_info, list) { + spin_lock(&space_info->lock); + used += btrfs_space_info_used(space_info, true); + spin_unlock(&space_info->lock); + } + factor = div64_u64(used * 100, total); return factor >= fs_info->bg_reclaim_threshold; }
On Mon, Jan 22, 2024 at 02:43:03PM +0000, Johannes Thumshirn wrote: > On 22.01.24 15:39, Naohiro Aota wrote: > >>> > >>> Also, looking into btrfs_zoned_should_reclaim(), it sums device->bytes_used > >>> for each fs_devices->devices. And, device->bytes_used is set at > >>> create_chunk() or at btrfs_remove_chunk(). Isn't it feasible to do the > >>> calculation only there? > >> > >> Oh sh*t! Right we should check bytes_used from all space_infos in > >> btrfs_zoned_should_reclaim() and compare that to the disk total bytes. > > > > You mean device->bytes_used? space_info->bytes_used does not count free > > space and zone_unusable in BGs, so using that changes the behavior. Even, > > it won't kick the thread if there are many zone_unusable but small used > > space. > > > > I did mean btrfs_space_info_used(): > > diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c > index b7e7b5a5a6fa..d5242c96c97c 100644 > --- a/fs/btrfs/zoned.c > +++ b/fs/btrfs/zoned.c > @@ -2414,6 +2414,7 @@ bool btrfs_zoned_should_reclaim(struct > btrfs_fs_info *fs_info) > { > struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; > struct btrfs_device *device; > + struct btrfs_space_info *space_info; > u64 used = 0; > u64 total = 0; > u64 factor; > @@ -2429,10 +2430,15 @@ bool btrfs_zoned_should_reclaim(struct > btrfs_fs_info *fs_info) > continue; > > total += device->disk_total_bytes; > - used += device->bytes_used; > } > rcu_read_unlock(); > > + list_for_each_entry(space_info, &fs_info->space_info, list) { > + spin_lock(&space_info->lock); > + used += btrfs_space_info_used(space_info, true); > + spin_unlock(&space_info->lock); > + } > + > factor = div64_u64(used * 100, total); > return factor >= fs_info->bg_reclaim_threshold; > } > This changes the behavior. btrfs_space_info_used() excludes unallocated space. Also, if we calculate it with device->disk_total_bytes, it screws up on DUP/RAID* profile because btrfs_space_info_used() counts the logical space vs disk_total_bytes counts the physical space.
On Mon, Jan 22, 2024 at 02:51:04AM -0800, Johannes Thumshirn wrote: > On very fast but small devices, waiting for a transaction commit can be > too long of a wait in order to wake up the cleaner kthread to remove unused > and reclaimable block-groups. > > Check every time we're adding back free space to a block group, if we need > to activate the cleaner kthread. > > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > --- > fs/btrfs/free-space-cache.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c > index d372c7ce0e6b..2d98b9ca0e83 100644 > --- a/fs/btrfs/free-space-cache.c > +++ b/fs/btrfs/free-space-cache.c > @@ -30,6 +30,7 @@ > #include "file-item.h" > #include "file.h" > #include "super.h" > +#include "zoned.h" > > #define BITS_PER_BITMAP (PAGE_SIZE * 8UL) > #define MAX_CACHE_BYTES_PER_GIG SZ_64K > @@ -2694,6 +2695,7 @@ int __btrfs_add_free_space(struct btrfs_block_group *block_group, > static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group, > u64 bytenr, u64 size, bool used) > { I thought add_free_space are only called from various error/backout conditions and then for real from unpin_extent_range, which is also in the transaction commit. The normal reclaim/unused decision is made in btrfs_update_block_group for that reason. OTOH, I assume you had some repro that was performing poorly and this patch fixed it so, I am very likely missing something. Thanks, Boris > + struct btrfs_fs_info *fs_info = block_group->fs_info; > struct btrfs_space_info *sinfo = block_group->space_info; > struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; > u64 offset = bytenr - block_group->start; > @@ -2745,6 +2747,10 @@ static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group, > btrfs_mark_bg_to_reclaim(block_group); > } > > + if (btrfs_zoned_should_reclaim(fs_info) && > + !test_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags)) > + wake_up_process(fs_info->cleaner_kthread); > + > return 0; > } > > > -- > 2.43.0 >
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index d372c7ce0e6b..2d98b9ca0e83 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -30,6 +30,7 @@ #include "file-item.h" #include "file.h" #include "super.h" +#include "zoned.h" #define BITS_PER_BITMAP (PAGE_SIZE * 8UL) #define MAX_CACHE_BYTES_PER_GIG SZ_64K @@ -2694,6 +2695,7 @@ int __btrfs_add_free_space(struct btrfs_block_group *block_group, static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group, u64 bytenr, u64 size, bool used) { + struct btrfs_fs_info *fs_info = block_group->fs_info; struct btrfs_space_info *sinfo = block_group->space_info; struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; u64 offset = bytenr - block_group->start; @@ -2745,6 +2747,10 @@ static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group, btrfs_mark_bg_to_reclaim(block_group); } + if (btrfs_zoned_should_reclaim(fs_info) && + !test_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags)) + wake_up_process(fs_info->cleaner_kthread); + return 0; }