[1/2] btrfs: zoned: use rcu list for iterating devices to collect stats

Message ID 20240122-reclaim-fix-v1-1-761234a6d005@wdc.com
State New
Headers
Series btrfs: zoned: kick reclaim earlier on fast zoned devices |

Commit Message

Johannes Thumshirn Jan. 22, 2024, 10:51 a.m. UTC
  As btrfs_zoned_should_reclaim only has to iterate the device list in order
to collect stats on the device's total and used bytes, we don't need to
take the full blown mutex, but can iterate the device list in a rcu_read
context.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/zoned.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
  

Comments

Naohiro Aota Jan. 22, 2024, 12:12 p.m. UTC | #1
On Mon, Jan 22, 2024 at 02:51:03AM -0800, Johannes Thumshirn wrote:
> As btrfs_zoned_should_reclaim only has to iterate the device list in order
> to collect stats on the device's total and used bytes, we don't need to
> take the full blown mutex, but can iterate the device list in a rcu_read
> context.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Looks good.

Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
  
David Sterba Jan. 22, 2024, 9:34 p.m. UTC | #2
On Mon, Jan 22, 2024 at 02:51:03AM -0800, Johannes Thumshirn wrote:
> As btrfs_zoned_should_reclaim only has to iterate the device list in order
> to collect stats on the device's total and used bytes, we don't need to
> take the full blown mutex, but can iterate the device list in a rcu_read
> context.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/btrfs/zoned.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index 168af9d000d1..b7e7b5a5a6fa 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -2423,15 +2423,15 @@ bool btrfs_zoned_should_reclaim(struct btrfs_fs_info *fs_info)
>  	if (fs_info->bg_reclaim_threshold == 0)
>  		return false;
>  
> -	mutex_lock(&fs_devices->device_list_mutex);
> -	list_for_each_entry(device, &fs_devices->devices, dev_list) {
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) {
>  		if (!device->bdev)
>  			continue;
>  
>  		total += device->disk_total_bytes;
>  		used += device->bytes_used;
>  	}
> -	mutex_unlock(&fs_devices->device_list_mutex);
> +	rcu_read_unlock();

This is basically only a hint and inaccuracies in the total or used
values would be transient, right? The sum is calculated each time the
funciton is called, not stored anywhere so in the unlikely case of
device removal it may skip reclaim once, but then pick it up later.
Any actual removal of the block groups in verified again and properly
locked in btrfs_reclaim_bgs_work().
  
Johannes Thumshirn Jan. 23, 2024, 7:49 a.m. UTC | #3
On 22.01.24 22:35, David Sterba wrote:
> On Mon, Jan 22, 2024 at 02:51:03AM -0800, Johannes Thumshirn wrote:
>> As btrfs_zoned_should_reclaim only has to iterate the device list in order
>> to collect stats on the device's total and used bytes, we don't need to
>> take the full blown mutex, but can iterate the device list in a rcu_read
>> context.
>>
>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>> ---
>>   fs/btrfs/zoned.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
>> index 168af9d000d1..b7e7b5a5a6fa 100644
>> --- a/fs/btrfs/zoned.c
>> +++ b/fs/btrfs/zoned.c
>> @@ -2423,15 +2423,15 @@ bool btrfs_zoned_should_reclaim(struct btrfs_fs_info *fs_info)
>>   	if (fs_info->bg_reclaim_threshold == 0)
>>   		return false;
>>   
>> -	mutex_lock(&fs_devices->device_list_mutex);
>> -	list_for_each_entry(device, &fs_devices->devices, dev_list) {
>> +	rcu_read_lock();
>> +	list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) {
>>   		if (!device->bdev)
>>   			continue;
>>   
>>   		total += device->disk_total_bytes;
>>   		used += device->bytes_used;
>>   	}
>> -	mutex_unlock(&fs_devices->device_list_mutex);
>> +	rcu_read_unlock();
> 
> This is basically only a hint and inaccuracies in the total or used
> values would be transient, right? The sum is calculated each time the
> funciton is called, not stored anywhere so in the unlikely case of
> device removal it may skip reclaim once, but then pick it up later.
> Any actual removal of the block groups in verified again and properly
> locked in btrfs_reclaim_bgs_work().
> 

Yes.
  
David Sterba Jan. 23, 2024, 6:35 p.m. UTC | #4
On Tue, Jan 23, 2024 at 07:49:22AM +0000, Johannes Thumshirn wrote:
> On 22.01.24 22:35, David Sterba wrote:
> > On Mon, Jan 22, 2024 at 02:51:03AM -0800, Johannes Thumshirn wrote:
> >> As btrfs_zoned_should_reclaim only has to iterate the device list in order
> >> to collect stats on the device's total and used bytes, we don't need to
> >> take the full blown mutex, but can iterate the device list in a rcu_read
> >> context.
> >>
> >> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> >> ---
> >>   fs/btrfs/zoned.c | 6 +++---
> >>   1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> >> index 168af9d000d1..b7e7b5a5a6fa 100644
> >> --- a/fs/btrfs/zoned.c
> >> +++ b/fs/btrfs/zoned.c
> >> @@ -2423,15 +2423,15 @@ bool btrfs_zoned_should_reclaim(struct btrfs_fs_info *fs_info)
> >>   	if (fs_info->bg_reclaim_threshold == 0)
> >>   		return false;
> >>   
> >> -	mutex_lock(&fs_devices->device_list_mutex);
> >> -	list_for_each_entry(device, &fs_devices->devices, dev_list) {
> >> +	rcu_read_lock();
> >> +	list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) {
> >>   		if (!device->bdev)
> >>   			continue;
> >>   
> >>   		total += device->disk_total_bytes;
> >>   		used += device->bytes_used;
> >>   	}
> >> -	mutex_unlock(&fs_devices->device_list_mutex);
> >> +	rcu_read_unlock();
> > 
> > This is basically only a hint and inaccuracies in the total or used
> > values would be transient, right? The sum is calculated each time the
> > funciton is called, not stored anywhere so in the unlikely case of
> > device removal it may skip reclaim once, but then pick it up later.
> > Any actual removal of the block groups in verified again and properly
> > locked in btrfs_reclaim_bgs_work().
> > 
> 
> Yes.

So please add it to the changelog as an explanation why the mutex -> rcu
switch is safe, thanks.
  

Patch

diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 168af9d000d1..b7e7b5a5a6fa 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -2423,15 +2423,15 @@  bool btrfs_zoned_should_reclaim(struct btrfs_fs_info *fs_info)
 	if (fs_info->bg_reclaim_threshold == 0)
 		return false;
 
-	mutex_lock(&fs_devices->device_list_mutex);
-	list_for_each_entry(device, &fs_devices->devices, dev_list) {
+	rcu_read_lock();
+	list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) {
 		if (!device->bdev)
 			continue;
 
 		total += device->disk_total_bytes;
 		used += device->bytes_used;
 	}
-	mutex_unlock(&fs_devices->device_list_mutex);
+	rcu_read_unlock();
 
 	factor = div64_u64(used * 100, total);
 	return factor >= fs_info->bg_reclaim_threshold;