Message ID | 20230226160259.18354-1-ammarfaizi2@gnuweeb.org |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp2001877wrd; Sun, 26 Feb 2023 08:04:22 -0800 (PST) X-Google-Smtp-Source: AK7set/SNZ1MKzztOM+03e8e0sGOshYuiisQVlTKj4hPI2fGSUOegl/vVn/VgDjP5r2wSozn8/Yc X-Received: by 2002:a17:906:7394:b0:8b1:2614:dea6 with SMTP id f20-20020a170906739400b008b12614dea6mr25943756ejl.1.1677427462151; Sun, 26 Feb 2023 08:04:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1677427462; cv=none; d=google.com; s=arc-20160816; b=NIiCwLzIo/h/tg6XwHS3VzAUhPUbS2BvcAO5ydMYo2zL148Pe1oX/E8wIZCacmxRpt ht7fO/P6zm/Bwz0vwKeUnkbFdHmUeYCmnFQGyarl+6XA0nzH2imDXNWpZU+HKFiWR7qg 8R9Pp3gEXv6JMWmq2p8UcE5YH/knxOe3SXfMSCORIvB6cmY3xrwZJPYPna93YOiWTeTC lH5kZaVf4nvjgBqp4HXbLLuZjFciZQs2t2GBdmoODONaVmkrZpPq3EDHLEI9JodXAEcS F8wFYf3DH/OYVlLI3ws4O8jCUBtTw7rRtSRA9ed0/V6ggOeA+3cupjMaBt/5amKq9mxo O0Ew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=3gLnfKLoZyvD0UiVi53/wJD8iHaRv3poodmw86gvuCE=; b=P9b9oRfUSL7G8vSb9uxUJMpXeRcqC9Ftcf4CWs8ueZ+5pqMLaON4RMa6zNK86MuUOY etdD05ky12+or5V/7U1B4PqsSGHlTGIpEMWcX9FUq1QGE+M1P79t3s/Ir1gmoejfAxwj MDQi9iui9NaS/Z/iXZcievv1vrluk06dEFCkgAu5+BE3aKp3s0FhX/jHkVAmIxdPMGXP hatrWEQZhBfTlXfbC/cIkEE1mWlk7Aa2bRW1uF4NQIdythTnzydoIKIvhtYYz4tueuRm jkI9XH1SR3SEzqJ1yAaoXtCPG4oZQqUf9paXdKQuA7pHX72VC4RoX7wHbE2jz2+3CXQ1 OI+w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gnuweeb.org header.s=default header.b=Cbwp9hG1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnuweeb.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i27-20020a170906251b00b008d28622c8d9si4988617ejb.728.2023.02.26.08.03.59; Sun, 26 Feb 2023 08:04:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gnuweeb.org header.s=default header.b=Cbwp9hG1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnuweeb.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229545AbjBZQDd (ORCPT <rfc822;tertiaryakionsight@gmail.com> + 99 others); Sun, 26 Feb 2023 11:03:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53468 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229563AbjBZQDc (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sun, 26 Feb 2023 11:03:32 -0500 Received: from gnuweeb.org (gnuweeb.org [51.81.211.47]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8BE6BF746; Sun, 26 Feb 2023 08:03:19 -0800 (PST) Received: from localhost.localdomain (unknown [182.253.183.169]) by gnuweeb.org (Postfix) with ESMTPSA id 99D5E8319A; Sun, 26 Feb 2023 16:03:13 +0000 (UTC) X-GW-Data: lPqxHiMPbJw1wb7CM9QUryAGzr0yq5atzVDdxTR0iA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gnuweeb.org; s=default; t=1677427398; bh=wfVxdCU0WiYOKtzPo+vOA/+kkGOosWbPqZrRb2tJROM=; h=From:To:Cc:Subject:Date:From; b=Cbwp9hG1ztLu//tbm0WTi431HwgTqf/o6Ow6TTH8vqx4fqBSXmIB2kIZZUPjAgabw ETlx2MqhF+NzRe5c8bb7KfXKbL8522tCQdYTzzjbMhGpZXdCmy0yzsL3QDvfMpQFbf LRYLl2SrLHb9hldByAo0hFw0yc8k9oq5GPWsFxFyAcs8PVDwF6YO8Of49NVShJtr15 MX8He954Co/SrT29kA0TVceGBZ9A+A1iPsg04NnEvBQbZMHE/bKmkafFxJM71fFjf/ 87buMqh/YyudR1WduVho1V7Jb21UQAhSZ2eQ3TWies9fOi/J9flnTRrz4ODQh+kH1A AHmkD8Y2Iw3aQ== From: Ammar Faizi <ammarfaizi2@gnuweeb.org> To: Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>, David Sterba <dsterba@suse.com>, Tejun Heo <tj@kernel.org> Cc: Ammar Faizi <ammarfaizi2@gnuweeb.org>, Lai Jiangshan <jiangshanlai@gmail.com>, Filipe Manana <fdmanana@suse.com>, Linux Btrfs Mailing List <linux-btrfs@vger.kernel.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Linux Fsdevel Mailing List <linux-fsdevel@vger.kernel.org>, GNU/Weeb Mailing List <gwml@vger.gnuweeb.org> Subject: [RFC PATCH v1 0/6] Introducing `wq_cpu_set` mount option for btrfs Date: Sun, 26 Feb 2023 23:02:53 +0700 Message-Id: <20230226160259.18354-1-ammarfaizi2@gnuweeb.org> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_PASS,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758910178396721209?= X-GMAIL-MSGID: =?utf-8?q?1758910178396721209?= |
Series |
Introducing `wq_cpu_set` mount option for btrfs
|
|
Message
Ammar Faizi
Feb. 26, 2023, 4:02 p.m. UTC
Hi,
This is an RFC patchset that introduces the `wq_cpu_set` mount option.
This option lets the user specify a CPU set that the Btrfs workqueues
will use.
Btrfs workqueues can slow sensitive user tasks down because they can use
any online CPU to perform heavy workloads on an SMP system. Add a mount
option to isolate the Btrfs workqueues to a set of CPUs. It is helpful
to avoid sensitive user tasks being preempted by Btrfs heavy workqueues.
This option is similar to the taskset bitmask except that the comma
separator is replaced with a dot. The reason for this is that the mount
option parser uses commas to separate mount options.
Figure (the CPU usage when `wq_cpu_set` is used VS when it is not):
https://gist.githubusercontent.com/ammarfaizi2/a10f8073e58d1712c1ed49af83ae4ad1/raw/a4f7cbc4eb163db792a669d570ff542495e8c704/wq_cpu_set.png
A simple stress testing:
1. Open htop.
2. Open a new terminal.
3. Mount and perform a heavy workload on the mounted Btrfs filesystem.
## Test without wq_cpu_set
sudo mount -t btrfs -o rw,compress-force=zstd:15,commit=1500 /dev/sda2 hdd/a;
cp -rf /path/folder_with_many_large_files/ hdd/a/test;
sync; # See the CPU usage in htop.
sudo umount hdd/a;
## Test wq_cpu_set
sudo mount -t btrfs -o rw,compress-force=zstd:15,commit=1500,wq_cpu_set=0.4.1.5 /dev/sda2 hdd/a;
cp -rf /path/folder_with_many_large_files/ hdd/a/test;
sync; # See the CPU usage in htop.
sudo umount hdd/a;
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---
Ammar Faizi (6):
workqueue: Add set_workqueue_cpumask() helper function
btrfs: Change `mount_opt` type in `struct btrfs_fs_info` to `u64`
btrfs: Create btrfs CPU set struct and helpers
btrfs: Add wq_cpu_set=%s mount option
btrfs: Adjust the default thread pool size when `wq_cpu_set` option is used
btrfs: Add `BTRFS_DEFAULT_MAX_THREAD_POOL_SIZE` macro
fs/btrfs/async-thread.c | 51 ++++++++++++++++++++
fs/btrfs/async-thread.h | 3 ++
fs/btrfs/disk-io.c | 6 ++-
fs/btrfs/fs.c | 97 +++++++++++++++++++++++++++++++++++++++
fs/btrfs/fs.h | 12 ++++-
fs/btrfs/super.c | 83 +++++++++++++++++++++++++++++++++
include/linux/workqueue.h | 3 ++
kernel/workqueue.c | 19 ++++++++
8 files changed, 271 insertions(+), 3 deletions(-)
base-commit: 2fcd07b7ccd5fd10b2120d298363e4e6c53ccf9c
Comments
Hello, On Sun, 26 Feb 2023 07:01:41 -1000, Tejun Heo wrote: > Hmm... the allowed cpumasks for unbounded workqueues can already be set > through /sys/devices/virtual/workqueue/cpumask and also each individual > workqueue can be exposed in the matching subdirectory by setting WQ_SYSFS. > Wouldn't the proposed btrfs option be a bit reduandant? Thank you for the comment. I just realized the sysfs facility for this. So I will take a look into it deeper. For now, I have several reasons to use the `wq_cpu_set` option: 1. Currently, there are 15 btrfs workqueues. It would not be convenient to let the user manage each of them via the sysfs. Using `wq_cpu_set` option at mounting time allows the user to set all of them in one shot. (for btrfs maintainers): I am also not sure if the number of btrfs workqueues is stable so that the user can rely on the WQ_SYSFS facility. 2. I looked at /sys/devices/virtual/workqueue/ and saw its subdirectories. The directory name is taken from the wq->name. But how to distinguish multiple workqueues with the same name? Each btrfs mount will at least do this: alloc_workqueue("btrfs-compressed-write", flags, max_active); When we do: mount -t -o rw btrfs /dev/sda1 a; mount -t -o rw btrfs /dev/sda2 b; mount -t -o rw btrfs /dev/sda3 c; mount -t -o rw btrfs /dev/sda4 d; Is there a way to identify which sysfs devices correspond to a specific mounted btrfs fs workqueues? Let's say I want each mount to have a different CPU mask.
On Mon, Feb 27, 2023 at 01:26:24AM +0700, Ammar Faizi wrote: > mount -t -o rw btrfs /dev/sda1 a; > mount -t -o rw btrfs /dev/sda2 b; > mount -t -o rw btrfs /dev/sda3 c; > mount -t -o rw btrfs /dev/sda4 d; Excuse the wrong mount commands, should be something like this: mount -t btrfs -o rw /dev/sda1 a; mount -t btrfs -o rw /dev/sda2 b; mount -t btrfs -o rw /dev/sda3 c; mount -t btrfs -o rw /dev/sda4 d;
On Mon, Feb 27, 2023 at 11:04:38AM +0800, Hillf Danton wrote: > Are the sensitive user tasks likely also preempted by the kblockd_workqueue[1] > for instance? > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/block/blk-mq.c#n2258 I didn't see any kblockd_workqueue influence. From my observation, the issue is btrfs-specific, especially for write operations with high-level compression. It does not happen with ext4. (Well, it doesn't compress the data with ext4, so no doubt it's lighter). Seeing from htop, several kthreads consume CPU time: - kworker btrfs-dealloc - kworker btrfs-endio-write - kworker btrfs-worker-high - kworker btrfs-compressed-write - ... there are more, but all of them are btrfs workqueues. They perform heavy work when there is an intensive writing operation. However, for the read operation, there is no visible issue. Increasing vm.dirty_ratio and vm.background_dirty_ratio, plus using -o commit=N (mount option) where N is a large number helps a lot. But it slows my entire system down when it starts syncing the dirty write to the disk. It freezes the UI for 2 to 3 seconds and causes audio lag. Isolating btrfs workqueues and UI tasks in a different set of CPUs, as proposed in this series, solves the issue.
On 2023/2/27 00:02, Ammar Faizi wrote: > Hi, > > This is an RFC patchset that introduces the `wq_cpu_set` mount option. > This option lets the user specify a CPU set that the Btrfs workqueues > will use. > > Btrfs workqueues can slow sensitive user tasks down because they can use > any online CPU to perform heavy workloads on an SMP system. Add a mount > option to isolate the Btrfs workqueues to a set of CPUs. It is helpful > to avoid sensitive user tasks being preempted by Btrfs heavy workqueues. I'm not sure if pinning the wq is really the best way to your problem. Yes, I understand you want to limit the CPU usage of btrfs workqueues, but have you tried "thread_pool=" mount option? That mount option should limit the max amount of in-flight work items, thus at least limit the CPU usage. For the wq CPU pinning part, I'm not sure if it's really needed, although it's known CPU pinning can affect some performance characteristics. Thanks, Qu > > This option is similar to the taskset bitmask except that the comma > separator is replaced with a dot. The reason for this is that the mount > option parser uses commas to separate mount options. > > Figure (the CPU usage when `wq_cpu_set` is used VS when it is not): > https://gist.githubusercontent.com/ammarfaizi2/a10f8073e58d1712c1ed49af83ae4ad1/raw/a4f7cbc4eb163db792a669d570ff542495e8c704/wq_cpu_set.png > > A simple stress testing: > > 1. Open htop. > 2. Open a new terminal. > 3. Mount and perform a heavy workload on the mounted Btrfs filesystem. > > ## Test without wq_cpu_set > sudo mount -t btrfs -o rw,compress-force=zstd:15,commit=1500 /dev/sda2 hdd/a; > cp -rf /path/folder_with_many_large_files/ hdd/a/test; > sync; # See the CPU usage in htop. > sudo umount hdd/a; > > ## Test wq_cpu_set > sudo mount -t btrfs -o rw,compress-force=zstd:15,commit=1500,wq_cpu_set=0.4.1.5 /dev/sda2 hdd/a; > cp -rf /path/folder_with_many_large_files/ hdd/a/test; > sync; # See the CPU usage in htop. > sudo umount hdd/a; > > Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> > --- > > Ammar Faizi (6): > workqueue: Add set_workqueue_cpumask() helper function > btrfs: Change `mount_opt` type in `struct btrfs_fs_info` to `u64` > btrfs: Create btrfs CPU set struct and helpers > btrfs: Add wq_cpu_set=%s mount option > btrfs: Adjust the default thread pool size when `wq_cpu_set` option is used > btrfs: Add `BTRFS_DEFAULT_MAX_THREAD_POOL_SIZE` macro > > fs/btrfs/async-thread.c | 51 ++++++++++++++++++++ > fs/btrfs/async-thread.h | 3 ++ > fs/btrfs/disk-io.c | 6 ++- > fs/btrfs/fs.c | 97 +++++++++++++++++++++++++++++++++++++++ > fs/btrfs/fs.h | 12 ++++- > fs/btrfs/super.c | 83 +++++++++++++++++++++++++++++++++ > include/linux/workqueue.h | 3 ++ > kernel/workqueue.c | 19 ++++++++ > 8 files changed, 271 insertions(+), 3 deletions(-) > > > base-commit: 2fcd07b7ccd5fd10b2120d298363e4e6c53ccf9c
On Sun, Feb 26, 2023 at 4:31 PM Ammar Faizi <ammarfaizi2@gnuweeb.org> wrote: > > Hi, > > This is an RFC patchset that introduces the `wq_cpu_set` mount option. > This option lets the user specify a CPU set that the Btrfs workqueues > will use. > > Btrfs workqueues can slow sensitive user tasks down because they can use > any online CPU to perform heavy workloads on an SMP system. Add a mount > option to isolate the Btrfs workqueues to a set of CPUs. It is helpful > to avoid sensitive user tasks being preempted by Btrfs heavy workqueues. > > This option is similar to the taskset bitmask except that the comma > separator is replaced with a dot. The reason for this is that the mount > option parser uses commas to separate mount options. > > Figure (the CPU usage when `wq_cpu_set` is used VS when it is not): > https://gist.githubusercontent.com/ammarfaizi2/a10f8073e58d1712c1ed49af83ae4ad1/raw/a4f7cbc4eb163db792a669d570ff542495e8c704/wq_cpu_set.png I haven't read the patchset. It's great that it reduces CPU usage. But does it also provide other performance benefits, like lower latency or higher throughput for some workloads? Or using less CPU also affects negatively in those other aspects? Thanks. > > A simple stress testing: > > 1. Open htop. > 2. Open a new terminal. > 3. Mount and perform a heavy workload on the mounted Btrfs filesystem. > > ## Test without wq_cpu_set > sudo mount -t btrfs -o rw,compress-force=zstd:15,commit=1500 /dev/sda2 hdd/a; > cp -rf /path/folder_with_many_large_files/ hdd/a/test; > sync; # See the CPU usage in htop. > sudo umount hdd/a; > > ## Test wq_cpu_set > sudo mount -t btrfs -o rw,compress-force=zstd:15,commit=1500,wq_cpu_set=0.4.1.5 /dev/sda2 hdd/a; > cp -rf /path/folder_with_many_large_files/ hdd/a/test; > sync; # See the CPU usage in htop. > sudo umount hdd/a; > > Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> > --- > > Ammar Faizi (6): > workqueue: Add set_workqueue_cpumask() helper function > btrfs: Change `mount_opt` type in `struct btrfs_fs_info` to `u64` > btrfs: Create btrfs CPU set struct and helpers > btrfs: Add wq_cpu_set=%s mount option > btrfs: Adjust the default thread pool size when `wq_cpu_set` option is used > btrfs: Add `BTRFS_DEFAULT_MAX_THREAD_POOL_SIZE` macro > > fs/btrfs/async-thread.c | 51 ++++++++++++++++++++ > fs/btrfs/async-thread.h | 3 ++ > fs/btrfs/disk-io.c | 6 ++- > fs/btrfs/fs.c | 97 +++++++++++++++++++++++++++++++++++++++ > fs/btrfs/fs.h | 12 ++++- > fs/btrfs/super.c | 83 +++++++++++++++++++++++++++++++++ > include/linux/workqueue.h | 3 ++ > kernel/workqueue.c | 19 ++++++++ > 8 files changed, 271 insertions(+), 3 deletions(-) > > > base-commit: 2fcd07b7ccd5fd10b2120d298363e4e6c53ccf9c > -- > Ammar Faizi >
On 2023/2/27 19:02, Filipe Manana wrote: > On Sun, Feb 26, 2023 at 4:31 PM Ammar Faizi <ammarfaizi2@gnuweeb.org> wrote: >> >> Hi, >> >> This is an RFC patchset that introduces the `wq_cpu_set` mount option. >> This option lets the user specify a CPU set that the Btrfs workqueues >> will use. >> >> Btrfs workqueues can slow sensitive user tasks down because they can use >> any online CPU to perform heavy workloads on an SMP system. Add a mount >> option to isolate the Btrfs workqueues to a set of CPUs. It is helpful >> to avoid sensitive user tasks being preempted by Btrfs heavy workqueues. >> >> This option is similar to the taskset bitmask except that the comma >> separator is replaced with a dot. The reason for this is that the mount >> option parser uses commas to separate mount options. >> >> Figure (the CPU usage when `wq_cpu_set` is used VS when it is not): >> https://gist.githubusercontent.com/ammarfaizi2/a10f8073e58d1712c1ed49af83ae4ad1/raw/a4f7cbc4eb163db792a669d570ff542495e8c704/wq_cpu_set.png > > I haven't read the patchset. > > It's great that it reduces CPU usage. > But does it also provide other performance benefits, like lower > latency or higher throughput for some workloads? Or using less CPU > also affects negatively in those other aspects? So far it looks like to just set CPU masks for each workqueue. Thus if it's reducing CPU usage, it also takes longer time to finish the workload (compression,csum calculation etc). Thanks, Qu > > Thanks. > >> >> A simple stress testing: >> >> 1. Open htop. >> 2. Open a new terminal. >> 3. Mount and perform a heavy workload on the mounted Btrfs filesystem. >> >> ## Test without wq_cpu_set >> sudo mount -t btrfs -o rw,compress-force=zstd:15,commit=1500 /dev/sda2 hdd/a; >> cp -rf /path/folder_with_many_large_files/ hdd/a/test; >> sync; # See the CPU usage in htop. >> sudo umount hdd/a; >> >> ## Test wq_cpu_set >> sudo mount -t btrfs -o rw,compress-force=zstd:15,commit=1500,wq_cpu_set=0.4.1.5 /dev/sda2 hdd/a; >> cp -rf /path/folder_with_many_large_files/ hdd/a/test; >> sync; # See the CPU usage in htop. >> sudo umount hdd/a; >> >> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> >> --- >> >> Ammar Faizi (6): >> workqueue: Add set_workqueue_cpumask() helper function >> btrfs: Change `mount_opt` type in `struct btrfs_fs_info` to `u64` >> btrfs: Create btrfs CPU set struct and helpers >> btrfs: Add wq_cpu_set=%s mount option >> btrfs: Adjust the default thread pool size when `wq_cpu_set` option is used >> btrfs: Add `BTRFS_DEFAULT_MAX_THREAD_POOL_SIZE` macro >> >> fs/btrfs/async-thread.c | 51 ++++++++++++++++++++ >> fs/btrfs/async-thread.h | 3 ++ >> fs/btrfs/disk-io.c | 6 ++- >> fs/btrfs/fs.c | 97 +++++++++++++++++++++++++++++++++++++++ >> fs/btrfs/fs.h | 12 ++++- >> fs/btrfs/super.c | 83 +++++++++++++++++++++++++++++++++ >> include/linux/workqueue.h | 3 ++ >> kernel/workqueue.c | 19 ++++++++ >> 8 files changed, 271 insertions(+), 3 deletions(-) >> >> >> base-commit: 2fcd07b7ccd5fd10b2120d298363e4e6c53ccf9c >> -- >> Ammar Faizi >>
On Mon, Feb 27, 2023 at 06:18:43PM +0800, Qu Wenruo wrote: > I'm not sure if pinning the wq is really the best way to your problem. > > Yes, I understand you want to limit the CPU usage of btrfs workqueues, but > have you tried "thread_pool=" mount option? > > That mount option should limit the max amount of in-flight work items, thus > at least limit the CPU usage. I have tried to use the thread_poll=%u mount option previously. But I didn't observe the effect intensively. I'll try to play with this option more and see if it can yield the desired behavior. > For the wq CPU pinning part, I'm not sure if it's really needed, although > it's known CPU pinning can affect some performance characteristics. What I like about CPU pinning is that we can dedicate CPUs for specific workloads so it won't cause scheduling noise to the app we've dedicated other CPUs for.
On 2/27/23 6:46 PM, Qu Wenruo wrote: > On 2023/2/27 19:02, Filipe Manana wrote: >> On Sun, Feb 26, 2023 at 4:31 PM Ammar Faizi <ammarfaizi2@gnuweeb.org> wrote: >>> Figure (the CPU usage when `wq_cpu_set` is used VS when it is not): >>> https://gist.githubusercontent.com/ammarfaizi2/a10f8073e58d1712c1ed49af83ae4ad1/raw/a4f7cbc4eb163db792a669d570ff542495e8c704/wq_cpu_set.png >> >> I haven't read the patchset. >> >> It's great that it reduces CPU usage. But does it also provide >> other performance benefits, like lower latency or higher throughput >> for some workloads? Or using less CPU also affects negatively in >> those other aspects? Based on my testing, it gives lower latency for a browser app playing a YouTube video. Without this proposed option, high-level compression on a btrfs storage is a real noise to user space apps. It periodically freezes the UI for 2 to 3 seconds and causes audio lag; it mostly happens when it starts writing the dirty write to the disk. It's reasonably easy to reproduce by making a large dirty write and invoking a "sync" command. Side note: Pin user apps to CPUs a,b,c,d and btrfs workquques to CPUs w,x,y,z. > So far it looks like to just set CPU masks for each workqueue. > > Thus if it's reducing CPU usage, it also takes longer time to finish > the workload (compression,csum calculation etc). Yes, that's correct. I see this as a good mount option for btrfs because the btrfs-workload in question is CPU bound, specifically for the writing operation. While it may degrade the btrfs workload because we limit the number of usable CPUs, there is a condition where users don't prioritize writing to disk. Let's say: I want to run a smooth app with video. I also want to have high-level compression for my btrfs storage. But I don't want the compression and checksum work to bother my video; here, I give you CPU x,y,z for the btrfs work. And here I give you CPU a,b,c,d,e,f for the video work. I have a similar case on a torrent seeder server where high-level compression is expected. And I believe there are more cases where this option is advantageous. Thank you all for the comments,
On Mon, 27 Feb 2023 20:45:26 +0700 Ammar Faizi <ammarfaizi2@gmail.com> wrote: > Based on my testing, it gives lower latency for a browser app playing > a YouTube video. > > Without this proposed option, high-level compression on a btrfs > storage is a real noise to user space apps. It periodically freezes > the UI for 2 to 3 seconds and causes audio lag; it mostly happens when > it starts writing the dirty write to the disk. > > It's reasonably easy to reproduce by making a large dirty write and > invoking a "sync" command. > > Side note: Pin user apps to CPUs a,b,c,d and btrfs workquques to CPUs > w,x,y,z. The end user should not be expected to do that. (At least that is my opinion as an end user :) The worst part, in times when Btrfs has no current work to do, it means the user apps are hard-capped to 4 cores for no good reason, and the other 4 are idling. Even with a split like 6/2, this is still looks like giving up on the achievements of multi-tasking operating systems and falling back to some antique state with fixed separation. > I want to run a smooth app with video. I also want to have high-level > compression for my btrfs storage. But I don't want the compression and > checksum work to bother my video; here, I give you CPU x,y,z for the > btrfs work. And here I give you CPU a,b,c,d,e,f for the video work. > > I have a similar case on a torrent seeder server where high-level > compression is expected. And I believe there are more cases where this > option is advantageous. I really hope there can be some other approach. Such as some adjustment to kworker processing so that it's not as aggressive in starving userspace. There are no possibility to run kernel task at a lower priority, right? But if no other way, maybe an option like that is good to have for the moment.
On 2023/2/27 21:42, Ammar Faizi wrote: > On Mon, Feb 27, 2023 at 06:18:43PM +0800, Qu Wenruo wrote: >> I'm not sure if pinning the wq is really the best way to your problem. >> >> Yes, I understand you want to limit the CPU usage of btrfs workqueues, but >> have you tried "thread_pool=" mount option? >> >> That mount option should limit the max amount of in-flight work items, thus >> at least limit the CPU usage. > > I have tried to use the thread_poll=%u mount option previously. But I > didn't observe the effect intensively. I'll try to play with this option > more and see if it can yield the desired behavior. The thread_pool mount option is much harder to observe the behavior change. As wq pinned to one or two CPUs is easy to observe using htop, while the unbounded wq, even with thread_pool, is much harder to observe. Thus it needs more systematic testing to find the difference. > >> For the wq CPU pinning part, I'm not sure if it's really needed, although >> it's known CPU pinning can affect some performance characteristics. > > What I like about CPU pinning is that we can dedicate CPUs for specific > workloads so it won't cause scheduling noise to the app we've dedicated > other CPUs for. > I'm not 100% sure if we're really any better than the scheduler developers, as there are a lot of more things to consider. E.g. for recent Intel CPUs, they have BIG and little cores, and BIG cores even have SMT supports. For current kernels, scheduler would avoid putting workloads into the threads sharing the same physical cores. Thus it can result seemingly weird priority like BIG core thread1 > little core > BIG core thread2. But that results overall better performance. So overall, unless necessary I'd avoid manual CPU pinning. Thanks, Qu
On Tue, Feb 28, 2023 at 09:17:45AM +1100, Dave Chinner wrote: > This seems like the wrong model for setting cpu locality for > internal filesystem threads. > > Users are used to controlling cpu sets and other locality behaviour > of a task with wrapper tools like numactl. Wrap th emount command > with a numactl command to limit the CPU set, then have the btrfs > fill_super() callback set the cpu mask for the work queues it > creates based on the cpu mask that has been set for the mount task. > > That is, I think the model should be "inherit cpu mask from parent > task" rather than adding mount options. This model allows anything > that numactl can control (e.g. memory locality) to also influence > the filesystem default behaviour without having to add yet more > mount options in the future.... Good idea on the tooling part. I like the idea of using numactl to determine a proper CPU set. But users may also use /etc/fstab to mount their btrfs storage. In that case, using mount option is still handy. Also, if we always inherit CPU mask from the parent task who calls the mount, it will be breaking the CPU affinity for old users who inadvertently call their mount cmd with random CPU mask. We should keep the old behavior and allow user to opt in if they want to. Maybe something like: numactl -N 0 mount -t btrfs -o rw,wq_cpu_set=inherit /dev/bla bla