Message ID | 20240123183332.876854-3-shikemeng@huaweicloud.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-35101-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:2553:b0:103:945f:af90 with SMTP id p19csp243138dyi; Tue, 23 Jan 2024 02:37:07 -0800 (PST) X-Google-Smtp-Source: AGHT+IHimVih/UqlDEQlGF5MZcwiHnSI4pPC+COK/cpS7fWvmzxv9Am0YP3obLoXXYgpsR1NeDGm X-Received: by 2002:a17:906:99d3:b0:a30:3ab3:651b with SMTP id s19-20020a17090699d300b00a303ab3651bmr1192974ejn.135.1706006227705; Tue, 23 Jan 2024 02:37:07 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706006227; cv=pass; d=google.com; s=arc-20160816; b=mrcGt9gye2IvImJhhPyRx9WN7E851KvreYIX+kqZ/fXRE4cvPdiCRNrgNtoDZaio+J rxaXyOafU20p6QSaQ5KsUEu8JiXFw1qNp9yw0N7/W+E/Y4ynOL7ksHpYUVkwQhbrxvqg 2ks2Jaa7NkD0xzuSP9KY2vwv7Hjnm7ujYtztg/aRvZUJtHZQ51ljZVDuA8W3PTMy03Ko O4bz8So75jan4WB53e4+A1EDQ39rITd4e/XVT74a3Pl59DbGH8zOW1eE7z+zLTsZlYm7 Ct97cuImYBPDD3Wn3ZVKPCyF5IeAno4qy/RgBladM4Z0AKNUKVLLkG2tell8FJtrB1O9 LCmg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=STqcwsIa7V5zMfywJaxQON49pm/FNYF+nf7cFtmGVYw=; fh=3Ss5Ej51LtomINREz9SBMa8SvucOLePgu/oJV7u94x4=; b=e9RtJJmkniNgtPK3c8FQ5/IM5QjNGN9rdviq7a2/BLRgjUImptOwWtZcr+vyI3zYSB emcryWOIRn1LfYm9a13YpQHhXpn+ZZLUEOvAZ6HSybduVOu2KCAqeoccpmV3AlTOlshk IQv4JF6SDgt/xtuawy+FM1obIQSSnlQce5RRP0Ugq1Y31Zg0Mkll0Zq6IebJsuYAd2Wo +MGYp20fhbQZm1OvZY3z/vlYtA4Wcbt6yML9uWft0Azkqk+ULoIjlQwuI6RRJncnrY6y PuNCBgEc6vzmym/FRP3IOzq4wWDtttVh4y9Noi9SfsaV7oivs0/cjdQfvDm9iIcL/9sN 84uQ== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-35101-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-35101-ouuuleilei=gmail.com@vger.kernel.org" Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id f24-20020a170906391800b00a2693a66d02si11626964eje.251.2024.01.23.02.37.07 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 Jan 2024 02:37:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-35101-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-35101-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-35101-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 530371F231B2 for <ouuuleilei@gmail.com>; Tue, 23 Jan 2024 10:37:07 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 458B95DF05; Tue, 23 Jan 2024 10:35:58 +0000 (UTC) Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D9FC95C619; Tue, 23 Jan 2024 10:35:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706006156; cv=none; b=HkFbRUxMKTLhKD36rvngdiInLtXgMfe8iQYI3FRBFMY3l+0UEk+huwCQXlxO8vtlly8mZRMV8N8Kh6a83xA3VQ/zScZpjT5ZQQHiSp63b/RoeD7K8ghmGTfm28JhM4Wi3SZsgWu5W5HxSjj9RRYMxKHU6Kwdo09i6tG4eb/QcHw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706006156; c=relaxed/simple; bh=t6jBg3pMH2UPw5fopr4M7LYUMgCMan9VjdEWKSOqnUY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Z6lusV4Uk40WS8jXEn7rBJK3BoSjj6viXNboR2PM0qYmZHwHDbeRV11il6FEnRfDHcN3ERRTXltgwOYTQp0yEgu+OeYPNsspchSWTmfaS46eoS/z1qudSzqXCyA4LlqCr5pMV7sx08jLQ2zWWtuKt+7PesylECeYLWwvtGuH3R8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4TK3QW70zwz4f3khP; Tue, 23 Jan 2024 18:35:47 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id DED491A0199; Tue, 23 Jan 2024 18:35:51 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP1 (Coremail) with SMTP id cCh0CgA3Bg+Flq9ly6DjBg--.30161S4; Tue, 23 Jan 2024 18:35:51 +0800 (CST) From: Kemeng Shi <shikemeng@huaweicloud.com> To: willy@infradead.org, akpm@linux-foundation.org Cc: tj@kernel.org, hcochran@kernelspring.com, mszeredi@redhat.com, axboe@kernel.dk, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 2/5] mm: correct calculation of cgroup wb's bg_thresh in wb_over_bg_thresh Date: Wed, 24 Jan 2024 02:33:29 +0800 Message-Id: <20240123183332.876854-3-shikemeng@huaweicloud.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: <20240123183332.876854-1-shikemeng@huaweicloud.com> References: <20240123183332.876854-1-shikemeng@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID: cCh0CgA3Bg+Flq9ly6DjBg--.30161S4 X-Coremail-Antispam: 1UD129KBjvdXoW7JF45ZFy8tFWUKF1xZr1UJrb_yoWfZwb_uw 18tr47GrW7J3WDGay8uas3Jr1jk3yDuF1rCa1rKFy7tay0vr1DZF18Cw4kZr9Fva4j9rZI 934SqrW5XwsrKjkaLaAFLSUrUUUUUb8apTn2vfkv8UJUUUU8Yxn0WfASr-VFAUDa7-sFnT 9fnUUIcSsGvfJTRUUUbfxYFVCjjxCrM7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20E Y4v20xvaj40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l87I20VAvwVAaII0Ic2I_JFv_Gryl82 xGYIkIc2x26280x7IE14v26r15M28IrcIa0xkI8VCY1x0267AKxVW8JVW5JwA2ocxC64kI II0Yj41l84x0c7CEw4AK67xGY2AK021l84ACjcxK6xIIjxv20xvE14v26F1j6w1UM28EF7 xvwVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2 z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4 xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v2 6r1j6r4UMcvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvY0x0EwIxGrwCF04k20xvY0x0EwI xGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480 Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2Ij64vIr41lIxAIcVC0I7 IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1lIxAIcVCF04k2 6cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxV AFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x07jguciUUUUU= X-CM-SenderInfo: 5vklyvpphqwq5kxd4v5lfo033gof0z/ X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1788877186254065483 X-GMAIL-MSGID: 1788877186254065483 |
Series |
Fix and cleanups to page-writeback
|
|
Commit Message
Kemeng Shi
Jan. 23, 2024, 6:33 p.m. UTC
The wb_calc_thresh will calculate wb's share in global wb domain. We need
to wb's share in mem_cgroup_wb_domain for mdtc. Call __wb_calc_thresh
instead of wb_calc_thresh to fix this.
Fixes: 74d369443325 ("writeback: Fix performance regression in wb_over_bg_thresh()")
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
mm/page-writeback.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Comments
On Wed, Jan 24, 2024 at 02:33:29AM +0800, Kemeng Shi wrote: > The wb_calc_thresh will calculate wb's share in global wb domain. We need > to wb's share in mem_cgroup_wb_domain for mdtc. Call __wb_calc_thresh > instead of wb_calc_thresh to fix this. That function is calculating the wb's portion of wb portion in the whole system so that threshold can be distributed accordingly. So, it has to be compared in the global domain. If you look at the comment on top of struct wb_domain, it says: /* * A wb_domain represents a domain that wb's (bdi_writeback's) belong to * and are measured against each other in. There always is one global * domain, global_wb_domain, that every wb in the system is a member of. * This allows measuring the relative bandwidth of each wb to distribute * dirtyable memory accordingly. */ Also, how is this tested? Was there a case where the existing code misbehaved that's improved by this patch? Or is this just from reading code? Thanks.
on 1/24/2024 4:43 AM, Tejun Heo wrote: > On Wed, Jan 24, 2024 at 02:33:29AM +0800, Kemeng Shi wrote: >> The wb_calc_thresh will calculate wb's share in global wb domain. We need >> to wb's share in mem_cgroup_wb_domain for mdtc. Call __wb_calc_thresh >> instead of wb_calc_thresh to fix this. > > That function is calculating the wb's portion of wb portion in the whole > system so that threshold can be distributed accordingly. So, it has to be > compared in the global domain. If you look at the comment on top of struct > wb_domain, it says: > > /* > * A wb_domain represents a domain that wb's (bdi_writeback's) belong to > * and are measured against each other in. There always is one global > * domain, global_wb_domain, that every wb in the system is a member of. > * This allows measuring the relative bandwidth of each wb to distribute > * dirtyable memory accordingly. > */ > Hi Tejun, thanks for reply. For cgroup wb, it will belongs to a global wb domain and a cgroup domain. I agree the way how we calculate wb's threshold in global domain as you described above. This patch tries to fix calculation of wb's threshold in cgroup domain which now is wb_calc_thresh(mdtc->wb, mdtc->bg_thresh)), means: (wb bandwidth) / (system bandwidth) * (*cgroup domain threshold*) The cgroup domain threshold is (memory of cgroup domain) / (memory of system) * (system threshold). Then the wb's threshold in cgroup will be smaller than expected. Consider following domain hierarchy: global domain (100G) / \ cgroup domain1(50G) cgroup domain2(50G) | | bdi wb1 wb2 Assume wb1 and wb2 has the same bandwidth. We have global domain bg_thresh 10G, cgroup domain bg_thresh 5G. Then we have: wb's thresh in global domain = 10G * (wb bandwidth) / (system bandwidth) = 10G * 1/2 = 5G wb's thresh in cgroup domain = 5G * (wb bandwidth) / (system bandwidth) = 5G * 1/2 = 2.5G At last, wb1 and wb2 will be limited at 2.5G, the system will be limited at 5G which is less than global domain bg_thresh 10G. After the fix, threshold in cgroup domain will be: (wb bandwidth) / (cgroup bandwidth) * (cgroup domain threshold) The wb1 and wb2 will be limited at 5G, the system will be limited at 10G which equals to global domain bg_thresh 10G. As I didn't take a deep look into memory cgroup, please correct me if anything is wrong. Thanks! > Also, how is this tested? Was there a case where the existing code > misbehaved that's improved by this patch? Or is this just from reading code? This is jut from reading code. Would the case showed above convince you a bit. Look forward to your reply, thanks!. > > Thanks. >
Hello, On Wed, Jan 24, 2024 at 10:01:47AM +0800, Kemeng Shi wrote: > Hi Tejun, thanks for reply. For cgroup wb, it will belongs to a global wb > domain and a cgroup domain. I agree the way how we calculate wb's threshold > in global domain as you described above. This patch tries to fix calculation > of wb's threshold in cgroup domain which now is wb_calc_thresh(mdtc->wb, > mdtc->bg_thresh)), means: > (wb bandwidth) / (system bandwidth) * (*cgroup domain threshold*) > The cgroup domain threshold is > (memory of cgroup domain) / (memory of system) * (system threshold). > Then the wb's threshold in cgroup will be smaller than expected. > > Consider following domain hierarchy: > global domain (100G) > / \ > cgroup domain1(50G) cgroup domain2(50G) > | | > bdi wb1 wb2 > Assume wb1 and wb2 has the same bandwidth. > We have global domain bg_thresh 10G, cgroup domain bg_thresh 5G. > Then we have: > wb's thresh in global domain = 10G * (wb bandwidth) / (system bandwidth) > = 10G * 1/2 = 5G > wb's thresh in cgroup domain = 5G * (wb bandwidth) / (system bandwidth) > = 5G * 1/2 = 2.5G > At last, wb1 and wb2 will be limited at 2.5G, the system will be limited > at 5G which is less than global domain bg_thresh 10G. > > After the fix, threshold in cgroup domain will be: > (wb bandwidth) / (cgroup bandwidth) * (cgroup domain threshold) > The wb1 and wb2 will be limited at 5G, the system will be limited at > 10G which equals to global domain bg_thresh 10G. > > As I didn't take a deep look into memory cgroup, please correct me if > anything is wrong. Thanks! > > Also, how is this tested? Was there a case where the existing code > > misbehaved that's improved by this patch? Or is this just from reading code? > > This is jut from reading code. Would the case showed above convince you > a bit. Look forward to your reply, thanks!. So, the explanation makes some sense to me but can you please construct a case to actually demonstrate the problem and fix? I don't think it'd be wise to apply the change without actually observing the code change does what it says it does. Thanks.
on 1/30/2024 5:00 AM, Tejun Heo wrote: > Hello, > > On Wed, Jan 24, 2024 at 10:01:47AM +0800, Kemeng Shi wrote: >> Hi Tejun, thanks for reply. For cgroup wb, it will belongs to a global wb >> domain and a cgroup domain. I agree the way how we calculate wb's threshold >> in global domain as you described above. This patch tries to fix calculation >> of wb's threshold in cgroup domain which now is wb_calc_thresh(mdtc->wb, >> mdtc->bg_thresh)), means: >> (wb bandwidth) / (system bandwidth) * (*cgroup domain threshold*) >> The cgroup domain threshold is >> (memory of cgroup domain) / (memory of system) * (system threshold). >> Then the wb's threshold in cgroup will be smaller than expected. >> >> Consider following domain hierarchy: >> global domain (100G) >> / \ >> cgroup domain1(50G) cgroup domain2(50G) >> | | >> bdi wb1 wb2 >> Assume wb1 and wb2 has the same bandwidth. >> We have global domain bg_thresh 10G, cgroup domain bg_thresh 5G. >> Then we have: >> wb's thresh in global domain = 10G * (wb bandwidth) / (system bandwidth) >> = 10G * 1/2 = 5G >> wb's thresh in cgroup domain = 5G * (wb bandwidth) / (system bandwidth) >> = 5G * 1/2 = 2.5G >> At last, wb1 and wb2 will be limited at 2.5G, the system will be limited >> at 5G which is less than global domain bg_thresh 10G. >> >> After the fix, threshold in cgroup domain will be: >> (wb bandwidth) / (cgroup bandwidth) * (cgroup domain threshold) >> The wb1 and wb2 will be limited at 5G, the system will be limited at >> 10G which equals to global domain bg_thresh 10G. >> >> As I didn't take a deep look into memory cgroup, please correct me if >> anything is wrong. Thanks! >>> Also, how is this tested? Was there a case where the existing code >>> misbehaved that's improved by this patch? Or is this just from reading code? >> >> This is jut from reading code. Would the case showed above convince you >> a bit. Look forward to your reply, thanks!. > > So, the explanation makes some sense to me but can you please construct a > case to actually demonstrate the problem and fix? I don't think it'd be wise > to apply the change without actually observing the code change does what it > says it does. Hi Tejun, sorry for the delay as I found there is a issue that keep triggering writeback even the dirty page is under dirty background threshold. The issue make it difficult to observe the expected improvment from this patch. I try to fix it in [1] and test this patch based on the fix patches. Run test as following: /* make background writeback easier to observe */ echo 300000 > /proc/sys/vm/dirty_expire_centisecs echo 100 > /proc/sys/vm/dirty_writeback_centisecs /* enable memory and io cgroup */ echo "+memory +io" > /sys/fs/cgroup/cgroup.subtree_control /* run fio in group1 with shell */ cd /sys/fs/cgroup mkdir group1 cd group1 echo 10G > memory.high echo 10G > memory.max echo $$ > cgroup.procs mkfs.ext4 -F /dev/vdb mount /dev/vdb /bdi1/ fio -name test -filename=/bdi1/file -size=800M -ioengine=libaio -bs=4K -iodepth=1 -rw=write -direct=0 --time_based -runtime=60 -invalidate=0 /* run another fio in group2 with another shell */ cd /sys/fs/cgroup mkdir group2 cd group2 echo 10G > memory.high echo 10G > memory.max echo $$ > cgroup.procs mkfs.ext4 -F /dev/vdc mount /dev/vdc /bdi2/ fio -name test -filename=/bdi2/file -size=800M -ioengine=libaio -bs=4K -iodepth=1 -rw=write -direct=0 --time_based -runtime=60 -invalidate=0 Before the fix we got (result of three tests): fio1 WRITE: bw=1304MiB/s (1367MB/s), 1304MiB/s-1304MiB/s (1367MB/s-1367MB/s), io=76.4GiB (82.0GB), run=60001-60001msec WRITE: bw=1351MiB/s (1417MB/s), 1351MiB/s-1351MiB/s (1417MB/s-1417MB/s), io=79.2GiB (85.0GB), run=60001-60001msec WRITE: bw=1373MiB/s (1440MB/s), 1373MiB/s-1373MiB/s (1440MB/s-1440MB/s), io=80.5GiB (86.4GB), run=60001-60001msec fio2 WRITE: bw=1134MiB/s (1190MB/s), 1134MiB/s-1134MiB/s (1190MB/s-1190MB/s), io=66.5GiB (71.4GB), run=60001-60001msec WRITE: bw=1414MiB/s (1483MB/s), 1414MiB/s-1414MiB/s (1483MB/s-1483MB/s), io=82.8GiB (88.0GB), run=60001-60001msec WRITE: bw=1469MiB/s (1540MB/s), 1469MiB/s-1469MiB/s (1540MB/s-1540MB/s), io=86.0GiB (92.4GB), run=60001-60001msec After the fix we got (result of three tests): fio1 WRITE: bw=1719MiB/s (1802MB/s), 1719MiB/s-1719MiB/s (1802MB/s-1802MB/s), io=101GiB (108GB), run=60001-60001msec WRITE: bw=1723MiB/s (1806MB/s), 1723MiB/s-1723MiB/s (1806MB/s-1806MB/s), io=101GiB (108GB), run=60001-60001msec WRITE: bw=1691MiB/s (1774MB/s), 1691MiB/s-1691MiB/s (1774MB/s-1774MB/s), io=99.2GiB (106GB), run=60036-60036msec fio2 WRITE: bw=1692MiB/s (1774MB/s), 1692MiB/s-1692MiB/s (1774MB/s-1774MB/s), io=99.1GiB (106GB), run=60001-60001msec WRITE: bw=1681MiB/s (1763MB/s), 1681MiB/s-1681MiB/s (1763MB/s-1763MB/s), io=98.5GiB (106GB), run=60001-60001msec WRITE: bw=1671MiB/s (1752MB/s), 1671MiB/s-1671MiB/s (1752MB/s-1752MB/s), io=97.9GiB (105GB), run=60001-60001msec I also add code to print the pages written in writeback and pages written in writeback reduce a lot and are rare after this fix. [1] https://lore.kernel.org/linux-fsdevel/20240208172024.23625-2-shikemeng@huaweicloud.com/T/#u > > Thanks. >
Hello, Kemeng. On Thu, Feb 08, 2024 at 05:26:10PM +0800, Kemeng Shi wrote: > Hi Tejun, sorry for the delay as I found there is a issue that keep triggering > writeback even the dirty page is under dirty background threshold. The issue > make it difficult to observe the expected improvment from this patch. I try to > fix it in [1] and test this patch based on the fix patches. > Run test as following: Ah, that looks promising and thanks a lot for looking into this. It's great to have someone actually poring over the code and behavior. Understanding the wb and cgroup wb behaviors have always been challenging because the only thing we have is the tracepoints and it's really tedious and difficult to build an overall understanding from the trace outputs. Can I persuade you into writing a drgn monitoring script similar to e.g. tools/workqueues/wq_monitor.py? I think there's a pretty good chance the visibility can be improved substantially. Thanks.
on 2/9/2024 3:32 AM, Tejun Heo wrote: > Hello, Kemeng. > > On Thu, Feb 08, 2024 at 05:26:10PM +0800, Kemeng Shi wrote: >> Hi Tejun, sorry for the delay as I found there is a issue that keep triggering >> writeback even the dirty page is under dirty background threshold. The issue >> make it difficult to observe the expected improvment from this patch. I try to >> fix it in [1] and test this patch based on the fix patches. >> Run test as following: > > Ah, that looks promising and thanks a lot for looking into this. It's great > to have someone actually poring over the code and behavior. Understanding > the wb and cgroup wb behaviors have always been challenging because the only > thing we have is the tracepoints and it's really tedious and difficult to > build an overall understanding from the trace outputs. Can I persuade you > into writing a drgn monitoring script similar to e.g. > tools/workqueues/wq_monitor.py? I think there's a pretty good chance the > visibility can be improved substantially. Hi Tejun, sorry for the late reply as I was on vacation these days. I agree that visibility is poor so I have to add some printks to debug. Actually, I have added per wb stats to improve the visibility as we only have per bdi stats (/sys/kernel/debug/bdi/xxx:x/stats) now. And I plan to submit it in a new series. I'd like to add a script to improve visibility more but I can't ensure the time to do it. I would submit the monitoring script with the per wb stats if the time problem does not bother you. Thanks. > > Thanks. >
Hello, On Sun, Feb 18, 2024 at 10:35:41AM +0800, Kemeng Shi wrote: > Hi Tejun, sorry for the late reply as I was on vacation these days. > I agree that visibility is poor so I have to add some printks to debug. > Actually, I have added per wb stats to improve the visibility as we > only have per bdi stats (/sys/kernel/debug/bdi/xxx:x/stats) now. And I > plan to submit it in a new series. > I'd like to add a script to improve visibility more but I can't ensure > the time to do it. I would submit the monitoring script with the per wb > stats if the time problem does not bother you. It has had poor visiblity for many many years, I don't think we're in any hurry. Thanks.
diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 9268859722c4..f6c7f3b0f495 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -2118,7 +2118,7 @@ bool wb_over_bg_thresh(struct bdi_writeback *wb) if (mdtc->dirty > mdtc->bg_thresh) return true; - thresh = wb_calc_thresh(mdtc->wb, mdtc->bg_thresh); + thresh = __wb_calc_thresh(mdtc, mdtc->bg_thresh); if (thresh < 2 * wb_stat_error()) reclaimable = wb_stat_sum(wb, WB_RECLAIMABLE); else