From patchwork Sat Oct 14 12:55:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 152943 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2908:b0:403:3b70:6f57 with SMTP id ib8csp2464518vqb; Sat, 14 Oct 2023 05:56:20 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEhq9zB9jd1Hk2BTE9piGwqQHYgiW5e73TofbBJmiLpqwGZccjM5EnesT0hmWFu8i0oLF5S X-Received: by 2002:a05:6870:bacf:b0:1e9:8ab9:11ca with SMTP id js15-20020a056870bacf00b001e98ab911camr11088340oab.3.1697288179900; Sat, 14 Oct 2023 05:56:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697288179; cv=none; d=google.com; s=arc-20160816; b=PxxNnhTDrl2mnAT4UT5foOd0dt3r9mZHQXGFBC2X/Ac36oxJlVR6hqksxzSf0DGK9F 9yvxw208JON4o+DRTvo1/vJ5eDpA9JbxuxhKv4C/SKntg8pH4VkR7TSjQehbR2hFxVlb K6LsUohTkITV0txxdnU1LQq3W7dBbuFpr6eUOGY0UQa8OZ/VdSfEOb8BtDx+L+935zuE mL/5BAImVqtNS4igDO/a1fWfJCARYLHlZrF/IpN83SWc0UoxediZoVnj2bwXLtJVKRe0 QKJBTG30BrlLw54jh7mcgR94/2fFlrmNN+TbNyr/1NfyRDPfw1p6lRiqFbXjogxRALHG p4RQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=ELSqrmM25pCsCVrz9DwOrNDNlF5VDPLbte0ZOF4flcA=; fh=IhJFhtt8obxONNynhpu5h/OHjThLYRUUZPSMHOX6lBs=; b=NprevPY2DOgbCzWpglkmrb8IdyYLZDkmnSsVopMjkVRmgQc9pl0vMmJZL2bOspVppX kBiDiddi0WHp4+Nx6m6C2+eNgiO/1wp0OQooUMrZxvYhcehfFXHd/yyW3wIf2SsM+307 6ORInYg6CbWccJNSuspxo9sCIokh3shq/UHj6zlpyxbG9RtGMzVEKLVNn5JJFOsvoruR BxA9Gpsk32fjhRddcZSS2XKGhUxqVQ5Th3IT3FsXp+K7+/CZPaDsIrFLK/nIxxLyT2tK 50nW8hIXeApj0jAtKqwTH+c8T80o7q378dmq8zGygD/lLN+r1MYpUdNwxLmqYhuc0nAG nFVA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id bk13-20020a056a02028d00b0055fce913d52si6718547pgb.761.2023.10.14.05.56.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Oct 2023 05:56:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 6D46A8039850; Sat, 14 Oct 2023 05:56:17 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233189AbjJNMzW (ORCPT + 20 others); Sat, 14 Oct 2023 08:55:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51450 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232957AbjJNMzV (ORCPT ); Sat, 14 Oct 2023 08:55:21 -0400 Received: from out30-100.freemail.mail.aliyun.com (out30-100.freemail.mail.aliyun.com [115.124.30.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 21F8EB7; Sat, 14 Oct 2023 05:55:17 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R671e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045192;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=12;SR=0;TI=SMTPD_---0Vu5Ns.c_1697288111; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0Vu5Ns.c_1697288111) by smtp.aliyun-inc.com; Sat, 14 Oct 2023 20:55:12 +0800 From: Jingbo Xu To: tj@kernel.org, guro@fb.com, jack@suse.cz Cc: lizefan.x@bytedance.com, hannes@cmpxchg.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, joseph.qi@linux.alibaba.com Subject: [PATCH v3] writeback, cgroup: switch inodes with dirty timestamps to release dying cgwbs Date: Sat, 14 Oct 2023 20:55:11 +0800 Message-Id: <20231014125511.102978-1-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.19.1.6.gb485710b MIME-Version: 1.0 X-Spam-Status: No, score=-0.7 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on howler.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Sat, 14 Oct 2023 05:56:17 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1779735650501250823 X-GMAIL-MSGID: 1779735650501250823 The cgwb cleanup routine will try to release the dying cgwb by switching the attached inodes. It fetches the attached inodes from wb->b_attached list, omitting the fact that inodes only with dirty timestamps reside in wb->b_dirty_time list, which is the case when lazytime is enabled. This causes enormous zombie memory cgroup when lazytime is enabled, as inodes with dirty timestamps can not be switched to a live cgwb for a long time. It is reasonable not to switch cgwb for inodes with dirty data, as otherwise it may break the bandwidth restrictions. However since the writeback of inode metadata is not accounted for, let's also switch inodes with dirty timestamps to avoid zombie memory and block cgroups when laztytime is enabled. Fixes: c22d70a162d3 ("writeback, cgroup: release dying cgwbs by switching attached inodes") Reviewed-by: Jan Kara Signed-off-by: Jingbo Xu Acked-by: Tejun Heo --- v3: fix spelling of "Fixes"; add "Reviewed-by" tag from Jan Kara (Thanks!) v1: https://lore.kernel.org/all/20231011084228.77615-1-jefflexu@linux.alibaba.com/ v2: https://lore.kernel.org/all/20231013055208.15457-1-jefflexu@linux.alibaba.com/ --- fs/fs-writeback.c | 41 +++++++++++++++++++++++++++++------------ 1 file changed, 29 insertions(+), 12 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index c1af01b2c42d..1767493dffda 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -613,6 +613,24 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id) kfree(isw); } +static bool isw_prepare_wbs_switch(struct inode_switch_wbs_context *isw, + struct list_head *list, int *nr) +{ + struct inode *inode; + + list_for_each_entry(inode, list, i_io_list) { + if (!inode_prepare_wbs_switch(inode, isw->new_wb)) + continue; + + isw->inodes[*nr] = inode; + (*nr)++; + + if (*nr >= WB_MAX_INODES_PER_ISW - 1) + return true; + } + return false; +} + /** * cleanup_offline_cgwb - detach associated inodes * @wb: target wb @@ -625,7 +643,6 @@ bool cleanup_offline_cgwb(struct bdi_writeback *wb) { struct cgroup_subsys_state *memcg_css; struct inode_switch_wbs_context *isw; - struct inode *inode; int nr; bool restart = false; @@ -647,17 +664,17 @@ bool cleanup_offline_cgwb(struct bdi_writeback *wb) nr = 0; spin_lock(&wb->list_lock); - list_for_each_entry(inode, &wb->b_attached, i_io_list) { - if (!inode_prepare_wbs_switch(inode, isw->new_wb)) - continue; - - isw->inodes[nr++] = inode; - - if (nr >= WB_MAX_INODES_PER_ISW - 1) { - restart = true; - break; - } - } + /* + * In addition to the inodes that have completed writeback, also switch + * cgwbs for those inodes only with dirty timestamps. Otherwise, those + * inodes won't be written back for a long time when lazytime is + * enabled, and thus pinning the dying cgwbs. It won't break the + * bandwidth restrictions, as writeback of inode metadata is not + * accounted for. + */ + restart = isw_prepare_wbs_switch(isw, &wb->b_attached, &nr); + if (!restart) + restart = isw_prepare_wbs_switch(isw, &wb->b_dirty_time, &nr); spin_unlock(&wb->list_lock); /* no attached inodes? bail out */