From patchwork Fri Oct 13 05:52:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 152317 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2908:b0:403:3b70:6f57 with SMTP id ib8csp1682992vqb; Thu, 12 Oct 2023 22:52:39 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHYT0CpU/bPNiUrkHQeOUF2TGOM9O8XZEETfLmEGRKYfVVxZoW/IVmCyd6P2uc3mtJlEgqj X-Received: by 2002:a92:cdac:0:b0:357:59ca:38ec with SMTP id g12-20020a92cdac000000b0035759ca38ecmr3963731ild.1.1697176359639; Thu, 12 Oct 2023 22:52:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697176359; cv=none; d=google.com; s=arc-20160816; b=eJLwDZJns2qpe5JAkgcIzdnyn7eX/klVAh08qNNWX5MhpF7bMtCyvxtk5UasJGLLfJ YIGEh7pbxhf3brJrjZ2Uzoz2FIXmKfI7hWn/dwt2nXPOTn5VHAjtAp7CHmZ94UQ6+of4 s4Jf2EjekkkSUBBAm6ngXxFVKY17tV+/wCritHmSZzuRHPy2WbVpXo/tgdtCK1w1u1jn JfSOwUbfLMeE1+a76BzwR70LqW7bxO0KA3qnnDXOa0HeYMOOj/Zi3bPxW8U0Nrr2R5lG wGO78pkSIwlzI+mGtWyRomSz1RPYQr2OfvldXQYcicL5K6i+uiEglxvudXgmFdnJSSOY rs/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=XyoU9+bHIs9C4n3oSuPWp2ouztsGp17Q3wfRubUiMho=; fh=tQFLCiAPtBUQ+uASqNHnXL0aRgmOqhnAuHviRiKUMDg=; b=lBWQ3eZAZQpDmTKbG43u0nGjTVnNKZFMxJjEHVuntSlYRnFVNlatxgSyPhCcw8qK98 MQH0Eg9lfJFm4zVov3DVZP+Jdt2w5y+rJS8WPQw+3aOzjmGJj663JoAgcKcaAbOGKdkj W0ajRNLUuLXiKr+8TFV5C/K3q077wy7SMRzce3OZpJFhLVwIMNpWQ3hpBNWEq8gy/jUs RWookoT4Y1Kw5sf3KFB0cJ3ao8kw7IBhBC/dwmO6mmFEzrN+tljNvHAAsN2z0d1UOghE CszcYMAgVq3IEOkF5OfmC53DdiInuKaa68tiWvNK4HkVbrNkmbYc3wDLAsKAlwKOjgRO g1Kg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id k198-20020a633dcf000000b00578bb707e70si3692975pga.799.2023.10.12.22.52.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 Oct 2023 22:52:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 43B6980BE2CD; Thu, 12 Oct 2023 22:52:21 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229722AbjJMFwR (ORCPT + 19 others); Fri, 13 Oct 2023 01:52:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49382 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229671AbjJMFwQ (ORCPT ); Fri, 13 Oct 2023 01:52:16 -0400 Received: from out30-111.freemail.mail.aliyun.com (out30-111.freemail.mail.aliyun.com [115.124.30.111]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 50500B8; Thu, 12 Oct 2023 22:52:13 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R131e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045170;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=12;SR=0;TI=SMTPD_---0Vu15FF8_1697176328; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0Vu15FF8_1697176328) by smtp.aliyun-inc.com; Fri, 13 Oct 2023 13:52:09 +0800 From: Jingbo Xu To: tj@kernel.org, guro@fb.com Cc: lizefan.x@bytedance.com, hannes@cmpxchg.org, cgroups@vger.kernel.org, jack@suse.cz, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, joseph.qi@linux.alibaba.com Subject: [PATCH v2] writeback, cgroup: switch inodes with dirty timestamps to release dying cgwbs Date: Fri, 13 Oct 2023 13:52:08 +0800 Message-Id: <20231013055208.15457-1-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.19.1.6.gb485710b MIME-Version: 1.0 X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Thu, 12 Oct 2023 22:52:21 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1779447916164111284 X-GMAIL-MSGID: 1779618398649266254 The cgwb cleanup routine will try to release the dying cgwb by switching the attached inodes. It fetches the attached inodes from wb->b_attached list, omitting the fact that inodes only with dirty timestamps reside in wb->b_dirty_time list, which is the case when lazytime is enabled. This causes enormous zombie memory cgroup when lazytime is enabled, as inodes with dirty timestamps can not be switched to a live cgwb for a long time. It is reasonable not to switch cgwb for inodes with dirty data, as otherwise it may break the bandwidth restrictions. However since the writeback of inode metadata is not accounted for, let's also switch inodes with dirty timestamps to avoid zombie memory and block cgroups when laztytime is enabled. Fixs: c22d70a162d3 ("writeback, cgroup: release dying cgwbs by switching attached inodes") Signed-off-by: Jingbo Xu Reviewed-by: Jan Kara Acked-by: Tejun Heo --- v2: add comment explaining why switching for inodes with dirty timestamps is needed v1: https://lore.kernel.org/all/20231011084228.77615-1-jefflexu@linux.alibaba.com/ --- fs/fs-writeback.c | 41 +++++++++++++++++++++++++++++------------ 1 file changed, 29 insertions(+), 12 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index c1af01b2c42d..1767493dffda 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -613,6 +613,24 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id) kfree(isw); } +static bool isw_prepare_wbs_switch(struct inode_switch_wbs_context *isw, + struct list_head *list, int *nr) +{ + struct inode *inode; + + list_for_each_entry(inode, list, i_io_list) { + if (!inode_prepare_wbs_switch(inode, isw->new_wb)) + continue; + + isw->inodes[*nr] = inode; + (*nr)++; + + if (*nr >= WB_MAX_INODES_PER_ISW - 1) + return true; + } + return false; +} + /** * cleanup_offline_cgwb - detach associated inodes * @wb: target wb @@ -625,7 +643,6 @@ bool cleanup_offline_cgwb(struct bdi_writeback *wb) { struct cgroup_subsys_state *memcg_css; struct inode_switch_wbs_context *isw; - struct inode *inode; int nr; bool restart = false; @@ -647,17 +664,17 @@ bool cleanup_offline_cgwb(struct bdi_writeback *wb) nr = 0; spin_lock(&wb->list_lock); - list_for_each_entry(inode, &wb->b_attached, i_io_list) { - if (!inode_prepare_wbs_switch(inode, isw->new_wb)) - continue; - - isw->inodes[nr++] = inode; - - if (nr >= WB_MAX_INODES_PER_ISW - 1) { - restart = true; - break; - } - } + /* + * In addition to the inodes that have completed writeback, also switch + * cgwbs for those inodes only with dirty timestamps. Otherwise, those + * inodes won't be written back for a long time when lazytime is + * enabled, and thus pinning the dying cgwbs. It won't break the + * bandwidth restrictions, as writeback of inode metadata is not + * accounted for. + */ + restart = isw_prepare_wbs_switch(isw, &wb->b_attached, &nr); + if (!restart) + restart = isw_prepare_wbs_switch(isw, &wb->b_dirty_time, &nr); spin_unlock(&wb->list_lock); /* no attached inodes? bail out */