From patchwork Fri Dec 8 06:14:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 175604 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp5270288vqy; Thu, 7 Dec 2023 22:14:23 -0800 (PST) X-Google-Smtp-Source: AGHT+IG3ZyTTZAWSPKSNwoE/LnSSS7BvaQrYoyWqiXC/E/3Ic7mZkmPAffUzEuD8h4+2p46izyXi X-Received: by 2002:a05:6870:8198:b0:1fa:f200:6ada with SMTP id k24-20020a056870819800b001faf2006adamr4212951oae.57.1702016062982; Thu, 07 Dec 2023 22:14:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702016062; cv=none; d=google.com; s=arc-20160816; b=MFWiOyiaJjY60GvTZ0mhJwN4SDkvGLBv/2Qbx8gXLSYvHEJKhsWrF5BGk5K9RRNLIE BV6raX8b1F1NpwLKOf8pVO2W8Z3ReGTTaFELrPX6pr52OrwkAdhZrixY7tyLMkkKxzPb MOGlhIlQQNsBcZOSSfpMIjqL2MTkMtGSna90SYjWXolv9bBMQjywL30+Nrie6OuHbQ5c n2fB2/O3m/SYRceA9KKnmsjCUmQPYzF6sA0OFm3CtlqERVol9lESJHjJDX4WHK0xxvgb QFs+L6Ab82WmaEs81FSgyCE3q/+T9xvnXRYiwiGjLS/HGNmE/w60a8GiqnQOwylbMLrh yWyQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:mime-version:date :dkim-signature; bh=Z4mnJuieaycBMAUUn8t7GQZw8OVUsYesaq0RGH3Lfpw=; fh=U+9XC6b5mAQenHnbawo6et0gmTSUFtRQuXDWVfpGbBQ=; b=TWZZojutDh6BZE0x4L4xproMjvGM7FAtP4xJH6BWVbh/c+dfiwcVyXqkbg2WkBDCAC l62f2TO6rDQ98rrYV3uwInVPhnVkDf8hbkDLuwU+6ae+aWUclwD7JXrCHbFZ+UqzvAFo qJMdf6knbwXR53/rkca3goEEjOdop7QyicNcehBAjSoXE1fQ5+IP9iw4bmz7kz90ZnJM 6kfyyCMIoJR8uo7D7zWaYMq0pg1FUoKqbke1UCWPlMwH8lnc2qxCfr5qpzxjkti8UaMf tCrNIYQpEUyyF487Ge1X9JoU9Xhm6wcIn0CN20Fz2xZzNOvoZMmPW1ftF6c2nfqk5q8q bITg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=sEhpW7n9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id j25-20020a632319000000b005b8f24e6525si977972pgj.81.2023.12.07.22.14.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Dec 2023 22:14:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=sEhpW7n9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id D7DBE811AD92; Thu, 7 Dec 2023 22:14:17 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232926AbjLHGOI (ORCPT + 99 others); Fri, 8 Dec 2023 01:14:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53480 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229801AbjLHGOH (ORCPT ); Fri, 8 Dec 2023 01:14:07 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF561171F for ; Thu, 7 Dec 2023 22:14:13 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-5cb6271b225so21175627b3.1 for ; Thu, 07 Dec 2023 22:14:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1702016053; x=1702620853; darn=vger.kernel.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=Z4mnJuieaycBMAUUn8t7GQZw8OVUsYesaq0RGH3Lfpw=; b=sEhpW7n99BHoqyVEloepCNmgZyxzTxY5PS8Z19fagxcD0dXLl9VEizP5nTfN8Zk9Vr JqO2jYsoljWdGT+RMz8mzAopmSK+jy1DYmYw8m2pO8OCZNLVM1lOeCASQi/k6LzBpnbU XdST1z3tnHvbvTBijqKfv5sozS9EyQXUvL6Jr0DtivgqjzmsNb7Kekms/lAgYzmwAppY pjboS9yZ0ptCPC/hHVhdcoWR22HPyoM/LPIpKfO88TgdqLTBSKGs1Gv6+eI4RiYbOl4b NCDrWuU8bXos9d3sdNMDVTrjzavFnTNcr6OEdGcrA9UF22YY0vEY5wIKUVJFpyqZNjHE QpxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702016053; x=1702620853; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Z4mnJuieaycBMAUUn8t7GQZw8OVUsYesaq0RGH3Lfpw=; b=qfar2eMXRtTv1ymKLJSvt85l0VQO8PEoro31LMXokALcmYmuCUwTqrXoEl5jZHL2H+ iXnc9unwGeeNPAKGeclmh+hnxfgwVXTLsFgz4J+JA1zJI4JExhsLC9gmNEvJEwe16zxA RoGtGcrmQ4qpdzY+IehrAjkdTLujMlI+3EKXqIWgX65orF+/LmujqQP2r43opL6xNdRE gTiFFf8lHk6MWUd9kUrlvSrogdKF9vaus8E90ynoBPghHFWxHjQemfmBgAS0PHu3BeeW 26RU2pRuwho0Iw5zCQ902xrGBK+q5T+it+JV2tjc+3rhsXMsVkM/4rcEN1toMKNLNNsb 2agA== X-Gm-Message-State: AOJu0YxYnYicNCsZEI17zrRAEtOqrzql3Juo4AZGyfZv9KMBAJXvNS1k lOPlKVDqTbJW7AacoK6IiwydQFEtyJw= X-Received: from yuzhao2.bld.corp.google.com ([100.64.188.49]) (user=yuzhao job=sendgmr) by 2002:a25:cf81:0:b0:d9a:36cd:482e with SMTP id f123-20020a25cf81000000b00d9a36cd482emr49952ybg.13.1702016052872; Thu, 07 Dec 2023 22:14:12 -0800 (PST) Date: Thu, 7 Dec 2023 23:14:04 -0700 Mime-Version: 1.0 X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog Message-ID: <20231208061407.2125867-1-yuzhao@google.com> Subject: [PATCH mm-unstable v1 1/4] mm/mglru: fix underprotected page cache From: Yu Zhao To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yu Zhao , Charan Teja Kalla , Kalesh Singh , stable@vger.kernel.org X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Thu, 07 Dec 2023 22:14:18 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784693194813123667 X-GMAIL-MSGID: 1784693194813123667 Unmapped folios accessed through file descriptors can be underprotected. Those folios are added to the oldest generation based on: 1. The fact that they are less costly to reclaim (no need to walk the rmap and flush the TLB) and have less impact on performance (don't cause major PFs and can be non-blocking if needed again). 2. The observation that they are likely to be single-use. E.g., for client use cases like Android, its apps parse configuration files and store the data in heap (anon); for server use cases like MySQL, it reads from InnoDB files and holds the cached data for tables in buffer pools (anon). However, the oldest generation can be very short lived, and if so, it doesn't provide the PID controller with enough time to respond to a surge of refaults. (Note that the PID controller uses weighted refaults and those from evicted generations only take a half of the whole weight.) In other words, for a short lived generation, the moving average smooths out the spike quickly. To fix the problem: 1. For folios that are already on LRU, if they can be beyond the tracking range of tiers, i.e., five accesses through file descriptors, move them to the second oldest generation to give them more time to age. (Note that tiers are used by the PID controller to statistically determine whether folios accessed multiple times through file descriptors are worth protecting.) 2. When adding unmapped folios to LRU, adjust the placement of them so that they are not too close to the tail. The effect of this is similar to the above. On Android, launching 55 apps sequentially: Before After Change workingset_refault_anon 25641024 25598972 0% workingset_refault_file 115016834 106178438 -8% Fixes: ac35a4902374 ("mm: multi-gen LRU: minimal implementation") Signed-off-by: Yu Zhao Reported-by: Charan Teja Kalla Tested-by: Kalesh Singh Cc: stable@vger.kernel.org --- include/linux/mm_inline.h | 23 ++++++++++++++--------- mm/vmscan.c | 2 +- mm/workingset.c | 6 +++--- 3 files changed, 18 insertions(+), 13 deletions(-) diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index 9ae7def16cb2..f4fe593c1400 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -232,22 +232,27 @@ static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio, if (folio_test_unevictable(folio) || !lrugen->enabled) return false; /* - * There are three common cases for this page: - * 1. If it's hot, e.g., freshly faulted in or previously hot and - * migrated, add it to the youngest generation. - * 2. If it's cold but can't be evicted immediately, i.e., an anon page - * not in swapcache or a dirty page pending writeback, add it to the - * second oldest generation. - * 3. Everything else (clean, cold) is added to the oldest generation. + * There are four common cases for this page: + * 1. If it's hot, i.e., freshly faulted in, add it to the youngest + * generation, and it's protected over the rest below. + * 2. If it can't be evicted immediately, i.e., a dirty page pending + * writeback, add it to the second youngest generation. + * 3. If it should be evicted first, e.g., cold and clean from + * folio_rotate_reclaimable(), add it to the oldest generation. + * 4. Everything else falls between 2 & 3 above and is added to the + * second oldest generation if it's considered inactive, or the + * oldest generation otherwise. See lru_gen_is_active(). */ if (folio_test_active(folio)) seq = lrugen->max_seq; else if ((type == LRU_GEN_ANON && !folio_test_swapcache(folio)) || (folio_test_reclaim(folio) && (folio_test_dirty(folio) || folio_test_writeback(folio)))) - seq = lrugen->min_seq[type] + 1; - else + seq = lrugen->max_seq - 1; + else if (reclaiming || lrugen->min_seq[type] + MIN_NR_GENS >= lrugen->max_seq) seq = lrugen->min_seq[type]; + else + seq = lrugen->min_seq[type] + 1; gen = lru_gen_from_seq(seq); flags = (gen + 1UL) << LRU_GEN_PGOFF; diff --git a/mm/vmscan.c b/mm/vmscan.c index 4e3b835c6b4a..e67631c60ac0 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4260,7 +4260,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c } /* protected */ - if (tier > tier_idx) { + if (tier > tier_idx || refs == BIT(LRU_REFS_WIDTH)) { int hist = lru_hist_from_seq(lrugen->min_seq[type]); gen = folio_inc_gen(lruvec, folio, false); diff --git a/mm/workingset.c b/mm/workingset.c index 7d3dacab8451..2a2a34234df9 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -313,10 +313,10 @@ static void lru_gen_refault(struct folio *folio, void *shadow) * 1. For pages accessed through page tables, hotter pages pushed out * hot pages which refaulted immediately. * 2. For pages accessed multiple times through file descriptors, - * numbers of accesses might have been out of the range. + * they would have been protected by sort_folio(). */ - if (lru_gen_in_fault() || refs == BIT(LRU_REFS_WIDTH)) { - folio_set_workingset(folio); + if (lru_gen_in_fault() || refs >= BIT(LRU_REFS_WIDTH) - 1) { + set_mask_bits(&folio->flags, 0, LRU_REFS_MASK | BIT(PG_workingset)); mod_lruvec_state(lruvec, WORKINGSET_RESTORE_BASE + type, delta); } unlock: From patchwork Fri Dec 8 06:14:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 175605 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp5270336vqy; Thu, 7 Dec 2023 22:14:30 -0800 (PST) X-Google-Smtp-Source: AGHT+IG1dtDmFIwrSWS8PpQVnuPB3jIfLY059wnorhYHnFk8oxeFGTJRBL65NmYxwoUX3fp9C7Ry X-Received: by 2002:a05:6a00:1887:b0:6ce:784a:579 with SMTP id x7-20020a056a00188700b006ce784a0579mr3914456pfh.21.1702016070403; Thu, 07 Dec 2023 22:14:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702016070; cv=none; d=google.com; s=arc-20160816; b=kKKpfSr0KlPvLssAVm9zcd8On1UPBX5p6DaIOGE1nOTOtQggx/tzoeHtA8tM3UjvJF KUZ/xROHxzZZbYA27pCpOJbKoV3RQooHgiE6wN0zHNRNmQVnjf3w85N6NwTmF2ZWELX5 PeYnUvw0/XBZucs0RI1QZUnpaHQpoA8lvVD04ljdpLgo+oD/O4FqwMh+cPaG5jw9IlR4 yNkdUNMFXg4wQeJUiW/VhvR/NfjzgpNWtpU0Al1keTGFeawQqbB+325xMTRtRZEb/sY2 vDeDtzxq0hN09z9ygeu3zGfudj6he9i+9QH+ngUJgLUXqQkGNoc3AEa8PgWlbuNB5Nh4 3nEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=MPyzSKrXH5Tqmu9fHEEk3D/5egUUX9zjhxgsMiAOvSY=; fh=dROBwnW/FDhjmtX2p6stMmawpGpA1wY5VNkjrDJUcJg=; b=DMg/U4wyDuMITPc0G1UagxrMV1BRCBgepUvV45DiumowRBi7QvcLnyQwenKldLYn6E RBQPLCxfnD+/ojClyv5zcF7CgLP9Z3Qy+7po+EcU9S8+Te+LufKMvUtgTJNZ+MBaxlV8 qvgierQj7U2T4JORJvD3xpnwaWrV4iur8g8QGD4i+dor2RoPz4eM5tNoWNugdQm6rDd1 PJDTScI9KcEUV1PDICQX16i9JaWaW0XTu9Rk6HpJ0WUORZGoErjRbolEXM2oCFYaFGbg b8qH6alRPw3De7gwYMuZYRw/jMehp+/+9YSa8AsyLH0ZwAOKkWItwVdTaxuzGrUTWywT D4mQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=sx41H2WP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id g21-20020aa78195000000b006ce7f143603si971566pfi.184.2023.12.07.22.14.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Dec 2023 22:14:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=sx41H2WP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 913158183EC8; Thu, 7 Dec 2023 22:14:27 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233144AbjLHGOQ (ORCPT + 99 others); Fri, 8 Dec 2023 01:14:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53482 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229801AbjLHGOK (ORCPT ); Fri, 8 Dec 2023 01:14:10 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2EC5E171F for ; Thu, 7 Dec 2023 22:14:16 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-db548f8dae4so2294059276.3 for ; Thu, 07 Dec 2023 22:14:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1702016055; x=1702620855; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=MPyzSKrXH5Tqmu9fHEEk3D/5egUUX9zjhxgsMiAOvSY=; b=sx41H2WPZze9RWflhBXz/lSSAEC3LY0JnjR3LOlOzXrnPNvP8f6bQZVip5xKRq78kE jeyVo9r+Whn7Hz9qv0N/oel3byRyL53qMpC7gU+4o1KeK2CpB3Gq3Lo4VAQnqJvOnvrC KmlB3msz8cfcEqQWI7QXMuzZgt8OlwQLdhinMKYFshrAOrsYTEfdMK1F8yR3krsomZxx YXp21SOKcabj2Ogkkhg8IsF/fh95K92d3TNvuk8yQJA7B/B5ea+HVKUKy28hRVSLyHjD lynMy5FeiiLCkulYVtLNOi7E/70+EHF3uRcT5fRsjG6sCqa0TAFASwrJw3FaKzI+t+AN l2pg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702016055; x=1702620855; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MPyzSKrXH5Tqmu9fHEEk3D/5egUUX9zjhxgsMiAOvSY=; b=ukKenaKztEtDF5WHhWHmc39/+O1jyqnNBpkBVUe2YpxXdHsJce8dr3P8Uyc0wN5QQY 7gV63Tpx4QesTfyUsQTjuEamMLo1/5TFSSjHcIUAw/gpDqu045m7qqCxfGmMrqDp/v4F ZRu821kQ/XKLvMYROg0p08kcwwkFPx0xnGe8AkeSFtPndXicxm7u5uPJjKCloaVskAXt 8u1DqgF4fBbG7QuAua41JVsVRFmF6B8c9zsIFifje1F0BiygHr7h+XnzXhAW2WL0pw6M SadihBO//PLtY2u32c+75MkEEEjlTv4HxoJfLjrlbXMs7fJQ5Fs2pkiELEC+wzkumeu4 Tsxw== X-Gm-Message-State: AOJu0YzeITMglcSsVzBvzrZpzPcVEsYmsudj0w0EXpB9YIi5YY0aEYPE hnWY4UV3k57OmCzTNfaR1h26HjyUv9M= X-Received: from yuzhao2.bld.corp.google.com ([100.64.188.49]) (user=yuzhao job=sendgmr) by 2002:a25:76d5:0:b0:da0:3da9:ce08 with SMTP id r204-20020a2576d5000000b00da03da9ce08mr52921ybc.10.1702016055395; Thu, 07 Dec 2023 22:14:15 -0800 (PST) Date: Thu, 7 Dec 2023 23:14:05 -0700 In-Reply-To: <20231208061407.2125867-1-yuzhao@google.com> Mime-Version: 1.0 References: <20231208061407.2125867-1-yuzhao@google.com> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog Message-ID: <20231208061407.2125867-2-yuzhao@google.com> Subject: [PATCH mm-unstable v1 2/4] mm/mglru: try to stop at high watermarks From: Yu Zhao To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yu Zhao , Charan Teja Kalla , Jaroslav Pulchart , Kalesh Singh , stable@vger.kernel.org X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Thu, 07 Dec 2023 22:14:27 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784693203070232398 X-GMAIL-MSGID: 1784693203070232398 The initial MGLRU patchset didn't include the memcg LRU support, and it relied on should_abort_scan(), added by commit f76c83378851 ("mm: multi-gen LRU: optimize multiple memcgs"), to "backoff to avoid overshooting their aggregate reclaim target by too much". Later on when the memcg LRU was added, should_abort_scan() was deemed unnecessary, and the test results [1] showed no side effects after it was removed by commit a579086c99ed ("mm: multi-gen LRU: remove eviction fairness safeguard"). However, that test used memory.reclaim, which sets nr_to_reclaim to SWAP_CLUSTER_MAX. So it can overshoot only by SWAP_CLUSTER_MAX-1 pages, i.e., from nr_reclaimed=nr_to_reclaim-1 to nr_reclaimed=nr_to_reclaim+SWAP_CLUSTER_MAX-1. Compared with the batch size kswapd sets to nr_to_reclaim, SWAP_CLUSTER_MAX is tiny. Therefore that test isn't able to reproduce the worst case scenario, i.e., kswapd overshooting GBs on large systems and "consuming 100% CPU" (see the Closes tag). Bring back a simplified version of should_abort_scan() on top of the memcg LRU, so that kswapd stops when all eligible zones are above their respective high watermarks plus a small delta to lower the chance of KSWAPD_HIGH_WMARK_HIT_QUICKLY. Note that this only applies to order-0 reclaim, meaning compaction-induced reclaim can still run wild (which is a different problem). On Android, launching 55 apps sequentially: Before After Change pgpgin 838377172 802955040 -4% pgpgout 38037080 34336300 -10% [1] https://lore.kernel.org/20221222041905.2431096-1-yuzhao@google.com/ Fixes: a579086c99ed ("mm: multi-gen LRU: remove eviction fairness safeguard") Signed-off-by: Yu Zhao Reported-by: Charan Teja Kalla Reported-by: Jaroslav Pulchart Closes: https://lore.kernel.org/CAK8fFZ4DY+GtBA40Pm7Nn5xCHy+51w3sfxPqkqpqakSXYyX+Wg@mail.gmail.com/ Tested-by: Jaroslav Pulchart Tested-by: Kalesh Singh Cc: stable@vger.kernel.org --- mm/vmscan.c | 36 ++++++++++++++++++++++++++++-------- 1 file changed, 28 insertions(+), 8 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index e67631c60ac0..10e964cd0efe 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4676,20 +4676,41 @@ static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc, bool return try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, false) ? -1 : 0; } -static unsigned long get_nr_to_reclaim(struct scan_control *sc) +static bool should_abort_scan(struct lruvec *lruvec, struct scan_control *sc) { + int i; + enum zone_watermarks mark; + /* don't abort memcg reclaim to ensure fairness */ if (!root_reclaim(sc)) - return -1; + return false; - return max(sc->nr_to_reclaim, compact_gap(sc->order)); + if (sc->nr_reclaimed >= max(sc->nr_to_reclaim, compact_gap(sc->order))) + return true; + + /* check the order to exclude compaction-induced reclaim */ + if (!current_is_kswapd() || sc->order) + return false; + + mark = sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING ? + WMARK_PROMO : WMARK_HIGH; + + for (i = 0; i <= sc->reclaim_idx; i++) { + struct zone *zone = lruvec_pgdat(lruvec)->node_zones + i; + unsigned long size = wmark_pages(zone, mark) + MIN_LRU_BATCH; + + if (managed_zone(zone) && !zone_watermark_ok(zone, 0, size, sc->reclaim_idx, 0)) + return false; + } + + /* kswapd should abort if all eligible zones are safe */ + return true; } static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) { long nr_to_scan; unsigned long scanned = 0; - unsigned long nr_to_reclaim = get_nr_to_reclaim(sc); int swappiness = get_swappiness(lruvec, sc); /* clean file folios are more likely to exist */ @@ -4711,7 +4732,7 @@ static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) if (scanned >= nr_to_scan) break; - if (sc->nr_reclaimed >= nr_to_reclaim) + if (should_abort_scan(lruvec, sc)) break; cond_resched(); @@ -4772,7 +4793,6 @@ static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc) struct lru_gen_folio *lrugen; struct mem_cgroup *memcg; const struct hlist_nulls_node *pos; - unsigned long nr_to_reclaim = get_nr_to_reclaim(sc); bin = first_bin = get_random_u32_below(MEMCG_NR_BINS); restart: @@ -4805,7 +4825,7 @@ static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc) rcu_read_lock(); - if (sc->nr_reclaimed >= nr_to_reclaim) + if (should_abort_scan(lruvec, sc)) break; } @@ -4816,7 +4836,7 @@ static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc) mem_cgroup_put(memcg); - if (sc->nr_reclaimed >= nr_to_reclaim) + if (!is_a_nulls(pos)) return; /* restart if raced with lru_gen_rotate_memcg() */ From patchwork Fri Dec 8 06:14:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 175606 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp5270366vqy; Thu, 7 Dec 2023 22:14:34 -0800 (PST) X-Google-Smtp-Source: AGHT+IESSIpzIgq3mIvqGMHlveP2KJHWJKkl5pLkGKjrz5YtLig6g8HScDUnJ8VCyw4vvANVrvwG X-Received: by 2002:a05:6a00:21cd:b0:68f:c078:b0c9 with SMTP id t13-20020a056a0021cd00b0068fc078b0c9mr505807pfj.11.1702016074620; Thu, 07 Dec 2023 22:14:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702016074; cv=none; d=google.com; s=arc-20160816; b=CknxwboQyeddUK2ylIgIHNUyysnWV2fzgGXWN9RuJ0SMb8oDNiyHt8y6Vo/X1NqhZQ dQccXYz9rWg6SaDLybuF5LvSZ5PnzrJS/eiLNlKWGX52jV1S8891m+YIJGA3TcqGJ8v+ dYHqex+UVIH5JWNebKrs9eu93J+rnRruHgyThW78kKssSD4Jb2gaavRa1FKnrD7scxdT L38mWWeUv4qkyv1vfW1ObhmbwTXH6EziO2FY4rUu/FDc40FZm9tkGewzzZbfLR634o/J g10gZBzELpztmG0ohNAABNAZzcqAtmyhXMB/KSRVUxSJcLkDT9St6aYk/nO0OR2GBO4A haKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=bVLrJAGnIVd+3pf7kb6jXJD9XT/HKbFyDQ4DzyYCxV8=; fh=y0ulJ+i/IiXdhkZjff7CPUVioFi+JhpR4i6gNnaoQh0=; b=asiximS+wCBJAxtnAbaqKlAR9IOUs8HwW8NiBbobFXlsTy8ysasOCF6QbS2qaNJ92O EYIilUfVhPB30UI2SScRMGHpWIyaH+u6UpDrUCteTkpGrmh4+Eef7URCL8AvSPzxXuHW PCHrMAeGg5UkQ6NoMn0nDdNmBYWR0f0WYHNLng+vOneSjBpW3isDzFXogIHhOkLUB/xT KHahfL7Yxqforg1ahKPvoQHKFCS/o+YnY6vZeDMPTuCKsobFz7+ICNo9zNRW27My+jvX XFjt7Wyo4EspaK05uU0XkB4tFec2jOMa/EAimU1XiQLpsnkoEMee/2/3SMbSAaEMeGY8 eA1w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=DYMfm5ep; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id c14-20020a634e0e000000b005bd04873387si961754pgb.105.2023.12.07.22.14.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Dec 2023 22:14:34 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=DYMfm5ep; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 0EF8D85A6CD5; Thu, 7 Dec 2023 22:14:31 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1573225AbjLHGOV (ORCPT + 99 others); Fri, 8 Dec 2023 01:14:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57584 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233100AbjLHGOM (ORCPT ); Fri, 8 Dec 2023 01:14:12 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A44541720 for ; Thu, 7 Dec 2023 22:14:18 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-5ddd64f83a4so16397587b3.0 for ; Thu, 07 Dec 2023 22:14:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1702016058; x=1702620858; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=bVLrJAGnIVd+3pf7kb6jXJD9XT/HKbFyDQ4DzyYCxV8=; b=DYMfm5eproKgtvufpK7UWF4bD5BIst+8BSRd4OIrXlzLFrW+BE4SangKzXmimaiMvU mhPWsiJi2CNur9lIL21AjXP+8mMRZ3olgR/DGFDMfasTqoaCil282hBIMrIrocOLfNfd Td9/+FHAKknL9PzG53MJ6nZUjz5sqVh0ZqL+KhHjMSFjLiJkbsk+Gr1mRhvjViEumZL6 nCzeIf2h+08CS6q6STOPN1nPJojI4r+BtIohMd+qYEohNrtwgzRZBOExsjrbNwj06qj0 w2jhn3Jquo0nNYJbHGVH3o8q+5rce0RHycxEsfggL3Qa1/pGdi4GO0chXzOVPjRToGnd RDYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702016058; x=1702620858; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=bVLrJAGnIVd+3pf7kb6jXJD9XT/HKbFyDQ4DzyYCxV8=; b=EC4rvbxu3jvXSWZ4i6t2H2DrkRzuOmzndyPwhsH+qYY4DhxTXG/dGeAkTYbgRqRtP9 Q2jKy3cqYNjrKmwJQrfXk7pEfoOS6nMY4qeXfh137bXT6AQ93/Jzby3bUrZyoXbje8oQ G/q9lQA42ShVJaUqXQwsLHhLh3Yqiw6NVtvmc3Idb4veE8XiQCaIwH6TAWqAa8jBgw+l HRAfYLGFsvjDpxWIvEQv9E6m4S853prkMq1gitGx9oQ8krOqlFm+gimzcx6LVu6FDBCM MZ8PTZWuBdexCrfyoR8gldYHsqAjGbQK8DegPj9FG2kYwJPxQQY6h+aTuMVPcnUwNJJL lC5Q== X-Gm-Message-State: AOJu0YzbgqPX1oUOez7culpXpraCi9ZCiG+ZNbIm7epWin1KQz0EQ8Zm /kMCqvS4lgP8d1s0H6b3x4AJP4AwBgg= X-Received: from yuzhao2.bld.corp.google.com ([100.64.188.49]) (user=yuzhao job=sendgmr) by 2002:a81:af51:0:b0:5d7:1941:61d3 with SMTP id x17-20020a81af51000000b005d7194161d3mr44861ywj.9.1702016057776; Thu, 07 Dec 2023 22:14:17 -0800 (PST) Date: Thu, 7 Dec 2023 23:14:06 -0700 In-Reply-To: <20231208061407.2125867-1-yuzhao@google.com> Mime-Version: 1.0 References: <20231208061407.2125867-1-yuzhao@google.com> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog Message-ID: <20231208061407.2125867-3-yuzhao@google.com> Subject: [PATCH mm-unstable v1 3/4] mm/mglru: respect min_ttl_ms with memcgs From: Yu Zhao To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yu Zhao , "T . J . Mercier" , stable@vger.kernel.org X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on howler.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Thu, 07 Dec 2023 22:14:31 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784693207442899893 X-GMAIL-MSGID: 1784693207442899893 While investigating kswapd "consuming 100% CPU" [1] (also see "mm/mglru: try to stop at high watermarks"), it was discovered that the memcg LRU can breach the thrashing protection imposed by min_ttl_ms. Before the memcg LRU: kswapd() shrink_node_memcgs() mem_cgroup_iter() inc_max_seq() // always hit a different memcg lru_gen_age_node() mem_cgroup_iter() check the timestamp of the oldest generation After the memcg LRU: kswapd() shrink_many() restart: iterate the memcg LRU: inc_max_seq() // occasionally hit the same memcg if raced with lru_gen_rotate_memcg(): goto restart lru_gen_age_node() mem_cgroup_iter() check the timestamp of the oldest generation Specifically, when the restart happens in shrink_many(), it needs to stick with the (memcg LRU) generation it began with. In other words, it should neither re-read memcg_lru->seq nor age an lruvec of a different generation. Otherwise it can hit the same memcg multiple times without giving lru_gen_age_node() a chance to check the timestamp of that memcg's oldest generation (against min_ttl_ms). [1] https://lore.kernel.org/CAK8fFZ4DY+GtBA40Pm7Nn5xCHy+51w3sfxPqkqpqakSXYyX+Wg@mail.gmail.com/ Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists") Signed-off-by: Yu Zhao Tested-by: T.J. Mercier Cc: stable@vger.kernel.org --- include/linux/mmzone.h | 30 +++++++++++++++++------------- mm/vmscan.c | 30 ++++++++++++++++-------------- 2 files changed, 33 insertions(+), 27 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index b23bc5390240..e3093ef9530f 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -510,33 +510,37 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); * the old generation, is incremented when all its bins become empty. * * There are four operations: - * 1. MEMCG_LRU_HEAD, which moves an memcg to the head of a random bin in its + * 1. MEMCG_LRU_HEAD, which moves a memcg to the head of a random bin in its * current generation (old or young) and updates its "seg" to "head"; - * 2. MEMCG_LRU_TAIL, which moves an memcg to the tail of a random bin in its + * 2. MEMCG_LRU_TAIL, which moves a memcg to the tail of a random bin in its * current generation (old or young) and updates its "seg" to "tail"; - * 3. MEMCG_LRU_OLD, which moves an memcg to the head of a random bin in the old + * 3. MEMCG_LRU_OLD, which moves a memcg to the head of a random bin in the old * generation, updates its "gen" to "old" and resets its "seg" to "default"; - * 4. MEMCG_LRU_YOUNG, which moves an memcg to the tail of a random bin in the + * 4. MEMCG_LRU_YOUNG, which moves a memcg to the tail of a random bin in the * young generation, updates its "gen" to "young" and resets its "seg" to * "default". * * The events that trigger the above operations are: * 1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD; - * 2. The first attempt to reclaim an memcg below low, which triggers + * 2. The first attempt to reclaim a memcg below low, which triggers * MEMCG_LRU_TAIL; - * 3. The first attempt to reclaim an memcg below reclaimable size threshold, + * 3. The first attempt to reclaim a memcg below reclaimable size threshold, * which triggers MEMCG_LRU_TAIL; - * 4. The second attempt to reclaim an memcg below reclaimable size threshold, + * 4. The second attempt to reclaim a memcg below reclaimable size threshold, * which triggers MEMCG_LRU_YOUNG; - * 5. Attempting to reclaim an memcg below min, which triggers MEMCG_LRU_YOUNG; + * 5. Attempting to reclaim a memcg below min, which triggers MEMCG_LRU_YOUNG; * 6. Finishing the aging on the eviction path, which triggers MEMCG_LRU_YOUNG; - * 7. Offlining an memcg, which triggers MEMCG_LRU_OLD. + * 7. Offlining a memcg, which triggers MEMCG_LRU_OLD. * - * Note that memcg LRU only applies to global reclaim, and the round-robin - * incrementing of their max_seq counters ensures the eventual fairness to all - * eligible memcgs. For memcg reclaim, it still relies on mem_cgroup_iter(). + * Notes: + * 1. Memcg LRU only applies to global reclaim, and the round-robin incrementing + * of their max_seq counters ensures the eventual fairness to all eligible + * memcgs. For memcg reclaim, it still relies on mem_cgroup_iter(). + * 2. There are only two valid generations: old (seq) and young (seq+1). + * MEMCG_NR_GENS is set to three so that when reading the generation counter + * locklessly, a stale value (seq-1) does not wraparound to young. */ -#define MEMCG_NR_GENS 2 +#define MEMCG_NR_GENS 3 #define MEMCG_NR_BINS 8 struct lru_gen_memcg { diff --git a/mm/vmscan.c b/mm/vmscan.c index 10e964cd0efe..cac38e9cac86 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4117,6 +4117,9 @@ static void lru_gen_rotate_memcg(struct lruvec *lruvec, int op) else VM_WARN_ON_ONCE(true); + WRITE_ONCE(lruvec->lrugen.seg, seg); + WRITE_ONCE(lruvec->lrugen.gen, new); + hlist_nulls_del_rcu(&lruvec->lrugen.list); if (op == MEMCG_LRU_HEAD || op == MEMCG_LRU_OLD) @@ -4127,9 +4130,6 @@ static void lru_gen_rotate_memcg(struct lruvec *lruvec, int op) pgdat->memcg_lru.nr_memcgs[old]--; pgdat->memcg_lru.nr_memcgs[new]++; - lruvec->lrugen.gen = new; - WRITE_ONCE(lruvec->lrugen.seg, seg); - if (!pgdat->memcg_lru.nr_memcgs[old] && old == get_memcg_gen(pgdat->memcg_lru.seq)) WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1); @@ -4152,11 +4152,11 @@ void lru_gen_online_memcg(struct mem_cgroup *memcg) gen = get_memcg_gen(pgdat->memcg_lru.seq); + lruvec->lrugen.gen = gen; + hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[gen][bin]); pgdat->memcg_lru.nr_memcgs[gen]++; - lruvec->lrugen.gen = gen; - spin_unlock_irq(&pgdat->memcg_lru.lock); } } @@ -4663,7 +4663,7 @@ static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc, bool DEFINE_MAX_SEQ(lruvec); if (mem_cgroup_below_min(sc->target_mem_cgroup, memcg)) - return 0; + return -1; if (!should_run_aging(lruvec, max_seq, sc, can_swap, &nr_to_scan)) return nr_to_scan; @@ -4738,7 +4738,7 @@ static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) cond_resched(); } - /* whether try_to_inc_max_seq() was successful */ + /* whether this lruvec should be rotated */ return nr_to_scan < 0; } @@ -4792,13 +4792,13 @@ static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc) struct lruvec *lruvec; struct lru_gen_folio *lrugen; struct mem_cgroup *memcg; - const struct hlist_nulls_node *pos; + struct hlist_nulls_node *pos; + gen = get_memcg_gen(READ_ONCE(pgdat->memcg_lru.seq)); bin = first_bin = get_random_u32_below(MEMCG_NR_BINS); restart: op = 0; memcg = NULL; - gen = get_memcg_gen(READ_ONCE(pgdat->memcg_lru.seq)); rcu_read_lock(); @@ -4809,6 +4809,10 @@ static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc) } mem_cgroup_put(memcg); + memcg = NULL; + + if (gen != READ_ONCE(lrugen->gen)) + continue; lruvec = container_of(lrugen, struct lruvec, lrugen); memcg = lruvec_memcg(lruvec); @@ -4893,16 +4897,14 @@ static void set_initial_priority(struct pglist_data *pgdat, struct scan_control if (sc->priority != DEF_PRIORITY || sc->nr_to_reclaim < MIN_LRU_BATCH) return; /* - * Determine the initial priority based on ((total / MEMCG_NR_GENS) >> - * priority) * reclaimed_to_scanned_ratio = nr_to_reclaim, where the - * estimated reclaimed_to_scanned_ratio = inactive / total. + * Determine the initial priority based on + * (total >> priority) * reclaimed_to_scanned_ratio = nr_to_reclaim, + * where reclaimed_to_scanned_ratio = inactive / total. */ reclaimable = node_page_state(pgdat, NR_INACTIVE_FILE); if (get_swappiness(lruvec, sc)) reclaimable += node_page_state(pgdat, NR_INACTIVE_ANON); - reclaimable /= MEMCG_NR_GENS; - /* round down reclaimable and round up sc->nr_to_reclaim */ priority = fls_long(reclaimable) - 1 - fls_long(sc->nr_to_reclaim - 1); From patchwork Fri Dec 8 06:14:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 175607 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp5270409vqy; Thu, 7 Dec 2023 22:14:41 -0800 (PST) X-Google-Smtp-Source: AGHT+IEFPoPRNH1TfQj4OgciW5APbzQx5Iw9WKJbN8zMuVup2zGSNFt1AHLwDYuAYd3DQJIQ/NzD X-Received: by 2002:a05:6a00:15d1:b0:6ce:521e:23dc with SMTP id o17-20020a056a0015d100b006ce521e23dcmr555200pfu.9.1702016080797; Thu, 07 Dec 2023 22:14:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702016080; cv=none; d=google.com; s=arc-20160816; b=FcYx25V01mgpI17NnJuKh8Ri/YLC0V8cZEXGZR7Iwz1PnWBFuCHgllSUvqIV8wHQeu M2yjoHaGQ6KKXVuhyMRcHHd+tnY9GkabrImOr218Utm0M9N52eg9mFb3vsinZlr2/hSu zxM57bHLH8j9j1bsWU76qiAbyMnJkyv1ehMyvz/YZWLuXpLJ3b9/ggThjVVFBtf08iFd G2t85I8yZGpahvXC24uYNxaQZKC4Zz69AyxFbF15cgAyuvx9wIe8rLvr5PE1kB41KZQ1 3nQG9oSUkPwWFIrZoJYAFh1eltAv1hCbtuaGvHW550AthltY2MOD79+xCubYzZ7PJPI0 JTGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=XyO/ljh9FKCeCHVPdhFy9bPgiCVl0Vc+13vD9r1JAPY=; fh=y0ulJ+i/IiXdhkZjff7CPUVioFi+JhpR4i6gNnaoQh0=; b=c7zNyBdvG3VWqqhqVUpmCt0GZObMZ/xduBq1nV8yLb9esqtgQJhursl/jzLxO73kOY qworhQLTMLnZXjwPvawRpw3HmktoyVYXU6HWVDoCebS/HduMbBdFTqtIDvzCOLxLkNPM IUJ1+ilFs5b85+bkUWZI86kkk5xA1gzVInVGKx28/iTrzuskafLEt9bdjln9bPh/JMVc XS4JBkHvHwJe4890Uun9fkJEhUuHZvxGYt4sJyojJzRAVL6q+Sk2gzSZQaTCPW6j31tb 8o+1S9Xvz3yMy9AzDqEd3Xcwryxm9In4fouYGhsKt/8WJh/bCiFLA8PpaMO3BRyXojdf aC1g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=GUYrbsor; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from morse.vger.email (morse.vger.email. [2620:137:e000::3:1]) by mx.google.com with ESMTPS id w9-20020a631609000000b005c6b4e26664si989370pgl.548.2023.12.07.22.14.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Dec 2023 22:14:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) client-ip=2620:137:e000::3:1; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=GUYrbsor; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id 65B6E833D576; Thu, 7 Dec 2023 22:14:37 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1573209AbjLHGOS (ORCPT + 99 others); Fri, 8 Dec 2023 01:14:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57586 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233113AbjLHGOO (ORCPT ); Fri, 8 Dec 2023 01:14:14 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA9E11722 for ; Thu, 7 Dec 2023 22:14:20 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-5d7e7e10231so11270377b3.1 for ; Thu, 07 Dec 2023 22:14:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1702016060; x=1702620860; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=XyO/ljh9FKCeCHVPdhFy9bPgiCVl0Vc+13vD9r1JAPY=; b=GUYrbsor2KKaC7Oq15b/4pulJKhW+gjsCS0minnbPPwyvsHQFaZzo35B6Wrlaqc3bG ItvYxkU+76n3Gh/uZHpFEDogn6aZo9mMc3qbHzjaFYGQbhtia40wLFS9uGoSVCTs7dwQ PYJhdgNM5xAT1dQlLe07Yn03/E+LTNyxZn6cTs9EWKDZqInOttTFuT+H/C6WD2rx7aBZ TwTYMvJUIcBsfGP8pHpDUr+9nsqnx6o577bt/yZ9mi+L4sGML9Kk/JP44HPVBhXaAzSO dU1FcKVBTT4wm2ZSrl2Bi34BH1HLlEamXWcWlNHdEjmn0DXF5dkorJAp4HYCDis/F9uY stfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702016060; x=1702620860; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=XyO/ljh9FKCeCHVPdhFy9bPgiCVl0Vc+13vD9r1JAPY=; b=HWiDkQRTlFtF0iZPV9UP4nA9FuyqFVBMoyg8orfOygEjpjPIN4ZfI7DfeAJihBtHJg IIYh84QZLqvVasVM96JelCw7YC1enF685uXT6xxc1tqBAMykoyLk26qzGNz0uIIXFqpf orgA4q9ioci3bB0v7SfcfSMVAA09Ub9UiMDefhxHR7BCNfM0GjWX4/FAFSh1S4nxPWNy pfuznQR0Wv9cgRj0qemCnBwgWBTpUf9mgBaqlB8W1GmAuKqUctiGEOf7CZ7PWJgtFrB4 RVgqrayKQXNaBC9ST9nX5jpvVMpqdoz2Fegn2SSV+xCV4oSg3IBSB0bwswKMqGRfX4Lp DsKQ== X-Gm-Message-State: AOJu0YwXu1oXACt/gh/9OW+W3sw/Jw5UnQR9GqjA7iQM/QAegP5dWNi9 gkz5HaGDcCGu/N8AoGxpj6FmKRefSok= X-Received: from yuzhao2.bld.corp.google.com ([100.64.188.49]) (user=yuzhao job=sendgmr) by 2002:a05:690c:d05:b0:5d3:e8b8:e1fd with SMTP id cn5-20020a05690c0d0500b005d3e8b8e1fdmr7250ywb.3.1702016060001; Thu, 07 Dec 2023 22:14:20 -0800 (PST) Date: Thu, 7 Dec 2023 23:14:07 -0700 In-Reply-To: <20231208061407.2125867-1-yuzhao@google.com> Mime-Version: 1.0 References: <20231208061407.2125867-1-yuzhao@google.com> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog Message-ID: <20231208061407.2125867-4-yuzhao@google.com> Subject: [PATCH mm-unstable v1 4/4] mm/mglru: reclaim offlined memcgs harder From: Yu Zhao To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yu Zhao , "T . J . Mercier" , stable@vger.kernel.org X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Thu, 07 Dec 2023 22:14:37 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784693213841916457 X-GMAIL-MSGID: 1784693213841916457 In the effort to reduce zombie memcgs [1], it was discovered that the memcg LRU doesn't apply enough pressure on offlined memcgs. Specifically, instead of rotating them to the tail of the current generation (MEMCG_LRU_TAIL) for a second attempt, it moves them to the next generation (MEMCG_LRU_YOUNG) after the first attempt. Not applying enough pressure on offlined memcgs can cause them to build up, and this can be particularly harmful to memory-constrained systems. On Pixel 8 Pro, launching apps for 50 cycles: Before After Change Zombie memcgs 45 35 -22% [1] https://lore.kernel.org/CABdmKX2M6koq4Q0Cmp_-=wbP0Qa190HdEGGaHfxNS05gAkUtPA@mail.gmail.com/ Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists") Signed-off-by: Yu Zhao Reported-by: T.J. Mercier Tested-by: T.J. Mercier Cc: stable@vger.kernel.org --- include/linux/mmzone.h | 8 ++++---- mm/vmscan.c | 24 ++++++++++++++++-------- 2 files changed, 20 insertions(+), 12 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index e3093ef9530f..2efd3be484fd 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -524,10 +524,10 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); * 1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD; * 2. The first attempt to reclaim a memcg below low, which triggers * MEMCG_LRU_TAIL; - * 3. The first attempt to reclaim a memcg below reclaimable size threshold, - * which triggers MEMCG_LRU_TAIL; - * 4. The second attempt to reclaim a memcg below reclaimable size threshold, - * which triggers MEMCG_LRU_YOUNG; + * 3. The first attempt to reclaim a memcg offlined or below reclaimable size + * threshold, which triggers MEMCG_LRU_TAIL; + * 4. The second attempt to reclaim a memcg offlined or below reclaimable size + * threshold, which triggers MEMCG_LRU_YOUNG; * 5. Attempting to reclaim a memcg below min, which triggers MEMCG_LRU_YOUNG; * 6. Finishing the aging on the eviction path, which triggers MEMCG_LRU_YOUNG; * 7. Offlining a memcg, which triggers MEMCG_LRU_OLD. diff --git a/mm/vmscan.c b/mm/vmscan.c index cac38e9cac86..dad4b80b04cd 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4626,7 +4626,12 @@ static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq, } /* try to scrape all its memory if this memcg was deleted */ - *nr_to_scan = mem_cgroup_online(memcg) ? (total >> sc->priority) : total; + if (!mem_cgroup_online(memcg)) { + *nr_to_scan = total; + return false; + } + + *nr_to_scan = total >> sc->priority; /* * The aging tries to be lazy to reduce the overhead, while the eviction @@ -4747,14 +4752,9 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc) bool success; unsigned long scanned = sc->nr_scanned; unsigned long reclaimed = sc->nr_reclaimed; - int seg = lru_gen_memcg_seg(lruvec); struct mem_cgroup *memcg = lruvec_memcg(lruvec); struct pglist_data *pgdat = lruvec_pgdat(lruvec); - /* see the comment on MEMCG_NR_GENS */ - if (!lruvec_is_sizable(lruvec, sc)) - return seg != MEMCG_LRU_TAIL ? MEMCG_LRU_TAIL : MEMCG_LRU_YOUNG; - mem_cgroup_calculate_protection(NULL, memcg); if (mem_cgroup_below_min(NULL, memcg)) @@ -4762,7 +4762,7 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc) if (mem_cgroup_below_low(NULL, memcg)) { /* see the comment on MEMCG_NR_GENS */ - if (seg != MEMCG_LRU_TAIL) + if (lru_gen_memcg_seg(lruvec) != MEMCG_LRU_TAIL) return MEMCG_LRU_TAIL; memcg_memory_event(memcg, MEMCG_LOW); @@ -4778,7 +4778,15 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc) flush_reclaim_state(sc); - return success ? MEMCG_LRU_YOUNG : 0; + if (success && mem_cgroup_online(memcg)) + return MEMCG_LRU_YOUNG; + + if (!success && lruvec_is_sizable(lruvec, sc)) + return 0; + + /* one retry if offlined or too small */ + return lru_gen_memcg_seg(lruvec) != MEMCG_LRU_TAIL ? + MEMCG_LRU_TAIL : MEMCG_LRU_YOUNG; } #ifdef CONFIG_MEMCG