From patchwork Wed Jun 21 18:04:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanchu Xie X-Patchwork-Id: 111251 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp4559331vqr; Wed, 21 Jun 2023 11:25:03 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7yHn3AhpPg/GmfuK2hnKVeUTDXGpqA+wE7KNZhlP9phrGiziMFjWdZrn6ZmyOpSZ9yHDCF X-Received: by 2002:a05:6a00:2352:b0:668:74e9:8efb with SMTP id j18-20020a056a00235200b0066874e98efbmr10030325pfj.8.1687371902153; Wed, 21 Jun 2023 11:25:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687371902; cv=none; d=google.com; s=arc-20160816; b=nexbIICBD9+Ch3UHXQA/kfXnokR7Ikmjsiu0cQBUx+FkZEMoAw7Rk5BtCMYy1VysQz SZHQxyGogd7kO0JClWdn5Bd8tQia2d0fYDStKJttzKarNGbtXoJhx7UEwsF9xpecZABA TXw4euZxOajbPk9OXcU7Vu+wR84hbuwEyo5m3GOu4PWqYS0sEW8ACoh9fGT2n0opgxXd MewtKmzJpC9OAn0cKJyrUONAGpEp5As6EXeIj48vvatVjgSARHrfbjhpLQkbFHNmja8m pzJ/TNdqtDtljgJyuNTjwKX4Q/sAREfA4Q2cwXBblZv2NETp9C8M0iIj+wzll2W/C+qk PHQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=OF0x5iXR2rtborFmFX3mVbCSq6NeXvlAh2BRuHag0JE=; b=BSliCL9HbpGugcrt5sC/dBHNFK5bGBRY/CBGweNo7eXRoTrgWAp9lArcqVhycEDbhZ syS3qwV7PGrFK8wnv4ZRvAFRVgIqFqcoMtxN0V52nb5eBYIVaQDnXmO8kPfRk+o2PSbD TaxynMdC/kGFzL0TRgn53y5Td09wejp2xLxIbx+lONw/JKurwZSYtgeORtraQ1XgyWF0 +oM5/AG9jbDrzsKUYG6goBGQxuyNMvqgYYNfo2y/eEKbHMilU1eNO0A5pKG51Dnn27Ty ptShjiVq6wF3mLzKBKSSa55Yya40wccbpyqwc86ghJ8gYISY/f/PzDvWs4DRWAVDIhUp KXkg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=1LbcOPvi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v21-20020a637a15000000b0053faf4483fasi4607618pgc.143.2023.06.21.11.24.48; Wed, 21 Jun 2023 11:25:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=1LbcOPvi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230036AbjFUSRQ (ORCPT + 99 others); Wed, 21 Jun 2023 14:17:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45218 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231586AbjFUSRI (ORCPT ); Wed, 21 Jun 2023 14:17:08 -0400 Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com [IPv6:2607:f8b0:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E6BBA1992 for ; Wed, 21 Jun 2023 11:17:05 -0700 (PDT) Received: by mail-pl1-x64a.google.com with SMTP id d9443c01a7336-1b53aa1f3ffso29397255ad.1 for ; Wed, 21 Jun 2023 11:17:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687371425; x=1689963425; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=OF0x5iXR2rtborFmFX3mVbCSq6NeXvlAh2BRuHag0JE=; b=1LbcOPviyE1RonFFFO2iRhIHUi81/c0so2hP4+VMcspAnZIkymOf0iYBytVRabaeUj ycl2o8rabRTKDi0TntoygE8hzAJWWLwNNKjQ3PIqTegl/qovifqQmm17729vy1wOcJAb ipMUT0w+5oN7kjrvtSucwhWwkXJW4ZIukhr9qhGT9SGdyrzYe24E8iKoGwtx45fxHbFO N7keCLJrpODVRUe3OL50ehki7wzAWuUgb2eF/Kmp6C/b1p8AHQIs5a3J9j6vb5mp36Vn GXUnR3XgqklGviIoRppCczDbFXuHcyV8jUVaZLT/luwa6Rb3AGJ7BeEn95ssG1HOFz5G 4d9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687371425; x=1689963425; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=OF0x5iXR2rtborFmFX3mVbCSq6NeXvlAh2BRuHag0JE=; b=QceBu96O7WDeCllm772PNQnCaynRGJjECwsKVVHkhwr/QVQx7RTlUIp2ID2LSP6DsM KMAjvRaQGGZ3my16ooNlxue8SRXatyR5LsKhrnExDO9wzRw2oaVweyT6ofo4W9dfNrSA Gb6/jrbBxWhUMq0IPfVCexDJQJIeVklHjNDEon7a9X7bKKd+NS0GUHbDcGBCxZgCWrp6 iwYX/wBMnv7ov6XF/DxMrlrrzg3H5jR0KkZoB1iL5nFxLAyI3DQNNAFB1wdXxZOopgmX +K+8h77/L9vjwHlxjuHSv//ecdQMRfyIwCLPbK4j3qGy0X/YqW8B5rlWdLS0mkitiXad sWVw== X-Gm-Message-State: AC+VfDza35CSOnPsSuaEfudRW0zjwy79wnsNMPrBu5iUiECHWwJvzx7M K3kDE30SFFFLC02BYvnKwPXgouyGVZFi X-Received: from yuanchu.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:4024]) (user=yuanchu job=sendgmr) by 2002:a17:902:c215:b0:1b5:2b7b:1233 with SMTP id 21-20020a170902c21500b001b52b7b1233mr2122074pll.12.1687371425190; Wed, 21 Jun 2023 11:17:05 -0700 (PDT) Date: Wed, 21 Jun 2023 18:04:49 +0000 In-Reply-To: <20230621180454.973862-1-yuanchu@google.com> Mime-Version: 1.0 References: <20230621180454.973862-1-yuanchu@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Message-ID: <20230621180454.973862-2-yuanchu@google.com> Subject: [RFC PATCH v2 1/6] mm: aggregate working set information into histograms From: Yuanchu Xie To: Greg Kroah-Hartman , "Rafael J . Wysocki" , "Michael S . Tsirkin" , David Hildenbrand , Jason Wang , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Yu Zhao , Kefeng Wang , Kairui Song , Yosry Ahmed , Yuanchu Xie , "T . J . Alumbaugh" Cc: Wei Xu , SeongJae Park , Sudarshan Rajagopalan , kai.huang@intel.com, hch@lst.de, jon@nutanix.com, Aneesh Kumar K V , Matthew Wilcox , Vasily Averin , linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-mm@kvack.org, cgroups@vger.kernel.org X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769337679770913330?= X-GMAIL-MSGID: =?utf-8?q?1769337679770913330?= Hierarchically aggregate all memcgs' MGLRU generations and their page counts into working set histograms. The histograms break down the system's working set per-node, per-anon/file. Signed-off-by: T.J. Alumbaugh Signed-off-by: Yuanchu Xie --- drivers/base/node.c | 3 + include/linux/mmzone.h | 4 + include/linux/wsr.h | 73 +++++++++++ mm/Kconfig | 7 + mm/Makefile | 1 + mm/internal.h | 1 + mm/mmzone.c | 3 + mm/vmscan.c | 3 + mm/wsr.c | 288 +++++++++++++++++++++++++++++++++++++++++ 9 files changed, 383 insertions(+) create mode 100644 include/linux/wsr.h create mode 100644 mm/wsr.c diff --git a/drivers/base/node.c b/drivers/base/node.c index faf3597a96da9..e326debe22d8f 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -21,6 +21,7 @@ #include #include #include +#include static struct bus_type node_subsys = { .name = "node", @@ -616,6 +617,7 @@ static int register_node(struct node *node, int num) } else { hugetlb_register_node(node); compaction_register_node(node); + wsr_register_node(node); } return error; @@ -632,6 +634,7 @@ void unregister_node(struct node *node) { hugetlb_unregister_node(node); compaction_unregister_node(node); + wsr_unregister_node(node); node_remove_accesses(node); node_remove_caches(node); device_unregister(&node->dev); diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index cd28a100d9e4f..96f0d8f3584e4 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -21,6 +21,7 @@ #include #include #include +#include #include /* Free memory management - zoned buddy allocator. */ @@ -527,7 +528,10 @@ struct lruvec { struct lru_gen_struct lrugen; /* to concurrently iterate lru_gen_mm_list */ struct lru_gen_mm_state mm_state; +#ifdef CONFIG_WSR + struct wsr __wsr; #endif +#endif /* CONFIG_LRU_GEN */ #ifdef CONFIG_MEMCG struct pglist_data *pgdat; #endif diff --git a/include/linux/wsr.h b/include/linux/wsr.h new file mode 100644 index 0000000000000..fa46b4d61177d --- /dev/null +++ b/include/linux/wsr.h @@ -0,0 +1,73 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_WSR_H +#define _LINUX_WSR_H + +#include +#include + +struct node; +struct lruvec; +struct mem_cgroup; +struct pglist_data; +struct scan_control; +struct lru_gen_mm_walk; + +#ifdef CONFIG_WSR +#define ANON_AND_FILE 2 + +#define MIN_NR_BINS 4 +#define MAX_NR_BINS 16 + +struct ws_bin { + unsigned long idle_age; + unsigned long nr_pages[ANON_AND_FILE]; +}; + +struct wsr { + /* protects bins */ + struct mutex bins_lock; + struct ws_bin bins[MAX_NR_BINS]; +}; + +void wsr_register_node(struct node *node); +void wsr_unregister_node(struct node *node); + +void wsr_init(struct lruvec *lruvec); +void wsr_destroy(struct lruvec *lruvec); +struct wsr *lruvec_wsr(struct lruvec *lruvec); + +ssize_t wsr_intervals_ms_parse(char *src, struct ws_bin *bins); + +/* + * wsr->bins needs to be locked + */ +void wsr_refresh(struct wsr *wsr, struct mem_cgroup *root, + struct pglist_data *pgdat); +#else +struct ws_bin; +struct wsr; + +static inline void wsr_register_node(struct node *node) +{ +} +static inline void wsr_unregister_node(struct node *node) +{ +} +static inline void wsr_init(struct lruvec *lruvec) +{ +} +static inline void wsr_destroy(struct lruvec *lruvec) +{ +} +/* lruvec_wsr is intentially omitted */ +static inline ssize_t wsr_intervals_ms_parse(char *src, struct ws_bin *bins) +{ + return -EINVAL; +} +static inline void wsr_refresh(struct wsr *wsr, struct mem_cgroup *root, + struct pglist_data *pgdat) +{ +} +#endif /* CONFIG_WSR */ + +#endif /* _LINUX_WSR_H */ diff --git a/mm/Kconfig b/mm/Kconfig index ff7b209dec055..8a84c1402159a 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1183,6 +1183,13 @@ config LRU_GEN_STATS This option has a per-memcg and per-node memory overhead. # } +config WSR + bool "Working set reporting" + depends on LRU_GEN + help + This option enables working set reporting, separate backends + WIP. Currently only supports MGLRU. + source "mm/damon/Kconfig" endmenu diff --git a/mm/Makefile b/mm/Makefile index 8e105e5b3e293..12e2da5ba2d04 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -98,6 +98,7 @@ obj-$(CONFIG_DEVICE_MIGRATION) += migrate_device.o obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o khugepaged.o obj-$(CONFIG_PAGE_COUNTER) += page_counter.o obj-$(CONFIG_MEMCG) += memcontrol.o vmpressure.o +obj-$(CONFIG_WSR) += wsr.o ifdef CONFIG_SWAP obj-$(CONFIG_MEMCG) += swap_cgroup.o endif diff --git a/mm/internal.h b/mm/internal.h index bcf75a8b032de..88dba0b11f663 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -180,6 +180,7 @@ pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr, /* * in mm/vmscan.c: */ +struct scan_control; int isolate_lru_page(struct page *page); int folio_isolate_lru(struct folio *folio); void putback_lru_page(struct page *page); diff --git a/mm/mmzone.c b/mm/mmzone.c index 68e1511be12de..22a8282f67150 100644 --- a/mm/mmzone.c +++ b/mm/mmzone.c @@ -8,6 +8,7 @@ #include #include +#include #include struct pglist_data *first_online_pgdat(void) @@ -89,6 +90,8 @@ void lruvec_init(struct lruvec *lruvec) */ list_del(&lruvec->lists[LRU_UNEVICTABLE]); + wsr_init(lruvec); + lru_gen_init_lruvec(lruvec); } diff --git a/mm/vmscan.c b/mm/vmscan.c index 5b7b8d4f5297f..150e3cd70c65e 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -55,6 +55,7 @@ #include #include #include +#include #include #include @@ -5890,6 +5891,8 @@ static int __init init_lru_gen(void) if (sysfs_create_group(mm_kobj, &lru_gen_attr_group)) pr_err("lru_gen: failed to create sysfs group\n"); + wsr_register_node(NULL); + debugfs_create_file("lru_gen", 0644, NULL, NULL, &lru_gen_rw_fops); debugfs_create_file("lru_gen_full", 0444, NULL, NULL, &lru_gen_ro_fops); diff --git a/mm/wsr.c b/mm/wsr.c new file mode 100644 index 0000000000000..1e4c0ce69caf7 --- /dev/null +++ b/mm/wsr.c @@ -0,0 +1,288 @@ +// SPDX-License-Identifier: GPL-2.0 +// +#include + +#include +#include +#include +#include + +#include "internal.h" + +/* For now just embed wsr in the lruvec. + * Consider only allocating struct wsr when it's used + * since sizeof(struct wsr) is ~864 bytes. + */ +struct wsr *lruvec_wsr(struct lruvec *lruvec) +{ + return &lruvec->__wsr; +} + +void wsr_init(struct lruvec *lruvec) +{ + struct wsr *wsr = lruvec_wsr(lruvec); + + mutex_init(&wsr->bins_lock); + wsr->bins[0].idle_age = -1; +} + +void wsr_destroy(struct lruvec *lruvec) +{ + struct wsr *wsr = lruvec_wsr(lruvec); + + mutex_destroy(&wsr->bins_lock); + memset(wsr, 0, sizeof(*wsr)); +} + +ssize_t wsr_intervals_ms_parse(char *src, struct ws_bin *bins) +{ + int err, i = 0; + char *cur, *next = strim(src); + + while ((cur = strsep(&next, ","))) { + unsigned int msecs; + + err = kstrtouint(cur, 0, &msecs); + if (err) + return err; + + bins[i].idle_age = msecs_to_jiffies(msecs); + if (i > 0 && bins[i].idle_age <= bins[i - 1].idle_age) + return -EINVAL; + + if (++i == MAX_NR_BINS) + return -ERANGE; + } + + if (i && i < MIN_NR_BINS - 1) + return -ERANGE; + + bins[i].idle_age = -1; + return 0; +} + +static void collect_wsr(struct wsr *wsr, const struct lruvec *lruvec) +{ + int gen, type, zone; + const struct lru_gen_struct *lrugen = &lruvec->lrugen; + unsigned long curr_timestamp = jiffies; + unsigned long max_seq = READ_ONCE((lruvec)->lrugen.max_seq); + unsigned long min_seq[ANON_AND_FILE] = { + READ_ONCE(lruvec->lrugen.min_seq[LRU_GEN_ANON]), + READ_ONCE(lruvec->lrugen.min_seq[LRU_GEN_FILE]), + }; + + for (type = 0; type < ANON_AND_FILE; type++) { + unsigned long seq; + // TODO update bins hierarchically + struct ws_bin *bin = wsr->bins; + + lockdep_assert_held(&wsr->bins_lock); + for (seq = max_seq; seq + 1 > min_seq[type]; seq--) { + unsigned long birth, gen_start = curr_timestamp, error, size = 0; + + gen = lru_gen_from_seq(seq); + + for (zone = 0; zone < MAX_NR_ZONES; zone++) + size += max( + READ_ONCE(lrugen->nr_pages[gen][type] + [zone]), + 0L); + + birth = READ_ONCE(lruvec->lrugen.timestamps[gen]); + if (seq != max_seq) { + int next_gen = lru_gen_from_seq(seq + 1); + + gen_start = READ_ONCE( + lruvec->lrugen.timestamps[next_gen]); + } + + error = size; + /* gen exceeds the idle_age of bin */ + while (bin->idle_age != -1 && + time_before(birth + bin->idle_age, + curr_timestamp)) { + unsigned long proportion = + gen_start - + (curr_timestamp - bin->idle_age); + unsigned long gen_len = gen_start - birth; + + if (!gen_len) + break; + if (proportion) { + unsigned long split_bin = + size / gen_len * + proportion; + bin->nr_pages[type] += split_bin; + error -= split_bin; + } + gen_start = curr_timestamp - bin->idle_age; + bin++; + + } + bin->nr_pages[type] += error; + } + } +} + +static void refresh_wsr(struct wsr *wsr, struct mem_cgroup *root, + struct pglist_data *pgdat) +{ + struct ws_bin *bin; + struct mem_cgroup *memcg; + + lockdep_assert_held(&wsr->bins_lock); + VM_WARN_ON_ONCE(wsr->bins->idle_age == -1); + + for (bin = wsr->bins; bin->idle_age != -1; bin++) { + bin->nr_pages[0] = 0; + bin->nr_pages[1] = 0; + } + // the last used bin has idle_age == -1. + bin->nr_pages[0] = 0; + bin->nr_pages[1] = 0; + + memcg = mem_cgroup_iter(root, NULL, NULL); + do { + struct lruvec *lruvec = + mem_cgroup_lruvec(memcg, pgdat); + + collect_wsr(wsr, lruvec); + + cond_resched(); + } while ((memcg = mem_cgroup_iter(root, memcg, NULL))); +} +static struct pglist_data *kobj_to_pgdat(struct kobject *kobj) +{ + int nid = IS_ENABLED(CONFIG_NUMA) ? kobj_to_dev(kobj)->id : + first_memory_node; + + return NODE_DATA(nid); +} + +static struct wsr *kobj_to_wsr(struct kobject *kobj) +{ + return lruvec_wsr(mem_cgroup_lruvec(NULL, kobj_to_pgdat(kobj))); +} + +static ssize_t intervals_ms_show(struct kobject *kobj, struct kobj_attribute *attr, + char *buf) +{ + struct ws_bin *bin; + int len = 0; + struct wsr *wsr = kobj_to_wsr(kobj); + + mutex_lock(&wsr->bins_lock); + + for (bin = wsr->bins; bin->idle_age != -1; bin++) + len += sysfs_emit_at(buf, len, "%u,", jiffies_to_msecs(bin->idle_age)); + + len += sysfs_emit_at(buf, len, "%lld\n", LLONG_MAX); + + mutex_unlock(&wsr->bins_lock); + + return len; +} + +static ssize_t intervals_ms_store(struct kobject *kobj, struct kobj_attribute *attr, + const char *src, size_t len) +{ + char *buf; + struct ws_bin *bins; + int err = 0; + struct wsr *wsr = kobj_to_wsr(kobj); + + bins = kzalloc(sizeof(wsr->bins), GFP_KERNEL); + if (!bins) + return -ENOMEM; + + buf = kstrdup(src, GFP_KERNEL); + if (!buf) { + err = -ENOMEM; + goto failed; + } + + err = wsr_intervals_ms_parse(buf, bins); + if (err) + goto failed; + + mutex_lock(&wsr->bins_lock); + memcpy(wsr->bins, bins, sizeof(wsr->bins)); + mutex_unlock(&wsr->bins_lock); +failed: + kfree(buf); + kfree(bins); + + return err ?: len; +} + +static struct kobj_attribute intervals_ms_attr = __ATTR_RW(intervals_ms); + +static ssize_t histogram_show(struct kobject *kobj, struct kobj_attribute *attr, + char *buf) +{ + struct ws_bin *bin; + int len = 0; + struct wsr *wsr = kobj_to_wsr(kobj); + + mutex_lock(&wsr->bins_lock); + + refresh_wsr(wsr, NULL, kobj_to_pgdat(kobj)); + + for (bin = wsr->bins; bin->idle_age != -1; bin++) + len += sysfs_emit_at(buf, len, "%u anon=%lu file=%lu\n", + jiffies_to_msecs(bin->idle_age), bin->nr_pages[0], + bin->nr_pages[1]); + + len += sysfs_emit_at(buf, len, "%lld anon=%lu file=%lu\n", LLONG_MAX, + bin->nr_pages[0], bin->nr_pages[1]); + + mutex_unlock(&wsr->bins_lock); + + return len; +} + +static struct kobj_attribute histogram_attr = __ATTR_RO(histogram); + +static struct attribute *wsr_attrs[] = { + &intervals_ms_attr.attr, + &histogram_attr.attr, + NULL +}; + +static const struct attribute_group wsr_attr_group = { + .name = "wsr", + .attrs = wsr_attrs, +}; + +void wsr_register_node(struct node *node) +{ + struct kobject *kobj = node ? &node->dev.kobj : mm_kobj; + struct wsr *wsr; + + if (IS_ENABLED(CONFIG_NUMA) && !node) + return; + + wsr = kobj_to_wsr(kobj); + + /* wsr should be initialized when pgdat was initialized + * or when the root memcg was initialized + */ + if (sysfs_create_group(kobj, &wsr_attr_group)) { + pr_warn("WSR failed to created group"); + return; + } +} + +void wsr_unregister_node(struct node *node) +{ + struct kobject *kobj = &node->dev.kobj; + struct wsr *wsr; + + if (IS_ENABLED(CONFIG_NUMA) && !node) + return; + + wsr = kobj_to_wsr(kobj); + sysfs_remove_group(kobj, &wsr_attr_group); + wsr_destroy(mem_cgroup_lruvec(NULL, kobj_to_pgdat(kobj))); +} From patchwork Wed Jun 21 18:04:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanchu Xie X-Patchwork-Id: 111264 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp4567182vqr; Wed, 21 Jun 2023 11:41:16 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5NmmiKupO5yN/Fcz9m/0WBSfL08jWYQ+nYA0686TL1dCUEPA+qumrqnWZDVWfzyJ4yK5+/ X-Received: by 2002:a17:90a:ec14:b0:25b:e4dc:bd06 with SMTP id l20-20020a17090aec1400b0025be4dcbd06mr12416953pjy.18.1687372876199; Wed, 21 Jun 2023 11:41:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687372876; cv=none; d=google.com; s=arc-20160816; b=dr+Th1tC856R3szZZYqhu/SFn8bpWCoZIhWv+1SFMGgoG4pa8snrTI1fR9mfm+9L4d Y+2dBYkCi+n957OnaKVRX3LmWaXLWIIyVonPORybvIGQOA4QcQf97KW/nsUi0oTr3qCt OdcapjT7+1Bqn34igCWd5ywtiJuCtqzlyc2Sr0EnUDj53FGG9RNVsGVXHQ1qKevTubk3 U9tRxB+egxt5RpTQDpIF28JDRqgO3z7UA0XB3diQE3zCM69MHQiSJvOuZtMBnYQIrNEx 1LRFzdYSmlB62GP51Ui86bQOSPhm0wNVDYftV73Icq/MnjeAnQxYBOxiIZJbkZkvVIh2 nlHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=VxhcEKjfe7vCc52XMrwf8C6JOsMy8jrJM8NpygXtz0A=; b=uujLNzAGkkT6nn6LkarLQ5n2V+zawAARnVevVglWTJKYfAAdICLPmu3vUETeg7ABON 8dWW8V+NZ+/SSmu9HRUxrPEfvkWE8x1hhjT9Z777ENT8G+J2+/zHRJqx6ynGxo4MVoXd 2i6C16B8TW07urr/VDs6fVORfKhUO1wamDgiTCX/PmQBTzmwzx4GftrlD/aSZoa55cOK ko4nTP8u2DUdVPEA94+lSRXPOeMptLcAx3wGvVB3noNKFxJ+tLP0WgEYyCWPR91Vxs04 hLKXO+gKqiM86/FBlXr2zF5lFqaFBfij/SNbdqLlbNwLGRypC8dAYtoHRMS93NBZxBGW MBNg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=YrxzTYP2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id nm4-20020a17090b19c400b002402275fc56si4639130pjb.118.2023.06.21.11.41.02; Wed, 21 Jun 2023 11:41:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=YrxzTYP2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231517AbjFUSRS (ORCPT + 99 others); Wed, 21 Jun 2023 14:17:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45224 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231132AbjFUSRJ (ORCPT ); Wed, 21 Jun 2023 14:17:09 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A2E4E120 for ; Wed, 21 Jun 2023 11:17:07 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-bd69ee0edacso7912064276.3 for ; Wed, 21 Jun 2023 11:17:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687371427; x=1689963427; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=VxhcEKjfe7vCc52XMrwf8C6JOsMy8jrJM8NpygXtz0A=; b=YrxzTYP2Y/JUxuvH4119nDn3m669dvyeu1ioqqJFdsHRVZAj+ZA9EqTpltTHxG50zq qfOTrJc6zM1jH8E4+2c3lCHG6eE3+bnJMfrPN5dNe5p0C2a8FPj+SyHCvquxF7n79tcF VHmZC11UUpY3AyupbmMbK35VBgwGx2dy7P5HJ+del76VoLJE5CBkEqHVxN6YMZmEjiNW Nm/tK449uQplLimDFQlk1RDQFbICohvd+xoAZOn5/kUpZH7auwDHZzdUwPamBHfS4zje DibmTNCjMMTpWrlCnyh4PwuG3h/Y48tI7tHM9kAgnxNARzwXIjzj+7DB8CzJyW06rhtB RT1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687371427; x=1689963427; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=VxhcEKjfe7vCc52XMrwf8C6JOsMy8jrJM8NpygXtz0A=; b=A9+s6KL317dKQRA3/vWqCsvLV6/lqFJcKMVR70YdiJNHUvDVczQH7gj0lFmpQmFXsd 09GZ5+eqfhfGR5BkdrAoHW34oN+ughkM8FNjZ7AiUoUN8G7FnqE7gNuP9/VGzzcMkvyg F0l60paoQow1k9q0nrDHgXq61BRrd7qsfEjzkwdmprbMFQo5bSc4OgtoXp+A8oO6x8l5 3UsieKPvHaCoavNTQQImxvO7MTyXb2qjGPY5W13M6qbzrcQ01TE5MnZo9wj3usoMLYGR xkn9XbzTbdReGAPCoxzUMB2Qt4srXcRoiWZI7La5YVFJa++o/BDOvDR4Il2pNcc53wSv DO3g== X-Gm-Message-State: AC+VfDwSTXZJvZxyQfyPPuWLS8qIJGe1awFdHwKpIYCrSKPK6HeZ7Z2D Qp+/asJlaBIAgiyj7Yhc/GYujbwMlP8F X-Received: from yuanchu.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:4024]) (user=yuanchu job=sendgmr) by 2002:a25:7710:0:b0:be4:aab4:789b with SMTP id s16-20020a257710000000b00be4aab4789bmr6836965ybc.8.1687371426885; Wed, 21 Jun 2023 11:17:06 -0700 (PDT) Date: Wed, 21 Jun 2023 18:04:50 +0000 In-Reply-To: <20230621180454.973862-1-yuanchu@google.com> Mime-Version: 1.0 References: <20230621180454.973862-1-yuanchu@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Message-ID: <20230621180454.973862-3-yuanchu@google.com> Subject: [RFC PATCH v2 2/6] mm: add working set refresh threshold to rate-limit aggregation From: Yuanchu Xie To: Greg Kroah-Hartman , "Rafael J . Wysocki" , "Michael S . Tsirkin" , David Hildenbrand , Jason Wang , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Yu Zhao , Kefeng Wang , Kairui Song , Yosry Ahmed , Yuanchu Xie , "T . J . Alumbaugh" Cc: Wei Xu , SeongJae Park , Sudarshan Rajagopalan , kai.huang@intel.com, hch@lst.de, jon@nutanix.com, Aneesh Kumar K V , Matthew Wilcox , Vasily Averin , linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-mm@kvack.org, cgroups@vger.kernel.org X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769338701020738700?= X-GMAIL-MSGID: =?utf-8?q?1769338701020738700?= Refresh threshold is a rate limiting factor to working set histogram reads. When a working set report is generated, a timestamp is noted, and the same report will be read until it expires beyond the refresh threshold, at which point a new report is generated. Signed-off-by: T.J. Alumbaugh Signed-off-by: Yuanchu Xie --- include/linux/mmzone.h | 1 + include/linux/wsr.h | 3 +++ mm/internal.h | 11 +++++++++ mm/vmscan.c | 39 +++++++++++++++++++++++++++++-- mm/wsr.c | 52 +++++++++++++++++++++++++++++++++++++++--- 5 files changed, 101 insertions(+), 5 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 96f0d8f3584e4..bca828a16a46b 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -362,6 +362,7 @@ enum lruvec_flags { #ifndef __GENERATING_BOUNDS_H +struct node; struct lruvec; struct page_vma_mapped_walk; diff --git a/include/linux/wsr.h b/include/linux/wsr.h index fa46b4d61177d..a86105468c710 100644 --- a/include/linux/wsr.h +++ b/include/linux/wsr.h @@ -26,6 +26,8 @@ struct ws_bin { struct wsr { /* protects bins */ struct mutex bins_lock; + unsigned long timestamp; + unsigned long refresh_threshold; struct ws_bin bins[MAX_NR_BINS]; }; @@ -40,6 +42,7 @@ ssize_t wsr_intervals_ms_parse(char *src, struct ws_bin *bins); /* * wsr->bins needs to be locked + * refreshes wsr based on the refresh threshold */ void wsr_refresh(struct wsr *wsr, struct mem_cgroup *root, struct pglist_data *pgdat); diff --git a/mm/internal.h b/mm/internal.h index 88dba0b11f663..ce4757e7f8277 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -186,6 +186,17 @@ int folio_isolate_lru(struct folio *folio); void putback_lru_page(struct page *page); void folio_putback_lru(struct folio *folio); extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason); +int get_swappiness(struct lruvec *lruvec, struct scan_control *sc); +bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq, + struct scan_control *sc, bool can_swap, + bool force_scan); + +/* + * in mm/wsr.c + */ +void refresh_wsr(struct wsr *wsr, struct mem_cgroup *root, + struct pglist_data *pgdat, struct scan_control *sc, + unsigned long refresh_threshold); /* * in mm/rmap.c: diff --git a/mm/vmscan.c b/mm/vmscan.c index 150e3cd70c65e..66c5df2a7f65b 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3201,7 +3201,7 @@ static struct lruvec *get_lruvec(struct mem_cgroup *memcg, int nid) return &pgdat->__lruvec; } -static int get_swappiness(struct lruvec *lruvec, struct scan_control *sc) +int get_swappiness(struct lruvec *lruvec, struct scan_control *sc) { struct mem_cgroup *memcg = lruvec_memcg(lruvec); struct pglist_data *pgdat = lruvec_pgdat(lruvec); @@ -4402,7 +4402,7 @@ static void inc_max_seq(struct lruvec *lruvec, bool can_swap, bool force_scan) spin_unlock_irq(&lruvec->lru_lock); } -static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq, +bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq, struct scan_control *sc, bool can_swap, bool force_scan) { bool success; @@ -5900,6 +5900,41 @@ static int __init init_lru_gen(void) }; late_initcall(init_lru_gen); +/****************************************************************************** + * working set reporting + ******************************************************************************/ + +#ifdef CONFIG_WSR +void wsr_refresh(struct wsr *wsr, struct mem_cgroup *root, + struct pglist_data *pgdat) +{ + unsigned int flags; + struct scan_control sc = { + .may_writepage = true, + .may_unmap = true, + .may_swap = true, + .reclaim_idx = MAX_NR_ZONES - 1, + .gfp_mask = GFP_KERNEL, + }; + + lockdep_assert_held(&wsr->bins_lock); + + if (wsr->bins->idle_age != -1) { + unsigned long timestamp = READ_ONCE(wsr->timestamp); + unsigned long threshold = READ_ONCE(wsr->refresh_threshold); + + if (time_is_before_jiffies(timestamp + threshold)) { + set_task_reclaim_state(current, &sc.reclaim_state); + flags = memalloc_noreclaim_save(); + refresh_wsr(wsr, root, pgdat, &sc, threshold); + memalloc_noreclaim_restore(flags); + set_task_reclaim_state(current, NULL); + } + } +} + +#endif /* CONFIG_WSR */ + #else /* !CONFIG_LRU_GEN */ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) diff --git a/mm/wsr.c b/mm/wsr.c index 1e4c0ce69caf7..ee295d164461e 100644 --- a/mm/wsr.c +++ b/mm/wsr.c @@ -125,8 +125,9 @@ static void collect_wsr(struct wsr *wsr, const struct lruvec *lruvec) } } -static void refresh_wsr(struct wsr *wsr, struct mem_cgroup *root, - struct pglist_data *pgdat) +void refresh_wsr(struct wsr *wsr, struct mem_cgroup *root, + struct pglist_data *pgdat, struct scan_control *sc, + unsigned long refresh_threshold) { struct ws_bin *bin; struct mem_cgroup *memcg; @@ -146,6 +147,24 @@ static void refresh_wsr(struct wsr *wsr, struct mem_cgroup *root, do { struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat); + bool can_swap = get_swappiness(lruvec, sc); + unsigned long max_seq = READ_ONCE((lruvec)->lrugen.max_seq); + unsigned long min_seq[ANON_AND_FILE] = { + READ_ONCE(lruvec->lrugen.min_seq[LRU_GEN_ANON]), + READ_ONCE(lruvec->lrugen.min_seq[LRU_GEN_FILE]), + }; + + mem_cgroup_calculate_protection(root, memcg); + if (!mem_cgroup_below_min(root, memcg) && refresh_threshold && + min_seq[!can_swap] + MAX_NR_GENS - 1 > max_seq) { + int gen = lru_gen_from_seq(max_seq); + unsigned long birth = + READ_ONCE(lruvec->lrugen.timestamps[gen]); + + if (time_is_before_jiffies(birth + refresh_threshold)) + try_to_inc_max_seq(lruvec, max_seq, sc, + can_swap, false); + } collect_wsr(wsr, lruvec); @@ -165,6 +184,32 @@ static struct wsr *kobj_to_wsr(struct kobject *kobj) return lruvec_wsr(mem_cgroup_lruvec(NULL, kobj_to_pgdat(kobj))); } + +static ssize_t refresh_ms_show(struct kobject *kobj, struct kobj_attribute *attr, + char *buf) +{ + struct wsr *wsr = kobj_to_wsr(kobj); + unsigned long threshold = READ_ONCE(wsr->refresh_threshold); + + return sysfs_emit(buf, "%u\n", jiffies_to_msecs(threshold)); +} + +static ssize_t refresh_ms_store(struct kobject *kobj, struct kobj_attribute *attr, + const char *buf, size_t len) +{ + unsigned int msecs; + struct wsr *wsr = kobj_to_wsr(kobj); + + if (kstrtouint(buf, 0, &msecs)) + return -EINVAL; + + WRITE_ONCE(wsr->refresh_threshold, msecs_to_jiffies(msecs)); + + return len; +} + +static struct kobj_attribute refresh_ms_attr = __ATTR_RW(refresh_ms); + static ssize_t intervals_ms_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { @@ -227,7 +272,7 @@ static ssize_t histogram_show(struct kobject *kobj, struct kobj_attribute *attr, mutex_lock(&wsr->bins_lock); - refresh_wsr(wsr, NULL, kobj_to_pgdat(kobj)); + wsr_refresh(wsr, NULL, kobj_to_pgdat(kobj)); for (bin = wsr->bins; bin->idle_age != -1; bin++) len += sysfs_emit_at(buf, len, "%u anon=%lu file=%lu\n", @@ -245,6 +290,7 @@ static ssize_t histogram_show(struct kobject *kobj, struct kobj_attribute *attr, static struct kobj_attribute histogram_attr = __ATTR_RO(histogram); static struct attribute *wsr_attrs[] = { + &refresh_ms_attr.attr, &intervals_ms_attr.attr, &histogram_attr.attr, NULL From patchwork Wed Jun 21 18:04:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanchu Xie X-Patchwork-Id: 111257 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp4563802vqr; Wed, 21 Jun 2023 11:33:47 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6zjMsmwuszqPJ+FHx47BH3yn1eIgh2GaYwS8eyhrFa8rKjsepLvIzFTdlbHyMlDkbBYpkl X-Received: by 2002:a9d:5f12:0:b0:6af:9f36:fae4 with SMTP id f18-20020a9d5f12000000b006af9f36fae4mr13073712oti.10.1687372426734; Wed, 21 Jun 2023 11:33:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687372426; cv=none; d=google.com; s=arc-20160816; b=LNx7e0hpkDk+15IggTu7mlNO++H5/05gwuYPXa8YLkoJOaJIneEIyP3Q7EO/jbRae1 3WgCZZJnFVikCQ82jan2tSIZ40may4sKNX+yhaE22zfOQtW9O41M7nUARnog4mi+cl1V LKt2x47+zBArkLXDpL8XJOMWe/VQP+HWyzevasETbHiY5s+9hKjpfqez1ToFlXPwb/Ai nZAHnPSAWp9N1g35ro81yKgb3cUFurmTCD4SvyBr/+8pzv2fO3N2GFYjfrJrQgZnAzWc o688bG8stvo7ivxQ7LsvElLNW0fWhBI8KDRe4cyXknoiGao/pHtxI9ALSJUYoLGU4P3F 0Dlg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=3JGsVX34HhPuQXT4BLiLEOz1NiuPLfb2cVh6HwJxK4o=; b=bbJ4HvqAYddkWS+LFvSsM00jDTv6kuCd0S1alxP5dTnDBmzA5pyyJAGeX2tpZBK0NN 8LrHmD4W3DyhH/ZrVRvLr3f2CnxTUEEHqFeUyUAOxY2/XoDcltTdH5oRNyeE4BF1vTn+ /Ymma3pGrU5z1HwpuFzqikcw+Gp+89/5BIGBLHL1gd8p9HMycAJS3FSwtL7B20LbtG6Z k6vsmOg8vEzpIAMtl9J/zUG31mg5qG25O1LBlRgvnYDBGY5rR9mrL2YBDbpoEB7krbOa ntHLWCQWznOKrAI4S3ukF/me+1C8np7alkJEcw8L1JSv2T0gSW93WLvohEYMWhV7TWEg x+IA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=70Qxbud8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s5-20020a63af45000000b0052868a865d4si4454854pgo.553.2023.06.21.11.33.33; Wed, 21 Jun 2023 11:33:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=70Qxbud8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231661AbjFUSRX (ORCPT + 99 others); Wed, 21 Jun 2023 14:17:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45190 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231500AbjFUSRK (ORCPT ); Wed, 21 Jun 2023 14:17:10 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C6B81735 for ; Wed, 21 Jun 2023 11:17:09 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-bb2a7308f21so8137481276.2 for ; Wed, 21 Jun 2023 11:17:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687371428; x=1689963428; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=3JGsVX34HhPuQXT4BLiLEOz1NiuPLfb2cVh6HwJxK4o=; b=70Qxbud8HFNvNvLOmD566LbrpApTrbrxWsMSM6HF6QmoZGg/FBnAyX88slUMpumiBi diNbZcJERvs0V2VffNaDCVaHMJOzaVrBEkAe5EZ1O6A/hh0O9H5aZf8H2BB6Mlok5pwi Qp29oWTYIUZtuSy9RX7gPmQXkCw4QFn5nNaIbxnoUmOxwFKIDY0cBb8w/oEdrOysTyGI QKa8UsiA1/jCdjbbwtNfphxGUkYaFoXgqKIEitSaRhd8zSW01LwtJliel8Hudvgi30AS X/k6OhQGaUKb6QuOPuutfW8riQaepGTB5XzW87kgVWm7FTNzPsJL/a16RJ7U5XR1IOzX 77wA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687371428; x=1689963428; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3JGsVX34HhPuQXT4BLiLEOz1NiuPLfb2cVh6HwJxK4o=; b=ecty8j3hb1I8QIMJEmMuwY8AUe2jfCUvjbp6jGJ4N91lQIhQOYHJkbDPTalmzF1Mlx zq3eF9pP/Gix5DUYwib/EPctyKbxSyhgF5XT/fRBU3v/e7SJVm6tfZ2y8T03pDPOtcVX 8c8UALH4m0IngGFoyvGT5SsJ/HA1cVtWkKV1dOlc1KqiFqU7M5ieGbTNVH9hvhgeQE12 9mAF4R/JVDUC/uE0X+XkrhqVwWl2u1SjjsMgLkzF3GKJ44VizJyJ4Z41fD7lRuKi+giK T/4FQnbGJZ6p0VnWJ/+gnoTN279SXv/hYjuar4IPL+lX3hLYESb3NEBRKT2m5mtYB9PX 53FA== X-Gm-Message-State: AC+VfDw3HNF1k6zH9Z0XOJV8aOXKKno12ciUDfGIO6t3AVMyxP3TjUfR vL7k5UufJUdAvfoDzkIOJkN8WYWvZtrn X-Received: from yuanchu.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:4024]) (user=yuanchu job=sendgmr) by 2002:a25:ab28:0:b0:bc5:2869:d735 with SMTP id u37-20020a25ab28000000b00bc52869d735mr6204196ybi.13.1687371428609; Wed, 21 Jun 2023 11:17:08 -0700 (PDT) Date: Wed, 21 Jun 2023 18:04:51 +0000 In-Reply-To: <20230621180454.973862-1-yuanchu@google.com> Mime-Version: 1.0 References: <20230621180454.973862-1-yuanchu@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Message-ID: <20230621180454.973862-4-yuanchu@google.com> Subject: [RFC PATCH v2 3/6] mm: report working set when under memory pressure From: Yuanchu Xie To: Greg Kroah-Hartman , "Rafael J . Wysocki" , "Michael S . Tsirkin" , David Hildenbrand , Jason Wang , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Yu Zhao , Kefeng Wang , Kairui Song , Yosry Ahmed , Yuanchu Xie , "T . J . Alumbaugh" Cc: Wei Xu , SeongJae Park , Sudarshan Rajagopalan , kai.huang@intel.com, hch@lst.de, jon@nutanix.com, Aneesh Kumar K V , Matthew Wilcox , Vasily Averin , linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-mm@kvack.org, cgroups@vger.kernel.org X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769338229285704995?= X-GMAIL-MSGID: =?utf-8?q?1769338229285704995?= When a system is under memory pressure and kswapd kicks in, a working set report is produced. The userspace program polling on the histogram file is notified of the new report. The report threshold acts as a rate-limiting mechanism to prevent the system from generating reports too frequently. Signed-off-by: T.J. Alumbaugh Signed-off-by: Yuanchu Xie --- include/linux/wsr.h | 2 ++ mm/vmscan.c | 37 +++++++++++++++++++++++++++++++++++++ mm/wsr.c | 29 +++++++++++++++++++++++++++++ 3 files changed, 68 insertions(+) diff --git a/include/linux/wsr.h b/include/linux/wsr.h index a86105468c710..85c901ce026b9 100644 --- a/include/linux/wsr.h +++ b/include/linux/wsr.h @@ -26,7 +26,9 @@ struct ws_bin { struct wsr { /* protects bins */ struct mutex bins_lock; + struct kernfs_node *notifier; unsigned long timestamp; + unsigned long report_threshold; unsigned long refresh_threshold; struct ws_bin bins[MAX_NR_BINS]; }; diff --git a/mm/vmscan.c b/mm/vmscan.c index 66c5df2a7f65b..c56fddcec88fb 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4559,6 +4559,8 @@ static bool age_lruvec(struct lruvec *lruvec, struct scan_control *sc, unsigned return true; } +static void report_ws(struct pglist_data *pgdat, struct scan_control *sc); + /* to protect the working set of the last N jiffies */ static unsigned long lru_gen_min_ttl __read_mostly; @@ -4570,6 +4572,8 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) VM_WARN_ON_ONCE(!current_is_kswapd()); + report_ws(pgdat, sc); + sc->last_reclaimed = sc->nr_reclaimed; /* @@ -5933,6 +5937,39 @@ void wsr_refresh(struct wsr *wsr, struct mem_cgroup *root, } } +static void report_ws(struct pglist_data *pgdat, struct scan_control *sc) +{ + static DEFINE_RATELIMIT_STATE(rate, HZ, 3); + + struct mem_cgroup *memcg = sc->target_mem_cgroup; + struct wsr *wsr = lruvec_wsr(mem_cgroup_lruvec(memcg, pgdat)); + unsigned long threshold; + + threshold = READ_ONCE(wsr->report_threshold); + + if (sc->priority == DEF_PRIORITY) + return; + + if (READ_ONCE(wsr->bins->idle_age) == -1) + return; + + if (!threshold || time_is_after_jiffies(wsr->timestamp + threshold)) + return; + + if (!__ratelimit(&rate)) + return; + + if (!mutex_trylock(&wsr->bins_lock)) + return; + + refresh_wsr(wsr, memcg, pgdat, sc, 0); + WRITE_ONCE(wsr->timestamp, jiffies); + + mutex_unlock(&wsr->bins_lock); + + if (wsr->notifier) + kernfs_notify(wsr->notifier); +} #endif /* CONFIG_WSR */ #else /* !CONFIG_LRU_GEN */ diff --git a/mm/wsr.c b/mm/wsr.c index ee295d164461e..cd045ade5e9ba 100644 --- a/mm/wsr.c +++ b/mm/wsr.c @@ -24,6 +24,7 @@ void wsr_init(struct lruvec *lruvec) mutex_init(&wsr->bins_lock); wsr->bins[0].idle_age = -1; + wsr->notifier = NULL; } void wsr_destroy(struct lruvec *lruvec) @@ -184,6 +185,30 @@ static struct wsr *kobj_to_wsr(struct kobject *kobj) return lruvec_wsr(mem_cgroup_lruvec(NULL, kobj_to_pgdat(kobj))); } +static ssize_t report_ms_show(struct kobject *kobj, struct kobj_attribute *attr, + char *buf) +{ + struct wsr *wsr = kobj_to_wsr(kobj); + unsigned long threshold = READ_ONCE(wsr->report_threshold); + + return sysfs_emit(buf, "%u\n", jiffies_to_msecs(threshold)); +} + +static ssize_t report_ms_store(struct kobject *kobj, struct kobj_attribute *attr, + const char *buf, size_t len) +{ + unsigned int msecs; + struct wsr *wsr = kobj_to_wsr(kobj); + + if (kstrtouint(buf, 0, &msecs)) + return -EINVAL; + + WRITE_ONCE(wsr->report_threshold, msecs_to_jiffies(msecs)); + + return len; +} + +static struct kobj_attribute report_ms_attr = __ATTR_RW(report_ms); static ssize_t refresh_ms_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) @@ -290,6 +315,7 @@ static ssize_t histogram_show(struct kobject *kobj, struct kobj_attribute *attr, static struct kobj_attribute histogram_attr = __ATTR_RO(histogram); static struct attribute *wsr_attrs[] = { + &report_ms_attr.attr, &refresh_ms_attr.attr, &intervals_ms_attr.attr, &histogram_attr.attr, @@ -318,6 +344,8 @@ void wsr_register_node(struct node *node) pr_warn("WSR failed to created group"); return; } + + wsr->notifier = kernfs_walk_and_get(kobj->sd, "wsr/histogram"); } void wsr_unregister_node(struct node *node) @@ -329,6 +357,7 @@ void wsr_unregister_node(struct node *node) return; wsr = kobj_to_wsr(kobj); + kernfs_put(wsr->notifier); sysfs_remove_group(kobj, &wsr_attr_group); wsr_destroy(mem_cgroup_lruvec(NULL, kobj_to_pgdat(kobj))); } From patchwork Wed Jun 21 18:04:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanchu Xie X-Patchwork-Id: 111269 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp4571904vqr; Wed, 21 Jun 2023 11:51:06 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4GJIdGLKy4rzvz9U3oeI1ZZh/Md4LoRpPyC/QRZoKFikzq0By1JXs/QU81aHkU+0jhnJVI X-Received: by 2002:a17:902:900a:b0:1b5:39f8:b22 with SMTP id a10-20020a170902900a00b001b539f80b22mr7835589plp.33.1687373466318; Wed, 21 Jun 2023 11:51:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687373466; cv=none; d=google.com; s=arc-20160816; b=WGWHDZkZB6bwf8rHzY3g1/nFYKsQBMBC0cMSXwCxD3O7ml8LkDl9OUmDt6xf7MN9IA r/G/+BN73ImnonzlBhS48S+owfvYYmpu/zZc3EEYmksOMsI1pOGK0GQvY2CiZpnsfpCw TsLnJBAII0/5iT4Gu22qMRUDw6Ic+FwyhzvrKtXhjFk+HBUiL7AhA0uQXnD88vf/2GbY Z1bJci3VYrf3rUf1R21x2tzfb8c85T2oWVbBEn9aZ6WU+4s5yCftrgyF4vijAfVwenQb o35geLbgNt81KvoPtAJK7zH3PNegvcaTCc5M3AuXjCJCX+JP/xFNltuz3/CGLf3WdWvA 8ifQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=P8vYzFHxLWdhm/YYBuM4cir+l3MkexuRd1yB7xK/qZo=; b=FzW3r/FYAuRkWdsQ+PxXzwCU1Ab8jQWz+dmtwZ3kiJri29Si4rzHhLY0zlRSQ4MBA1 WtB8OH9r4pLNVNWjF6YoTxdb3Po2g99S9hXMlcbNwoNdv5iab/cbJXEp1JTIyz4khoIW nIaPlBVVdFULNQiRjPfXQ01c78y7XKX3FC8Hd6+z4+HVx/67owMkbkquIVA/iXwZVNAX zCYU1HC+4PlPQq/DBC0Hqm9Zn3Sx39WwtaZo423sG0f15bmBKh0hrlZ7cEw03LBKZ8uM DDm7Q78cOzhZ3bJqSq5cUX0Vpt2xsWINRfvCLqk5+bIsjty8DGgRyyGxAKmd7EBTPVct QsaQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=TwWoi+pZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q15-20020a17090311cf00b001b55fd24d91si5226275plh.591.2023.06.21.11.50.46; Wed, 21 Jun 2023 11:51:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=TwWoi+pZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229746AbjFUSR1 (ORCPT + 99 others); Wed, 21 Jun 2023 14:17:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45276 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231562AbjFUSRM (ORCPT ); Wed, 21 Jun 2023 14:17:12 -0400 Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1176319A2 for ; Wed, 21 Jun 2023 11:17:11 -0700 (PDT) Received: by mail-pj1-x104a.google.com with SMTP id 98e67ed59e1d1-25eccff0a51so3839298a91.0 for ; Wed, 21 Jun 2023 11:17:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687371430; x=1689963430; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=P8vYzFHxLWdhm/YYBuM4cir+l3MkexuRd1yB7xK/qZo=; b=TwWoi+pZl5v8X1NfXYAXDFTjFQRH0gRf8kcFXat6SAjcaa5QAAlB1haHO0q77RAinF D7xdtDa76dKs0GBJ34FBnPIsbh4xyimY5wrW1f6snuskuEcZJCOzknkqQ824BwEme3MM E6lxh0I4tvcYJRHOS1/KATWGHFctdscLKfObeEhRzcu1IiMQFS9ynJo1hdgECo0P8Gpt Xor2kFxaCjpsZ3henaaUXtkntEKa7iivsPZyOrHfOsxsId1bV6FY8Ohj1/LpkWsG+GdH d7oCOnaC7hgluf1VkKFS52Jq6dvfFstI19Xby8uEeRklXtHAMo72yh/opONH6YPZCAbt StMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687371430; x=1689963430; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=P8vYzFHxLWdhm/YYBuM4cir+l3MkexuRd1yB7xK/qZo=; b=dMR9ebuhYT7nBpULJyAlqdHSqj3+XYTozxyly/GtbmNU1d16GW4FinyH5XHFe26GMA iXl58N8pVVEsurCMxyXRHIMI1eLL2xFwXBKoqAJErj+49e706NKN0JrnnVE0Sd0iIuJD ID/SWziIHjo6O0X68f4rO0DrsqMw1zMV8CmWmtyq+uaXYrD+hSd3UyHpMzIwlw7iQq+B fkIKsLp02ZdxliqiEJrXvofWd+13oZcJ2DhjI8gXkd+ankHlY7hF+1f6yF2xVzGP9FeO uILbKcy8MrbY51sN8jS2yydAmkHJC+DnO31Hdt3m3EsPbK+H6skk8o5aTEoeOFiw2fCd Y6qw== X-Gm-Message-State: AC+VfDyuFNqmbppcfw5hjtAK/zjNx8HgYIiJa68Zl4dxw3t31AuAZpG0 Xkj+zUQBdFG5SriSS0A8p94rETpX11NM X-Received: from yuanchu.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:4024]) (user=yuanchu job=sendgmr) by 2002:a17:90a:cb06:b0:25b:f973:f944 with SMTP id z6-20020a17090acb0600b0025bf973f944mr2138645pjt.3.1687371430482; Wed, 21 Jun 2023 11:17:10 -0700 (PDT) Date: Wed, 21 Jun 2023 18:04:52 +0000 In-Reply-To: <20230621180454.973862-1-yuanchu@google.com> Mime-Version: 1.0 References: <20230621180454.973862-1-yuanchu@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Message-ID: <20230621180454.973862-5-yuanchu@google.com> Subject: [RFC PATCH v2 4/6] mm: extend working set reporting to memcgs From: Yuanchu Xie To: Greg Kroah-Hartman , "Rafael J . Wysocki" , "Michael S . Tsirkin" , David Hildenbrand , Jason Wang , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Yu Zhao , Kefeng Wang , Kairui Song , Yosry Ahmed , Yuanchu Xie , "T . J . Alumbaugh" Cc: Wei Xu , SeongJae Park , Sudarshan Rajagopalan , kai.huang@intel.com, hch@lst.de, jon@nutanix.com, Aneesh Kumar K V , Matthew Wilcox , Vasily Averin , linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-mm@kvack.org, cgroups@vger.kernel.org X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769339319647731749?= X-GMAIL-MSGID: =?utf-8?q?1769339319647731749?= Break down the system-wide working set reporting into per-memcg reports, which aggregages its children hierarchically. The per-node working set reporting histograms and refresh/report threshold files are presented as memcg files, showing a report containing all the nodes. Signed-off-by: T.J. Alumbaugh Signed-off-by: Yuanchu Xie --- include/linux/memcontrol.h | 6 + include/linux/wsr.h | 4 + mm/memcontrol.c | 262 ++++++++++++++++++++++++++++++++++++- mm/vmscan.c | 9 +- 4 files changed, 277 insertions(+), 4 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 85dc9b88ea379..96971aa6a48cd 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -10,6 +10,7 @@ #ifndef _LINUX_MEMCONTROL_H #define _LINUX_MEMCONTROL_H +#include #include #include #include @@ -325,6 +326,11 @@ struct mem_cgroup { struct lru_gen_mm_list mm_list; #endif +#ifdef CONFIG_WSR + int wsr_event; + wait_queue_head_t wsr_wait_queue; +#endif + struct mem_cgroup_per_node *nodeinfo[]; }; diff --git a/include/linux/wsr.h b/include/linux/wsr.h index 85c901ce026b9..d45f7cc0672ac 100644 --- a/include/linux/wsr.h +++ b/include/linux/wsr.h @@ -48,6 +48,7 @@ ssize_t wsr_intervals_ms_parse(char *src, struct ws_bin *bins); */ void wsr_refresh(struct wsr *wsr, struct mem_cgroup *root, struct pglist_data *pgdat); +void report_ws(struct pglist_data *pgdat, struct scan_control *sc); #else struct ws_bin; struct wsr; @@ -73,6 +74,9 @@ static inline void wsr_refresh(struct wsr *wsr, struct mem_cgroup *root, struct pglist_data *pgdat) { } +static inline void report_ws(struct pglist_data *pgdat, struct scan_control *sc) +{ +} #endif /* CONFIG_WSR */ #endif /* _LINUX_WSR_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2eee092f8f119..edf5bb31bb19c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -25,6 +25,7 @@ * Copyright (C) 2020 Alibaba, Inc, Alex Shi */ +#include #include #include #include @@ -65,6 +66,7 @@ #include #include "internal.h" #include +#include #include #include "slab.h" #include "swap.h" @@ -5233,6 +5235,7 @@ static void free_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node) if (!pn) return; + wsr_destroy(&pn->lruvec); free_percpu(pn->lruvec_stats_percpu); kfree(pn); } @@ -5311,6 +5314,10 @@ static struct mem_cgroup *mem_cgroup_alloc(void) spin_lock_init(&memcg->deferred_split_queue.split_queue_lock); INIT_LIST_HEAD(&memcg->deferred_split_queue.split_queue); memcg->deferred_split_queue.split_queue_len = 0; +#endif +#ifdef CONFIG_WSR + memcg->wsr_event = 0; + init_waitqueue_head(&memcg->wsr_wait_queue); #endif idr_replace(&mem_cgroup_idr, memcg, memcg->id.id); lru_gen_init_memcg(memcg); @@ -5411,6 +5418,11 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css) } spin_unlock_irq(&memcg->event_list_lock); +#ifdef CONFIG_WSR + wake_up_pollfree(&memcg->wsr_wait_queue); + synchronize_rcu(); +#endif + page_counter_set_min(&memcg->memory, 0); page_counter_set_low(&memcg->memory, 0); @@ -6642,6 +6654,228 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, return nbytes; } +#ifdef CONFIG_WSR +static int memory_wsr_intervals_ms_show(struct seq_file *m, void *v) +{ + int nid; + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + + for_each_node_state(nid, N_MEMORY) { + struct wsr *wsr; + struct ws_bin *bin; + + wsr = lruvec_wsr(mem_cgroup_lruvec(memcg, NODE_DATA(nid))); + mutex_lock(&wsr->bins_lock); + seq_printf(m, "N%d=", nid); + for (bin = wsr->bins; bin->idle_age != -1; bin++) + seq_printf(m, "%u,", jiffies_to_msecs(bin->idle_age)); + mutex_unlock(&wsr->bins_lock); + + seq_printf(m, "%lld ", LLONG_MAX); + } + seq_putc(m, '\n'); + + return 0; +} + +static ssize_t memory_wsr_intervals_ms_parse(struct kernfs_open_file *of, + char *buf, size_t nbytes, + unsigned int *nid_out, + struct ws_bin *bins) +{ + char *node, *intervals; + unsigned int nid; + int err; + + buf = strstrip(buf); + intervals = buf; + node = strsep(&intervals, "="); + + if (*node != 'N') + return -EINVAL; + + err = kstrtouint(node + 1, 0, &nid); + if (err) + return err; + + if (nid >= nr_node_ids || !node_state(nid, N_MEMORY)) + return -EINVAL; + + err = wsr_intervals_ms_parse(intervals, bins); + if (err) + return err; + + *nid_out = nid; + return 0; +} + +static ssize_t memory_wsr_intervals_ms_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, + loff_t off) +{ + unsigned int nid; + int err; + struct wsr *wsr; + struct ws_bin *bins; + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + + bins = kzalloc(sizeof(wsr->bins), GFP_KERNEL); + if (!bins) + return -ENOMEM; + + err = memory_wsr_intervals_ms_parse(of, buf, nbytes, &nid, bins); + if (err) + goto failed; + + wsr = lruvec_wsr(mem_cgroup_lruvec(memcg, NODE_DATA(nid))); + mutex_lock(&wsr->bins_lock); + memcpy(wsr->bins, bins, sizeof(wsr->bins)); + mutex_unlock(&wsr->bins_lock); +failed: + kfree(bins); + return err ?: nbytes; +} + +static int memory_wsr_refresh_ms_show(struct seq_file *m, void *v) +{ + int nid; + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + + for_each_node_state(nid, N_MEMORY) { + struct wsr *wsr = + lruvec_wsr(mem_cgroup_lruvec(memcg, NODE_DATA(nid))); + + seq_printf(m, "N%d=%u ", nid, + jiffies_to_msecs(READ_ONCE(wsr->refresh_threshold))); + } + seq_putc(m, '\n'); + + return 0; +} + +static ssize_t memory_wsr_threshold_parse(char *buf, size_t nbytes, + unsigned int *nid_out, + unsigned int *msecs) +{ + char *node, *threshold; + unsigned int nid; + int err; + + buf = strstrip(buf); + threshold = buf; + node = strsep(&threshold, "="); + + if (*node != 'N') + return -EINVAL; + + err = kstrtouint(node + 1, 0, &nid); + if (err) + return err; + + if (nid >= nr_node_ids || !node_state(nid, N_MEMORY)) + return -EINVAL; + + err = kstrtouint(threshold, 0, msecs); + if (err) + return err; + + *nid_out = nid; + + return nbytes; +} + +static ssize_t memory_wsr_refresh_ms_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + unsigned int nid, msecs; + struct wsr *wsr; + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + ssize_t ret = memory_wsr_threshold_parse(buf, nbytes, &nid, &msecs); + + if (ret < 0) + return ret; + + wsr = lruvec_wsr(mem_cgroup_lruvec(memcg, NODE_DATA(nid))); + WRITE_ONCE(wsr->refresh_threshold, msecs_to_jiffies(msecs)); + return ret; +} + +static int memory_wsr_report_ms_show(struct seq_file *m, void *v) +{ + int nid; + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + + for_each_node_state(nid, N_MEMORY) { + struct wsr *wsr = + lruvec_wsr(mem_cgroup_lruvec(memcg, NODE_DATA(nid))); + + seq_printf(m, "N%d=%u ", nid, + jiffies_to_msecs(READ_ONCE(wsr->report_threshold))); + } + seq_putc(m, '\n'); + + return 0; +} + +static ssize_t memory_wsr_report_ms_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + unsigned int nid, msecs; + struct wsr *wsr; + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + ssize_t ret = memory_wsr_threshold_parse(buf, nbytes, &nid, &msecs); + + if (ret < 0) + return ret; + + wsr = lruvec_wsr(mem_cgroup_lruvec(memcg, NODE_DATA(nid))); + WRITE_ONCE(wsr->report_threshold, msecs_to_jiffies(msecs)); + return ret; +} + +static int memory_wsr_histogram_show(struct seq_file *m, void *v) +{ + int nid; + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + + for_each_node_state(nid, N_MEMORY) { + struct wsr *wsr = + lruvec_wsr(mem_cgroup_lruvec(memcg, NODE_DATA(nid))); + struct ws_bin *bin; + + seq_printf(m, "N%d\n", nid); + + mutex_lock(&wsr->bins_lock); + wsr_refresh(wsr, memcg, NODE_DATA(nid)); + for (bin = wsr->bins; bin->idle_age != -1; bin++) + seq_printf(m, "%u anon=%lu file=%lu\n", + jiffies_to_msecs(bin->idle_age), + bin->nr_pages[0], bin->nr_pages[1]); + + seq_printf(m, "%lld anon=%lu file=%lu\n", LLONG_MAX, + bin->nr_pages[0], bin->nr_pages[1]); + + mutex_unlock(&wsr->bins_lock); + } + + return 0; +} + +__poll_t memory_wsr_histogram_poll(struct kernfs_open_file *of, + struct poll_table_struct *pt) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + + if (memcg->css.flags & CSS_DYING) + return DEFAULT_POLLMASK; + + poll_wait(of->file, &memcg->wsr_wait_queue, pt); + if (cmpxchg(&memcg->wsr_event, 1, 0) == 1) + return DEFAULT_POLLMASK | EPOLLPRI; + return DEFAULT_POLLMASK; +} +#endif + static struct cftype memory_files[] = { { .name = "current", @@ -6710,7 +6944,33 @@ static struct cftype memory_files[] = { .flags = CFTYPE_NS_DELEGATABLE, .write = memory_reclaim, }, - { } /* terminate */ +#ifdef CONFIG_WSR + { + .name = "wsr.intervals_ms", + .flags = CFTYPE_NOT_ON_ROOT | CFTYPE_NS_DELEGATABLE, + .seq_show = memory_wsr_intervals_ms_show, + .write = memory_wsr_intervals_ms_write, + }, + { + .name = "wsr.refresh_ms", + .flags = CFTYPE_NOT_ON_ROOT | CFTYPE_NS_DELEGATABLE, + .seq_show = memory_wsr_refresh_ms_show, + .write = memory_wsr_refresh_ms_write, + }, + { + .name = "wsr.report_ms", + .flags = CFTYPE_NOT_ON_ROOT | CFTYPE_NS_DELEGATABLE, + .seq_show = memory_wsr_report_ms_show, + .write = memory_wsr_report_ms_write, + }, + { + .name = "wsr.histogram", + .flags = CFTYPE_NOT_ON_ROOT | CFTYPE_NS_DELEGATABLE, + .seq_show = memory_wsr_histogram_show, + .poll = memory_wsr_histogram_poll, + }, +#endif + {} /* terminate */ }; struct cgroup_subsys memory_cgrp_subsys = { diff --git a/mm/vmscan.c b/mm/vmscan.c index c56fddcec88fb..ba254b6e91e19 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4559,8 +4559,6 @@ static bool age_lruvec(struct lruvec *lruvec, struct scan_control *sc, unsigned return true; } -static void report_ws(struct pglist_data *pgdat, struct scan_control *sc); - /* to protect the working set of the last N jiffies */ static unsigned long lru_gen_min_ttl __read_mostly; @@ -5937,7 +5935,7 @@ void wsr_refresh(struct wsr *wsr, struct mem_cgroup *root, } } -static void report_ws(struct pglist_data *pgdat, struct scan_control *sc) +void report_ws(struct pglist_data *pgdat, struct scan_control *sc) { static DEFINE_RATELIMIT_STATE(rate, HZ, 3); @@ -5969,6 +5967,8 @@ static void report_ws(struct pglist_data *pgdat, struct scan_control *sc) if (wsr->notifier) kernfs_notify(wsr->notifier); + if (memcg && cmpxchg(&memcg->wsr_event, 0, 1) == 0) + wake_up_interruptible(&memcg->wsr_wait_queue); } #endif /* CONFIG_WSR */ @@ -6486,6 +6486,9 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc) if (zone->zone_pgdat == last_pgdat) continue; last_pgdat = zone->zone_pgdat; + + if (!sc->proactive) + report_ws(zone->zone_pgdat, sc); shrink_node(zone->zone_pgdat, sc); } From patchwork Wed Jun 21 18:04:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanchu Xie X-Patchwork-Id: 111254 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp4561972vqr; Wed, 21 Jun 2023 11:30:34 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5Q59zlYoG3MEziTPo+3OJ6dQBxnEPPIhOa5TPepX8XPD4VrhL4YGDxz7SfRxO/WSDFeLu2 X-Received: by 2002:a05:6a21:9992:b0:111:1bd6:2731 with SMTP id ve18-20020a056a21999200b001111bd62731mr6770735pzb.7.1687372233662; Wed, 21 Jun 2023 11:30:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687372233; cv=none; d=google.com; s=arc-20160816; b=EqVSwshySk5NrdtIDRr2xyM3KPOooQF2u1tL2ea18GRsDiJqwnOfRuAXtJl7EMok4H pGfgfX8xEhEFZ9fe75wUI3AbhIQji0WFBY4J27I1FOP8kxzjyAuyQa7OQ6mMFZlfiaif mMoPHdyXrZB7Qeg3m59MsfK96p0eJfoIrDC2xce1X7WXffkXi0AWZ69QlcI9ykm17cuU ycaQ5PpV5Qip9tSKCj2I+7NfZNqVFHxaxp4ecZ/SBPjMlZYvV93tfcqrfrjz1E35lyfv XQ6jn/fpbEf54BHCqKhHMN3HHubp4R9cRcnouFzKTBCh9HfmmkPH0SpcfHRCPD8bVH2c o+Mw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=WHNe3JpSW3qQzxOEOwM65k3PJvHd1TN91bqMS64mhVU=; b=vWCYEGWaFVUbcd7xPMNpAAvandgnvt2LabHW2QQduvpJzjrtx5jXhjOBr4Yk7sgYv4 PPsMUpih9He7EJ1ykRTbvBTHbJFsuDQQyaq3PvHWvprz7wVjLjHd94MiKSYoOCUO6sN/ 0TJfiGu1nKolyywDk5J41J92Kwz2FthU44K6JM/udPgdPpa4pkv9wg5srz6UWK90V5s7 Djn109BXw/STidx6KUF9U2S/kgzp8sPJEBTQsapA9/ERUvpaxvdA8wUGI8Lt4R+7j8MO MjxzE9FyzFvWXBINXRdavI+qK04z9IbBNPDyuXHgOsxlZyD3GpdZSm7SwJ7A8WHBvT7H amYA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=MVxoxG2U; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d191-20020a621dc8000000b006634c05d13csi2999171pfd.172.2023.06.21.11.30.20; Wed, 21 Jun 2023 11:30:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=MVxoxG2U; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231670AbjFUSRc (ORCPT + 99 others); Wed, 21 Jun 2023 14:17:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45164 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231284AbjFUSRO (ORCPT ); Wed, 21 Jun 2023 14:17:14 -0400 Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com [IPv6:2607:f8b0:4864:20::649]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A58C519A9 for ; Wed, 21 Jun 2023 11:17:12 -0700 (PDT) Received: by mail-pl1-x649.google.com with SMTP id d9443c01a7336-1b51ba96cb4so47407265ad.3 for ; Wed, 21 Jun 2023 11:17:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687371432; x=1689963432; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=WHNe3JpSW3qQzxOEOwM65k3PJvHd1TN91bqMS64mhVU=; b=MVxoxG2UJlFie+EE1fYVqH+HX9TC4DlWD+BHaHeLLJHx6Jxtu/DFPoB7LMunGaFsjp //1r8XRL5aRB7zBg/F9i2/UzJRwBnWI703NA2RibkRlg3moTohk16JYHt+4fDzmFU/lF J9YtXBEapHmbZvYFfm/TI8AJIbFeBlyUEi/8WTkJDyLh1QqStF5YuF45YeCkuAvBUg6q NXcwAJutswIyTGJSIujATExtb/Ttp52WLyhCNKELa/UsjoeripP+vytdkFYr7Yz97nyD KqivdIfLL9LXaMZgmLAkFXoY67oftZj07I4GBXIVmzIBpBgPbNV47l4Y63aZ1yUYhhRO y9Pg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687371432; x=1689963432; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WHNe3JpSW3qQzxOEOwM65k3PJvHd1TN91bqMS64mhVU=; b=mFQ8sjENVPeHX1ZO0wVQ/2nmQFa+hSbYIrottb46rXgMqqN0E4g7xEcL8clbEZAJSN v8SNeCGEIOPPXtDhcHHMs37F6BoT6bKWU4P164Zc9q7zQNhahlg5Syi6UAKw9r4Mmfc1 HU58KtOqlS+gqnEx+eSpPphJptDTnP1pKRk1spNmZ33QIxoOKmmv64DzK1c8T3CLGOMZ e8c35AYikAlw80nkuEc6t2zMMAG7kdCFbuVk9IsVCiu+TKbL59YaT0MB3sIcXFnrIOfd mIYj2VM8s2C3OFYvDaIBT4M/jbBH6Q6Piw1Fs6ZGbMZiNnLfEHsJukeYCvocNnM6Kyx3 WcmA== X-Gm-Message-State: AC+VfDxWxuY6BljN3iV2EDYBGOCv9XKHWq3yKhLoOa3R+HE591sakGat 48Ey//qN23SISoIt+ypquSWHQDNHT+w6 X-Received: from yuanchu.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:4024]) (user=yuanchu job=sendgmr) by 2002:a17:902:efd1:b0:1b3:d6be:4680 with SMTP id ja17-20020a170902efd100b001b3d6be4680mr2304661plb.13.1687371432115; Wed, 21 Jun 2023 11:17:12 -0700 (PDT) Date: Wed, 21 Jun 2023 18:04:53 +0000 In-Reply-To: <20230621180454.973862-1-yuanchu@google.com> Mime-Version: 1.0 References: <20230621180454.973862-1-yuanchu@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Message-ID: <20230621180454.973862-6-yuanchu@google.com> Subject: [RFC PATCH v2 5/6] mm: add per-memcg reaccess histogram From: Yuanchu Xie To: Greg Kroah-Hartman , "Rafael J . Wysocki" , "Michael S . Tsirkin" , David Hildenbrand , Jason Wang , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Yu Zhao , Kefeng Wang , Kairui Song , Yosry Ahmed , Yuanchu Xie , "T . J . Alumbaugh" Cc: Wei Xu , SeongJae Park , Sudarshan Rajagopalan , kai.huang@intel.com, hch@lst.de, jon@nutanix.com, Aneesh Kumar K V , Matthew Wilcox , Vasily Averin , linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-mm@kvack.org, cgroups@vger.kernel.org X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769338027055155330?= X-GMAIL-MSGID: =?utf-8?q?1769338027055155330?= A reaccess refers to detecting an access on a page via refault or access bit harvesting after the initial access. Similar to the working set histogram, the reaccess histogram breaks down reaccesses into user-defined bins. Currently it only tracks reaccesses from access bit harvesting, and the plan is to include refaults in the same histogram by pulling information from folio->mapping->i_pages shadow entry for swapped out pages. Signed-off-by: T.J. Alumbaugh Signed-off-by: Yuanchu Xie --- include/linux/wsr.h | 9 +++- mm/memcontrol.c | 89 ++++++++++++++++++++++++++++++++++++++ mm/vmscan.c | 6 ++- mm/wsr.c | 101 ++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 203 insertions(+), 2 deletions(-) diff --git a/include/linux/wsr.h b/include/linux/wsr.h index d45f7cc0672ac..68246734679cd 100644 --- a/include/linux/wsr.h +++ b/include/linux/wsr.h @@ -26,11 +26,14 @@ struct ws_bin { struct wsr { /* protects bins */ struct mutex bins_lock; + /* protects reaccess_bins */ + struct mutex reaccess_bins_lock; struct kernfs_node *notifier; unsigned long timestamp; unsigned long report_threshold; unsigned long refresh_threshold; struct ws_bin bins[MAX_NR_BINS]; + struct ws_bin reaccess_bins[MAX_NR_BINS]; }; void wsr_register_node(struct node *node); @@ -48,6 +51,7 @@ ssize_t wsr_intervals_ms_parse(char *src, struct ws_bin *bins); */ void wsr_refresh(struct wsr *wsr, struct mem_cgroup *root, struct pglist_data *pgdat); +void report_reaccess(struct lruvec *lruvec, struct lru_gen_mm_walk *walk); void report_ws(struct pglist_data *pgdat, struct scan_control *sc); #else struct ws_bin; @@ -71,7 +75,10 @@ static inline ssize_t wsr_intervals_ms_parse(char *src, struct ws_bin *bins) return -EINVAL; } static inline void wsr_refresh(struct wsr *wsr, struct mem_cgroup *root, - struct pglist_data *pgdat) + struct pglist_data *pgdat) +{ +} +static inline void report_reaccess(struct lruvec *lruvec, struct lru_gen_mm_walk *walk) { } static inline void report_ws(struct pglist_data *pgdat, struct scan_control *sc) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index edf5bb31bb19c..b901982d659d2 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6736,6 +6736,56 @@ static ssize_t memory_wsr_intervals_ms_write(struct kernfs_open_file *of, return err ?: nbytes; } +static int memory_reaccess_intervals_ms_show(struct seq_file *m, void *v) +{ + int nid; + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + + for_each_node_state(nid, N_MEMORY) { + struct wsr *wsr; + struct ws_bin *bin; + + wsr = lruvec_wsr(mem_cgroup_lruvec(memcg, NODE_DATA(nid))); + mutex_lock(&wsr->reaccess_bins_lock); + seq_printf(m, "N%d=", nid); + for (bin = wsr->reaccess_bins; bin->idle_age != -1; bin++) + seq_printf(m, "%u,", jiffies_to_msecs(bin->idle_age)); + mutex_unlock(&wsr->reaccess_bins_lock); + + seq_printf(m, "%lld ", LLONG_MAX); + } + seq_putc(m, '\n'); + + return 0; +} + +static ssize_t memory_reaccess_intervals_ms_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, + loff_t off) +{ + unsigned int nid; + int err; + struct wsr *wsr; + struct ws_bin *bins; + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + + bins = kzalloc(sizeof(wsr->reaccess_bins), GFP_KERNEL); + if (!bins) + return -ENOMEM; + + err = memory_wsr_intervals_ms_parse(of, buf, nbytes, &nid, bins); + if (err) + goto failed; + + wsr = lruvec_wsr(mem_cgroup_lruvec(memcg, NODE_DATA(nid))); + mutex_lock(&wsr->reaccess_bins_lock); + memcpy(wsr->reaccess_bins, bins, sizeof(wsr->reaccess_bins)); + mutex_unlock(&wsr->reaccess_bins_lock); +failed: + kfree(bins); + return err ?: nbytes; +} + static int memory_wsr_refresh_ms_show(struct seq_file *m, void *v) { int nid; @@ -6874,6 +6924,34 @@ __poll_t memory_wsr_histogram_poll(struct kernfs_open_file *of, return DEFAULT_POLLMASK | EPOLLPRI; return DEFAULT_POLLMASK; } + +static int memory_reaccess_histogram_show(struct seq_file *m, void *v) +{ + int nid; + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + + for_each_node_state(nid, N_MEMORY) { + struct wsr *wsr = + lruvec_wsr(mem_cgroup_lruvec(memcg, NODE_DATA(nid))); + struct ws_bin *bin; + + seq_printf(m, "N%d\n", nid); + + mutex_lock(&wsr->reaccess_bins_lock); + wsr_refresh(wsr, memcg, NODE_DATA(nid)); + for (bin = wsr->reaccess_bins; bin->idle_age != -1; bin++) + seq_printf(m, "%u anon=%lu file=%lu\n", + jiffies_to_msecs(bin->idle_age), + bin->nr_pages[0], bin->nr_pages[1]); + + seq_printf(m, "%lld anon=%lu file=%lu\n", LLONG_MAX, + bin->nr_pages[0], bin->nr_pages[1]); + + mutex_unlock(&wsr->reaccess_bins_lock); + } + + return 0; +} #endif static struct cftype memory_files[] = { @@ -6969,6 +7047,17 @@ static struct cftype memory_files[] = { .seq_show = memory_wsr_histogram_show, .poll = memory_wsr_histogram_poll, }, + { + .name = "reaccess.intervals_ms", + .flags = CFTYPE_NOT_ON_ROOT | CFTYPE_NS_DELEGATABLE, + .seq_show = memory_reaccess_intervals_ms_show, + .write = memory_reaccess_intervals_ms_write, + }, + { + .name = "reaccess.histogram", + .flags = CFTYPE_NOT_ON_ROOT | CFTYPE_NS_DELEGATABLE, + .seq_show = memory_reaccess_histogram_show, + }, #endif {} /* terminate */ }; diff --git a/mm/vmscan.c b/mm/vmscan.c index ba254b6e91e19..bc8c026ceef0d 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4226,6 +4226,7 @@ static void walk_mm(struct lruvec *lruvec, struct mm_struct *mm, struct lru_gen_ mem_cgroup_unlock_pages(); if (walk->batched) { + report_reaccess(lruvec, walk); spin_lock_irq(&lruvec->lru_lock); reset_batch_size(lruvec, walk); spin_unlock_irq(&lruvec->lru_lock); @@ -5079,11 +5080,14 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap sc->nr_scanned -= folio_nr_pages(folio); } + walk = current->reclaim_state->mm_walk; + if (walk && walk->batched) + report_reaccess(lruvec, walk); + spin_lock_irq(&lruvec->lru_lock); move_folios_to_lru(lruvec, &list); - walk = current->reclaim_state->mm_walk; if (walk && walk->batched) reset_batch_size(lruvec, walk); diff --git a/mm/wsr.c b/mm/wsr.c index cd045ade5e9ba..a63d678e64f8b 100644 --- a/mm/wsr.c +++ b/mm/wsr.c @@ -23,8 +23,10 @@ void wsr_init(struct lruvec *lruvec) struct wsr *wsr = lruvec_wsr(lruvec); mutex_init(&wsr->bins_lock); + mutex_init(&wsr->reaccess_bins_lock); wsr->bins[0].idle_age = -1; wsr->notifier = NULL; + wsr->reaccess_bins[0].idle_age = -1; } void wsr_destroy(struct lruvec *lruvec) @@ -32,6 +34,7 @@ void wsr_destroy(struct lruvec *lruvec) struct wsr *wsr = lruvec_wsr(lruvec); mutex_destroy(&wsr->bins_lock); + mutex_destroy(&wsr->reaccess_bins_lock); memset(wsr, 0, sizeof(*wsr)); } @@ -172,6 +175,104 @@ void refresh_wsr(struct wsr *wsr, struct mem_cgroup *root, cond_resched(); } while ((memcg = mem_cgroup_iter(root, memcg, NULL))); } + +static void collect_reaccess_locked(struct wsr *wsr, + struct lru_gen_struct *lrugen, + struct lru_gen_mm_walk *walk) +{ + int gen, type, zone; + unsigned long curr_timestamp = jiffies; + unsigned long max_seq = READ_ONCE(walk->max_seq); + unsigned long min_seq[ANON_AND_FILE] = { + READ_ONCE(lrugen->min_seq[LRU_GEN_ANON]), + READ_ONCE(lrugen->min_seq[LRU_GEN_FILE]), + }; + + for (type = 0; type < ANON_AND_FILE; type++) { + unsigned long seq; + struct ws_bin *bin = wsr->reaccess_bins; + + lockdep_assert_held(&wsr->reaccess_bins_lock); + /* Skip max_seq because a reaccess moves a page from another seq + * to max_seq. We use the negative change in page count from + * other seqs to track the number of reaccesses. + */ + for (seq = max_seq - 1; seq + 1 > min_seq[type]; seq--) { + long error; + int next_gen; + unsigned long birth, gen_start; + long delta = 0; + + gen = lru_gen_from_seq(seq); + + for (zone = 0; zone < MAX_NR_ZONES; zone++) { + long nr_pages = walk->nr_pages[gen][type][zone]; + + if (nr_pages < 0) + delta += -nr_pages; + } + + birth = READ_ONCE(lrugen->timestamps[gen]); + next_gen = lru_gen_from_seq(seq + 1); + gen_start = READ_ONCE(lrugen->timestamps[next_gen]); + + /* ensure gen_start is within idle_age of bin */ + while (bin->idle_age != -1 && + time_before(gen_start + bin->idle_age, + curr_timestamp)) + bin++; + + error = delta; + /* gen exceeds the idle_age of bin */ + while (bin->idle_age != -1 && + time_before(birth + bin->idle_age, + curr_timestamp)) { + unsigned long proportion = + gen_start - + (curr_timestamp - bin->idle_age); + unsigned long gen_len = gen_start - birth; + + if (!gen_len) + break; + if (proportion) { + unsigned long split_bin = + delta / gen_len * proportion; + bin->nr_pages[type] += split_bin; + error -= split_bin; + } + gen_start = curr_timestamp - bin->idle_age; + bin++; + } + bin->nr_pages[type] += error; + } + } +} + +static void collect_reaccess(struct wsr *wsr, + struct lru_gen_struct *lrugen, + struct lru_gen_mm_walk *walk) +{ + if (READ_ONCE(wsr->reaccess_bins->idle_age) == -1) + return; + + mutex_lock(&wsr->reaccess_bins_lock); + collect_reaccess_locked(wsr, lrugen, walk); + mutex_unlock(&wsr->reaccess_bins_lock); +} + +void report_reaccess(struct lruvec *lruvec, struct lru_gen_mm_walk *walk) +{ + struct lru_gen_struct *lrugen = &lruvec->lrugen; + struct mem_cgroup *memcg = lruvec_memcg(lruvec); + + while (memcg) { + collect_reaccess(lruvec_wsr(mem_cgroup_lruvec( + memcg, lruvec_pgdat(lruvec))), + lrugen, walk); + memcg = parent_mem_cgroup(memcg); + } +} + static struct pglist_data *kobj_to_pgdat(struct kobject *kobj) { int nid = IS_ENABLED(CONFIG_NUMA) ? kobj_to_dev(kobj)->id : From patchwork Wed Jun 21 18:04:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanchu Xie X-Patchwork-Id: 111266 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp4570168vqr; Wed, 21 Jun 2023 11:47:20 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6jiH3hIqD6IZ1ifYwfhlzbB3RgnPYgDJVNXj680UxIuZFNRrv94UvkSvRBBg3ItYQZSlKj X-Received: by 2002:a17:903:234e:b0:1b5:54cb:36a7 with SMTP id c14-20020a170903234e00b001b554cb36a7mr11190785plh.61.1687373240118; Wed, 21 Jun 2023 11:47:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687373240; cv=none; d=google.com; s=arc-20160816; b=gjczxpc0tvJ/ofN7dmNfVAgXqNGuzWQfq/9gbaTVeLIPkDt+st5gmy5TPLwlXIerHh YFu18dP8V8b4ObgUXxAOt1shUamGMnoL6XWkKAuX3dViTvNHDh9R3lojvBfrnMnn2om3 BlSCONzZmNVxgTQnz62y7lx32zDzwTPuOkVXf2KaxhFFj3Q/yzFMFQ3qV0A2lQTfEC83 Vapvb01ShspN404bVRRR/K9AYeAbinLKDUHDdJEBn5LEfOVWYnZtxByCfVizfeMa/S7M F7bfoZ7b7GYH3HM4awbOYfrU6MEs2Txm+IM6K3iu/X5f2u/Saj+sAMgPQjgjUjwuIwDg iGuA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=nQEbqydm3aL8QCWI0JBHm0Z6aDHcaid/YGMOE1SbX2c=; b=NRBhfLstNOQqEwwnoEqmqvWcNOKU4PhIVXkgbZGBqWayiR5FmueyQ5FItoDQU2cZCK VEXgzWLv5uVqoWtxm1DEnbkGBK3FHkASEI7x6XAKgXJ67AWkHG2Y0durU4IPQotG2EbB NSx7eG81JjNY4Dzok7bEy66Gv/NQyxkFn7cYuNLRYzO/kRC1m4GHOn7FYY+cbO/iueJ7 Qo7t2+tUEVpZBz3N7cbWCzc8V3L9n2T0HzQZK8b36IWZy+u6rw0256Rz1vWtucqGL+Zk R9tzKDHSSGHZztK0cnckx5RAdn8mfIdV4DuOZVH5REi2cPuB/tLugaMfYQZlDBaLUCtk 8yJA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=PrCXSIfL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id jb9-20020a170903258900b001a94b91f412si4522315plb.164.2023.06.21.11.47.06; Wed, 21 Jun 2023 11:47:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=PrCXSIfL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231793AbjFUSRj (ORCPT + 99 others); Wed, 21 Jun 2023 14:17:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45240 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231674AbjFUSRQ (ORCPT ); Wed, 21 Jun 2023 14:17:16 -0400 Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com [IPv6:2607:f8b0:4864:20::549]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7D30C1731 for ; Wed, 21 Jun 2023 11:17:14 -0700 (PDT) Received: by mail-pg1-x549.google.com with SMTP id 41be03b00d2f7-53f06f7cc74so3288542a12.1 for ; Wed, 21 Jun 2023 11:17:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687371434; x=1689963434; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=nQEbqydm3aL8QCWI0JBHm0Z6aDHcaid/YGMOE1SbX2c=; b=PrCXSIfLnBmUtPYYUmkwE7KJpSUraf0vLrQ2op9siDUjsF/iFitSuaxbZsZzv7vceT agqOlwEx1ZgaHTJXqPJVw5v/Q9J0pJzHeQ7/eV/ZiJG1oHbMZTM/vO4crBHXbQmHdgej qGIqz6KtSnRtXDC3w9vqtnBVj/t6kzCgKkL2QmL8VXVOYtKIW5WJReUtIQWtc1D2VP8s wtcYva1qOl9J2ywR33pdi4ZWORX2IQiNbGuqANzecOj2iW7VtYdCVsCb5ZgwlzhnCUvm HTczbg559emgkr6xxtcTwEXFAU7yb6WiUwofa0nT4QVBR6JTvnQQBFn/kA/RukNC8Ti7 JPrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687371434; x=1689963434; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nQEbqydm3aL8QCWI0JBHm0Z6aDHcaid/YGMOE1SbX2c=; b=JT6XDRz+n6QZiEui06T/BmkuEx15fmtFNze2ojTFGPcE5OB75oH6zWXNE/3qXAlRhW NkrKhDp1Qimqdu15asIU6GqVJhN1ZSUiHWXjduw6dSuqCbwjbLA5oRf+9fjzJpOg7g2h 1xEZ8r6iYvXnf2Nb48gtt+wI5AbKgOqbj/4lcNSXECYmJQ95R237ct2okDSZqHcvaCAh GLlK1R24ke924ovWJkeFOA1a+h/xn5UwVfGIHjd6+M/3hH6wRYoXNuIo5bYzn6TZfCkl vSCuy7c59bDF6/eZBpm3hfeYuzI3PlGQ9YiPL9C9o0Xn+7JS7tdyCOX/BcTjamUevOfY FMlA== X-Gm-Message-State: AC+VfDwy+yG/SYv45gnNgQwqFZgIfdia6P0PADZpo5jG5prvpHGdI83b AxaJswMwFyozKsuP9JGno2VeVngwHJV5 X-Received: from yuanchu.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:4024]) (user=yuanchu job=sendgmr) by 2002:a17:90a:c7cb:b0:25e:8ed6:947a with SMTP id gf11-20020a17090ac7cb00b0025e8ed6947amr2668996pjb.2.1687371433953; Wed, 21 Jun 2023 11:17:13 -0700 (PDT) Date: Wed, 21 Jun 2023 18:04:54 +0000 In-Reply-To: <20230621180454.973862-1-yuanchu@google.com> Mime-Version: 1.0 References: <20230621180454.973862-1-yuanchu@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Message-ID: <20230621180454.973862-7-yuanchu@google.com> Subject: [RFC PATCH v2 6/6] virtio-balloon: Add Working Set reporting From: Yuanchu Xie To: Greg Kroah-Hartman , "Rafael J . Wysocki" , "Michael S . Tsirkin" , David Hildenbrand , Jason Wang , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Yu Zhao , Kefeng Wang , Kairui Song , Yosry Ahmed , Yuanchu Xie , "T . J . Alumbaugh" Cc: Wei Xu , SeongJae Park , Sudarshan Rajagopalan , kai.huang@intel.com, hch@lst.de, jon@nutanix.com, Aneesh Kumar K V , Matthew Wilcox , Vasily Averin , linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-mm@kvack.org, cgroups@vger.kernel.org X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769339082650935368?= X-GMAIL-MSGID: =?utf-8?q?1769339082650935368?= From: "T.J. Alumbaugh" Add working set and notification vqueues, along with a simple interface to kernel WS functions. The driver receives config info and sends reports on notification. A mutex is used to guard virtio_balloon state. Signed-off-by: T.J. Alumbaugh Signed-off-by: Yuanchu Xie --- drivers/virtio/virtio_balloon.c | 288 ++++++++++++++++++++++++++++ include/linux/balloon_compaction.h | 3 + include/linux/wsr.h | 29 ++- include/uapi/linux/virtio_balloon.h | 33 ++++ mm/vmscan.c | 106 ++++++++++ 5 files changed, 457 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 3f78a3a1eb753..0cb6a46eb7e8a 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -45,6 +46,8 @@ enum virtio_balloon_vq { VIRTIO_BALLOON_VQ_STATS, VIRTIO_BALLOON_VQ_FREE_PAGE, VIRTIO_BALLOON_VQ_REPORTING, + VIRTIO_BALLOON_VQ_WORKING_SET, + VIRTIO_BALLOON_VQ_NOTIFY, VIRTIO_BALLOON_VQ_MAX }; @@ -55,6 +58,9 @@ enum virtio_balloon_config_read { struct virtio_balloon { struct virtio_device *vdev; struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq; +#ifdef CONFIG_WSR + struct virtqueue *working_set_vq, *notification_vq; +#endif /* Balloon's own wq for cpu-intensive work items */ struct workqueue_struct *balloon_wq; @@ -64,6 +70,10 @@ struct virtio_balloon { /* The balloon servicing is delegated to a freezable workqueue. */ struct work_struct update_balloon_stats_work; struct work_struct update_balloon_size_work; +#ifdef CONFIG_WSR + struct work_struct update_balloon_working_set_work; + struct work_struct update_balloon_notification_work; +#endif /* Prevent updating balloon when it is being canceled. */ spinlock_t stop_update_lock; @@ -119,6 +129,16 @@ struct virtio_balloon { /* Free page reporting device */ struct virtqueue *reporting_vq; struct page_reporting_dev_info pr_dev_info; + +#ifdef CONFIG_WSR + /* Working Set reporting */ + u8 working_set_num_bins; + struct virtio_balloon_working_set *working_set; + + /* A buffer to hold incoming notification from the host. */ + unsigned int notification_size; + void *notification_buf; +#endif }; static const struct virtio_device_id id_table[] = { @@ -465,6 +485,211 @@ static void update_balloon_stats_func(struct work_struct *work) stats_handle_request(vb); } +#ifdef CONFIG_WSR +/* Must hold the balloon_lock while calling this function. */ +static inline void reset_working_set(struct virtio_balloon *vb) +{ + int i; + + for (i = 0; i < vb->working_set_num_bins; ++i) { + vb->working_set[i].tag = cpu_to_virtio16(vb->vdev, -1); + vb->working_set[i].node_id = cpu_to_virtio16(vb->vdev, -1); + vb->working_set[i].idle_age_ms = cpu_to_virtio64(vb->vdev, 0); + vb->working_set[i].memory_size_bytes[0] = cpu_to_virtio64(vb->vdev, -1); + vb->working_set[i].memory_size_bytes[1] = cpu_to_virtio64(vb->vdev, -1); + } +} + +/* Must hold the balloon_lock while calling this function. */ +static inline void update_working_set(struct virtio_balloon *vb, int idx, + u64 idle_age, u64 bytes_anon, + u64 bytes_file, int node_id) +{ + vb->working_set[idx].tag = cpu_to_virtio16(vb->vdev, VIRTIO_BALLOON_WS_RECLAIMABLE); + vb->working_set[idx].node_id = cpu_to_virtio16(vb->vdev, node_id); + vb->working_set[idx].idle_age_ms = cpu_to_virtio64(vb->vdev, idle_age); + vb->working_set[idx].memory_size_bytes[0] = cpu_to_virtio64(vb->vdev, + bytes_anon); + vb->working_set[idx].memory_size_bytes[1] = cpu_to_virtio64(vb->vdev, + bytes_file); +} + +static bool working_set_is_init(struct virtio_balloon *vb) +{ + if (vb->working_set[0].idle_age_ms > 0) + return true; + return false; +} + +static void virtio_balloon_working_set_request(void) +{ + struct pglist_data *pgdat; + int nid = NUMA_NO_NODE; + + if (IS_ENABLED(CONFIG_NUMA)) { + for_each_online_node(nid) { + if (node_possible(nid)) { + pgdat = NODE_DATA(nid); + working_set_request(pgdat); + } + } + } else { + pgdat = NODE_DATA(nid); + working_set_request(pgdat); + } +} + +static void notification_receive(struct virtqueue *vq) +{ + struct virtio_balloon *vb = vq->vdev->priv; + + spin_lock(&vb->stop_update_lock); + if (!vb->stop_update) + queue_work(system_freezable_wq, &vb->update_balloon_notification_work); + spin_unlock(&vb->stop_update_lock); +} + +static int virtio_balloon_register_working_set_receiver(struct virtio_balloon *vb, + __virtio64 *intervals, unsigned long nr_bins, __virtio64 refresh_ms, + __virtio64 report_ms) +{ + struct pglist_data *pgdat; + unsigned long *bin_intervals = NULL; + int i, err; + int nid = NUMA_NO_NODE; + + if (intervals && nr_bins) { + /* TODO: keep values as 32-bits throughout. */ + bin_intervals = kzalloc(sizeof(unsigned long) * (nr_bins-1), + GFP_KERNEL); + for (i = 0; i < nr_bins - 1; i++) + bin_intervals[i] = (unsigned long)intervals[i]; + + if (IS_ENABLED(CONFIG_NUMA)) { + for_each_online_node(nid) { + if (node_possible(nid)) { + pgdat = NODE_DATA(nid); + err = register_working_set_receiver(vb, + pgdat, &(bin_intervals[0]), + nr_bins, (unsigned long) refresh_ms, + (unsigned long) report_ms); + } + } + } else { + pgdat = NODE_DATA(nid); + err = register_working_set_receiver(vb, pgdat, + &(bin_intervals[0]), nr_bins, + (unsigned long) refresh_ms, + (unsigned long) report_ms); + } + kfree(bin_intervals); + return err; + } + return -EINVAL; +} + +void working_set_notify(void *ws_receiver, struct ws_bin *bins, int node_id) +{ + u64 bytes_nr_file, bytes_nr_anon; + struct virtio_balloon *vb = ws_receiver; + int idx = 0; + + if (!mutex_trylock(&vb->balloon_lock)) + return; + for (; idx < vb->working_set_num_bins; idx++) { + bytes_nr_anon = (u64)(bins[idx].nr_pages[0]) * PAGE_SIZE; + bytes_nr_file = (u64)(bins[idx].nr_pages[1]) * PAGE_SIZE; + update_working_set(vb, idx, jiffies_to_msecs(bins[idx].idle_age), + bytes_nr_anon, bytes_nr_file, node_id); + } + mutex_unlock(&vb->balloon_lock); + /* Send the working set report to the device. */ + spin_lock(&vb->stop_update_lock); + if (!vb->stop_update) + queue_work(system_freezable_wq, &vb->update_balloon_working_set_work); + spin_unlock(&vb->stop_update_lock); +} +EXPORT_SYMBOL(working_set_notify); + +static void update_balloon_notification_func(struct work_struct *work) +{ + struct virtio_balloon *vb; + struct scatterlist sg_in; + struct pglist_data *pgdat; + __virtio64 *bin_intervals; + __virtio64 refresh_ms, report_ms; + int16_t tag; + char *buf; + int len; + + vb = container_of(work, struct virtio_balloon, + update_balloon_notification_work); + + /* Read a Working Set notification from the device. */ + buf = (char *)vb->notification_buf; + tag = *((int16_t *)buf); + buf += sizeof(int16_t); + if (tag == VIRTIO_BALLOON_WS_REQUEST) { + pgdat = NODE_DATA(NUMA_NO_NODE); + virtio_balloon_working_set_request(); + } else if (tag == VIRTIO_BALLOON_WS_CONFIG) { + mutex_lock(&vb->balloon_lock); + reset_working_set(vb); + mutex_unlock(&vb->balloon_lock); + bin_intervals = (__virtio64 *) buf; + buf += sizeof(__virtio64) * (vb->working_set_num_bins - 1); + refresh_ms = *((__virtio64 *) buf); + buf += sizeof(__virtio64); + report_ms = *((__virtio64 *) buf); + virtio_balloon_register_working_set_receiver(vb, bin_intervals, + vb->working_set_num_bins, refresh_ms, report_ms); + } else { + dev_warn(&vb->vdev->dev, "Received invalid notification, %u\n", tag); + return; + } + + /* Detach all the used buffers from the vq */ + while (virtqueue_get_buf(vb->notification_vq, &len)) + ; + /* Add a new notification buffer for device to fill. */ + sg_init_one(&sg_in, vb->notification_buf, vb->notification_size); + virtqueue_add_inbuf(vb->notification_vq, &sg_in, 1, vb, GFP_KERNEL); + virtqueue_kick(vb->notification_vq); +} + +static void update_balloon_ws_func(struct work_struct *work) +{ + struct virtio_balloon *vb; + struct scatterlist sg_out; + int err = 0; + int unused; + + vb = container_of(work, struct virtio_balloon, + update_balloon_working_set_work); + + mutex_lock(&vb->balloon_lock); + if (working_set_is_init(vb)) { + /* Detach all the used buffers from the vq */ + while (virtqueue_get_buf(vb->working_set_vq, &unused)) + ; + sg_init_one(&sg_out, vb->working_set, + (sizeof(struct virtio_balloon_working_set) * + vb->working_set_num_bins)); + err = virtqueue_add_outbuf(vb->working_set_vq, &sg_out, 1, vb, GFP_KERNEL); + } else { + dev_warn(&vb->vdev->dev, "Working Set not initialized."); + err = -EINVAL; + } + mutex_unlock(&vb->balloon_lock); + if (unlikely(err)) { + dev_err(&vb->vdev->dev, + "Failed to send working set report err = %d\n", err); + } else { + virtqueue_kick(vb->working_set_vq); + } +} +#endif /* CONFIG_WSR */ + static void update_balloon_size_func(struct work_struct *work) { struct virtio_balloon *vb; @@ -508,6 +733,10 @@ static int init_vqs(struct virtio_balloon *vb) callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL; names[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL; names[VIRTIO_BALLOON_VQ_REPORTING] = NULL; + callbacks[VIRTIO_BALLOON_VQ_WORKING_SET] = NULL; + names[VIRTIO_BALLOON_VQ_WORKING_SET] = NULL; + callbacks[VIRTIO_BALLOON_VQ_NOTIFY] = NULL; + names[VIRTIO_BALLOON_VQ_NOTIFY] = NULL; if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) { names[VIRTIO_BALLOON_VQ_STATS] = "stats"; @@ -524,6 +753,15 @@ static int init_vqs(struct virtio_balloon *vb) callbacks[VIRTIO_BALLOON_VQ_REPORTING] = balloon_ack; } +#ifdef CONFIG_WSR + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_WS_REPORTING)) { + names[VIRTIO_BALLOON_VQ_WORKING_SET] = "ws"; + callbacks[VIRTIO_BALLOON_VQ_WORKING_SET] = NULL; + names[VIRTIO_BALLOON_VQ_NOTIFY] = "notify"; + callbacks[VIRTIO_BALLOON_VQ_NOTIFY] = notification_receive; + } +#endif + err = virtio_find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX, vqs, callbacks, names, NULL); if (err) @@ -534,6 +772,7 @@ static int init_vqs(struct virtio_balloon *vb) if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) { struct scatterlist sg; unsigned int num_stats; + vb->stats_vq = vqs[VIRTIO_BALLOON_VQ_STATS]; /* @@ -553,6 +792,25 @@ static int init_vqs(struct virtio_balloon *vb) virtqueue_kick(vb->stats_vq); } +#ifdef CONFIG_WSR + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_WS_REPORTING)) { + struct scatterlist sg; + + vb->working_set_vq = vqs[VIRTIO_BALLOON_VQ_WORKING_SET]; + vb->notification_vq = vqs[VIRTIO_BALLOON_VQ_NOTIFY]; + + /* Prime the notification virtqueue for the device to fill.*/ + sg_init_one(&sg, vb->notification_buf, vb->notification_size); + err = virtqueue_add_inbuf(vb->notification_vq, &sg, 1, vb, GFP_KERNEL); + if (unlikely(err)) { + dev_err(&vb->vdev->dev, + "Failed to prepare notifications, err = %d\n", err); + } else { + virtqueue_kick(vb->notification_vq); + } + } +#endif + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) vb->free_page_vq = vqs[VIRTIO_BALLOON_VQ_FREE_PAGE]; @@ -878,6 +1136,10 @@ static int virtballoon_probe(struct virtio_device *vdev) INIT_WORK(&vb->update_balloon_stats_work, update_balloon_stats_func); INIT_WORK(&vb->update_balloon_size_work, update_balloon_size_func); +#ifdef CONFIG_WSR + INIT_WORK(&vb->update_balloon_working_set_work, update_balloon_ws_func); + INIT_WORK(&vb->update_balloon_notification_work, update_balloon_notification_func); +#endif spin_lock_init(&vb->stop_update_lock); mutex_init(&vb->balloon_lock); init_waitqueue_head(&vb->acked); @@ -885,6 +1147,23 @@ static int virtballoon_probe(struct virtio_device *vdev) balloon_devinfo_init(&vb->vb_dev_info); +#ifdef CONFIG_WSR + if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_WS_REPORTING)) { + virtio_cread_le(vdev, struct virtio_balloon_config, working_set_num_bins, + &vb->working_set_num_bins); + dev_err(&vb->vdev->dev, "in probe , num bins: %d ", vb->working_set_num_bins); + /* Allocate space for a Working Set report. */ + vb->working_set = kcalloc(vb->working_set_num_bins, + sizeof(struct virtio_balloon_working_set), GFP_KERNEL); + /* Allocate space for host notifications. */ + vb->notification_size = + sizeof(uint16_t) + + sizeof(uint64_t) * (vb->working_set_num_bins + 1); + vb->notification_buf = kzalloc(vb->notification_size, GFP_KERNEL); + reset_working_set(vb); + } +#endif + err = init_vqs(vb); if (err) goto out_free_vb; @@ -1034,11 +1313,19 @@ static void virtballoon_remove(struct virtio_device *vdev) unregister_oom_notifier(&vb->oom_nb); if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) virtio_balloon_unregister_shrinker(vb); +#ifdef CONFIG_WSR + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_WS_REPORTING)) + unregister_working_set_receiver(vb); +#endif spin_lock_irq(&vb->stop_update_lock); vb->stop_update = true; spin_unlock_irq(&vb->stop_update_lock); cancel_work_sync(&vb->update_balloon_size_work); cancel_work_sync(&vb->update_balloon_stats_work); +#ifdef CONFIG_WSR + cancel_work_sync(&vb->update_balloon_working_set_work); + cancel_work_sync(&vb->update_balloon_notification_work); +#endif if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) { cancel_work_sync(&vb->report_free_page_work); @@ -1104,6 +1391,7 @@ static unsigned int features[] = { VIRTIO_BALLOON_F_FREE_PAGE_HINT, VIRTIO_BALLOON_F_PAGE_POISON, VIRTIO_BALLOON_F_REPORTING, + VIRTIO_BALLOON_F_WS_REPORTING, }; static struct virtio_driver virtio_balloon_driver = { diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h index 5ca2d56996201..7bbf5281d84d3 100644 --- a/include/linux/balloon_compaction.h +++ b/include/linux/balloon_compaction.h @@ -43,6 +43,7 @@ #include #include #include +#include /* * Balloon device information descriptor. @@ -67,6 +68,8 @@ extern size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info, struct list_head *pages); extern size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info, struct list_head *pages, size_t n_req_pages); +extern void working_set_notify(void *ws_receiver, struct ws_bin *bins, + int node_id); static inline void balloon_devinfo_init(struct balloon_dev_info *balloon) { diff --git a/include/linux/wsr.h b/include/linux/wsr.h index 68246734679cd..671ca5426254d 100644 --- a/include/linux/wsr.h +++ b/include/linux/wsr.h @@ -53,6 +53,16 @@ void wsr_refresh(struct wsr *wsr, struct mem_cgroup *root, struct pglist_data *pgdat); void report_reaccess(struct lruvec *lruvec, struct lru_gen_mm_walk *walk); void report_ws(struct pglist_data *pgdat, struct scan_control *sc); +/* + * Function to send the working set report to a receiver (e.g. the balloon driver) + * TODO: Replace with a proper registration interface, similar to shrinkers. + */ +int register_working_set_receiver(void *receiver, struct pglist_data *pgdat, + unsigned long *intervals, unsigned long nr_bins, + unsigned long report_threshold, + unsigned long refresh_threshold); +void unregister_working_set_receiver(void *receiver); +bool working_set_request(struct pglist_data *pgdat); #else struct ws_bin; struct wsr; @@ -84,6 +94,21 @@ static inline void report_reaccess(struct lruvec *lruvec, struct lru_gen_mm_walk static inline void report_ws(struct pglist_data *pgdat, struct scan_control *sc) { } -#endif /* CONFIG_WSR */ +static inline int +register_working_set_receiver(void *receiver, struct pglist_data *pgdat, + unsigned long *intervals, unsigned long nr_bins, + unsigned long report_threshold, + unsigned long refresh_threshold) +{ + return -EINVAL; +} +static inline void unregister_working_set_receiver(void *receiver) +{ +} +static inline bool working_set_request(struct pglist_data *pgdat) +{ + return false; +} +#endif /* CONFIG_WSR */ -#endif /* _LINUX_WSR_H */ +#endif /* _LINUX_WSR_H */ diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h index ddaa45e723c4c..a682d917daca1 100644 --- a/include/uapi/linux/virtio_balloon.h +++ b/include/uapi/linux/virtio_balloon.h @@ -37,6 +37,7 @@ #define VIRTIO_BALLOON_F_FREE_PAGE_HINT 3 /* VQ to report free pages */ #define VIRTIO_BALLOON_F_PAGE_POISON 4 /* Guest is using page poisoning */ #define VIRTIO_BALLOON_F_REPORTING 5 /* Page reporting virtqueue */ +#define VIRTIO_BALLOON_F_WS_REPORTING 6 /* Working Set Size reporting */ /* Size of a PFN in the balloon interface. */ #define VIRTIO_BALLOON_PFN_SHIFT 12 @@ -59,6 +60,9 @@ struct virtio_balloon_config { }; /* Stores PAGE_POISON if page poisoning is in use */ __le32 poison_val; + /* Number of bins for Working Set report if in use. */ + __u8 working_set_num_bins; + __u8 padding[3]; }; #define VIRTIO_BALLOON_S_SWAP_IN 0 /* Amount of memory swapped in */ @@ -116,4 +120,33 @@ struct virtio_balloon_stat { __virtio64 val; } __attribute__((packed)); +/* Enumerate all possible message types from the device. */ +enum virtio_balloon_working_set_op { + VIRTIO_BALLOON_WS_REQUEST = 1, + VIRTIO_BALLOON_WS_CONFIG = 2, +}; + +/* The metadata values for Working Set Reports. */ +enum virtio_balloon_working_set_tags { + /* Memory is reclaimable by guest */ + VIRTIO_BALLOON_WS_RECLAIMABLE = 0, + /* Memory can only be discarded by guest */ + VIRTIO_BALLOON_WS_DISCARDABLE = 1, +}; + +/* + * Working Set Report structure. + */ +struct virtio_balloon_working_set { + /* A tag for additional metadata. */ + __le16 tag; + /* the NUMA node for this report. */ + __le16 node_id; + uint8_t reserved[4]; + /* The idle age (in ms) of this bin of memory */ + __virtio64 idle_age_ms; + /* A bin each for anonymous and file-backed memory. */ + __le64 memory_size_bytes[2]; +}; + #endif /* _LINUX_VIRTIO_BALLOON_H */ diff --git a/mm/vmscan.c b/mm/vmscan.c index bc8c026ceef0d..c89728f8f61ba 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -5911,6 +5911,57 @@ late_initcall(init_lru_gen); ******************************************************************************/ #ifdef CONFIG_WSR +static void *wsr_receiver; + +/* + * Register/unregister a receiver of working set notifications + * TODO: Replace with a proper registration interface, similar to shrinkers. + */ +int register_working_set_receiver(void *receiver, struct pglist_data *pgdat, + unsigned long *intervals, + unsigned long nr_bins, + unsigned long refresh_threshold, + unsigned long report_threshold) +{ + struct wsr *wsr; + struct ws_bin *bins; + int i; + + wsr_receiver = receiver; + + if (!pgdat) + return 0; + + if (!intervals || !nr_bins) + return 0; + + bins = kzalloc(sizeof(wsr->bins), GFP_KERNEL); + if (!bins) + return -ENOMEM; + + for (i = 0; i < nr_bins - 1; i++) { + bins[i].idle_age = msecs_to_jiffies(*intervals); + intervals++; + } + bins[i].idle_age = -1; + + wsr = lruvec_wsr(mem_cgroup_lruvec(NULL, pgdat)); + + mutex_lock(&wsr->bins_lock); + memcpy(wsr->bins, bins, sizeof(wsr->bins)); + WRITE_ONCE(wsr->refresh_threshold, msecs_to_jiffies(refresh_threshold)); + WRITE_ONCE(wsr->report_threshold, msecs_to_jiffies(report_threshold)); + mutex_unlock(&wsr->bins_lock); + return 0; +} +EXPORT_SYMBOL(register_working_set_receiver); + +void unregister_working_set_receiver(void *receiver) +{ + wsr_receiver = NULL; +} +EXPORT_SYMBOL(unregister_working_set_receiver); + void wsr_refresh(struct wsr *wsr, struct mem_cgroup *root, struct pglist_data *pgdat) { @@ -5967,6 +6018,16 @@ void report_ws(struct pglist_data *pgdat, struct scan_control *sc) refresh_wsr(wsr, memcg, pgdat, sc, 0); WRITE_ONCE(wsr->timestamp, jiffies); + /* Balloon driver subscribes to global memory reclaim. + * This requires CONFIG_VIRTIO_BALLOON=y, not m, because + * it's calling a function defined in virtio_balloon.c. + * This is a hack to have balloon notifications work in a + * proof of concept, and aproper notification registration + * interface is on the TODO list. + */ + if (!cgroup_reclaim(sc) && wsr_receiver) + working_set_notify(wsr_receiver, wsr->bins, pgdat->node_id); + mutex_unlock(&wsr->bins_lock); if (wsr->notifier) @@ -5974,6 +6035,51 @@ void report_ws(struct pglist_data *pgdat, struct scan_control *sc) if (memcg && cmpxchg(&memcg->wsr_event, 0, 1) == 0) wake_up_interruptible(&memcg->wsr_wait_queue); } + +/* TODO: Replace with a proper registration interface, similar to shrinkers. */ +bool working_set_request(struct pglist_data *pgdat) +{ + unsigned int flags; + struct scan_control sc = { + .may_writepage = true, + .may_unmap = true, + .may_swap = true, + .reclaim_idx = MAX_NR_ZONES - 1, + .gfp_mask = GFP_KERNEL, + }; + struct wsr *wsr; + + if (!wsr_receiver) + return false; + + wsr = lruvec_wsr(mem_cgroup_lruvec(NULL, pgdat)); + + if (!mutex_trylock(&wsr->bins_lock)) + return false; + + if (wsr->bins->idle_age != -1) { + unsigned long timestamp = READ_ONCE(wsr->timestamp); + unsigned long threshold = READ_ONCE(wsr->refresh_threshold); + + if (time_is_before_jiffies(timestamp + threshold)) { + // We might need to refresh the report. + set_task_reclaim_state(current, &sc.reclaim_state); + flags = memalloc_noreclaim_save(); + refresh_wsr(wsr, NULL, pgdat, &sc, threshold); + memalloc_noreclaim_restore(flags); + set_task_reclaim_state(current, NULL); + } + } + + if (wsr_receiver) + working_set_notify(wsr_receiver, wsr->bins, pgdat->node_id); + + mutex_unlock(&wsr->bins_lock); + return true; + +} +EXPORT_SYMBOL(working_set_request); + #endif /* CONFIG_WSR */ #else /* !CONFIG_LRU_GEN */