From patchwork Wed Sep 20 19:02:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 142723 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp4601155vqi; Wed, 20 Sep 2023 21:27:00 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGsBE69jX6JUMI+aJCKLK5pOHG+GIZl9rAfsGUJWSdzWRbNbj7pozL50UxMkAThBUkscf9G X-Received: by 2002:a05:6a21:338d:b0:134:a478:5e4a with SMTP id yy13-20020a056a21338d00b00134a4785e4amr5293099pzb.17.1695270419888; Wed, 20 Sep 2023 21:26:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695270419; cv=none; d=google.com; s=arc-20160816; b=hjV1kFIaInOzSNBtGK2OExJkilqgvegJoT4WWma5ZvtNnGsgA8XEfM2AT4RUbzueps QnAK/tWE7Xlqetb5z92J2HzR4vLI7edlXWkHbZCfS9JwkEnlXQXBGPoX+l4hYXss8Kix fOXmQr3w+7/Q4+kUk3E4z8MyOQGkgqAqmRWYs7SmyFbcPnMQNpFz7IIP6rY9VM9K515G paT26/M5P04EqpOT1ARD13r1i56faIMP8Ai9WLQ+RaaKrNG0BnzsrqdMqIScCEkcnJhY gLrihET066XvqaAH76Dxppa/ep092+NRVcgYMefv+sCLWgTJfFJI60uwy0XUIjCPgCSP Pj7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version:reply-to :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ouvC9AOz3I+aujxy3BgYZwvGJPKjc3/8uuxzUuwqOFg=; fh=E4Dx6FsRR0bUU1V9Gy1LgJPpghBs+NaR6FSgB1AfVlI=; b=nUyvEMYHnH2dssywMkoi8rOg/Bm0henZB/EFL2SJbsTY4/ZLqIO0Yw4QjVZz3mRCwc GTZWkKoc4ZNBiYL11yehxJy4gPaHK9rZRadUfiW7M2Dq0yUtt209q2Cb65mZm16oE2Cd jUdPa/iiH+GyqvumnroHwfPhflF6qp39HHL5fr7Jy8Ai8S9+xgdURsxQVAAXZ/zBu6Hu i0mE8cFWsh9kLXEyzvsntB9sd6TOR/Mci7S53jGJjjfchjN5+sKnk7Ov/wmHlgGZNRun isK2bHE28x/JflyW/Zj4I5AjBRJKPG93nfNKIfXCkOLhkPRAxQ0vQqtb9P5vj4At3bnu 5hlQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=nkUotcJq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from pete.vger.email (pete.vger.email. [2620:137:e000::3:6]) by mx.google.com with ESMTPS id r13-20020a17090b050d00b00276d4b6009asi1910597pjz.135.2023.09.20.21.26.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Sep 2023 21:26:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) client-ip=2620:137:e000::3:6; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=nkUotcJq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id EF63F81A7BCE; Wed, 20 Sep 2023 12:03:55 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229861AbjITTDP (ORCPT + 27 others); Wed, 20 Sep 2023 15:03:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42632 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229832AbjITTDL (ORCPT ); Wed, 20 Sep 2023 15:03:11 -0400 Received: from mail-pf1-x42e.google.com (mail-pf1-x42e.google.com [IPv6:2607:f8b0:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A83CBCA for ; Wed, 20 Sep 2023 12:03:04 -0700 (PDT) Received: by mail-pf1-x42e.google.com with SMTP id d2e1a72fcca58-690d2e13074so115082b3a.1 for ; Wed, 20 Sep 2023 12:03:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1695236584; x=1695841384; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=ouvC9AOz3I+aujxy3BgYZwvGJPKjc3/8uuxzUuwqOFg=; b=nkUotcJqL+cOwV30t5+g9WT4P9W2YD2QxJHzy3MjAVmT+18LzIFonEicCtjzGrLui9 DSQK5WGoX8/znIgiZB/gPpyYZdTD+H1wyygdL+NMK/3fmnUTgrmlvk/eT0ovj5HY7BFN 984M0q7Sh20JVhI6Quj+LqO/CWBmmP2H9Nt8JkuoXlWteb9+hYKcE+1p35GRUNjL4G5m BV0s2p6g9nlLAiFPJUrvEaARJsrjgPq0Kn5MgLZrbhsRd4393EHECT7aRSwknrFxFmMF FE6STDXzPo7IiyynXg6c6WIWplQKyOwS07DeSN574Yhz8Q2dKu4MWextM/scz2Z187f7 05LA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695236584; x=1695841384; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=ouvC9AOz3I+aujxy3BgYZwvGJPKjc3/8uuxzUuwqOFg=; b=XQSn0ug4avlutT036Sm1DNIxiHPuK+qVPfA7Q7MatXW1xAjIOsLCPoqLI+nEJfDvgw I4Ik+O9GJjpuaRefRmCOpUcpYz77TBdbEDPJzw48VMScHqbIS7Nat4A97TE2kFEZo8oO LYAmG71IWriUUP/EM0FhuGV/G25ClEF+q92rwls3oZnNCHhpEGo/MZEboNki33eR5ofO qe596a5e2bB4VtBBOaet3QxAPU4VVbx5tBD4BQZAeJVQ2ajYoI7R3zRnTGpaighvhlqJ 16KYGn9HDvXVCvc0LMqqB+lrnboFjPa7JPjr7ILcg8bie1n/ctYcUIBMX1E7+mboZDol f4bg== X-Gm-Message-State: AOJu0Yy28iC7tX9Nd7J2PDhzB9MvWj9hAgRQZetGbmr4UotWHOSDimCC A9YgBYiHbA6fzOa+De9AjCw= X-Received: by 2002:a05:6a20:8402:b0:153:78c1:c40f with SMTP id c2-20020a056a20840200b0015378c1c40fmr4054957pzd.15.1695236583894; Wed, 20 Sep 2023 12:03:03 -0700 (PDT) Received: from KASONG-MB2.tencent.com ([124.127.145.18]) by smtp.gmail.com with ESMTPSA id m5-20020aa78a05000000b006871fdde2c7sm423935pfa.110.2023.09.20.12.03.00 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 20 Sep 2023 12:03:03 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Yu Zhao , Roman Gushchin , Johannes Weiner , Michal Hocko , Hugh Dickins , Nhat Pham , Yuanchu Xie , Kalesh Singh , Suren Baghdasaryan , "T . J . Mercier" , linux-kernel@vger.kernel.org, Kairui Song Subject: [RFC PATCH v3 1/6] workingset: simplify and use a more intuitive model Date: Thu, 21 Sep 2023 03:02:39 +0800 Message-ID: <20230920190244.16839-2-ryncsn@gmail.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230920190244.16839-1-ryncsn@gmail.com> References: <20230920190244.16839-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Wed, 20 Sep 2023 12:03:56 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777619875678897940 X-GMAIL-MSGID: 1777619875678897940 From: Kairui Song This basically removed workingset_activation and reduced calls to workingset_age_nonresident. The idea behind this change is a new way to calculate the refault distance and prepare for adapting refault distance based re-activation for multi-gen LRU. Currently, refault distance re-activation is based on two assumptions: 1. Activation of an inactive page will left-shift LRU pages (considering LRU starts from right). 2. Eviction of an inactive page will left-shift LRU pages. Assumption 2 is correct, but assumption 1 is not always true, an activated page could be anywhere in the LRU list (through mark_page_accessed), it only left-shift the pages on its right. And besides, one page can get activate/deactivated for multiple times. And multi-gen LRU doesn't fit with this model well, pages are getting aged and activated constantly as the generation sliding window slides. So instead we introduce a simpler idea here: Just presume the evicted pages are still in memory, each has an eviction sequence like before. Let the `nonresistence_age` still be NA and get increased for each eviction, so we get a "Shadow LRU" here of one evicted page: Let SP = ((NA's reading @ current) - (NA's reading @ eviction)) +-memory available to cache-+ | | +-------------------------+===============+===========+ | * shadows O O O | INACTIVE | ACTIVE | +-+-----------------------+===============+===========+ | | +-----------------------+ | SP fault page O -> Hole left by previously faulted in pages * -> The page corresponding to SP It can be easily seen that SP stands for how far the current workflow could push a page out of available memory. Since all evicted page was once head of INACTIVE list, the page could have such an access distance: SP + NR_INACTIVE It *may* get re-activated before getting evicted again if: SP + NR_INACTIVE < NR_INACTIVE + NR_ACTIVE Which can be simplified to: SP < NR_ACTIVE Then the page is worth getting re-activated to start from ACTIVE part, since the access distance is shorter than the total memory to make it stay. And since this is only an estimation, based on several hypotheses, and it could break the ability of LRU to distinguish a workingset out of caches, so throttle this by two factors: 1. Notice previously re-faulted in pages may leave "holes" on the shadow part of LRU, that part is left unhandled on purpose to decrease re-activate rate for pages that have a large SP value (the larger SP value a page has, the more likely it will be affected by such holes). 2. When the ACTIVE part of LRU is long enough, chanllaging ACTIVE pages by re-activating a one-time faulted previously INACTIVE page may not be a good idea, so throttle the re-activation when ACTIVE > INACTIVE by comparing with INACTIVE instead. Another effect of the refault activation worth noticing is that, by throttling reactivation when ACTIVE part is high, this refault distance based re-activation can help hold a portion of the caches in memory instead of letting cached get evicted permutably when the cache size is larger than total memory, and hotness is similar among all cache pages. That's because the established workingset (ACTIVE part) will tend to stay since we throttled reactivation, until the workingset itself start to stall. This is actually similar with the algoritm before, which introduce such effect by increasing nonresistence_age in many call paths, trottled the reactivation when activition/reactivation is massively happenning. Combined all above, we have: Upon refault, if any of following conditions is met, mark page as active: - If ACTIVE LRU is low (NR_ACTIVE < NR_INACTIVE), check if: SP < NR_ACTIVE - If ACTIVE LRU is high (NR_ACTIVE >= NR_INACTIVE), check if: SP < NR_INACTIVE Code-wise, this is simpler than before since no longer need to do lruvec statistic update when activating a page, and so far, a few benchmarks shows a similar or better result. And when combined with multi-gen LRU (in later commits) it shows a measurable performance gain for some workloads. Using memtier and fio test from commit ac35a4902374 but scaled down to fit in my test environment, and some other test results: memtier test (with 16G ramdisk as swap and 4G memcg limit on an i7-9700): memcached -u nobody -m 16384 -s /tmp/memcached.socket \ -a 0766 -t 12 -B binary & memtier_benchmark -S /tmp/memcached.socket -P memcache_binary -n allkeys\ --key-minimum=1 --key-maximum=32000000 --key-pattern=P:P -c 1 \ -t 12 --ratio 1:0 --pipeline 8 -d 2000 -x 6 fio test 1 (with 16G ramdisk on 28G VM on an i7-9700): fio -name=refault --numjobs=12 --directory=/mnt --size=1024m \ --buffered=1 --ioengine=io_uring --iodepth=128 \ --iodepth_batch_submit=32 --iodepth_batch_complete=32 \ --rw=randread --random_distribution=random --norandommap \ --time_based --ramp_time=5m --runtime=5m --group_reporting fio test 2 (with 16G ramdisk on 28G VM on an i7-9700): fio -name=mglru --numjobs=10 --directory=/mnt --size=1536m \ --buffered=1 --ioengine=io_uring --iodepth=128 \ --iodepth_batch_submit=32 --iodepth_batch_complete=32 \ --rw=randread --random_distribution=zipf:1.2 --norandommap \ --time_based --ramp_time=10m --runtime=5m --group_reporting mysql (using oltp_read_only from sysbench, with 12G of buffer pool in a 10G memcg): sysbench /usr/share/sysbench/oltp_read_only.lua \ --tables=36 --table-size=2000000 --threads=12 --time=1800 kernel build test done with 3G memcg limit on an i7-9700. Before (Average of 6 test run): fio: IOPS=5125.5k fio2: IOPS=7291.16k memcached: 57600.926 ops/s mysql: 6491.5 tps kernel-build: 1817.13499 seconds After (Average of 6 test run): fio: IOPS=5137.5k fio2: IOPS=7300.67k memcached: 57878.422 ops/s mysql: 6491.1 tps kernel-build: 1813.66231 seconds Signed-off-by: Kairui Song --- include/linux/swap.h | 2 - mm/swap.c | 1 - mm/vmscan.c | 2 - mm/workingset.c | 155 ++++++++++++++++++------------------------- 4 files changed, 64 insertions(+), 96 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 493487ed7c38..ca51d79842b7 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -344,10 +344,8 @@ static inline swp_entry_t page_swap_entry(struct page *page) /* linux/mm/workingset.c */ bool workingset_test_recent(void *shadow, bool file, bool *workingset); -void workingset_age_nonresident(struct lruvec *lruvec, unsigned long nr_pages); void *workingset_eviction(struct folio *folio, struct mem_cgroup *target_memcg); void workingset_refault(struct folio *folio, void *shadow); -void workingset_activation(struct folio *folio); /* Only track the nodes of mappings with shadow entries */ void workingset_update_node(struct xa_node *node); diff --git a/mm/swap.c b/mm/swap.c index cd8f0150ba3a..685b446fd4f9 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -482,7 +482,6 @@ void folio_mark_accessed(struct folio *folio) else __lru_cache_activate_folio(folio); folio_clear_referenced(folio); - workingset_activation(folio); } if (folio_test_idle(folio)) folio_clear_idle(folio); diff --git a/mm/vmscan.c b/mm/vmscan.c index 6f13394b112e..3f4de75e5186 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2539,8 +2539,6 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec, lruvec_add_folio(lruvec, folio); nr_pages = folio_nr_pages(folio); nr_moved += nr_pages; - if (folio_test_active(folio)) - workingset_age_nonresident(lruvec, nr_pages); } /* diff --git a/mm/workingset.c b/mm/workingset.c index da58a26d0d4d..8613945fc66e 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -64,74 +64,64 @@ * thrashing on the inactive list, after which refaulting pages can be * activated optimistically to compete with the existing active pages. * - * Approximating inactive page access frequency - Observations: + * For such approximation, we introduce a counter `nonresistence_age` (NA) + * here. This counter increases each time a page is evicted, and each evicted + * page will have a shadow that stores the counter reading at the eviction + * time as a timestamp. So when an evicted page was faulted again, we have: * - * 1. When a page is accessed for the first time, it is added to the - * head of the inactive list, slides every existing inactive page - * towards the tail by one slot, and pushes the current tail page - * out of memory. + * Let SP = ((NA's reading @ current) - (NA's reading @ eviction)) * - * 2. When a page is accessed for the second time, it is promoted to - * the active list, shrinking the inactive list by one slot. This - * also slides all inactive pages that were faulted into the cache - * more recently than the activated page towards the tail of the - * inactive list. + * +-memory available to cache-+ + * | | + * +-------------------------+===============+===========+ + * | * shadows O O O | INACTIVE | ACTIVE | + * +-+-----------------------+===============+===========+ + * | | + * +-----------------------+ + * | SP + * fault page O -> Hole left by previously faulted in pages + * * -> The page corresponding to SP * - * Thus: + * Here SP can stands for how far the current workflow could push a page + * out of available memory. Since all evicted page was once head of + * INACTIVE list, the page could have such an access distance of: * - * 1. The sum of evictions and activations between any two points in - * time indicate the minimum number of inactive pages accessed in - * between. + * SP + NR_INACTIVE * - * 2. Moving one inactive page N page slots towards the tail of the - * list requires at least N inactive page accesses. + * So if: * - * Combining these: + * SP + NR_INACTIVE < NR_INACTIVE + NR_ACTIVE * - * 1. When a page is finally evicted from memory, the number of - * inactive pages accessed while the page was in cache is at least - * the number of page slots on the inactive list. + * Which can be simplified to: * - * 2. In addition, measuring the sum of evictions and activations (E) - * at the time of a page's eviction, and comparing it to another - * reading (R) at the time the page faults back into memory tells - * the minimum number of accesses while the page was not cached. - * This is called the refault distance. + * SP < NR_ACTIVE * - * Because the first access of the page was the fault and the second - * access the refault, we combine the in-cache distance with the - * out-of-cache distance to get the complete minimum access distance - * of this page: + * Then the page is worth getting re-activated to start from ACTIVE part, + * since the access distance is shorter than total memory to make it stay. * - * NR_inactive + (R - E) + * And since this is only an estimation, based on several hypotheses, and + * it could break the ability of LRU to distinguish a workingset out of + * caches, so throttle this by two factors: * - * And knowing the minimum access distance of a page, we can easily - * tell if the page would be able to stay in cache assuming all page - * slots in the cache were available: + * 1. Notice that re-faulted in pages may leave "holes" on the shadow + * part of LRU, that part is left unhandled on purpose to decrease + * re-activate rate for pages that have a large SP value (the larger + * SP value a page have, the more likely it will be affected by such + * holes). + * 2. When the ACTIVE part of LRU is long enough, challenging ACTIVE pages + * by re-activating a one-time faulted previously INACTIVE page may not + * be a good idea, so throttle the re-activation when ACTIVE > INACTIVE + * by comparing with INACTIVE instead. * - * NR_inactive + (R - E) <= NR_inactive + NR_active + * Combined all above, we have: + * Upon refault, if any of the following conditions is met, mark the page + * as active: * - * If we have swap we should consider about NR_inactive_anon and - * NR_active_anon, so for page cache and anonymous respectively: - * - * NR_inactive_file + (R - E) <= NR_inactive_file + NR_active_file - * + NR_inactive_anon + NR_active_anon - * - * NR_inactive_anon + (R - E) <= NR_inactive_anon + NR_active_anon - * + NR_inactive_file + NR_active_file - * - * Which can be further simplified to: - * - * (R - E) <= NR_active_file + NR_inactive_anon + NR_active_anon - * - * (R - E) <= NR_active_anon + NR_inactive_file + NR_active_file - * - * Put into words, the refault distance (out-of-cache) can be seen as - * a deficit in inactive list space (in-cache). If the inactive list - * had (R - E) more page slots, the page would not have been evicted - * in between accesses, but activated instead. And on a full system, - * the only thing eating into inactive list space is active pages. + * - If ACTIVE LRU is low (NR_ACTIVE < NR_INACTIVE), check if: + * SP < NR_ACTIVE * + * - If ACTIVE LRU is high (NR_ACTIVE >= NR_INACTIVE), check if: + * SP < NR_INACTIVE * * Refaulting inactive pages * @@ -419,8 +409,10 @@ bool workingset_test_recent(void *shadow, bool file, bool *workingset) struct mem_cgroup *eviction_memcg; struct lruvec *eviction_lruvec; unsigned long refault_distance; - unsigned long workingset_size; + unsigned long inactive_file; + unsigned long inactive_anon; unsigned long refault; + unsigned long active; int memcgid; struct pglist_data *pgdat; unsigned long eviction; @@ -479,21 +471,27 @@ bool workingset_test_recent(void *shadow, bool file, bool *workingset) * workingset competition needs to consider anon or not depends * on having free swap space. */ - workingset_size = lruvec_page_state(eviction_lruvec, NR_ACTIVE_FILE); - if (!file) { - workingset_size += lruvec_page_state(eviction_lruvec, - NR_INACTIVE_FILE); - } + active = lruvec_page_state(eviction_lruvec, NR_ACTIVE_FILE); + inactive_file = lruvec_page_state(eviction_lruvec, NR_INACTIVE_FILE); + if (mem_cgroup_get_nr_swap_pages(eviction_memcg) > 0) { - workingset_size += lruvec_page_state(eviction_lruvec, + active += lruvec_page_state(eviction_lruvec, NR_ACTIVE_ANON); - if (file) { - workingset_size += lruvec_page_state(eviction_lruvec, - NR_INACTIVE_ANON); - } + inactive_anon = lruvec_page_state(eviction_lruvec, + NR_INACTIVE_ANON); + } else { + inactive_anon = 0; } - return refault_distance <= workingset_size; + /* + * When there are already enough active pages, be less aggressive + * on reactivating pages, challenge an large set of established + * active pages with one time refaulted page may not be a good idea. + */ + if (active >= inactive_anon + inactive_file) + return refault_distance < inactive_anon + inactive_file; + else + return refault_distance < active + (file ? inactive_anon : inactive_file); } /** @@ -543,7 +541,6 @@ void workingset_refault(struct folio *folio, void *shadow) goto out; folio_set_active(folio); - workingset_age_nonresident(lruvec, nr); mod_lruvec_state(lruvec, WORKINGSET_ACTIVATE_BASE + file, nr); /* Folio was active prior to eviction */ @@ -560,30 +557,6 @@ void workingset_refault(struct folio *folio, void *shadow) rcu_read_unlock(); } -/** - * workingset_activation - note a page activation - * @folio: Folio that is being activated. - */ -void workingset_activation(struct folio *folio) -{ - struct mem_cgroup *memcg; - - rcu_read_lock(); - /* - * Filter non-memcg pages here, e.g. unmap can call - * mark_page_accessed() on VDSO pages. - * - * XXX: See workingset_refault() - this should return - * root_mem_cgroup even for !CONFIG_MEMCG. - */ - memcg = folio_memcg_rcu(folio); - if (!mem_cgroup_disabled() && !memcg) - goto out; - workingset_age_nonresident(folio_lruvec(folio), folio_nr_pages(folio)); -out: - rcu_read_unlock(); -} - /* * Shadow entries reflect the share of the working set that does not * fit into memory, so their number depends on the access pattern of From patchwork Wed Sep 20 19:02:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 142666 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp4525974vqi; Wed, 20 Sep 2023 17:46:53 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGJgKJzCmgzkhHgK1Nuex6R7aGpwEwBXt5n4bntVcjhm0nQpfXT2kuHNHmzTO5dVSFwTbHI X-Received: by 2002:a17:903:234a:b0:1c3:3b5c:1fbf with SMTP id c10-20020a170903234a00b001c33b5c1fbfmr5333139plh.10.1695257212883; Wed, 20 Sep 2023 17:46:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695257212; cv=none; d=google.com; s=arc-20160816; b=w62WLYVlK61ok3yTjqfS0XRP7bm0O0DUHtKSyxBeD9LfWq7d7gzplaU8VqQnpevdi9 Ed94n10LO4JemcUCCXBscDoQzxcaKcXXhE44v3dBmtokh5AwzIzjuUJRjk0tlKfUdU23 dvPBr5x60zxrKtiTeFQObOeqEyzL7HnVCnv4vJ519LiB26m1w9rhk0TbBt4lBaa0sI++ 35AuXQBMZMkUGIS6xRTIVNbXXrohyeN7H2G60/OWoJI9GEqkvJ07BXuZqvS1Mvqrui1n lzkRLaDMbDBnAogESBXf8xVR7gGZ5Us+c+bQjVHeY3x8fRn/OAmHQFXpOmpcmAXA99pt Ml9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version:reply-to :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=DHOkw8Z/lNeirjsVQWHKGHEmHDrpMBtO7DHMUSjWf4M=; fh=E4Dx6FsRR0bUU1V9Gy1LgJPpghBs+NaR6FSgB1AfVlI=; b=IvNyoJv+KYE9222KZbg0p0Il3ECDr2utUMarfL2U07E/zF/dze4mibBk0S2HtgeWYb nacHMQvzx4RdJYavsC8BXoKYa4pypov0ACZxbq71AcrY56YM1gnp+TYpSzRck83JaCJ2 vsCMW5CzLFWzl9ftnhIq9dH6X9WUvM4ztjlxJtzfam4cguwnvNTUcCUMe6WHhwU4gXzU 82ceyuLpUN0jTRLljHRGAUTUMobQpITls705v+cfsIfNtdOILTHE0aUfWHc2TIFZ9B6I rV2qj03i01ucBuda/S3wuCz4L+G0zlnCQxq7lIwBCxmD9GuFqDYpkoymn2TJeTvaGy7a L8xA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=UPaRRKZw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from howler.vger.email (howler.vger.email. [23.128.96.34]) by mx.google.com with ESMTPS id kj3-20020a17090306c300b001c5c34172e8si241794plb.281.2023.09.20.17.46.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Sep 2023 17:46:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) client-ip=23.128.96.34; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=UPaRRKZw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 9666280213A4; Wed, 20 Sep 2023 12:03:41 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229753AbjITTDX (ORCPT + 27 others); Wed, 20 Sep 2023 15:03:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55082 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229825AbjITTDS (ORCPT ); Wed, 20 Sep 2023 15:03:18 -0400 Received: from mail-pf1-x434.google.com (mail-pf1-x434.google.com [IPv6:2607:f8b0:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D631EA for ; Wed, 20 Sep 2023 12:03:08 -0700 (PDT) Received: by mail-pf1-x434.google.com with SMTP id d2e1a72fcca58-690d8fb3b7eso109320b3a.1 for ; Wed, 20 Sep 2023 12:03:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1695236588; x=1695841388; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=DHOkw8Z/lNeirjsVQWHKGHEmHDrpMBtO7DHMUSjWf4M=; b=UPaRRKZwY+hj/oE4Xcjnf+Bj2qnAre/roHBLcISkPc12jmPnYuDOSjtcu7qkQXYL1F eP555GMtU8GgWqpR7pcZJwIetiuVB5WCYxI1le2V4gcfDsNuqEcvYHKTddCIJyI8Dp6C X7bhvWTk12ZN+BTDl0fa6TQ2E2ywnRLo0U6wDn6LVQBTVimM+f7amCBVWSwmRWSOg0pT Ne4G8ANfLeXVNszEBSXJdVh1i3ttXMBAhz9stUgo9N0zTZ4XJe65Hn5wf/mUS5HF7EPP RTSLcHiX40GRH9u4nmartmIyU+n/lWoZgtN2f8YJhAgM2IXVbDtbednvxOLKIwsKp9wX Piow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695236588; x=1695841388; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=DHOkw8Z/lNeirjsVQWHKGHEmHDrpMBtO7DHMUSjWf4M=; b=g9YMadu3SM6p0RPlJD+y7U+4k9bW7AmBVl3zrr+JI85X+0MzFlTwkpFgCRFYJXsCDg Kdsf94JAIAG2ng09d07pcfV7CaokXh8Nxu0Eey61hs8D2WdcJDqNsB9B8W8l2dfP06Ln dlwcQCJYF9sZiqlCSVdRkaAwyVXhI1ZuLdHN9S1IHTH6IfAh3svoghfKyxvMbJJNgiae osV6yBrlqS9TM0xG52vpCHWyNBvYjkLAp9l4o6yKKg0sVZigjdfH4VqpFGTHen4TN8ng jhZolmOFq/yFX22iUyZN3yQ/P9KewTxSncILnNKGMx3nayb2yvYCjijaesNItyH5hBmo /jlQ== X-Gm-Message-State: AOJu0YzKzvPpwiR1SdL+dAkTjxwKcx64z3xBkmRGdqELx+cYI52qw7wU 8Vcb/XvBno0CwoQ1jrrDkoU= X-Received: by 2002:a05:6a20:8e1f:b0:137:23a2:2b3c with SMTP id y31-20020a056a208e1f00b0013723a22b3cmr3637857pzj.49.1695236587894; Wed, 20 Sep 2023 12:03:07 -0700 (PDT) Received: from KASONG-MB2.tencent.com ([124.127.145.18]) by smtp.gmail.com with ESMTPSA id m5-20020aa78a05000000b006871fdde2c7sm423935pfa.110.2023.09.20.12.03.04 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 20 Sep 2023 12:03:07 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Yu Zhao , Roman Gushchin , Johannes Weiner , Michal Hocko , Hugh Dickins , Nhat Pham , Yuanchu Xie , Kalesh Singh , Suren Baghdasaryan , "T . J . Mercier" , linux-kernel@vger.kernel.org, Kairui Song Subject: [RFC PATCH v3 2/6] workingset: move refault distance checking into to a helper Date: Thu, 21 Sep 2023 03:02:40 +0800 Message-ID: <20230920190244.16839-3-ryncsn@gmail.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230920190244.16839-1-ryncsn@gmail.com> References: <20230920190244.16839-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Wed, 20 Sep 2023 12:03:41 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777606026877049180 X-GMAIL-MSGID: 1777606026877049180 From: Kairui Song There isn't any feature change, just move the refault distance checking logic into a standalone helper so it can be reused later. Signed-off-by: Kairui Song --- mm/workingset.c | 137 ++++++++++++++++++++++++++++-------------------- 1 file changed, 79 insertions(+), 58 deletions(-) diff --git a/mm/workingset.c b/mm/workingset.c index 8613945fc66e..b0704cbfc667 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -170,9 +170,10 @@ */ #define WORKINGSET_SHIFT 1 -#define EVICTION_SHIFT ((BITS_PER_LONG - BITS_PER_XA_VALUE) + \ +#define EVICTION_SHIFT ((BITS_PER_LONG - BITS_PER_XA_VALUE) + \ WORKINGSET_SHIFT + NODES_SHIFT + \ MEM_CGROUP_ID_SHIFT) +#define EVICTION_BITS (BITS_PER_LONG - (EVICTION_SHIFT)) #define EVICTION_MASK (~0UL >> EVICTION_SHIFT) /* @@ -216,6 +217,79 @@ static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat, *workingsetp = workingset; } +/* + * Get the refault distance timestamp reading at eviction time. + */ +static inline unsigned long lru_eviction(struct lruvec *lruvec, + int bits, int bucket_order) +{ + unsigned long eviction = atomic_long_read(&lruvec->nonresident_age); + + eviction >>= bucket_order; + eviction &= ~0UL >> (BITS_PER_LONG - bits); + + return eviction; +} + +/* + * Calculate and test refault distance. + */ +static inline bool lru_test_refault(struct mem_cgroup *memcg, + struct lruvec *lruvec, + unsigned long eviction, bool file, + int bits, int bucket_order) +{ + unsigned long refault, distance; + unsigned long active, inactive_file, inactive_anon; + + eviction <<= bucket_order; + refault = atomic_long_read(&lruvec->nonresident_age); + + /* + * The unsigned subtraction here gives an accurate distance + * across nonresident_age overflows in most cases. There is a + * special case: usually, shadow entries have a short lifetime + * and are either refaulted or reclaimed along with the inode + * before they get too old. But it is not impossible for the + * nonresident_age to lap a shadow entry in the field, which + * can then result in a false small refault distance, leading + * to a false activation should this old entry actually + * refault again. However, earlier kernels used to deactivate + * unconditionally with *every* reclaim invocation for the + * longest time, so the occasional inappropriate activation + * leading to pressure on the active list is not a problem. + */ + distance = (refault - eviction) & (~0UL >> (BITS_PER_LONG - bits)); + + /* + * Compare the distance to the existing workingset size. We + * don't activate pages that couldn't stay resident even if + * all the memory was available to the workingset. Whether + * workingset competition needs to consider anon or not depends + * on having free swap space. + */ + active = lruvec_page_state(lruvec, NR_ACTIVE_FILE); + inactive_file = lruvec_page_state(lruvec, NR_INACTIVE_FILE); + + if (mem_cgroup_get_nr_swap_pages(memcg) > 0) { + active += lruvec_page_state(lruvec, NR_ACTIVE_ANON); + inactive_anon = lruvec_page_state(lruvec, NR_INACTIVE_ANON); + } else { + inactive_anon = 0; + } + + /* + * When there are already enough active pages, be less aggressive + * on reactivating pages, challenge an large set of established + * active pages with one time refaulted page may not be a good idea. + */ + if (active >= inactive_anon + inactive_file) + return distance < inactive_anon + inactive_file; + else + return distance < active + \ + (file ? inactive_anon : inactive_file); +} + #ifdef CONFIG_LRU_GEN static void *lru_gen_eviction(struct folio *folio) @@ -386,11 +460,10 @@ void *workingset_eviction(struct folio *folio, struct mem_cgroup *target_memcg) lruvec = mem_cgroup_lruvec(target_memcg, pgdat); /* XXX: target_memcg can be NULL, go through lruvec */ memcgid = mem_cgroup_id(lruvec_memcg(lruvec)); - eviction = atomic_long_read(&lruvec->nonresident_age); - eviction >>= bucket_order; + eviction = lru_eviction(lruvec, EVICTION_BITS, bucket_order); workingset_age_nonresident(lruvec, folio_nr_pages(folio)); return pack_shadow(memcgid, pgdat, eviction, - folio_test_workingset(folio)); + folio_test_workingset(folio)); } /** @@ -408,11 +481,6 @@ bool workingset_test_recent(void *shadow, bool file, bool *workingset) { struct mem_cgroup *eviction_memcg; struct lruvec *eviction_lruvec; - unsigned long refault_distance; - unsigned long inactive_file; - unsigned long inactive_anon; - unsigned long refault; - unsigned long active; int memcgid; struct pglist_data *pgdat; unsigned long eviction; @@ -421,7 +489,6 @@ bool workingset_test_recent(void *shadow, bool file, bool *workingset) return lru_gen_test_recent(shadow, file, &eviction_lruvec, &eviction, workingset); unpack_shadow(shadow, &memcgid, &pgdat, &eviction, workingset); - eviction <<= bucket_order; /* * Look up the memcg associated with the stored ID. It might @@ -442,56 +509,10 @@ bool workingset_test_recent(void *shadow, bool file, bool *workingset) eviction_memcg = mem_cgroup_from_id(memcgid); if (!mem_cgroup_disabled() && !eviction_memcg) return false; - eviction_lruvec = mem_cgroup_lruvec(eviction_memcg, pgdat); - refault = atomic_long_read(&eviction_lruvec->nonresident_age); - /* - * Calculate the refault distance - * - * The unsigned subtraction here gives an accurate distance - * across nonresident_age overflows in most cases. There is a - * special case: usually, shadow entries have a short lifetime - * and are either refaulted or reclaimed along with the inode - * before they get too old. But it is not impossible for the - * nonresident_age to lap a shadow entry in the field, which - * can then result in a false small refault distance, leading - * to a false activation should this old entry actually - * refault again. However, earlier kernels used to deactivate - * unconditionally with *every* reclaim invocation for the - * longest time, so the occasional inappropriate activation - * leading to pressure on the active list is not a problem. - */ - refault_distance = (refault - eviction) & EVICTION_MASK; - - /* - * Compare the distance to the existing workingset size. We - * don't activate pages that couldn't stay resident even if - * all the memory was available to the workingset. Whether - * workingset competition needs to consider anon or not depends - * on having free swap space. - */ - active = lruvec_page_state(eviction_lruvec, NR_ACTIVE_FILE); - inactive_file = lruvec_page_state(eviction_lruvec, NR_INACTIVE_FILE); - - if (mem_cgroup_get_nr_swap_pages(eviction_memcg) > 0) { - active += lruvec_page_state(eviction_lruvec, - NR_ACTIVE_ANON); - inactive_anon = lruvec_page_state(eviction_lruvec, - NR_INACTIVE_ANON); - } else { - inactive_anon = 0; - } - - /* - * When there are already enough active pages, be less aggressive - * on reactivating pages, challenge an large set of established - * active pages with one time refaulted page may not be a good idea. - */ - if (active >= inactive_anon + inactive_file) - return refault_distance < inactive_anon + inactive_file; - else - return refault_distance < active + (file ? inactive_anon : inactive_file); + return lru_test_refault(eviction_memcg, eviction_lruvec, eviction, + file, EVICTION_BITS, bucket_order); } /** From patchwork Wed Sep 20 19:02:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 142741 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp4639841vqi; Wed, 20 Sep 2023 23:17:36 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEzm2HcKQkHqiMl/3r883HzCWs8UR3Kzt1XnQ60PA5Trf8kKKVLQkENtr2rz/Ui7mV/OQxr X-Received: by 2002:a25:5583:0:b0:d81:8da3:348e with SMTP id j125-20020a255583000000b00d818da3348emr3841088ybb.41.1695277055929; Wed, 20 Sep 2023 23:17:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695277055; cv=none; d=google.com; s=arc-20160816; b=fieE3g/aM+mp+9NJDPTxaa2MchNiCVMOqqDmhK45+VFoG5QwEnazX4PXju0RK4el8a quONUswWbP2IdjupxhwaVGeZEQk99sL0IarVoprOWCcKklp2x0dshzmWHLLvZXTptP55 61kTiqA8Z78PvIlTmI561Z3S5tcjHR6KE3pa9Xqf8xyY/k+CrPNplBtcUZpItW8h1xAt Jaj7J/s0HKv7KZD5gZKTMeiUmP5ccrjhM/YoW6LdV1yrUgHRKGNDjDg9NuU1U5nxxw4b aSudwOWnaJ0hiUAG1x3Ht7Nu/h6cjhIl+lXsX8NvJCy+p5gSlau6wiZB+ATsrY+lfl96 1SbA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version:reply-to :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=fTb+Ik2udqF6O5creFYnCkxLACYh4nJ1pznoCTKYdgg=; fh=E4Dx6FsRR0bUU1V9Gy1LgJPpghBs+NaR6FSgB1AfVlI=; b=ygDm+C4YQpcOjxWIE3exmoSjogSYrrAZulP6UJ+TSR55ENYGTTMycECd+xQ1eaya0X mpeRPtF11IRY/WIJ29nFYo28aOnADfXwD4AreX8HiSflKXgTZX1vR4c+WttV60lYaiDx Z1B2bx8cr9b4dUEGj3ahu1SGGDzNC0KbPKpjQTzD18bDiBKjYUTjksdprzlA0qPWNz/5 WX3e6pHQTYCGxBjGxwPEcCCObiJzx8Y4lbEnVXUdPYZhmbVaxjVpQGFjV+cg6UAr4R39 EcCKttxauWihwrrnTzm8b/6XW0vOUKqL+PmjGM9yD9TLI3Cf5Dcc5AgM97Z/rx4zbh/j fz+w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=cOZzq3xE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id v3-20020a655c43000000b00565e386ff44si723365pgr.702.2023.09.20.23.17.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Sep 2023 23:17:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=cOZzq3xE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 0EE2E825A3E0; Wed, 20 Sep 2023 12:03:37 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229897AbjITTD0 (ORCPT + 27 others); Wed, 20 Sep 2023 15:03:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55140 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229880AbjITTDV (ORCPT ); Wed, 20 Sep 2023 15:03:21 -0400 Received: from mail-pf1-x42f.google.com (mail-pf1-x42f.google.com [IPv6:2607:f8b0:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 86B80C2 for ; Wed, 20 Sep 2023 12:03:12 -0700 (PDT) Received: by mail-pf1-x42f.google.com with SMTP id d2e1a72fcca58-690fe10b6a4so102262b3a.3 for ; Wed, 20 Sep 2023 12:03:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1695236592; x=1695841392; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=fTb+Ik2udqF6O5creFYnCkxLACYh4nJ1pznoCTKYdgg=; b=cOZzq3xEY/EzAjSu/J2gB67UYARR/97qXxfEnak7ificFrznnSQT5c8KElVc/KhbHP kwZ8PBDdTfpRxbpFwjz2BEH5z4/F9ZQ+zJVuLYWgCjC5pmkvmBVQrO7LOfK+ORaVwYh6 3w7b8Lq5CAPTVmCTH3mElMs6fzTuJT9q0ktDn7thXTGz/My1nMHSP9WenfbF2ce10AKS erIaO0pedz9d3H+tX1z2xArUu/Bknns3Yf9FWPNe+RSXfSgKoS+2z2W31RCi7N3OKrfw QnGLXSMinBN5SPtm2AO0k+zJM+dpP4KQG5r/XSOTnI4PWEa3Rl2p6PAzoFVr/S0guklM nnsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695236592; x=1695841392; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=fTb+Ik2udqF6O5creFYnCkxLACYh4nJ1pznoCTKYdgg=; b=jdZyv6o7q8QPERSua6mk8/bBHxXpbbreQnqupa23+9k+/G06slC3gpCNNE5GDuaSFR xMoV/wT6WtSdNheBMaaAWTObCTSvJU+Cbme3tUagD94gwhin/7Fq2kmvg0g/1bIJCk03 sEvIEScC6q1KmiBHpckxu9gDQcGnAWSrZnmFM++rk3TdcbBj8BZnXU73BGjynL+7QHER H0NZVK8cxSqxPLy/MnDk3Nm1gmMYb5giskY58ZcP8GqcIfpY6AkcUGxU1Kwl5t9qEn62 kETS3EDdKBQXKaVcqNczq83+P+Zs9TdS4U3gBq8RoZWiAFIUNpl6/7LJM7CMyXsV85ZQ vQ3g== X-Gm-Message-State: AOJu0YxNJhoKabfEYsJqPOkNhjeJ23Jm/nHGNUUMmVyFeAQDZRWn9wmK CKdsPjm83gMj1J8xWTPYoK8= X-Received: by 2002:a05:6a20:3ca7:b0:13a:59b1:c884 with SMTP id b39-20020a056a203ca700b0013a59b1c884mr3984546pzj.40.1695236591863; Wed, 20 Sep 2023 12:03:11 -0700 (PDT) Received: from KASONG-MB2.tencent.com ([124.127.145.18]) by smtp.gmail.com with ESMTPSA id m5-20020aa78a05000000b006871fdde2c7sm423935pfa.110.2023.09.20.12.03.08 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 20 Sep 2023 12:03:11 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Yu Zhao , Roman Gushchin , Johannes Weiner , Michal Hocko , Hugh Dickins , Nhat Pham , Yuanchu Xie , Kalesh Singh , Suren Baghdasaryan , "T . J . Mercier" , linux-kernel@vger.kernel.org, Kairui Song Subject: [RFC PATCH v3 3/6] workignset: simplify the initilization code Date: Thu, 21 Sep 2023 03:02:41 +0800 Message-ID: <20230920190244.16839-4-ryncsn@gmail.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230920190244.16839-1-ryncsn@gmail.com> References: <20230920190244.16839-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Wed, 20 Sep 2023 12:03:37 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777626834035285751 X-GMAIL-MSGID: 1777626834035285751 From: Kairui Song Use the new introduced EVICTION_BITS to replace timestamp_bits, compiler should be able to optimize out the previous variable but this should make the code more clear and unified. Signed-off-by: Kairui Song --- mm/workingset.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/mm/workingset.c b/mm/workingset.c index b0704cbfc667..278c3b9eb549 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -772,7 +772,6 @@ static struct lock_class_key shadow_nodes_key; static int __init workingset_init(void) { - unsigned int timestamp_bits; unsigned int max_order; int ret; @@ -784,12 +783,11 @@ static int __init workingset_init(void) * some more pages at runtime, so keep working with up to * double the initial memory by using totalram_pages as-is. */ - timestamp_bits = BITS_PER_LONG - EVICTION_SHIFT; max_order = fls_long(totalram_pages() - 1); - if (max_order > timestamp_bits) - bucket_order = max_order - timestamp_bits; + if (max_order > EVICTION_BITS) + bucket_order = max_order - EVICTION_BITS; pr_info("workingset: timestamp_bits=%d max_order=%d bucket_order=%u\n", - timestamp_bits, max_order, bucket_order); + EVICTION_BITS, max_order, bucket_order); ret = prealloc_shrinker(&workingset_shadow_shrinker, "mm-shadow"); if (ret) From patchwork Wed Sep 20 19:02:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 142649 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp4493582vqi; Wed, 20 Sep 2023 16:30:37 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE4xncbWZOixyY3l3wCkZpC3M5Bcckef23HjesOqSLYNinZV0C48urnvm4Xy7e55PqbJ28e X-Received: by 2002:a17:90a:f687:b0:274:74ee:eda5 with SMTP id cl7-20020a17090af68700b0027474eeeda5mr4124669pjb.4.1695252636891; Wed, 20 Sep 2023 16:30:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695252636; cv=none; d=google.com; s=arc-20160816; b=fgbRVb6Nv4nqDgV76RLReCZdrBhLwjDnlbf4PHWVwrf/1/taSt8RbuTcftsaCOKy/z dESoGZohUR8kw5AmIxC/kBvdgu+DmAsyhVUZWbu4ozjUvq3/J7OHE96gZjlCOQF29M6/ 01YFsy7bI46avx9S76+c2uupHREvVpHpbSPHtKUs60tefvEpQor/f0qVn45qxbD3Gj/H T+rPz2jp/QWLb+MzYKxwMW60Nbh/KVA3Fx77g7sp8IcTfdlUuusdoOrj8jGj1axIB4pT 03z8abeq0e6s+tdu3O21sOZbpoRgmSGhzVQMMN/QSYDhF4KOctM4dUYpLurxpKkWiBau UYVg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version:reply-to :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=SXDXLfwQ0rN/NkRQHGcA+48P9cgCCQ47pt4nGg6amro=; fh=E4Dx6FsRR0bUU1V9Gy1LgJPpghBs+NaR6FSgB1AfVlI=; b=Uut1zCONF3guBf8xdvXbg1KoHPCBNEY+7WRPT+tHx6S4+aGHc6Dh+GzBYzHBGorKmn 5zXKaOn3cl84w1wuehe5QZ1f/NsgOhGs7wno3K1rCJog0XQ8armoPZO2SgzkBptmHamz b/PWM9WjjDp/+vvb1WFWrTx74PHzsB20ja4vXul/6sGHnzBHYQdywVEYLxykUdQf1iSo zaQyC6cBYsSFlYjU5yTaSkpiat4c81Q5FXD9gf1RfRmz+AOYLWUwlQjgjXIfm4D+RPNm /ypte7L+Ie+9kapXByzai2zAM06S55vWifZSzsZBw2+ZktxO0cvr0KZh4JSnzbtoexx9 9i3A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=RJ1kj3AY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from pete.vger.email (pete.vger.email. [2620:137:e000::3:6]) by mx.google.com with ESMTPS id na7-20020a17090b4c0700b0026f4d1e6940si2751481pjb.160.2023.09.20.16.30.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Sep 2023 16:30:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) client-ip=2620:137:e000::3:6; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=RJ1kj3AY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id B370B81A6C0B; Wed, 20 Sep 2023 12:04:10 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229593AbjITTD3 (ORCPT + 27 others); Wed, 20 Sep 2023 15:03:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55150 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229890AbjITTDW (ORCPT ); Wed, 20 Sep 2023 15:03:22 -0400 Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E710114 for ; Wed, 20 Sep 2023 12:03:16 -0700 (PDT) Received: by mail-pf1-x431.google.com with SMTP id d2e1a72fcca58-690fe10b6a4so102325b3a.3 for ; Wed, 20 Sep 2023 12:03:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1695236596; x=1695841396; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=SXDXLfwQ0rN/NkRQHGcA+48P9cgCCQ47pt4nGg6amro=; b=RJ1kj3AYzz2QIU/bwzofj+d1kQ2pKduwjLvL5WXNyEGsaCvlz70hIxigGzPKTVe+by KVqTTMG2SZE7R0CT3cppjf1ig1PxwPF6W8EcJFvZ12kS1k70XCeCDhBxH3M3kyH4HgY8 C1aKCRSUeUavnWPPnoNpSCvBnVZbXQeyaidTi7F+Txq9RUL60FHMWNYnbpWfENeClX3W QS3mkchiczt9JyggWgPi+rhZCEkgP6SqkILUYyrdz5kylbJliI8qYOeprf8LdFGPhsFX TL+QXby9omSt8bJ5vEx4c2CtPh9SxQkweEDhMPfW8WSAeY+FBf5JuO7NaBbGBS/Xw3P8 tEsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695236596; x=1695841396; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=SXDXLfwQ0rN/NkRQHGcA+48P9cgCCQ47pt4nGg6amro=; b=FIBnWVjZfJRl/uDMu1kLXqRgq6KncufZO5IBFt1Q7KWmlGjEXRtcimwM2rdUwSY3g+ qQdMvPsjiuxJuToMXPFmnA/m3vimixgMDT6Xd2i7OqxjsbFtGPSWLEOTXsIitaBXhKA6 7DO7qxxoZw5wClXFfgQ33xENPSFeWD/nj0CuennlItzLPL/eR2YC69yfXoH7Zdkt63oE sOgYT4c1Jv03l0lq3p+dLP9X1c/dIbsi0XoKDs1NaaUVxRBzqmBVUqMhgRpLs8hzi7Rb IC/PZC9/eCBwLPgVfofnLuVY1Neny653Bf1u4RrH2/zgO3Wab5g73tv9IXjljDugRrSF WMAQ== X-Gm-Message-State: AOJu0Yyi2mChiOv0OL4DxlVynuRpt7BDglxjjUwYfDvw1ROPtIpzK8U2 k5Zkq84Cb3gMc+b8+cUd3W/FeZ+YfIoy9uVJ X-Received: by 2002:a05:6a20:1002:b0:141:d640:794a with SMTP id gs2-20020a056a20100200b00141d640794amr2885144pzc.39.1695236595707; Wed, 20 Sep 2023 12:03:15 -0700 (PDT) Received: from KASONG-MB2.tencent.com ([124.127.145.18]) by smtp.gmail.com with ESMTPSA id m5-20020aa78a05000000b006871fdde2c7sm423935pfa.110.2023.09.20.12.03.12 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 20 Sep 2023 12:03:15 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Yu Zhao , Roman Gushchin , Johannes Weiner , Michal Hocko , Hugh Dickins , Nhat Pham , Yuanchu Xie , Kalesh Singh , Suren Baghdasaryan , "T . J . Mercier" , linux-kernel@vger.kernel.org, Kairui Song Subject: [RFC PATCH v3 4/6] workingset: simplify lru_gen_test_recent Date: Thu, 21 Sep 2023 03:02:42 +0800 Message-ID: <20230920190244.16839-5-ryncsn@gmail.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230920190244.16839-1-ryncsn@gmail.com> References: <20230920190244.16839-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Wed, 20 Sep 2023 12:04:11 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777601229111353852 X-GMAIL-MSGID: 1777601229111353852 From: Kairui Song Simplify the code, move some common path into its caller, prepare for following commits. Signed-off-by: Kairui Song --- mm/workingset.c | 30 +++++++++++++----------------- 1 file changed, 13 insertions(+), 17 deletions(-) diff --git a/mm/workingset.c b/mm/workingset.c index 278c3b9eb549..87a16b6158e5 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -323,42 +323,38 @@ static void *lru_gen_eviction(struct folio *folio) * Tests if the shadow entry is for a folio that was recently evicted. * Fills in @lruvec, @token, @workingset with the values unpacked from shadow. */ -static bool lru_gen_test_recent(void *shadow, bool file, struct lruvec **lruvec, - unsigned long *token, bool *workingset) +static bool lru_gen_test_recent(struct lruvec *lruvec, bool file, + unsigned long token) { - int memcg_id; unsigned long min_seq; - struct mem_cgroup *memcg; - struct pglist_data *pgdat; - unpack_shadow(shadow, &memcg_id, &pgdat, token, workingset); - - memcg = mem_cgroup_from_id(memcg_id); - *lruvec = mem_cgroup_lruvec(memcg, pgdat); - - min_seq = READ_ONCE((*lruvec)->lrugen.min_seq[file]); - return (*token >> LRU_REFS_WIDTH) == (min_seq & (EVICTION_MASK >> LRU_REFS_WIDTH)); + min_seq = READ_ONCE(lruvec->lrugen.min_seq[file]); + return (token >> LRU_REFS_WIDTH) == (min_seq & (EVICTION_MASK >> LRU_REFS_WIDTH)); } static void lru_gen_refault(struct folio *folio, void *shadow) { + int memcgid; bool recent; - int hist, tier, refs; bool workingset; unsigned long token; + int hist, tier, refs; struct lruvec *lruvec; + struct pglist_data *pgdat; struct lru_gen_folio *lrugen; int type = folio_is_file_lru(folio); int delta = folio_nr_pages(folio); rcu_read_lock(); - recent = lru_gen_test_recent(shadow, type, &lruvec, &token, &workingset); + unpack_shadow(shadow, &memcgid, &pgdat, &token, &workingset); + lruvec = mem_cgroup_lruvec(mem_cgroup_from_id(memcgid), pgdat); if (lruvec != folio_lruvec(folio)) goto unlock; mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + type, delta); + recent = lru_gen_test_recent(lruvec, type, token); if (!recent) goto unlock; @@ -485,9 +481,6 @@ bool workingset_test_recent(void *shadow, bool file, bool *workingset) struct pglist_data *pgdat; unsigned long eviction; - if (lru_gen_enabled()) - return lru_gen_test_recent(shadow, file, &eviction_lruvec, &eviction, workingset); - unpack_shadow(shadow, &memcgid, &pgdat, &eviction, workingset); /* @@ -511,6 +504,9 @@ bool workingset_test_recent(void *shadow, bool file, bool *workingset) return false; eviction_lruvec = mem_cgroup_lruvec(eviction_memcg, pgdat); + if (lru_gen_enabled()) + return lru_gen_test_recent(eviction_lruvec, file, eviction); + return lru_test_refault(eviction_memcg, eviction_lruvec, eviction, file, EVICTION_BITS, bucket_order); } From patchwork Wed Sep 20 19:02:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 142721 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp4599511vqi; Wed, 20 Sep 2023 21:21:18 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGOK7EXJDvccga0ayL9+fgeUxpldm9crP5GNrzaSNLcuX6Ftwxkmspy6/hPXt5dbl55FwLK X-Received: by 2002:a05:6808:4092:b0:3a7:44da:d5e4 with SMTP id db18-20020a056808409200b003a744dad5e4mr3719606oib.51.1695270078144; Wed, 20 Sep 2023 21:21:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695270078; cv=none; d=google.com; s=arc-20160816; b=BC/dKM/Ua24rBgf/0RY2tkzrPB9TzepBYONuKmyQ97XRhol9E+n1D8EBmdLeABqtm6 huijkNHoFY+32ueoOz4Nj3puUww9o9IA4LeeYW7ia+oAGXqLBFu3ZXAMXVrPRwXXMHIW 6xiZTBjW/ek+XjVaU8LPEDfT1mcMsL7AFAipye2t3vvf7/ZMp93y2aoEHX02FA8xG9+s ATfdsd98vv+sVTK0E/pkob+4uuZmfhrDn6vSNPqzmPb7NqBJkLW02DDWO3WWIc8ExBUP Ot6oiKd9jqfJbhWpDErCTRBf7x2VGzB3mrXuVw4Z03f2V1mMxjRXiqY60oYo5PO7THHj fWYA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version:reply-to :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=zmtAWomE9UHhxu4UCSwc3QxDlD+DtTBiD3PMp93+3PE=; fh=E4Dx6FsRR0bUU1V9Gy1LgJPpghBs+NaR6FSgB1AfVlI=; b=ZkMJg0Mg9HTtnLzt8/WbF/hV90tbAO2uy96Byl9s5RRDdVHBqc023iC+jJ0DD4yvZo p2zobOlFEVFnhYZY68xzYoa3kDJXftsacq2w1Sd1GyPr4J8Pg0W5egcmWBi1iRW9SJ2r ZpNPCk2+4XOIIy9/Kc10MJSkPEvzm+JHID6U72sd7PmqzMbico25FKa4Dz+VTD7Je26R 61ETLa9kd2ABnQDOlKG5m7KAURUYnk0V+WqQd5b5Uyzsb8JZPost0k40+ZaVZ6NfiL5J Ur/Pjury8FFoS8Pm8SsmVyg/fMFOKswZYwlqF4bu9AdcVBZ90foV8VeLowwPBrefTQdw 4pGg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=RbMDCyef; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from howler.vger.email (howler.vger.email. [23.128.96.34]) by mx.google.com with ESMTPS id p3-20020aa78603000000b0068bda9121ecsi586365pfn.72.2023.09.20.21.21.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Sep 2023 21:21:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) client-ip=23.128.96.34; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=RbMDCyef; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id C719480EBCB7; Wed, 20 Sep 2023 12:03:47 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229902AbjITTDl (ORCPT + 27 others); Wed, 20 Sep 2023 15:03:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33440 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229906AbjITTD0 (ORCPT ); Wed, 20 Sep 2023 15:03:26 -0400 Received: from mail-pg1-x533.google.com (mail-pg1-x533.google.com [IPv6:2607:f8b0:4864:20::533]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 56FB0E4 for ; Wed, 20 Sep 2023 12:03:20 -0700 (PDT) Received: by mail-pg1-x533.google.com with SMTP id 41be03b00d2f7-5789de5c677so64987a12.3 for ; Wed, 20 Sep 2023 12:03:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1695236600; x=1695841400; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=zmtAWomE9UHhxu4UCSwc3QxDlD+DtTBiD3PMp93+3PE=; b=RbMDCyefZ18r4dVeYKpV5jTNhWp+lns/rjINo5dQExzLjthyWFHE7YkXjKlrQlIYUj r/YpDNrgoc7ZLUFts5AeuYEb2TKS80mLeZkdXuNrSOFMAc3sC1yGaYFoOElGSTF7uApc BScSTGlSU2fLOKK706SBDyviMRE7aNItX2WqCOea02XiJhXg6VmtCvGOuza2ExTNJlYe 9zsyEhzDmjacS17Ixf8br5LtYwxOsDAG08MFU6G4rIPFvkEEx4U/j7VEV4buxXdon0v+ 1i1n087r+tMa2c3RRvByZqkvGZ0MWkita+wFun/Kcjb6GPUyGOiydG9G5QkQ0z0jY5L+ +wrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695236600; x=1695841400; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=zmtAWomE9UHhxu4UCSwc3QxDlD+DtTBiD3PMp93+3PE=; b=FnkNIBYKKDw2j4Y+x+DzIsmkE7VLkrEImPw+bYgiTgRgenkVm3ud2cKomROySgxCP1 uA0zhNVYmwcep1KziJSF/dfxYu99HkZzef2lpVGpYzbzXsNO9jVR8qYIOhjUcXbSyLhC nHswrO79SnXr52YXESzq7dB8Z/5Dx7UWgiPjX3l6z0wY+lmlRDIfvgC4FjHsyYS4hDZh vJB7jo+VW+4rj92/rcZmFYvvtwQtHdAKdzkEnahV0YNU87VEU65EPZ9xwDQDu3/6LVK0 4poTZMeX0kW7CX/Dxpwi4vWG28aU/9LMZhWZAftePCUxROS0R+2akZKiF1vn0LHoeHCs FNZg== X-Gm-Message-State: AOJu0YzEWDOa5FyMcklkoset2XNJWz4nGyj+uiDXDazbEVAQ3gcfgqAl 75tyNwXOdB2WmVwHchaz6xo= X-Received: by 2002:a05:6a20:138c:b0:145:6857:457a with SMTP id hn12-20020a056a20138c00b001456857457amr3193200pzc.4.1695236599694; Wed, 20 Sep 2023 12:03:19 -0700 (PDT) Received: from KASONG-MB2.tencent.com ([124.127.145.18]) by smtp.gmail.com with ESMTPSA id m5-20020aa78a05000000b006871fdde2c7sm423935pfa.110.2023.09.20.12.03.15 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 20 Sep 2023 12:03:19 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Yu Zhao , Roman Gushchin , Johannes Weiner , Michal Hocko , Hugh Dickins , Nhat Pham , Yuanchu Xie , Kalesh Singh , Suren Baghdasaryan , "T . J . Mercier" , linux-kernel@vger.kernel.org, Kairui Song Subject: [RFC PATCH v3 5/6] mm, lru_gen: convert avg_total and avg_refaulted to atomic Date: Thu, 21 Sep 2023 03:02:43 +0800 Message-ID: <20230920190244.16839-6-ryncsn@gmail.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230920190244.16839-1-ryncsn@gmail.com> References: <20230920190244.16839-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Wed, 20 Sep 2023 12:03:48 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777619517471476371 X-GMAIL-MSGID: 1777619517471476371 From: Kairui Song No feature change, prepare for later patch. Signed-off-by: Kairui Song --- include/linux/mmzone.h | 4 ++-- mm/vmscan.c | 16 ++++++++-------- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 4106fbc5b4b3..d944987b67d3 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -425,9 +425,9 @@ struct lru_gen_folio { /* the multi-gen LRU sizes, eventually consistent */ long nr_pages[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES]; /* the exponential moving average of refaulted */ - unsigned long avg_refaulted[ANON_AND_FILE][MAX_NR_TIERS]; + atomic_long_t avg_refaulted[ANON_AND_FILE][MAX_NR_TIERS]; /* the exponential moving average of evicted+protected */ - unsigned long avg_total[ANON_AND_FILE][MAX_NR_TIERS]; + atomic_long_t avg_total[ANON_AND_FILE][MAX_NR_TIERS]; /* the first tier doesn't need protection, hence the minus one */ unsigned long protected[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS - 1]; /* can be modified without holding the LRU lock */ diff --git a/mm/vmscan.c b/mm/vmscan.c index 3f4de75e5186..82acc1934c86 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3705,9 +3705,9 @@ static void read_ctrl_pos(struct lruvec *lruvec, int type, int tier, int gain, struct lru_gen_folio *lrugen = &lruvec->lrugen; int hist = lru_hist_from_seq(lrugen->min_seq[type]); - pos->refaulted = lrugen->avg_refaulted[type][tier] + + pos->refaulted = atomic_long_read(&lrugen->avg_refaulted[type][tier]) + atomic_long_read(&lrugen->refaulted[hist][type][tier]); - pos->total = lrugen->avg_total[type][tier] + + pos->total = atomic_long_read(&lrugen->avg_total[type][tier]) + atomic_long_read(&lrugen->evicted[hist][type][tier]); if (tier) pos->total += lrugen->protected[hist][type][tier - 1]; @@ -3732,15 +3732,15 @@ static void reset_ctrl_pos(struct lruvec *lruvec, int type, bool carryover) if (carryover) { unsigned long sum; - sum = lrugen->avg_refaulted[type][tier] + + sum = atomic_long_read(&lrugen->avg_refaulted[type][tier]) + atomic_long_read(&lrugen->refaulted[hist][type][tier]); - WRITE_ONCE(lrugen->avg_refaulted[type][tier], sum / 2); + atomic_long_set(&lrugen->avg_refaulted[type][tier], sum / 2); - sum = lrugen->avg_total[type][tier] + + sum = atomic_long_read(&lrugen->avg_total[type][tier]) + atomic_long_read(&lrugen->evicted[hist][type][tier]); if (tier) sum += lrugen->protected[hist][type][tier - 1]; - WRITE_ONCE(lrugen->avg_total[type][tier], sum / 2); + atomic_long_set(&lrugen->avg_total[type][tier], sum / 2); } if (clear) { @@ -5885,8 +5885,8 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec, if (seq == max_seq) { s = "RT "; - n[0] = READ_ONCE(lrugen->avg_refaulted[type][tier]); - n[1] = READ_ONCE(lrugen->avg_total[type][tier]); + n[0] = atomic_long_read(&lrugen->avg_refaulted[type][tier]); + n[1] = atomic_long_read(&lrugen->avg_total[type][tier]); } else if (seq == min_seq[type] || NR_HIST_GENS > 1) { s = "rep"; n[0] = atomic_long_read(&lrugen->refaulted[hist][type][tier]); From patchwork Wed Sep 20 19:02:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 142655 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp4501948vqi; Wed, 20 Sep 2023 16:51:46 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHPqYWIWNUSHpQ4g5mK8FAdmOzd1WqwFsiFK7MyykVXdnqQ4NBQEwv07R38hLfUexO0PGlI X-Received: by 2002:a05:6a00:1822:b0:690:1720:aa92 with SMTP id y34-20020a056a00182200b006901720aa92mr4264580pfa.10.1695253906495; Wed, 20 Sep 2023 16:51:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695253906; cv=none; d=google.com; s=arc-20160816; b=vGb03Wrb+scPuRakZtG85GRyI2u1eqIjPcfu6j8W57uJEnhAlFq+vuF5eTiVPXeeKx 5fF++r2RYwz5ArcOGrZyn1lo8yo6PlsStdcCzxwvTwJoZOcCBkdUudMvCzK7xYFgD1Cy AAsiXBYnZr/Si4YY6YVMVAfEJXhEud+gHt8M/9w5RDPWzO9LUF6Mzw0ryeY0vhuvAeeF J4zw48mvVRmp/u3A4Z/s7WFWgg9ozmTokFQydjZc4PblRhUJJRT7ON5kFFNbFHWiVdv/ cFGLaj3wX4R6yaiifAV6a8rPExORIzpxw8zVA40M/8W2Hqyx10OhcV88Rw97bSRjmz4k ndvA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version:reply-to :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Kjgv+mFujyzVEIugXeNIQdL6q9WaOAgedP9iUwn1GA0=; fh=E4Dx6FsRR0bUU1V9Gy1LgJPpghBs+NaR6FSgB1AfVlI=; b=tLb+oP8LTI6wR2jC3l/PViN83opp02V2TzBh4Zs8Z0xXrnw6aP2v2271IJLjUi1nEl BN0ymkjHslKdrjMwuvFIiLlWyiT2B49HqS20Vwt++1RdVmwBGfX/8Ru6Fz11fsACygfc P8ZxnP6YlCp39z8t2/d1gAFmbWwWM1dAHa2nAZFtZLAVgu3XU+Bd5SVNSEGZi2cZrssD nO8kLCqUJzq5fr3WoQf9KEEJFqCbADZNrXZQMLTaD5MylLmgq3JQbVL8nOsVM+Yby12d qoX/DkeO5UzZRfPemFOp+W45DjK9UfZpMMj5lDJE32Ilz0MpYOAkf8DnnoNkKOVvCNDF Vl7w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b="fZxJ1+O/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from pete.vger.email (pete.vger.email. [2620:137:e000::3:6]) by mx.google.com with ESMTPS id a73-20020a63904c000000b0057884435a7csi155705pge.292.2023.09.20.16.51.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Sep 2023 16:51:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) client-ip=2620:137:e000::3:6; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b="fZxJ1+O/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id E942481A73E4; Wed, 20 Sep 2023 12:04:52 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229948AbjITTDr (ORCPT + 27 others); Wed, 20 Sep 2023 15:03:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33442 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229904AbjITTDj (ORCPT ); Wed, 20 Sep 2023 15:03:39 -0400 Received: from mail-pf1-x436.google.com (mail-pf1-x436.google.com [IPv6:2607:f8b0:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 92BAC133 for ; Wed, 20 Sep 2023 12:03:24 -0700 (PDT) Received: by mail-pf1-x436.google.com with SMTP id d2e1a72fcca58-68fcb4dc8a9so107793b3a.2 for ; Wed, 20 Sep 2023 12:03:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1695236604; x=1695841404; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=Kjgv+mFujyzVEIugXeNIQdL6q9WaOAgedP9iUwn1GA0=; b=fZxJ1+O/3Y5snEztA2gP6ey5/kGtbUM6AY4ui9G+AJjWtV1gtE34AQiTeOXuxUxYyr LDsRfPuVVtoT74byQ8BZ1qOHHIpngbqqWrImNw+zJeV56I4CGbxG33K6o7UnFNvJ/8ep YYMeRpJurfjS1I0/glar3bYmgpYLQXyCBwdysUdrI4bZXrX6dOAuNIGPVOhcRDGqoKqY +ChomoEYdo3iXg0/6llijAzBoP65yNTT819LI+7yHROEA92X6SB3/75QUZEK1KcoSDAU hifQaX9zsV2tCOt00kDSI7r9njNzuDWKXGz9NrhoapfYuLiLiqYRsbKPI6PM9xQceZFp pYXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695236604; x=1695841404; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Kjgv+mFujyzVEIugXeNIQdL6q9WaOAgedP9iUwn1GA0=; b=YzJ+D4F2p6l5SmuijMER9mVo/l0bAvLAEKyOUFaimunI2q9j4pH8h8Iv2ePAMhswKW JPfiSWxK7yIVnbTYYzgFrAZ8nZ4LyJpU/PPASTdWjF1Cwzo+GsH8I8jmJW8f5raZIYAF GcEj4aHaukIxvs2gjXkmKwTGj6y+U3JrgtAo+zwYLHw3AbjTtc/bIrPQq8Xrxjx6nMr+ ZziEKQxHkZcyNmahXhE3eyIE1qfPYMOl+UxCdz71IQWAaDqNRh5kFfGS3x2+dDrRcuzP XHLDfkeOO4HW77Q2QII2tOts2akIpWSDvtjkUB0AzI6s+V6EpN2Rv0EWP49HvNMiALaW qP0A== X-Gm-Message-State: AOJu0YzD7OCMODAtRGWcbJ20ZzVOdl0+6ZXKpQeVaCPGEpGQjfxhxZeh VSsIgMD26VYeSshNx7rFGVc= X-Received: by 2002:a05:6a21:99a2:b0:15d:ee3:a1e3 with SMTP id ve34-20020a056a2199a200b0015d0ee3a1e3mr1350694pzb.16.1695236603799; Wed, 20 Sep 2023 12:03:23 -0700 (PDT) Received: from KASONG-MB2.tencent.com ([124.127.145.18]) by smtp.gmail.com with ESMTPSA id m5-20020aa78a05000000b006871fdde2c7sm423935pfa.110.2023.09.20.12.03.20 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 20 Sep 2023 12:03:23 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Yu Zhao , Roman Gushchin , Johannes Weiner , Michal Hocko , Hugh Dickins , Nhat Pham , Yuanchu Xie , Kalesh Singh , Suren Baghdasaryan , "T . J . Mercier" , linux-kernel@vger.kernel.org, Kairui Song Subject: [RFC PATCH v3 6/6] workingset, lru_gen: apply refault-distance based re-activation Date: Thu, 21 Sep 2023 03:02:44 +0800 Message-ID: <20230920190244.16839-7-ryncsn@gmail.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230920190244.16839-1-ryncsn@gmail.com> References: <20230920190244.16839-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Wed, 20 Sep 2023 12:04:53 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777602560267184773 X-GMAIL-MSGID: 1777602560267184773 From: Kairui Song I noticed MGLRU not working very well on certain workflows, which is observed on some heavily stressed databases. That is when the file page workingset size exceeds total memory, and the access distance (the left-shift time of a page before it gets activated, considering LRU starts from right) of file pages also larger than total memory. All file pages are stuck on the oldest generation and getting read-in then evicted permutably. Despite anon pages being idle, they never get aged. PID controller didn't kickin until there are some minor access pattern changes. And file pages are not promoted or reused. Even though the memory can't cover the whole workingset, the refault-distance based re-activation can help hold part of the workingset in-memory to help reduce the IO workload significantly. So apply it for MGLRU as well. The updated refault-distance model fits well for MGLRU in most cases, if we just consider the last two generation as the inactive LRU and the first two generations as active LRU. Some adjustment is done to fit the logic better, also make the refault-distance contributed to page tiering and PID refault detection of MGLRU: - If a tier-0 page have a qualified refault-distance, just promote it to higher tier, send it to second oldest gen. - If a tier >= 1 page have a qualified refault-distance, mark it as active and send it to youngest gen. - Increase the reference of every page that have a qualified refault-distance and increase the PID countroled refault rate of the updated tier, in hope similar paged will be protected next time upon eviction. NOTE: This also changed the meaning of workingset_* fields in /proc/vmstat, workingset_activate_* now stands for the pages reactivated or promoted by refault distance checking, workingset_restore_* now stands for all pages promoted by any reason. Following benchmark showed 5x improvement. To simulate the optimized workflow, I setup a 3-replicated mongodb cluster, each in a different cgroup, using 5 gb of wiretiger cache and 10g of oplog, on a 32G VM with no limit set. The benchmark is done using https://github.com/apavlo/py-tpcc.git, modified to run STOCK_LEVEL query only, for simulating slow query and get a stable result. Test is done on an EPYC 7K62 with 32G RAM with SATA SSD: - Before (with ZRAM enabled, the result won't change whether any kind of swap is on or not): $ tpcc.py --config=mongodb.config mongodb --duration=900 --warehouses=500 --clients=30 ================================================================== Execution Results after 919 seconds ------------------------------------------------------------------ Executed Time (µs) Rate STOCK_LEVEL 577 27584645283.7 0.02 txn/s ------------------------------------------------------------------ TOTAL 577 27584645283.7 0.02 txn/s $ cat /proc/vmstat | grep workingset workingset_nodes 47860 workingset_refault_anon 0 workingset_refault_file 23498953 workingset_activate_anon 0 workingset_activate_file 23487840 workingset_restore_anon 0 workingset_restore_file 18553646 workingset_nodereclaim 768 $ free -m total used free shared buff/cache available Mem: 31849 6829 790 23 24229 24542 Swap: 31848 0 31848 - Patched: (with ZRAM enabled): $ tpcc.py --config=mongodb.config mongodb --duration=900 --warehouses=500 --clients=30 ================================================================== Execution Results after 905 seconds ------------------------------------------------------------------ Executed Time (µs) Rate STOCK_LEVEL 2542 27121571486.2 0.09 txn/s ------------------------------------------------------------------ TOTAL 2542 27121571486.2 0.09 txn/s $ cat /proc/vmstat | grep working workingset_nodes 70358 workingset_refault_anon 16853 workingset_refault_file 22693601 workingset_activate_anon 10099 workingset_activate_file 8565519 workingset_restore_anon 10127 workingset_restore_file 8566053 workingset_nodereclaim 9801 $ free -m total used free shared buff/cache available Mem: 31849 7093 283 4 24472 24289 Swap: 31848 1652 30196 The performance is 5x times better than before, and the idle anon pages now can get swapped out as expected. The result is also better with lower test stress, testing with lower stress also shows a improvement. I also checked the benchmark with memtier/memcached and fio, using similar setup as in commit ac35a4902374 but scaled down to fit in my test environment: memtier test (16G ramdisk as swap, 4G memcg limit, VM on a EPYC 7K62): memcached -u nobody -m 16384 -s /tmp/memcached.socket -a 0766 \ -t 16 -B binary & memtier_benchmark -S /tmp/memcached.socket -P memcache_binary -n allkeys\ --key-minimum=1 --key-maximum=36000000 --key-pattern=P:P -c 1 \ -t 16 --ratio 1:0 --pipeline 8 -d 600 -x 6 fio test 1 (16G ramdisk, 4G memcg limit, VM on a EPYC 7K62): fio -name=mglru --numjobs=16 --directory=/mnt --size=1000m \ --buffered=1 --ioengine=io_uring --iodepth=128 \ --iodepth_batch_submit=32 --iodepth_batch_complete=32 \ --rw=randread --random_distribution=zipf:1.2 --norandommap \ --time_based --ramp_time=10m --runtime=5m --group_reporting fio test 2 (16G ramdisk, 2G memcg limit, VM on a EPYC 7K62): fio -name=mglru --numjobs=16 --directory=/mnt --size=1000m \ --buffered=1 --ioengine=io_uring --iodepth=128 \ --iodepth_batch_submit=32 --iodepth_batch_complete=32 \ --rw=randread --random_distribution=zipf:1.2 --norandommap \ --time_based --ramp_time=10m --runtime=5m --group_reporting mysql test (15G buffer pool with 16G memcg limit, VM on a EPYC 7K62): sysbench /usr/share/sysbench/oltp_read_only.lua \ --tables=48 --table-size=2000000 --threads=16 --time=1800 run Before this patch: memtier: 37794.71 op/s fio 1: 6327.3k iops fio 2: 5697.6k iops mysql: 146104.98 qps After this patch: memtier: 37792.61 op/s fio 1: 6583.3k iops fio 2: 5929.2k iops mysql: 146055.88 qps There is no regression on other tests so far, and a performance gain is observed on file page heavy tasks. Signed-off-by: Kairui Song --- mm/vmscan.c | 20 +++++--- mm/workingset.c | 130 +++++++++++++++++++++++++++++++----------------- 2 files changed, 95 insertions(+), 55 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 82acc1934c86..c7745b22cc0b 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3730,17 +3730,21 @@ static void reset_ctrl_pos(struct lruvec *lruvec, int type, bool carryover) for (tier = 0; tier < MAX_NR_TIERS; tier++) { if (carryover) { - unsigned long sum; + unsigned long refaulted, total; - sum = atomic_long_read(&lrugen->avg_refaulted[type][tier]) + - atomic_long_read(&lrugen->refaulted[hist][type][tier]); - atomic_long_set(&lrugen->avg_refaulted[type][tier], sum / 2); + refaulted = atomic_long_read(&lrugen->avg_refaulted[type][tier]) + + atomic_long_read(&lrugen->refaulted[hist][type][tier]); - sum = atomic_long_read(&lrugen->avg_total[type][tier]) + - atomic_long_read(&lrugen->evicted[hist][type][tier]); + total = atomic_long_read(&lrugen->avg_total[type][tier]) + + atomic_long_read(&lrugen->evicted[hist][type][tier]); if (tier) - sum += lrugen->protected[hist][type][tier - 1]; - atomic_long_set(&lrugen->avg_total[type][tier], sum / 2); + total += lrugen->protected[hist][type][tier - 1]; + + /* total could be less than refaulted, see lru_gen_refault */ + total = max(total, refaulted); + + atomic_long_set(&lrugen->avg_refaulted[type][tier], refaulted / 2); + atomic_long_set(&lrugen->avg_total[type][tier], total / 2); } if (clear) { diff --git a/mm/workingset.c b/mm/workingset.c index 87a16b6158e5..e548c8cee9ad 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -175,6 +175,7 @@ MEM_CGROUP_ID_SHIFT) #define EVICTION_BITS (BITS_PER_LONG - (EVICTION_SHIFT)) #define EVICTION_MASK (~0UL >> EVICTION_SHIFT) +#define LRU_GEN_EVICTION_BITS (EVICTION_BITS - LRU_REFS_WIDTH - LRU_GEN_WIDTH) /* * Eviction timestamps need to be able to cover the full range of @@ -185,6 +186,7 @@ * evictions into coarser buckets by shaving off lower timestamp bits. */ static unsigned int bucket_order __read_mostly; +static unsigned int lru_gen_bucket_order __read_mostly; static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long eviction, bool workingset) @@ -290,6 +292,34 @@ static inline bool lru_test_refault(struct mem_cgroup *memcg, (file ? inactive_anon : inactive_file); } +/** + * workingset_age_nonresident - age non-resident entries as LRU ages + * @lruvec: the lruvec that was aged + * @nr_pages: the number of pages to count + * + * As in-memory pages are aged, non-resident pages need to be aged as + * well, in order for the refault distances later on to be comparable + * to the in-memory dimensions. This function allows reclaim and LRU + * operations to drive the non-resident aging along in parallel. + */ +void workingset_age_nonresident(struct lruvec *lruvec, unsigned long nr_pages) +{ + /* + * Reclaiming a cgroup means reclaiming all its children in a + * round-robin fashion. That means that each cgroup has an LRU + * order that is composed of the LRU orders of its child + * cgroups; and every page has an LRU position not just in the + * cgroup that owns it, but in all of that group's ancestors. + * + * So when the physical inactive list of a leaf cgroup ages, + * the virtual inactive lists of all its parents, including + * the root cgroup's, age as well. + */ + do { + atomic_long_add(nr_pages, &lruvec->nonresident_age); + } while ((lruvec = parent_lruvec(lruvec))); +} + #ifdef CONFIG_LRU_GEN static void *lru_gen_eviction(struct folio *folio) @@ -311,10 +341,14 @@ static void *lru_gen_eviction(struct folio *folio) lruvec = mem_cgroup_lruvec(memcg, pgdat); lrugen = &lruvec->lrugen; min_seq = READ_ONCE(lrugen->min_seq[type]); + token = (min_seq << LRU_REFS_WIDTH) | max(refs - 1, 0); + token <<= LRU_GEN_EVICTION_BITS; + token |= lru_eviction(lruvec, LRU_GEN_EVICTION_BITS, lru_gen_bucket_order); hist = lru_hist_from_seq(min_seq); atomic_long_add(delta, &lrugen->evicted[hist][type][tier]); + workingset_age_nonresident(lruvec, folio_nr_pages(folio)); return pack_shadow(mem_cgroup_id(memcg), pgdat, token, refs); } @@ -329,15 +363,17 @@ static bool lru_gen_test_recent(struct lruvec *lruvec, bool file, unsigned long min_seq; min_seq = READ_ONCE(lruvec->lrugen.min_seq[file]); + token >>= LRU_GEN_EVICTION_BITS; return (token >> LRU_REFS_WIDTH) == (min_seq & (EVICTION_MASK >> LRU_REFS_WIDTH)); } static void lru_gen_refault(struct folio *folio, void *shadow) { int memcgid; - bool recent; + bool refault; bool workingset; unsigned long token; + bool recent = false; int hist, tier, refs; struct lruvec *lruvec; struct pglist_data *pgdat; @@ -345,28 +381,36 @@ static void lru_gen_refault(struct folio *folio, void *shadow) int type = folio_is_file_lru(folio); int delta = folio_nr_pages(folio); - rcu_read_lock(); - unpack_shadow(shadow, &memcgid, &pgdat, &token, &workingset); lruvec = mem_cgroup_lruvec(mem_cgroup_from_id(memcgid), pgdat); if (lruvec != folio_lruvec(folio)) - goto unlock; + return; mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + type, delta); - + refault = lru_test_refault(lruvec_memcg(lruvec), lruvec, token, type, + LRU_GEN_EVICTION_BITS, lru_gen_bucket_order); recent = lru_gen_test_recent(lruvec, type, token); - if (!recent) - goto unlock; + if (!recent && !refault) + return; lrugen = &lruvec->lrugen; - hist = lru_hist_from_seq(READ_ONCE(lrugen->min_seq[type])); /* see the comment in folio_lru_refs() */ + token >>= LRU_GEN_EVICTION_BITS; refs = (token & (BIT(LRU_REFS_WIDTH) - 1)) + workingset; tier = lru_tier_from_refs(refs); - atomic_long_add(delta, &lrugen->refaulted[hist][type][tier]); - mod_lruvec_state(lruvec, WORKINGSET_ACTIVATE_BASE + type, delta); + if (refault) { + if (refs) + folio_set_active(folio); + /* + * Protect higher tier to make it easier + * to stay in a stable workingset and prevent refault. + */ + if (refs != BIT(LRU_REFS_WIDTH)) + tier = lru_tier_from_refs(refs + 1); + mod_lruvec_state(lruvec, WORKINGSET_ACTIVATE_BASE + type, delta); + } /* * Count the following two cases as stalls: @@ -375,12 +419,25 @@ static void lru_gen_refault(struct folio *folio, void *shadow) * 2. For pages accessed multiple times through file descriptors, * numbers of accesses might have been out of the range. */ - if (lru_gen_in_fault() || refs == BIT(LRU_REFS_WIDTH)) { - folio_set_workingset(folio); + if (refault || lru_gen_in_fault() || refs == BIT(LRU_REFS_WIDTH)) { mod_lruvec_state(lruvec, WORKINGSET_RESTORE_BASE + type, delta); + folio_set_workingset(folio); + } + + /* + * If recent is false, add to global PID counters since the gen which + * the page evicted is gone already. + */ + if (recent) { + /* + * tier may get increased upon refault, which makes refaulted larger + * than evicted, this will be reset and accounted by reset_ctrl_pos + */ + atomic_long_add(delta, &lrugen->refaulted[hist][type][tier]); + } else { + atomic_long_add(delta, &lrugen->avg_total[type][tier]); + atomic_long_add(delta, &lrugen->avg_refaulted[type][tier]); } -unlock: - rcu_read_unlock(); } #else /* !CONFIG_LRU_GEN */ @@ -402,34 +459,6 @@ static void lru_gen_refault(struct folio *folio, void *shadow) #endif /* CONFIG_LRU_GEN */ -/** - * workingset_age_nonresident - age non-resident entries as LRU ages - * @lruvec: the lruvec that was aged - * @nr_pages: the number of pages to count - * - * As in-memory pages are aged, non-resident pages need to be aged as - * well, in order for the refault distances later on to be comparable - * to the in-memory dimensions. This function allows reclaim and LRU - * operations to drive the non-resident aging along in parallel. - */ -void workingset_age_nonresident(struct lruvec *lruvec, unsigned long nr_pages) -{ - /* - * Reclaiming a cgroup means reclaiming all its children in a - * round-robin fashion. That means that each cgroup has an LRU - * order that is composed of the LRU orders of its child - * cgroups; and every page has an LRU position not just in the - * cgroup that owns it, but in all of that group's ancestors. - * - * So when the physical inactive list of a leaf cgroup ages, - * the virtual inactive lists of all its parents, including - * the root cgroup's, age as well. - */ - do { - atomic_long_add(nr_pages, &lruvec->nonresident_age); - } while ((lruvec = parent_lruvec(lruvec))); -} - /** * workingset_eviction - note the eviction of a folio from memory * @target_memcg: the cgroup that is causing the reclaim @@ -529,16 +558,16 @@ void workingset_refault(struct folio *folio, void *shadow) bool workingset; long nr; - if (lru_gen_enabled()) { - lru_gen_refault(folio, shadow); - return; - } - /* Flush stats (and potentially sleep) before holding RCU read lock */ mem_cgroup_flush_stats_ratelimited(); rcu_read_lock(); + if (lru_gen_enabled()) { + lru_gen_refault(folio, shadow); + goto out; + } + /* * The activation decision for this folio is made at the level * where the eviction occurred, as that is where the LRU order @@ -785,6 +814,13 @@ static int __init workingset_init(void) pr_info("workingset: timestamp_bits=%d max_order=%d bucket_order=%u\n", EVICTION_BITS, max_order, bucket_order); +#ifdef CONFIG_LRU_GEN + if (max_order > LRU_GEN_EVICTION_BITS) + lru_gen_bucket_order = max_order - LRU_GEN_EVICTION_BITS; + pr_info("workingset: lru_gen_timestamp_bits=%d lru_gen_bucket_order=%u\n", + LRU_GEN_EVICTION_BITS, lru_gen_bucket_order); +#endif + ret = prealloc_shrinker(&workingset_shadow_shrinker, "mm-shadow"); if (ret) goto err;