Message ID | 202303161723055514455@zte.com.cn |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp382613wrt; Thu, 16 Mar 2023 02:45:53 -0700 (PDT) X-Google-Smtp-Source: AK7set/nAsPv0t9atEnJUFwp8ybMBBtFdXfFJOch9ov4fSpjp7ygptHvLGGsWApMXJwNhEKogr0G X-Received: by 2002:a17:90b:394d:b0:23d:133a:62cc with SMTP id oe13-20020a17090b394d00b0023d133a62ccmr3154891pjb.17.1678959953226; Thu, 16 Mar 2023 02:45:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678959953; cv=none; d=google.com; s=arc-20160816; b=TJ9UeFQnQKPCUi1CgJDtXqS97uHbJkiERo4vwWZ7PXvwFb/7FVGaDFs0nFssxbuXL0 REXxQ60/N3XvolUbVNLnwQCB16tDu5USGONrPIpI/08cDz9yxXRpltdJGt1XLigLzeGE Lrjk0bDOa3jwFa+5B2HX0CsZ2nTIUglQl14wCVwkFKS/S6z2h7bPShvuWiEIHl5pqnHJ wop3ddWZg0rMzA908JtBXw5dfnsXoa8JqBAmEgFf42Iffh/AS37RNIU/F3WuihnDlAWC MUkbJE89Rb6o5HgPM3vnndGjqL+MkV/l57P/B5cPG61Bq/Iz0qnRwmT01v7tlgvUf1Qb CY9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:subject:cc:to:from:mime-version:message-id:date; bh=D70TNtp3zkCOreFcACvQARKTqLnmJNLIKtdpW2Sixow=; b=ev/m+zozABI3ynlJ5xMP1M0jXVDTV6Xue7F6ZH48IpwJj/w5lgGRP8pecIn+AtFiCv YUldYLMBNxAmbQS0wuEDdENFpB7NzSbfgJ3uZLQQ2mBYBZhh47lDNBWVNKA6tsomt+4A yuVzxJws1T0Wd+jgRfkK4GLqWlB0DdXU9N2yBWVrQVDlntqaRHNfQZMMOZemexxE707y F1W/W9Wm+ik3QqyIN7eJypLSmHKUq3ns3NvApmysSGmmvn/ONkiSxItMUfOLRdd7iYb/ gpf9KxRsqJ3EgZjfgK6a955ass04H5lLwoYC5SqFAeN3AGgVg8YiM8UHR20WZ44hK2Ec NRbQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=zte.com.cn Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v11-20020a17090a520b00b002334001148fsi4077487pjh.152.2023.03.16.02.45.40; Thu, 16 Mar 2023 02:45:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=zte.com.cn Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230035AbjCPJXR (ORCPT <rfc822;pwkd43@gmail.com> + 99 others); Thu, 16 Mar 2023 05:23:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40720 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229532AbjCPJXQ (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 16 Mar 2023 05:23:16 -0400 Received: from mxhk.zte.com.cn (mxhk.zte.com.cn [63.216.63.40]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 29E3DA90A3 for <linux-kernel@vger.kernel.org>; Thu, 16 Mar 2023 02:23:13 -0700 (PDT) Received: from mse-fl2.zte.com.cn (unknown [10.5.228.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mxhk.zte.com.cn (FangMail) with ESMTPS id 4PchdC38z4z8R039; Thu, 16 Mar 2023 17:23:11 +0800 (CST) Received: from szxlzmapp04.zte.com.cn ([10.5.231.166]) by mse-fl2.zte.com.cn with SMTP id 32G9N3Ew025512; Thu, 16 Mar 2023 17:23:03 +0800 (+08) (envelope-from yang.yang29@zte.com.cn) Received: from mapi (szxlzmapp03[null]) by mapi (Zmail) with MAPI id mid14; Thu, 16 Mar 2023 17:23:05 +0800 (CST) Date: Thu, 16 Mar 2023 17:23:05 +0800 (CST) X-Zmail-TransId: 2b056412dff932c-66bf9 X-Mailer: Zmail v1.0 Message-ID: <202303161723055514455@zte.com.cn> Mime-Version: 1.0 From: <yang.yang29@zte.com.cn> To: <akpm@linux-foundation.org>, <hannes@cmpxchg.org> Cc: <linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>, <iamjoonsoo.kim@lge.com>, <yang.yang29@zte.com.cn>, <willy@infradead.org> Subject: =?utf-8?q?=C2=A0=5BPATCH_linux-next=5D_mm=3A_workingset=3A_simplify?= =?utf-8?q?_the=C2=A0calculation_of_workingset_size?= Content-Type: text/plain; charset="UTF-8" X-MAIL: mse-fl2.zte.com.cn 32G9N3Ew025512 X-Fangmail-Gw-Spam-Type: 0 X-FangMail-Miltered: at cgslv5.04-192.168.250.137.novalocal with ID 6412DFFF.001 by FangMail milter! X-FangMail-Envelope: 1678958591/4PchdC38z4z8R039/6412DFFF.001/10.5.228.133/[10.5.228.133]/mse-fl2.zte.com.cn/<yang.yang29@zte.com.cn> X-Fangmail-Anti-Spam-Filtered: true X-Fangmail-MID-QID: 6412DFFF.001/4PchdC38z4z8R039 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_MSPIKE_H2, SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760517111842337184?= X-GMAIL-MSGID: =?utf-8?q?1760517111842337184?= |
Series |
[linux-next] mm: workingset: simplify the calculation of workingset size
|
|
Commit Message
Yang Yang
March 16, 2023, 9:23 a.m. UTC
From: Yang Yang <yang.yang29@zte.com.cn> After we implemented workingset detection for anonymous LRU[1], the calculation of workingset size is a little complex. Actually there is no need to call mem_cgroup_get_nr_swap_pages() if refault page is anonymous page, since we are doing swapping then should always give pressure to NR_ACTIVE_ANON. So avoid using mem_cgroup_get_nr_swap_pages() when handling swapin in workingset_refault(). This also give us a chance to refactor the code to make it simpler and more understandable. [1] commit aae466b0052e ("mm/swap: implement workingset detection for anonymous LRU") Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Reviewed-by: Wang Yong <wang.yong12@zte.com.cn> Reviewed-by: Xiaokai Ran <ran.xiaokai@zte.com.cn> --- mm/workingset.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-)
Comments
On Thu, Mar 16, 2023 at 05:23:05PM +0800, yang.yang29@zte.com.cn wrote: > * Compare the distance to the existing workingset size. We > * don't activate pages that couldn't stay resident even if > - * all the memory was available to the workingset. Whether > - * workingset competition needs to consider anon or not depends > - * on having swap. > + * all the memory was available to the workingset. For page > + * cache whether workingset competition needs to consider > + * anon or not depends on having swap. I don't mind this change > */ > workingset_size = lruvec_page_state(eviction_lruvec, NR_ACTIVE_FILE); > + /* For anonymous page */ This comment adds no value > if (!file) { > + workingset_size += lruvec_page_state(eviction_lruvec, > + NR_ACTIVE_ANON); > workingset_size += lruvec_page_state(eviction_lruvec, > NR_INACTIVE_FILE); > - } > - if (mem_cgroup_get_nr_swap_pages(eviction_memcg) > 0) { > + /* For page cache */ Nor this one > + } else if (mem_cgroup_get_nr_swap_pages(eviction_memcg) > 0) { > workingset_size += lruvec_page_state(eviction_lruvec, > NR_ACTIVE_ANON); > - if (file) { > - workingset_size += lruvec_page_state(eviction_lruvec, > + workingset_size += lruvec_page_state(eviction_lruvec, > NR_INACTIVE_ANON); > - } > } I don't have an opinion on the actual code changes.
On Thu, Mar 16, 2023 at 05:23:05PM +0800, yang.yang29@zte.com.cn wrote: > From: Yang Yang <yang.yang29@zte.com.cn> > > After we implemented workingset detection for anonymous LRU[1], > the calculation of workingset size is a little complex. Actually there is > no need to call mem_cgroup_get_nr_swap_pages() if refault page is > anonymous page, since we are doing swapping then should always > give pressure to NR_ACTIVE_ANON. This is false. (mem_cgroup_)get_nr_swap_pages() returns the *free swap slots*. There might be swap, but if it's full, reclaim stops scanning anonymous pages altogether. That means that refaults of either type can no longer displace existing anonymous pages, only cache. So yes, all refaults need to check free swap to determine how to act on the reuse frequency. > @@ -466,22 +466,23 @@ void workingset_refault(struct folio *folio, void *shadow) > /* > * Compare the distance to the existing workingset size. We > * don't activate pages that couldn't stay resident even if > - * all the memory was available to the workingset. Whether > - * workingset competition needs to consider anon or not depends > - * on having swap. > + * all the memory was available to the workingset. For page > + * cache whether workingset competition needs to consider > + * anon or not depends on having swap. No, it applies to all refaults, not just cache. What could help is changing the comment to "having free swap space".
>On Thu, Mar 16, 2023 at 05:23:05PM +0800, yang.yang29@zte.com.cn wrote: >> From: Yang Yang <yang.yang29@zte.com.cn> >> >> After we implemented workingset detection for anonymous LRU[1], >> the calculation of workingset size is a little complex. Actually there is >> no need to call mem_cgroup_get_nr_swap_pages() if refault page is >> anonymous page, since we are doing swapping then should always >> give pressure to NR_ACTIVE_ANON. > > This is false. > > (mem_cgroup_)get_nr_swap_pages() returns the *free swap slots*. There > might be swap, but if it's full, reclaim stops scanning anonymous > pages altogether. That means that refaults of either type can no > longer displace existing anonymous pages, only cache. I see in this patch "mm: vmscan: enforce inactive:active ratio at the reclaim root", reclaim will be done in the combined workingset of different workloads in different cgroups. So if current cgroup reach it's swap limit(mem_cgroup_get_nr_swap_pages(memcg) == 0), but other cgroup still has swap slot, should we allow the refaulting page to active and give pressure to other cgroup?
On Fri, Mar 17, 2023 at 01:59:03AM +0000, Yang Yang wrote: > >On Thu, Mar 16, 2023 at 05:23:05PM +0800, yang.yang29@zte.com.cn wrote: > >> From: Yang Yang <yang.yang29@zte.com.cn> > >> > >> After we implemented workingset detection for anonymous LRU[1], > >> the calculation of workingset size is a little complex. Actually there is > >> no need to call mem_cgroup_get_nr_swap_pages() if refault page is > >> anonymous page, since we are doing swapping then should always > >> give pressure to NR_ACTIVE_ANON. > > > > This is false. > > > > (mem_cgroup_)get_nr_swap_pages() returns the *free swap slots*. There > > might be swap, but if it's full, reclaim stops scanning anonymous > > pages altogether. That means that refaults of either type can no > > longer displace existing anonymous pages, only cache. > > I see in this patch "mm: vmscan: enforce inactive:active ratio at the > reclaim root", reclaim will be done in the combined workingset of > different workloads in different cgroups. > > So if current cgroup reach it's swap limit(mem_cgroup_get_nr_swap_pages(memcg) == 0), > but other cgroup still has swap slot, should we allow the refaulting page > to active and give pressure to other cgroup? That's what we do today. The shadow entry remembers the reclaim root, so that refaults can later evaluated at the same level. So, say you have: root - A - A1 `- A2 and A1 and A2 are reclaimed due to a limit in A. The shadow entries of evictions from A1 and A2 will actually refer to A. When they refault later on, the distance is interpreted based on whether A has swap (eviction_lruvec).
> On Fri, Mar 17, 2023 at 01:59:03AM +0000, Yang Yang wrote: >> I see in this patch "mm: vmscan: enforce inactive:active ratio at the >> reclaim root", reclaim will be done in the combined workingset of >> different workloads in different cgroups. >> >> So if current cgroup reach it's swap limit(mem_cgroup_get_nr_swap_pages(memcg) == 0), >> but other cgroup still has swap slot, should we allow the refaulting page >> to active and give pressure to other cgroup? > > That's what we do today. > > The shadow entry remembers the reclaim root, so that refaults can > later evaluated at the same level. So, say you have: > > root - A - A1 > `- A2 > and A1 and A2 are reclaimed due to a limit in A. The shadow entries of > evictions from A1 and A2 will actually refer to A. > > When they refault later on, the distance is interpreted based on > whether A has swap (eviction_lruvec). Much appreciate to your patient explanation. Still a question: In the example above, if (NR_ACTIVE_FILE + NR_INACTIVE_FILE) < refault_distance < (NR_ACTIVE_FILE + NR_INACTIVE_FILE + NR_ACTIVE_ANON), and swap slot is full, the refault page is not set active. Then if some swap slots is freed, the newly refault page might be early reclaimed since it's inactive. And if we let the refault page be set active evenif swap slot is full, when swap slot is freed, the refault page is protected from being early relcaimed.
diff --git a/mm/workingset.c b/mm/workingset.c index 00c6f4d9d9be..a304e8571d54 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -466,22 +466,23 @@ void workingset_refault(struct folio *folio, void *shadow) /* * Compare the distance to the existing workingset size. We * don't activate pages that couldn't stay resident even if - * all the memory was available to the workingset. Whether - * workingset competition needs to consider anon or not depends - * on having swap. + * all the memory was available to the workingset. For page + * cache whether workingset competition needs to consider + * anon or not depends on having swap. */ workingset_size = lruvec_page_state(eviction_lruvec, NR_ACTIVE_FILE); + /* For anonymous page */ if (!file) { + workingset_size += lruvec_page_state(eviction_lruvec, + NR_ACTIVE_ANON); workingset_size += lruvec_page_state(eviction_lruvec, NR_INACTIVE_FILE); - } - if (mem_cgroup_get_nr_swap_pages(eviction_memcg) > 0) { + /* For page cache */ + } else if (mem_cgroup_get_nr_swap_pages(eviction_memcg) > 0) { workingset_size += lruvec_page_state(eviction_lruvec, NR_ACTIVE_ANON); - if (file) { - workingset_size += lruvec_page_state(eviction_lruvec, + workingset_size += lruvec_page_state(eviction_lruvec, NR_INACTIVE_ANON); - } } if (refault_distance > workingset_size) goto out;