Message ID | 20230117231632.2734737-3-minchan@kernel.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp2053254wrn; Tue, 17 Jan 2023 16:12:29 -0800 (PST) X-Google-Smtp-Source: AMrXdXvxLMlKOpO21ybMHf+Z4qroiYPilAQFDPqqNAFCE/GgUxplycvslYnVW1DDOOsKvn5c9CBi X-Received: by 2002:a17:906:f0c7:b0:86f:41b2:3e with SMTP id dk7-20020a170906f0c700b0086f41b2003emr4856842ejb.71.1674000749281; Tue, 17 Jan 2023 16:12:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674000749; cv=none; d=google.com; s=arc-20160816; b=euym4Kv9efMJd/4YK07cradhyQh8xXp4B+l3oL6YSFbxGXT5TL+G+HjVIcR+93Io4C XbSXcbZqEXkl4UwX64P0FAUGJ0zP6cF8XSpja86FVGR43h2PO6EKJvBHNLCWZQjUpJ7T ZuVJ9tK6XwrnSC7f9pOwcftQpqeT0jqYxsBKqr6PczbTdiNKqNQz7SW9nqvkLUmTCdtj +Y4up6QGHAui65FBN8boFxmiLxnLuERhpRk/Kh1QaaYLhKnHuvc5DpEFzl6DoBpBBLwZ aYLslQF4iWCc/rI4uGPNl9Ak3kP2ejRyJVQuqrcekYrrEyt9EGrBCLzGCGvAVE8FhOLG M+kA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from:sender :dkim-signature; bh=SMrwqounr0oUuG21W3XQfHvNf7os8CE1mar5f82JgxI=; b=G82QUUw11YihHNF205p7H0umqsic0Lx3DuYMX4MTpYKgw5Xv3FpuO8OEf2iabaRN6S e6rz+S2iTTVMUt1D9F3/JUq0AdRUZ8fp4+OHshGkG+tLoTd7Ws97sdgc1sgxP1dFN6n1 lNjbfuSpS+E/ExWPtVzqqEgfQx5B4qfkOcWDpCpf1yziT/Q61Wk65HFAgDP0EwAtDuMF PDI+3uQbC28FdTr5RAHvKEZFb8xF7LeEFpj1DE49Yqonq8XaknhnWwmKmokffVuYOQ2K eOb57ReQbZ9AriURp1nZuN19+lPI/QuiIbAnFAKdqwaAfEbulrOIMhZKB5aHCyIhfUdk Af7Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="pS/gUVrl"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ck17-20020a170906c45100b0078356aaeb63si27252023ejb.288.2023.01.17.16.12.06; Tue, 17 Jan 2023 16:12:29 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="pS/gUVrl"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229620AbjARAD6 (ORCPT <rfc822;pfffrao@gmail.com> + 99 others); Tue, 17 Jan 2023 19:03:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48378 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229613AbjARAD1 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 17 Jan 2023 19:03:27 -0500 Received: from mail-pj1-x102d.google.com (mail-pj1-x102d.google.com [IPv6:2607:f8b0:4864:20::102d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E44BEBC8BC for <linux-kernel@vger.kernel.org>; Tue, 17 Jan 2023 15:16:42 -0800 (PST) Received: by mail-pj1-x102d.google.com with SMTP id y3-20020a17090a390300b00229add7bb36so423268pjb.4 for <linux-kernel@vger.kernel.org>; Tue, 17 Jan 2023 15:16:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date :message-id:reply-to; bh=SMrwqounr0oUuG21W3XQfHvNf7os8CE1mar5f82JgxI=; b=pS/gUVrldsYuEvQWJem29eVikSvSY6AeGUnyAPPRx6JDLPalLF60OpyhtfbbhHVVNM blAGDRem/RXiBLYfhuMkVXlzQoEevYfTJApvZpXgxChB0zanFmX4sQTTir8d8RU7VdnS 5etir+xvETOpNvcIV9bzFP2EgZAASyCC2oVNPvv2pyP3VQB+zJ2eOPbHDGk9+TlUNMV+ f01T0BGb5Ozy++CKL4jWy5zgrnLZasNM3iqlSlU8yqlG08r7PqBbBHHQs2LU8plQM83n vdvOozKeSEhUDGgoZk0NuVQgif0pLhdH2EwrlcSLqO2NbFyZr5bBJ4WlwhLAcHJPGLkO cW2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=SMrwqounr0oUuG21W3XQfHvNf7os8CE1mar5f82JgxI=; b=FyQAiGa3zC2gPZVsa5YHBd7DYOS/c6tbajDVpM1MTGpXFtEi0WeNirfBZ0brZdfXKy Qv6uZAi03F/WyJ9GVADO4Cwl8wteadoZxg8ifgQMZlxtQ4HsJXGx8sBuN6W6/o2cZ2FQ qAawAA+QSiTL7HOc+BV0Lt61T4nDwkDjWImD0x3kQ51cNDcqXZf89kPnTYQFKT7KvBpK cs6bJxw4tc0VnjnpfDbYwJAcXjumkYPZJeDtZHdepIQy7rxOJA+Ze9ADRnheppIw19/R uiWTF1A5rizlbFRuDzBricYSmToxveKjrTB2hWSXNZTcH2j1RzkvtLpINKR+NdGVVtVF 0w5Q== X-Gm-Message-State: AFqh2ko7fMoKvyMTrBcbPovuVzLREK0i9gCA6sWN8Iw9GIwGoRxmw6IW 2nxdxgBm1eZl7mhEcxjwtKY= X-Received: by 2002:a05:6a20:4e26:b0:b8:a148:63b5 with SMTP id gk38-20020a056a204e2600b000b8a14863b5mr4792999pzb.3.1673997401048; Tue, 17 Jan 2023 15:16:41 -0800 (PST) Received: from bbox-1.mtv.corp.google.com ([2620:15c:211:201:27ce:97b5:ee13:dbfe]) by smtp.gmail.com with ESMTPSA id c24-20020aa79538000000b0057447bb0ddcsm5180965pfp.49.2023.01.17.15.16.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Jan 2023 15:16:40 -0800 (PST) Sender: Minchan Kim <minchan.kim@gmail.com> From: Minchan Kim <minchan@kernel.org> To: Andrew Morton <akpm@linux-foundation.org> Cc: Suren Baghdasaryan <surenb@google.com>, Matthew Wilcox <willy@infradead.org>, linux-mm <linux-mm@kvack.org>, LKML <linux-kernel@vger.kernel.org>, Michal Hocko <mhocko@suse.com>, SeongJae Park <sj@kernel.org>, Minchan Kim <minchan@kernel.org> Subject: [PATCH 3/3] mm: add vmstat statistics for madvise_[cold|pageout] Date: Tue, 17 Jan 2023 15:16:32 -0800 Message-Id: <20230117231632.2734737-3-minchan@kernel.org> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog In-Reply-To: <20230117231632.2734737-1-minchan@kernel.org> References: <20230117231632.2734737-1-minchan@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755317009845110737?= X-GMAIL-MSGID: =?utf-8?q?1755317009845110737?= |
Series |
[1/3] mm: return the number of pages successfully paged out
|
|
Commit Message
Minchan Kim
Jan. 17, 2023, 11:16 p.m. UTC
madvise LRU manipulation APIs need to scan address ranges to find
present pages at page table and provides advice hints for them.
Likewise pg[scan/steal] count on vmstat, madvise_pg[scanned/hinted]
shows the proactive reclaim efficiency so this patch addes those
two statistics in vmstat.
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
include/linux/vm_event_item.h | 2 ++
mm/madvise.c | 19 +++++++++++++++----
mm/vmstat.c | 2 ++
3 files changed, 19 insertions(+), 4 deletions(-)
Comments
On Tue 17-01-23 15:16:32, Minchan Kim wrote: > madvise LRU manipulation APIs need to scan address ranges to find > present pages at page table and provides advice hints for them. > > Likewise pg[scan/steal] count on vmstat, madvise_pg[scanned/hinted] > shows the proactive reclaim efficiency so this patch addes those > two statistics in vmstat. Please describe the usecase for those new counters.
On Wed, Jan 18, 2023 at 10:11:46AM +0100, Michal Hocko wrote: > On Tue 17-01-23 15:16:32, Minchan Kim wrote: > > madvise LRU manipulation APIs need to scan address ranges to find > > present pages at page table and provides advice hints for them. > > > > Likewise pg[scan/steal] count on vmstat, madvise_pg[scanned/hinted] > > shows the proactive reclaim efficiency so this patch addes those > > two statistics in vmstat. > > Please describe the usecase for those new counters. I wanted to know the proactive reclaim efficieny using MADV_COLD/MDDV_PAGEOUT. Userspace has several policy which when/which vmas need to be hinted by the call and they are evolving. I needed to know how effectively their policy works since the vma ranges are huge(i.e., nr_hinted/nr_scanned).
On Wed 18-01-23 09:15:34, Minchan Kim wrote: > On Wed, Jan 18, 2023 at 10:11:46AM +0100, Michal Hocko wrote: > > On Tue 17-01-23 15:16:32, Minchan Kim wrote: > > > madvise LRU manipulation APIs need to scan address ranges to find > > > present pages at page table and provides advice hints for them. > > > > > > Likewise pg[scan/steal] count on vmstat, madvise_pg[scanned/hinted] > > > shows the proactive reclaim efficiency so this patch addes those > > > two statistics in vmstat. > > > > Please describe the usecase for those new counters. > > I wanted to know the proactive reclaim efficieny using MADV_COLD/MDDV_PAGEOUT. > Userspace has several policy which when/which vmas need to be hinted by the call > and they are evolving. I needed to know how effectively their policy works since > the vma ranges are huge(i.e., nr_hinted/nr_scanned). I can see how that can be an interesting information but is there anything actionable about that beyond debugging purposes? In other words isn't this something that could be done by tracing instead? Also how are you going to identify specific madvise calls when they can interleave arbitrarily?
On Wed, Jan 18, 2023 at 06:27:02PM +0100, Michal Hocko wrote: > On Wed 18-01-23 09:15:34, Minchan Kim wrote: > > On Wed, Jan 18, 2023 at 10:11:46AM +0100, Michal Hocko wrote: > > > On Tue 17-01-23 15:16:32, Minchan Kim wrote: > > > > madvise LRU manipulation APIs need to scan address ranges to find > > > > present pages at page table and provides advice hints for them. > > > > > > > > Likewise pg[scan/steal] count on vmstat, madvise_pg[scanned/hinted] > > > > shows the proactive reclaim efficiency so this patch addes those > > > > two statistics in vmstat. > > > > > > Please describe the usecase for those new counters. > > > > I wanted to know the proactive reclaim efficieny using MADV_COLD/MDDV_PAGEOUT. > > Userspace has several policy which when/which vmas need to be hinted by the call > > and they are evolving. I needed to know how effectively their policy works since > > the vma ranges are huge(i.e., nr_hinted/nr_scanned). > > I can see how that can be an interesting information but is there > anything actionable about that beyond debugging purposes? In other words > isn't this something that could be done by tracing instead? That's the statictis for telemetry. With those stat, we are collecting various vmstat fields(i.e., pgsteal/pgscan) from real field devices and thought those two stats would be good fit along with other reclaim statistics in vmstat since we can know how much proactive madvise policy could make system healthier(e.g., less kswapd scan, less allocstall and so on). > > Also how are you going to identify specific madvise calls when they can > interleave arbitrarily? I guess you are talking about how we could separate MADV_PAGEOUT and MADV_COLD from vmstat. That's valid question. I thought for the start, adds just umbrella stat like this and if we want to break down, we need to introudce sysfs likewise slab.
On Wed 18-01-23 09:55:38, Minchan Kim wrote: > On Wed, Jan 18, 2023 at 06:27:02PM +0100, Michal Hocko wrote: > > On Wed 18-01-23 09:15:34, Minchan Kim wrote: > > > On Wed, Jan 18, 2023 at 10:11:46AM +0100, Michal Hocko wrote: > > > > On Tue 17-01-23 15:16:32, Minchan Kim wrote: > > > > > madvise LRU manipulation APIs need to scan address ranges to find > > > > > present pages at page table and provides advice hints for them. > > > > > > > > > > Likewise pg[scan/steal] count on vmstat, madvise_pg[scanned/hinted] > > > > > shows the proactive reclaim efficiency so this patch addes those > > > > > two statistics in vmstat. > > > > > > > > Please describe the usecase for those new counters. > > > > > > I wanted to know the proactive reclaim efficieny using MADV_COLD/MDDV_PAGEOUT. > > > Userspace has several policy which when/which vmas need to be hinted by the call > > > and they are evolving. I needed to know how effectively their policy works since > > > the vma ranges are huge(i.e., nr_hinted/nr_scanned). > > > > I can see how that can be an interesting information but is there > > anything actionable about that beyond debugging purposes? In other words > > isn't this something that could be done by tracing instead? > > That's the statictis for telemetry. With those stat, we are collecting > various vmstat fields(i.e., pgsteal/pgscan) from real field devices > and thought those two stats would be good fit along with other reclaim > statistics in vmstat since we can know how much proactive madvise policy > could make system healthier(e.g., less kswapd scan, less allocstall > and so on). > > > > > Also how are you going to identify specific madvise calls when they can > > interleave arbitrarily? > > I guess you are talking about how we could separate MADV_PAGEOUT and > MADV_COLD from vmstat. That's valid question. I thought for the start, > adds just umbrella stat like this and if we want to break down, we need > to introudce sysfs likewise slab. No, not really. MADV_COLD is about aging. There is no actual reclaim going on so pgscan/steal metrics do not make any sense. I am asking about potential different concurrent MADV_PAGEOUT happening. From what you've said earlier (how effectively policy works) I have understood you want to find out how a specific MADV_PAGEOUT effective is. But there maybe different callers of this applied to all sorts of different memory mappings and therefore the efficiency might be really different. As there is no clear way to tell one from the other I am really questioning whether this global stat is actually useful.
On Wed, Jan 18, 2023 at 10:13:38PM +0100, Michal Hocko wrote: > On Wed 18-01-23 09:55:38, Minchan Kim wrote: > > On Wed, Jan 18, 2023 at 06:27:02PM +0100, Michal Hocko wrote: > > > On Wed 18-01-23 09:15:34, Minchan Kim wrote: > > > > On Wed, Jan 18, 2023 at 10:11:46AM +0100, Michal Hocko wrote: > > > > > On Tue 17-01-23 15:16:32, Minchan Kim wrote: > > > > > > madvise LRU manipulation APIs need to scan address ranges to find > > > > > > present pages at page table and provides advice hints for them. > > > > > > > > > > > > Likewise pg[scan/steal] count on vmstat, madvise_pg[scanned/hinted] > > > > > > shows the proactive reclaim efficiency so this patch addes those > > > > > > two statistics in vmstat. > > > > > > > > > > Please describe the usecase for those new counters. > > > > > > > > I wanted to know the proactive reclaim efficieny using MADV_COLD/MDDV_PAGEOUT. > > > > Userspace has several policy which when/which vmas need to be hinted by the call > > > > and they are evolving. I needed to know how effectively their policy works since > > > > the vma ranges are huge(i.e., nr_hinted/nr_scanned). > > > > > > I can see how that can be an interesting information but is there > > > anything actionable about that beyond debugging purposes? In other words > > > isn't this something that could be done by tracing instead? > > > > That's the statictis for telemetry. With those stat, we are collecting > > various vmstat fields(i.e., pgsteal/pgscan) from real field devices > > and thought those two stats would be good fit along with other reclaim > > statistics in vmstat since we can know how much proactive madvise policy > > could make system healthier(e.g., less kswapd scan, less allocstall > > and so on). > > > > > > > > Also how are you going to identify specific madvise calls when they can > > > interleave arbitrarily? > > > > I guess you are talking about how we could separate MADV_PAGEOUT and > > MADV_COLD from vmstat. That's valid question. I thought for the start, > > adds just umbrella stat like this and if we want to break down, we need > > to introudce sysfs likewise slab. > > No, not really. MADV_COLD is about aging. There is no actual reclaim > going on so pgscan/steal metrics do not make any sense. I am asking > about potential different concurrent MADV_PAGEOUT happening. From what > you've said earlier (how effectively policy works) I have understood you > want to find out how a specific MADV_PAGEOUT effective is. But there No, it 's not a specific MADV_PAGEOUT but system global policy. Android has used the ActivityManagerService to control proactive memory compaction from apps since it could control life of apps. You can think it as userspace kswapd. > maybe different callers of this applied to all sorts of different memory > mappings and therefore the efficiency might be really different. As > there is no clear way to tell one from the other I am really questioning > whether this global stat is actually useful. The purpose is global stat.
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 3518dba1e02f..8b9fb2e151eb 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -49,6 +49,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, PGSCAN_FILE, PGSTEAL_ANON, PGSTEAL_FILE, + MADVISE_PGSCANNED, + MADVISE_PGHINTED, #ifdef CONFIG_NUMA PGSCAN_ZONE_RECLAIM_FAILED, #endif diff --git a/mm/madvise.c b/mm/madvise.c index a4a03054ab6b..0e58545ff6e9 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -334,6 +334,8 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, spinlock_t *ptl; struct page *page = NULL; LIST_HEAD(page_list); + unsigned int nr_scanned = 0; + unsigned int nr_hinted = 0; if (fatal_signal_pending(current)) return -EINTR; @@ -343,6 +345,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, pmd_t orig_pmd; unsigned long next = pmd_addr_end(addr, end); + nr_scanned += HPAGE_PMD_NR tlb_change_page_size(tlb, HPAGE_PMD_SIZE); ptl = pmd_trans_huge_lock(pmd, vma); if (!ptl) @@ -396,11 +399,15 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, list_add(&page->lru, &page_list); } } else - deactivate_page(page); + if (deactivate_page(page)) + nr_hinted += HPAGE_PMD_NR; huge_unlock: spin_unlock(ptl); if (pageout) - paging_out(&page_list); + nr_hinted += paging_out(&page_list); + + count_vm_events(MADVISE_PGSCANNED, nr_scanned); + count_vm_events(MADVISE_PGHINTED, nr_hinted); return 0; } @@ -414,6 +421,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, arch_enter_lazy_mmu_mode(); for (; addr < end; pte++, addr += PAGE_SIZE) { ptent = *pte; + nr_scanned++; if (pte_none(ptent)) continue; @@ -485,14 +493,17 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, list_add(&page->lru, &page_list); } } else - deactivate_page(page); + if (deactivate_page(page)) + nr_hinted++; } arch_leave_lazy_mmu_mode(); pte_unmap_unlock(orig_pte, ptl); if (pageout) - paging_out(&page_list); + nr_hinted += paging_out(&page_list); cond_resched(); + count_vm_events(MADVISE_PGSCANNED, nr_scanned); + count_vm_events(MADVISE_PGHINTED, nr_hinted); return 0; } diff --git a/mm/vmstat.c b/mm/vmstat.c index b2371d745e00..0139feade854 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1280,6 +1280,8 @@ const char * const vmstat_text[] = { "pgscan_file", "pgsteal_anon", "pgsteal_file", + "madvise_pgscanned", + "madvise_pghinted", #ifdef CONFIG_NUMA "zone_reclaim_failed",