Message ID | 202302100915227721315@zte.com.cn |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp680828wrn; Thu, 9 Feb 2023 17:21:15 -0800 (PST) X-Google-Smtp-Source: AK7set/hRHrN5W3ugsZH6X8S0hw22fGVrABlOwbd9Lz09pT+X1hlHV7dlawDJMVRdZBkHQ/XZpg3 X-Received: by 2002:a17:907:7213:b0:8ac:8705:c497 with SMTP id dr19-20020a170907721300b008ac8705c497mr11756300ejc.57.1675992075695; Thu, 09 Feb 2023 17:21:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675992075; cv=none; d=google.com; s=arc-20160816; b=k3rnq7pj8evs8K4s/tff9eW1k89kUNvO36YZKIwmOF3ZS7CRIydg/DdqR4S+uLR7Zq zDw52mEIINo0lU+m3ud4kC4e8N6Toj583JFhZRbPusqTzfgumaAbdo9PZhEUGKuQ11K6 EWL5xghUAnAlmZ5H9Q1VMS//C6cmR2MzHOIfQ17GlbtZ2xrkJD7W2o5ZObkQR+1jcNsm Yw7aSZqe07lSQCf0pO21v7JP2+JCI/I58rVRDLP4AAXY2SfQEINsc5F3/FdT1tq0coPr uCSg4fVq768LslMYzmbsaeJNPOSljYKgxcnOAAVeLSMjAhN0ESFc9JwrFsiZtBg6zPk1 CE4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:subject:cc:to:from:mime-version:message-id:date; bh=ZRCWa5VcbtTMd93vzJE230tM5ULf6+8N5VXoD1UdbY8=; b=oBFt51C5nsEKzzHz2OmJycKoa0D5HkfzPVp78SQBTxjF/UIyeemmp1AVk8OtI1ish/ LWb+3Ab5BQ0ZM+JIwvFRL+DYnyhuhDLcgdKeMnhRUpvvmdhWhAL2ysF4S2XPDECcKRdh UkqeZV/dKz5IlF/9nWlqkufOgnhq4Qi7Q0mKKvRhrKKQ1auaja24nWafaAdIWZ1K8IO9 a6pzNvuns1k29JvI4Dm8VXr1XHr+rc6uXm/TCF4upESh/ciu8Tfuh30wNJ2yJmzS/2SL mfuO5aRqeAVaYKC+yz1Ml6Gh1OORBP4mE/dFbej3bOvYkOGArJ2TmLBwstQXrrSkun1Z YlPg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=zte.com.cn Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id um38-20020a170907cb2600b0088eb55ed9d4si3746807ejc.689.2023.02.09.17.20.52; Thu, 09 Feb 2023 17:21:15 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=zte.com.cn Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231199AbjBJBPd (ORCPT <rfc822;ybw1215001957@gmail.com> + 99 others); Thu, 9 Feb 2023 20:15:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56710 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229483AbjBJBPb (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 9 Feb 2023 20:15:31 -0500 Received: from mxct.zte.com.cn (mxct.zte.com.cn [183.62.165.209]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1481C5EA07 for <linux-kernel@vger.kernel.org>; Thu, 9 Feb 2023 17:15:29 -0800 (PST) Received: from mse-fl1.zte.com.cn (unknown [10.5.228.132]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mxct.zte.com.cn (FangMail) with ESMTPS id 4PCbQ81jJRz501Qm; Fri, 10 Feb 2023 09:15:28 +0800 (CST) Received: from szxlzmapp01.zte.com.cn ([10.5.231.85]) by mse-fl1.zte.com.cn with SMTP id 31A1FLFN006412; Fri, 10 Feb 2023 09:15:21 +0800 (+08) (envelope-from yang.yang29@zte.com.cn) Received: from mapi (szxlzmapp01[null]) by mapi (Zmail) with MAPI id mid14; Fri, 10 Feb 2023 09:15:22 +0800 (CST) Date: Fri, 10 Feb 2023 09:15:22 +0800 (CST) X-Zmail-TransId: 2b0363e59aaafffffffff5e3bbcc X-Mailer: Zmail v1.0 Message-ID: <202302100915227721315@zte.com.cn> Mime-Version: 1.0 From: <yang.yang29@zte.com.cn> To: <akpm@linux-foundation.org> Cc: <david@redhat.com>, <imbrenda@linux.ibm.com>, <jiang.xuexin@zte.com.cn>, <linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>, <ran.xiaokai@zte.com.cn>, <xu.xin.sc@gmail.com>, <xu.xin16@zte.com.cn>, <yang.yang29@zte.com.cn> Subject: =?utf-8?q?=5BPATCH_v6_0/6=5D_ksm=3A_support_tracking_KSM-placed_zer?= =?utf-8?q?o-pages?= Content-Type: text/plain; charset="UTF-8" X-MAIL: mse-fl1.zte.com.cn 31A1FLFN006412 X-Fangmail-Gw-Spam-Type: 0 X-Fangmail-Anti-Spam-Filtered: true X-Fangmail-MID-QID: 63E59AB0.000/4PCbQ81jJRz501Qm X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_00,SORTED_RECIPS, SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757405066623354349?= X-GMAIL-MSGID: =?utf-8?q?1757405066623354349?= |
Series |
ksm: support tracking KSM-placed zero-pages
|
|
Message
Yang Yang
Feb. 10, 2023, 1:15 a.m. UTC
From: xu xin <xu.xin16@zte.com.cn>
The core idea of this patch set is to enable users to perceive the number of any
pages merged by KSM, regardless of whether use_zero_page switch has been turned
on, so that users can know how much free memory increase is really due to their
madvise(MERGEABLE) actions. But the problem is, when enabling use_zero_pages,
all empty pages will be merged with kernel zero pages instead of with each
other as use_zero_pages is disabled, and then these zero-pages are no longer
monitored by KSM.
The motivations for me to do this contains three points:
1) MADV_UNMERGEABLE and other ways to trigger unsharing will *not*
unshare the shared zeropage as placed by KSM (which is against the
MADV_UNMERGEABLE documentation at least); see the link:
https://lore.kernel.org/lkml/4a3daba6-18f9-d252-697c-197f65578c44@redhat.com/
2) We cannot know how many pages are zero pages placed by KSM when
enabling use_zero_pages, which hides the critical information about
how much actual memory are really saved by KSM. Knowing how many
ksm-placed zero pages are helpful for user to use the policy of madvise
(MERGEABLE) better because they can see the actual profit brought by KSM.
3) The zero pages placed-by KSM are different from those initial empty page
(filled with zeros) which are never touched by applications. The former
is active-merged by KSM while the later have never consume actual memory.
use_zero_pages is useful, not only because of cache colouring as described
in doc, but also because use_zero_pages can accelerate merging empty pages
when there are plenty of empty pages (full of zeros) as the time of
page-by-page comparisons (unstable_tree_search_insert) is saved. So we hope to
implement the support for ksm zero page tracking without affecting the feature
of use_zero_pages.
Zero pages may be the most common merged pages in actual environment(not only VM but
also including other application like containers). Enabling use_zero_pages in the
environment with plenty of empty pages(full of zeros) will be very useful. Users and
app developer can also benefit from knowing the proportion of zero pages in all
merged pages to optimize applications.
With the patch series, we can both unshare zero-pages(KSM-placed) accurately
and count ksm zero pages with enabling use_zero_pages.
v5->v6:
1) In [PATCH 1/6], fix some coments as Divid's suggestions [1].
2) In [PATCH 6/6], modify the patch as Divid's suggestions [2].
[1]: https://lore.kernel.org/lkml/0e0c90a2-d12c-f965-9cce-ecd5d28c09dd@redhat.com/
[2]: https://lore.kernel.org/lkml/3704dcf0-bd0a-8ab2-7f4f-045fc7c73171@redhat.com/
v4->v5:
---
1) fix warning: mm/ksm.c:3238:9: warning: no previous prototype for 'zero_pages_sharing_show'
[-Wmissing-prototypes].
In [PATCH 3/6] (ksm: count all zero pages placed by KSM), declare the function
ssize_t zero_pages_sharing_show(struct kobject *kobj...) as 'static'.
2) In [PATCH 6/6],fix error of "} while (end_scans < start_scans + 20);" to
"} while (end_scans < start_scans + 2);" in wait_two_full_scans().
---
v3->v4:
1) The patch series are readjusted to adapt to these recent changes of break_ksm()
from David Hildenbrand's commits:
https://lore.kernel.org/all/20221021101141.84170-9-david@redhat.com/T/#u
2) Some changes of patch itself:
In [patch 2/6], add a check of mm exiting in unshare_zero_pages in case of
unsharing the zero pages whose process is exiting; form a new function
clean_rmap_item_zero_flag(), and add it after stable_tree_search() to fix;
In [patch 3/6], all decreasing actions of zero pages count are put in
clean_rmap_item_zero_flag(), which is more accurate.
3) Add a selftest of unsharing and counting ksm-placed zero pages.
---
v2->v3:
1) Add more descriptive information in cover letter.
2) In [patch 2/5], add more commit log for explaining reasons.
3) In [patch 2/5], fix misuse of break_ksm() in unmerge_ksm_pages():
break_ksm(vma, addr, NULL) -> break_ksm(vma, addr, false);
---
v1->v2:
[patch 4/5] fix build warning, mm/ksm.c:550, misleading indentation; statement
'rmap_item->mm->ksm_zero_pages_sharing--;' is not part of the previous 'if'.
*** BLURB HERE ***
xu xin (6):
ksm: abstract the function try_to_get_old_rmap_item
ksm: support unsharing zero pages placed by KSM
ksm: count all zero pages placed by KSM
ksm: count zero pages for each process
ksm: add zero_pages_sharing documentation
selftest: add testing unsharing and counting ksm zero page
Documentation/admin-guide/mm/ksm.rst | 7 +
fs/proc/base.c | 1 +
include/linux/mm_types.h | 7 +-
mm/ksm.c | 185 +++++++++++++++++-----
tools/testing/selftests/vm/ksm_functional_tests.c | 96 ++++++++++-
5 files changed, 255 insertions(+), 41 deletions(-)
Comments
On 10.02.23 02:15, yang.yang29@zte.com.cn wrote: > From: xu xin <xu.xin16@zte.com.cn> > Hi, sorry for the late follow-up. Still wrapping my head around this and possible alternatives. I hope we'll get some comments from others as well about the basic approach. > The core idea of this patch set is to enable users to perceive the number of any > pages merged by KSM, regardless of whether use_zero_page switch has been turned > on, so that users can know how much free memory increase is really due to their > madvise(MERGEABLE) actions. But the problem is, when enabling use_zero_pages, > all empty pages will be merged with kernel zero pages instead of with each > other as use_zero_pages is disabled, and then these zero-pages are no longer > monitored by KSM. > > The motivations for me to do this contains three points: > > 1) MADV_UNMERGEABLE and other ways to trigger unsharing will *not* > unshare the shared zeropage as placed by KSM (which is against the > MADV_UNMERGEABLE documentation at least); see the link: > https://lore.kernel.org/lkml/4a3daba6-18f9-d252-697c-197f65578c44@redhat.com/ > > 2) We cannot know how many pages are zero pages placed by KSM when > enabling use_zero_pages, which hides the critical information about > how much actual memory are really saved by KSM. Knowing how many > ksm-placed zero pages are helpful for user to use the policy of madvise > (MERGEABLE) better because they can see the actual profit brought by KSM. > > 3) The zero pages placed-by KSM are different from those initial empty page > (filled with zeros) which are never touched by applications. The former > is active-merged by KSM while the later have never consume actual memory. > I agree with all of the above, but it's still unclear to me if there is a real downside to a simpler approach: (1) Tracking the shared zeropages. That would be fairly easy: whenever we map/unmap a shared zeropage, we simply update the counter. (2) Unmerging all shared zeropages inside the VMAs during MADV_UNMERGEABLE. (3) Documenting that MADV_UNMERGEABLE will also unmerge the shared zeropage when toggle xy is flipped. It's certainly simpler and doesn't rely on the rmap item. See below. > use_zero_pages is useful, not only because of cache colouring as described > in doc, but also because use_zero_pages can accelerate merging empty pages > when there are plenty of empty pages (full of zeros) as the time of > page-by-page comparisons (unstable_tree_search_insert) is saved. So we hope to > implement the support for ksm zero page tracking without affecting the feature > of use_zero_pages. > > Zero pages may be the most common merged pages in actual environment(not only VM but > also including other application like containers). Enabling use_zero_pages in the > environment with plenty of empty pages(full of zeros) will be very useful. Users and > app developer can also benefit from knowing the proportion of zero pages in all > merged pages to optimize applications. > I agree with that point, especially after I read in a paper that KSM applied to some applications mainly deduplicates pages filled with 0s. So it seems like a reasonable thing to optimize for. > With the patch series, we can both unshare zero-pages(KSM-placed) accurately > and count ksm zero pages with enabling use_zero_pages. The problem with this approach I see is that it fundamentally relies on the rmap/stable-tree to detect whether a zeropage was placed or not. I was wondering, why we even need an rmap item *at all* anymore. Why can't we place the shared zeropage an call it a day (remove the rmap item)? Once we placed a shared zeropage, the next KSM scan should better just ignore it, it's already deduplicated. So if most pages we deduplicate are shared zeropages, it would be quite interesting to reduce the memory overhead and avoid rmap items, instead of building new functionality on top of it? If we'd really want to identify whether a zeropage was deduplciated by KSM, we could try storing that information inside the PTE instead of inside the RMAP. Then, we could directly adjust the counter when zapping the shared zeropage or during MADV_DONTNEED/when unmerging. Eventually, we could simply say that * !pte_dirty(): zeropage placed during fault * pte_dirty(): zeropage placed by KSM Then it would also be easy to adjust counters and unmerge. We'd limit this handling to known-working architectures initially (spec64 still has the issue that pte_mkdirty() will set a pte writable ... and my patch to fix that was not merged yet). We'd have to double-check all pte_mkdirty/pte_mkclean() callsites.
On Mon, 13 Mar 2023 14:03:33 +0100 David Hildenbrand <david@redhat.com> wrote: > On 10.02.23 02:15, yang.yang29@zte.com.cn wrote: > > From: xu xin <xu.xin16@zte.com.cn> > > > > Hi, > > sorry for the late follow-up. Still wrapping my head around this and > possible alternatives. I hope we'll get some comments from others as > well about the basic approach. > > > The core idea of this patch set is to enable users to perceive the number of any > > pages merged by KSM, regardless of whether use_zero_page switch has been turned > > on, so that users can know how much free memory increase is really due to their > > madvise(MERGEABLE) actions. But the problem is, when enabling use_zero_pages, > > all empty pages will be merged with kernel zero pages instead of with each > > other as use_zero_pages is disabled, and then these zero-pages are no longer > > monitored by KSM. > > > > The motivations for me to do this contains three points: > > > > 1) MADV_UNMERGEABLE and other ways to trigger unsharing will *not* > > unshare the shared zeropage as placed by KSM (which is against the > > MADV_UNMERGEABLE documentation at least); see the link: > > https://lore.kernel.org/lkml/4a3daba6-18f9-d252-697c-197f65578c44@redhat.com/ > > > > 2) We cannot know how many pages are zero pages placed by KSM when > > enabling use_zero_pages, which hides the critical information about > > how much actual memory are really saved by KSM. Knowing how many > > ksm-placed zero pages are helpful for user to use the policy of madvise > > (MERGEABLE) better because they can see the actual profit brought by KSM. > > > > 3) The zero pages placed-by KSM are different from those initial empty page > > (filled with zeros) which are never touched by applications. The former > > is active-merged by KSM while the later have never consume actual memory. > > > > I agree with all of the above, but it's still unclear to me if there is > a real downside to a simpler approach: > > (1) Tracking the shared zeropages. That would be fairly easy: whenever > we map/unmap a shared zeropage, we simply update the counter. > > (2) Unmerging all shared zeropages inside the VMAs during > MADV_UNMERGEABLE. > > (3) Documenting that MADV_UNMERGEABLE will also unmerge the shared > zeropage when toggle xy is flipped. > > It's certainly simpler and doesn't rely on the rmap item. See below. I would surely prefer a simpler approach > > > use_zero_pages is useful, not only because of cache colouring as described > > in doc, but also because use_zero_pages can accelerate merging empty pages > > when there are plenty of empty pages (full of zeros) as the time of > > page-by-page comparisons (unstable_tree_search_insert) is saved. So we hope to > > implement the support for ksm zero page tracking without affecting the feature > > of use_zero_pages. > > > > Zero pages may be the most common merged pages in actual environment(not only VM but > > also including other application like containers). Enabling use_zero_pages in the > > environment with plenty of empty pages(full of zeros) will be very useful. Users and > > app developer can also benefit from knowing the proportion of zero pages in all > > merged pages to optimize applications. > > > > I agree with that point, especially after I read in a paper that KSM > applied to some applications mainly deduplicates pages filled with 0s. > So it seems like a reasonable thing to optimize for. > > > With the patch series, we can both unshare zero-pages(KSM-placed) accurately > > and count ksm zero pages with enabling use_zero_pages. > > The problem with this approach I see is that it fundamentally relies on > the rmap/stable-tree to detect whether a zeropage was placed or not. > > I was wondering, why we even need an rmap item *at all* anymore. Why > can't we place the shared zeropage an call it a day (remove the rmap > item)? Once we placed a shared zeropage, the next KSM scan should better > just ignore it, it's already deduplicated. > > So if most pages we deduplicate are shared zeropages, it would be quite > interesting to reduce the memory overhead and avoid rmap items, instead > of building new functionality on top of it? > > > > If we'd really want to identify whether a zeropage was deduplciated by > KSM, we could try storing that information inside the PTE instead of this is interesting, but needs caution, for the reason you mention below > inside the RMAP. Then, we could directly adjust the counter when zapping > the shared zeropage or during MADV_DONTNEED/when unmerging. > > Eventually, we could simply say that > * !pte_dirty(): zeropage placed during fault > * pte_dirty(): zeropage placed by KSM > > Then it would also be easy to adjust counters and unmerge. We'd limit > this handling to known-working architectures initially (spec64 still has > the issue that pte_mkdirty() will set a pte writable ... and my patch to > fix that was not merged yet). We'd have to double-check all > pte_mkdirty/pte_mkclean() callsites. this will be... interesting >
>> If we'd really want to identify whether a zeropage was deduplciated by >> KSM, we could try storing that information inside the PTE instead of > > this is interesting, but needs caution, for the reason you mention below > >> inside the RMAP. Then, we could directly adjust the counter when zapping >> the shared zeropage or during MADV_DONTNEED/when unmerging. >> >> Eventually, we could simply say that >> * !pte_dirty(): zeropage placed during fault >> * pte_dirty(): zeropage placed by KSM >> >> Then it would also be easy to adjust counters and unmerge. We'd limit >> this handling to known-working architectures initially (spec64 still has ^ I meant sparc64 here. We can (and should) have a testcase that deduplicates the shared zeropage using KSM and makes sure that writes properly lead to a write fault. >> the issue that pte_mkdirty() will set a pte writable ... and my patch to >> fix that was not merged yet). We'd have to double-check all >> pte_mkdirty/pte_mkclean() callsites. > > this will be... interesting IIRC, most code that touches the dirty bit makes sure that it operates on a proper struct page (via vm_normal_folio()). madvise_free_pte_range() is one such user. But we have to double-check.
On Fri, 10 Feb 2023 09:15:22 +0800 (CST) <yang.yang29@zte.com.cn> wrote: > The core idea of this patch set is to enable users to perceive the number of any > pages merged by KSM, regardless of whether use_zero_page switch has been turned > on, so that users can know how much free memory increase is really due to their > madvise(MERGEABLE) actions. But the problem is, when enabling use_zero_pages, > all empty pages will be merged with kernel zero pages instead of with each > other as use_zero_pages is disabled, and then these zero-pages are no longer > monitored by KSM. We appear to have some outstanding activity on this quite old patchset. From my notes: - An unresponded-to question from Claudia: https://lkml.kernel.org/r/20230307192421.30ab869c@p-imbrenda - Hoping for overall review from David - Another query from Claudia, and a response indicating that a v7 is in the works. https://lore.kernel.org/all/20230307194320.79373a26@p-imbrenda/T/#u - Another unresponded-to review query: https://lkml.kernel.org/r/20230307195119.745d0b46@p-imbrenda - Another response indicating that a v7 is coming https://lkml.kernel.org/r/20230307195313.2e21245a@p-imbrenda So I think I'll drop the v6 series. Please address all these things in the next version and let's start over.
On 29.03.23 00:38, Andrew Morton wrote: > On Fri, 10 Feb 2023 09:15:22 +0800 (CST) <yang.yang29@zte.com.cn> wrote: > >> The core idea of this patch set is to enable users to perceive the number of any >> pages merged by KSM, regardless of whether use_zero_page switch has been turned >> on, so that users can know how much free memory increase is really due to their >> madvise(MERGEABLE) actions. But the problem is, when enabling use_zero_pages, >> all empty pages will be merged with kernel zero pages instead of with each >> other as use_zero_pages is disabled, and then these zero-pages are no longer >> monitored by KSM. > > We appear to have some outstanding activity on this quite old patchset. > From my notes: > > - An unresponded-to question from Claudia: > https://lkml.kernel.org/r/20230307192421.30ab869c@p-imbrenda > > - Hoping for overall review from David I already shared some feedback in [1]. I think we should try to simplify this handling, as proposed in that mail. Still waiting for a reply. [1] https://lore.kernel.org/all/9d7a8be3-ee9e-3492-841b-a0af9952ef36@redhat.com/
Hi, I'm sorry to reply so late because I was so busy with my job matters recently. I appreciate David's idea of simplifying the implement of tracking KSM-placed zero pages. But I'm confused with how to implement that via pte_mkdirty/pte_dirty without affecting other functions now and in the future. > >I already shared some feedback in [1]. I think we should try to simplify >this handling, as proposed in that mail. Still waiting for a reply. > >[1] >https://lore.kernel.org/all/9d7a8be3-ee9e-3492-841b-a0af9952ef36@redhat.com/ I have some questions about using pte_mkdirty to mark KSM-placed zero pages. (1) Will KSM using pte_mkdirty to mark KSM-placed zero pages collides with the existing handling of the same pte in other featutes? And in the future, what if there are new codes also using pte_mkdirty for other goals. (2) Can the literal meaning of pte_mkdiry represents a pte that points to ksm zero page? (3) Suppose we use the pte_mkdirty approach, how to update/decline the count of ksm_zero_pages when upper app writting on the page triggers COW(Copy on Write)? In *mm_fault outside mm/ksm.c ? Move the previos message here to reply together. >The problem with this approach I see is that it fundamentally relies on >the rmap/stable-tree to detect whether a zeropage was placed or not. > >I was wondering, why we even need an rmap item *at all* anymore. Why >can't we place the shared zeropage an call it a day (remove the rmap >item)? Once we placed a shared zeropage, the next KSM scan should better >just ignore it, it's already deduplicated. The reason is as follows ... Initially, all scanned pages by ksmd will be assigned a rmap_item storing the page information and ksm information, which helps ksmd can know every scanned pages' status and update all counts especialy when COW happens. But since use_zero_pages feature was merged, the situation changed, ksm zero pages is the only exception of ksm-scanned page without owning a rmap_item in KSM, which leads to ksmd even don't know the existing of KSM-placed, and thus causes the problem of our patches aimed to solve.
On 30.03.23 14:06, xu xin wrote: > Hi, I'm sorry to reply so late because I was so busy with my job matters recently. > > I appreciate David's idea of simplifying the implement of tracking KSM-placed zero pages. > But I'm confused with how to implement that via pte_mkdirty/pte_dirty without affecting > other functions now and in the future. No need to worry about too much about the future here :) > >> >> I already shared some feedback in [1]. I think we should try to simplify >> this handling, as proposed in that mail. Still waiting for a reply. >> >> [1] >> https://lore.kernel.org/all/9d7a8be3-ee9e-3492-841b-a0af9952ef36@redhat.com/ > > I have some questions about using pte_mkdirty to mark KSM-placed zero pages. > > (1) Will KSM using pte_mkdirty to mark KSM-placed zero pages collides with the existing > handling of the same pte in other featutes? And in the future, what if there are new > codes also using pte_mkdirty for other goals. So far I am not aware of other users of the dirty bit for the shared zeropage. If ever required (why?) we could try finding another PTE bit. Or use a completely separate set of zeropages, if ever really running out of PTE bits. I selected pte_dirty() because it's available on all architectures and should be unused on the shared zeropage (always clean). Until then, we only have to worry about architectures that treat R/O dirty PTEs as writable (I only know sparc64), maybe a good justification to finally fix sparc64 and identify others. Again, happy to help here. [1] > > (2) Can the literal meaning of pte_mkdiry represents a pte that points to ksm zero page? I briefly scanned the code. pte_dirty() should mostly not matter for the shared zeropage. We have to double check (will do as well). > > (3) Suppose we use the pte_mkdirty approach, how to update/decline the count of ksm_zero_pages > when upper app writting on the page triggers COW(Copy on Write)? In *mm_fault outside > mm/ksm.c ? yes. Do it synchronously when unmapping the shared zeropage. diff --git a/mm/memory.c b/mm/memory.c index f456f3b5049c..78b6c60602dd 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1351,6 +1351,8 @@ zap_install_uffd_wp_if_needed(struct vm_area_struct *vma, pte_install_uffd_wp_if_needed(vma, addr, pte, pteval); } +#define is_ksm_zero_pte(pte) (is_zero_pfn(pte_pfn(pte)) && pte_dirty(pte)) + static unsigned long zap_pte_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, @@ -1392,8 +1394,11 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, tlb_remove_tlb_entry(tlb, pte, addr); zap_install_uffd_wp_if_needed(vma, addr, pte, details, ptent); - if (unlikely(!page)) + if (unlikely(!page)) { + if (is_ksm_zero_pte(ptent)) + /* TODO: adjust counter */ continue; + } delay_rmap = 0; if (!PageAnon(page)) { @@ -3111,6 +3116,8 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) inc_mm_counter(mm, MM_ANONPAGES); } } else { + if (is_ksm_zero_pte(orig_pte)) + /* TODO: adjust counter */ inc_mm_counter(mm, MM_ANONPAGES); } flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte)); The nice thing is, if we get it wrong we "only" get wrong counters. A prototype for that should be fairly simple -- to see what we're missing. > > > Move the previos message here to reply together. >> The problem with this approach I see is that it fundamentally relies on >> the rmap/stable-tree to detect whether a zeropage was placed or not. >> >> I was wondering, why we even need an rmap item *at all* anymore. Why >> can't we place the shared zeropage an call it a day (remove the rmap >> item)? Once we placed a shared zeropage, the next KSM scan should better >> just ignore it, it's already deduplicated. > > The reason is as follows ... > Initially, all scanned pages by ksmd will be assigned a rmap_item storing the page > information and ksm information, which helps ksmd can know every scanned pages' status and > update all counts especialy when COW happens. But since use_zero_pages feature was merged, > the situation changed, ksm zero pages is the only exception of ksm-scanned page without owning > a rmap_item in KSM, which leads to ksmd even don't know the existing of KSM-placed, and thus > causes the problem of our patches aimed to solve. > Understood, so per-PTE information would similarly work. [1] https://lkml.kernel.org/r/20221212130213.136267-1-david@redhat.com
>> The core idea of this patch set is to enable users to perceive the number of any >> pages merged by KSM, regardless of whether use_zero_page switch has been turned >> on, so that users can know how much free memory increase is really due to their> >> madvise(MERGEABLE) actions. But the problem is, when enabling use_zero_pages, >> all empty pages will be merged with kernel zero pages instead of with each >> other as use_zero_pages is disabled, and then these zero-pages are no longer >> monitored by KSM. > We have sent the v7 series, so some old codes might be changed, and no need to talk more. >We appear to have some outstanding activity on this quite old patchset. >From my notes: > >- An unresponded-to question from Claudia: > https://lkml.kernel.org/r/20230307192421.30ab869c@p-imbrenda > Claudia is right, but the v7 patches don't rely on rmap_items now, so we can skip this comment. >- Hoping for overall review from David > David's review is great, and we accept his advice on the basic approach in v7 patches to track the ksm zero pages. and now the v7 patches use pte_dirty to mark the KSM-placed zere pages. But since pte_dirty is related to architectures, and some architecture may treat pte_dirty as writable, so we restrict the feature of tracking ksm zero pages only to the tested and well-working architecture. >- Another query from Claudia, and a response indicating that a v7 is > in the works. > https://lore.kernel.org/all/20230307194320.79373a26@p-imbrenda/T/#u The v7 series don't changed it there now, then we can skip it. > >- Another unresponded-to review query: > https://lkml.kernel.org/r/20230307195119.745d0b46@p-imbrenda > The v7 series have referred to Claudia's review but made a few modifications. >- Another response indicating that a v7 is coming > https://lkml.kernel.org/r/20230307195313.2e21245a@p-imbrenda > > So I think I'll drop the v6 series. Please address all these things in > the next version and let's start over. Yes, the next version is here: https://lore.kernel.org/lkml/202304131346489021903@zte.com.cn/
On Fri, 14 Apr 2023 09:35:07 +0800 xu xin <xu.xin.sc@gmail.com> wrote: > >> The core idea of this patch set is to enable users to perceive the number of any > >> pages merged by KSM, regardless of whether use_zero_page switch has been turned > >> on, so that users can know how much free memory increase is really due to their> > >> madvise(MERGEABLE) actions. But the problem is, when enabling use_zero_pages, > >> all empty pages will be merged with kernel zero pages instead of with each > >> other as use_zero_pages is disabled, and then these zero-pages are no longer > >> monitored by KSM. > > > > We have sent the v7 series, so some old codes might be changed, and no need to talk > more. > > >We appear to have some outstanding activity on this quite old patchset. > >From my notes: > > > >- An unresponded-to question from Claudia: > > https://lkml.kernel.org/r/20230307192421.30ab869c@p-imbrenda > > > > Claudia is right, but the v7 patches don't rely on rmap_items now, so we can skip > this comment. > > >- Hoping for overall review from David > > > > David's review is great, and we accept his advice on the basic approach in v7 patches > to track the ksm zero pages. and now the v7 patches use pte_dirty to mark the > KSM-placed zere pages. But since pte_dirty is related to architectures, and some > architecture may treat pte_dirty as writable, so we restrict the feature of tracking > ksm zero pages only to the tested and well-working architecture. > > >- Another query from Claudia, and a response indicating that a v7 is > > in the works. > > https://lore.kernel.org/all/20230307194320.79373a26@p-imbrenda/T/#u > > The v7 series don't changed it there now, then we can skip it. > > > > >- Another unresponded-to review query: > > https://lkml.kernel.org/r/20230307195119.745d0b46@p-imbrenda > > > > The v7 series have referred to Claudia's review but made a few modifications. I would just like to point out that my name is Claudio, not Claudia. (this goes to Andrew as well) thanks > > >- Another response indicating that a v7 is coming > > https://lkml.kernel.org/r/20230307195313.2e21245a@p-imbrenda > > > > So I think I'll drop the v6 series. Please address all these things in > > the next version and let's start over. > > Yes, the next version is here: > https://lore.kernel.org/lkml/202304131346489021903@zte.com.cn/ > >