Message ID | 20230522104908.3999-1-yang.yang29@zte.com.cn |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1353188vqo; Mon, 22 May 2023 03:52:21 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5yOdyJsea/1WU/Z6vl405g2n6IJW+HQCOWPnJkujFclnztvUMCBLcLfAGTdCIZVmuJJtDa X-Received: by 2002:a05:6a20:7fa8:b0:101:9344:bf82 with SMTP id d40-20020a056a207fa800b001019344bf82mr11507507pzj.15.1684752741185; Mon, 22 May 2023 03:52:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684752741; cv=none; d=google.com; s=arc-20160816; b=KRZ0EDVSuz7soAoJtaVbIwMqcNatEUWQLrHTEXAWfc152dzwVS8hILOFKE+t6tXoj5 T4lR3ELXUbpb0Udm+XByh5EyGern0mw6rfjyZPprYNPeFVg6ER8fXdimAXVBwOSeopvj ArOLA3kiAE9zV/pSxFEkMcmc9EVJ0IEQL80UrmhVXaaZksBWv1qMIwUcn5KjD5LR6H92 em4kv2VsVyHEJCOUizKz0Tswx9K/tGzZE3zEX4zYAnSo0uSnRMODlO1aHzobnqYgthxI R/Xu9d7PgnagDV5hr68/mAHDqe3d1bSt7ZltJIIkiHdFhwxEXdza4B0xiKXfCyZVAvAg i0QQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=26JJ0jludqg/KlrhhNatsJle7K7stopN854w36nWFVw=; b=gTR3x+kv9xdXiIXQo6mJVaeUbrKh+oEwRlwjHf6+3KHKLCtLvV+nakHE+rYC7ecKa0 xdF7Q1UtbHRAxA58/QpbvcTuj9eufTZEV1xab/Cm3ftK/TO5PjbwA8+9PGCOhmxSTyHF HOkT+vjRjMnPKNcUbACdu5YYNykYws2OQSCk+8/2VoCiup4mqG0IwCL39UrbupR9aOzn Q5NJSjD2VLYZ9mpoJm7QF8lQFJaQzYcqcgKaF3cWeONeXOQEVUDOxWrracWKU/iwHBLx P50aJ0VJp7xtMG0lNxirPP5c021bqf/K9R/e/tWZxZXzD07pQ47geP4NQzUtTrMygS7u 7Agg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=zte.com.cn Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j29-20020a637a5d000000b00534869d75d0si427677pgn.767.2023.05.22.03.52.06; Mon, 22 May 2023 03:52:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=zte.com.cn Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231983AbjEVKtQ (ORCPT <rfc822;cscallsign@gmail.com> + 99 others); Mon, 22 May 2023 06:49:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40016 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230188AbjEVKtO (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 22 May 2023 06:49:14 -0400 Received: from ubuntu20 (unknown [193.203.214.57]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DD538DC for <linux-kernel@vger.kernel.org>; Mon, 22 May 2023 03:49:13 -0700 (PDT) Received: by ubuntu20 (Postfix, from userid 1003) id 5EA9FE0CF6; Mon, 22 May 2023 18:49:11 +0800 (CST) From: Yang Yang <yang.yang29@zte.com.cn> To: akpm@linux-foundation.org, david@redhat.com Cc: yang.yang29@zte.com.cn, imbrenda@linux.ibm.com, jiang.xuexin@zte.com.cn, linux-kernel@vger.kernel.org, linux-mm@kvack.org, ran.xiaokai@zte.com.cn, xu.xin.sc@gmail.com, xu.xin16@zte.com.cn Subject: [PATCH v8 1/6] ksm: support unsharing KSM-placed zero pages Date: Mon, 22 May 2023 18:49:08 +0800 Message-Id: <20230522104908.3999-1-yang.yang29@zte.com.cn> X-Mailer: git-send-email 2.25.1 In-Reply-To: <202305221842587200002@zte.com.cn> References: <202305221842587200002@zte.com.cn> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=3.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,FSL_HELO_NON_FQDN_1, HEADER_FROM_DIFFERENT_DOMAINS,HELO_NO_DOMAIN,NO_DNS_FOR_FROM, RCVD_IN_PBL,RDNS_NONE,SPF_SOFTFAIL,SPOOFED_FREEMAIL_NO_RDNS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Level: *** X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766591290141465149?= X-GMAIL-MSGID: =?utf-8?q?1766591290141465149?= |
Series |
ksm: support tracking KSM-placed zero-pages
|
|
Commit Message
Yang Yang
May 22, 2023, 10:49 a.m. UTC
From: xu xin <xu.xin16@zte.com.cn> When use_zero_pages of ksm is enabled, madvise(addr, len, MADV_UNMERGEABLE) and other ways (like write 2 to /sys/kernel/mm/ksm/run) to trigger unsharing will *not* actually unshare the shared zeropage as placed by KSM (which is against the MADV_UNMERGEABLE documentation). As these KSM-placed zero pages are out of the control of KSM, the related counts of ksm pages don't expose how many zero pages are placed by KSM (these special zero pages are different from those initially mapped zero pages, because the zero pages mapped to MADV_UNMERGEABLE areas are expected to be a complete and unshared page) To not blindly unshare all shared zero_pages in applicable VMAs, the patch use pte_mkdirty (related with architecture) to mark KSM-placed zero pages. Thus, MADV_UNMERGEABLE will only unshare those KSM-placed zero pages. The patch will not degrade the performance of use_zero_pages as it doesn't change the way of merging empty pages in use_zero_pages's feature. Signed-off-by: xu xin <xu.xin16@zte.com.cn> Suggested-by: David Hildenbrand <david@redhat.com> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Xuexin Jiang <jiang.xuexin@zte.com.cn> Reviewed-by: Xiaokai Ran <ran.xiaokai@zte.com.cn> Reviewed-by: Yang Yang <yang.yang29@zte.com.cn> --- include/linux/ksm.h | 6 ++++++ mm/ksm.c | 5 +++-- 2 files changed, 9 insertions(+), 2 deletions(-)
Comments
On 22.05.23 12:49, Yang Yang wrote: > From: xu xin <xu.xin16@zte.com.cn> > > When use_zero_pages of ksm is enabled, madvise(addr, len, MADV_UNMERGEABLE) > and other ways (like write 2 to /sys/kernel/mm/ksm/run) to trigger > unsharing will *not* actually unshare the shared zeropage as placed by KSM > (which is against the MADV_UNMERGEABLE documentation). As these KSM-placed > zero pages are out of the control of KSM, the related counts of ksm pages > don't expose how many zero pages are placed by KSM (these special zero > pages are different from those initially mapped zero pages, because the > zero pages mapped to MADV_UNMERGEABLE areas are expected to be a complete > and unshared page) > > To not blindly unshare all shared zero_pages in applicable VMAs, the patch > use pte_mkdirty (related with architecture) to mark KSM-placed zero pages. > Thus, MADV_UNMERGEABLE will only unshare those KSM-placed zero pages. > > The patch will not degrade the performance of use_zero_pages as it doesn't > change the way of merging empty pages in use_zero_pages's feature. > Maybe add: "We'll reuse this mechanism to reliably identify KSM-placed zeropages to properly account for them (e.g., calculating the KSM profit that includes zeropages) next." > Signed-off-by: xu xin <xu.xin16@zte.com.cn> > Suggested-by: David Hildenbrand <david@redhat.com> > Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> > Cc: Xuexin Jiang <jiang.xuexin@zte.com.cn> > Reviewed-by: Xiaokai Ran <ran.xiaokai@zte.com.cn> > Reviewed-by: Yang Yang <yang.yang29@zte.com.cn> > --- > include/linux/ksm.h | 6 ++++++ > mm/ksm.c | 5 +++-- > 2 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/include/linux/ksm.h b/include/linux/ksm.h > index 899a314bc487..7989200cdbb7 100644 > --- a/include/linux/ksm.h > +++ b/include/linux/ksm.h > @@ -26,6 +26,9 @@ int ksm_disable(struct mm_struct *mm); > > int __ksm_enter(struct mm_struct *mm); > void __ksm_exit(struct mm_struct *mm); > +/* use pte_mkdirty to track a KSM-placed zero page */ > +#define set_pte_ksm_zero(pte) pte_mkdirty(pte_mkspecial(pte)) If there is only a single user (which I assume), please inline it instead. Let's add some more documentation: /* * To identify zeropages that were mapped by KSM, we reuse the dirty bit * in the PTE. If the PTE is dirty, the zeropage was mapped by KSM when * deduplicating memory. */ > +#define is_ksm_zero_pte(pte) (is_zero_pfn(pte_pfn(pte)) && pte_dirty(pte)) > > static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm) > { > @@ -95,6 +98,9 @@ static inline void ksm_exit(struct mm_struct *mm) > { > } > > +#define set_pte_ksm_zero(pte) pte_mkspecial(pte) > +#define is_ksm_zero_pte(pte) 0 > + > #ifdef CONFIG_MEMORY_FAILURE > static inline void collect_procs_ksm(struct page *page, > struct list_head *to_kill, int force_early) > diff --git a/mm/ksm.c b/mm/ksm.c > index 0156bded3a66..9962f5962afd 100644 > --- a/mm/ksm.c > +++ b/mm/ksm.c > @@ -447,7 +447,8 @@ static int break_ksm_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long nex > if (is_migration_entry(entry)) > page = pfn_swap_entry_to_page(entry); > } > - ret = page && PageKsm(page); > + /* return 1 if the page is an normal ksm page or KSM-placed zero page */ > + ret = (page && PageKsm(page)) || is_ksm_zero_pte(*pte); > pte_unmap_unlock(pte, ptl); > return ret; > } > @@ -1220,7 +1221,7 @@ static int replace_page(struct vm_area_struct *vma, struct page *page, > page_add_anon_rmap(kpage, vma, addr, RMAP_NONE); > newpte = mk_pte(kpage, vma->vm_page_prot); > } else { > - newpte = pte_mkspecial(pfn_pte(page_to_pfn(kpage), > + newpte = set_pte_ksm_zero(pfn_pte(page_to_pfn(kpage), > vma->vm_page_prot)); > /* > * We're replacing an anonymous page with a zero page, which is Apart from that LGTM.
Excuse me, I'm wondering why using inline here instead of macro is better. Thanks! :) Thanks for reviews.
>> --- >> include/linux/ksm.h | 6 ++++++ >> mm/ksm.c | 5 +++-- >> 2 files changed, 9 insertions(+), 2 deletions(-) >> >> diff --git a/include/linux/ksm.h b/include/linux/ksm.h >> index 899a314bc487..7989200cdbb7 100644 >> --- a/include/linux/ksm.h >> +++ b/include/linux/ksm.h >> @@ -26,6 +26,9 @@ int ksm_disable(struct mm_struct *mm); >> >> int __ksm_enter(struct mm_struct *mm); >> void __ksm_exit(struct mm_struct *mm); >> +/* use pte_mkdirty to track a KSM-placed zero page */ >> +#define set_pte_ksm_zero(pte) pte_mkdirty(pte_mkspecial(pte)) > >If there is only a single user (which I assume), please inline it instead. Excuse me, I'm wondering why using inline here instead of macro is better. Thanks! :) Thanks for reviews.
On 23.05.23 15:57, xu xin wrote: >>> --- >>> include/linux/ksm.h | 6 ++++++ >>> mm/ksm.c | 5 +++-- >>> 2 files changed, 9 insertions(+), 2 deletions(-) >>> >>> diff --git a/include/linux/ksm.h b/include/linux/ksm.h >>> index 899a314bc487..7989200cdbb7 100644 >>> --- a/include/linux/ksm.h >>> +++ b/include/linux/ksm.h >>> @@ -26,6 +26,9 @@ int ksm_disable(struct mm_struct *mm); >>> >>> int __ksm_enter(struct mm_struct *mm); >>> void __ksm_exit(struct mm_struct *mm); >>> +/* use pte_mkdirty to track a KSM-placed zero page */ >>> +#define set_pte_ksm_zero(pte) pte_mkdirty(pte_mkspecial(pte)) >> >> If there is only a single user (which I assume), please inline it instead. > > Excuse me, I'm wondering why using inline here instead of macro is better. > Thanks! :) Just to clarify: not an inline function but removing the macro completely and just place that code directly into the single caller. Single user, no need to put that into ksm.h -- and I'm not super happy about the set_pte_ksm_zero() name ;) because we get the zero-pte already passed in from the caller ...
>>>> --- >>>> include/linux/ksm.h | 6 ++++++ >>>> mm/ksm.c | 5 +++-- >>>> 2 files changed, 9 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/include/linux/ksm.h b/include/linux/ksm.h >>>> index 899a314bc487..7989200cdbb7 100644 >>>> --- a/include/linux/ksm.h >>>> +++ b/include/linux/ksm.h >>>> @@ -26,6 +26,9 @@ int ksm_disable(struct mm_struct *mm); >>>> >>>> int __ksm_enter(struct mm_struct *mm); >>>> void __ksm_exit(struct mm_struct *mm); >>>> +/* use pte_mkdirty to track a KSM-placed zero page */ >>>> +#define set_pte_ksm_zero(pte) pte_mkdirty(pte_mkspecial(pte)) >>> >>> If there is only a single user (which I assume), please inline it instead. >> >> Excuse me, I'm wondering why using inline here instead of macro is better. >> Thanks! :) > >Just to clarify: not an inline function but removing the macro >completely and just place that code directly into the single caller. > >Single user, no need to put that into ksm.h -- and I'm not super happy >about the set_pte_ksm_zero() name ;) because we get the zero-pte already >passed in from the caller ... Oh, I see. Thanks
diff --git a/include/linux/ksm.h b/include/linux/ksm.h index 899a314bc487..7989200cdbb7 100644 --- a/include/linux/ksm.h +++ b/include/linux/ksm.h @@ -26,6 +26,9 @@ int ksm_disable(struct mm_struct *mm); int __ksm_enter(struct mm_struct *mm); void __ksm_exit(struct mm_struct *mm); +/* use pte_mkdirty to track a KSM-placed zero page */ +#define set_pte_ksm_zero(pte) pte_mkdirty(pte_mkspecial(pte)) +#define is_ksm_zero_pte(pte) (is_zero_pfn(pte_pfn(pte)) && pte_dirty(pte)) static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm) { @@ -95,6 +98,9 @@ static inline void ksm_exit(struct mm_struct *mm) { } +#define set_pte_ksm_zero(pte) pte_mkspecial(pte) +#define is_ksm_zero_pte(pte) 0 + #ifdef CONFIG_MEMORY_FAILURE static inline void collect_procs_ksm(struct page *page, struct list_head *to_kill, int force_early) diff --git a/mm/ksm.c b/mm/ksm.c index 0156bded3a66..9962f5962afd 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -447,7 +447,8 @@ static int break_ksm_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long nex if (is_migration_entry(entry)) page = pfn_swap_entry_to_page(entry); } - ret = page && PageKsm(page); + /* return 1 if the page is an normal ksm page or KSM-placed zero page */ + ret = (page && PageKsm(page)) || is_ksm_zero_pte(*pte); pte_unmap_unlock(pte, ptl); return ret; } @@ -1220,7 +1221,7 @@ static int replace_page(struct vm_area_struct *vma, struct page *page, page_add_anon_rmap(kpage, vma, addr, RMAP_NONE); newpte = mk_pte(kpage, vma->vm_page_prot); } else { - newpte = pte_mkspecial(pfn_pte(page_to_pfn(kpage), + newpte = set_pte_ksm_zero(pfn_pte(page_to_pfn(kpage), vma->vm_page_prot)); /* * We're replacing an anonymous page with a zero page, which is