From patchwork Thu Apr 13 05:50:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Yang X-Patchwork-Id: 82809 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp812667vqo; Wed, 12 Apr 2023 23:01:24 -0700 (PDT) X-Google-Smtp-Source: AKy350Y4RAHoq/U/WDFCA5y6qYK13Tgn+/tiJa7Ag4CnTuD5Lva5Cp7y9ejA1cXlvoJOFM9xccsb X-Received: by 2002:a17:906:4409:b0:94e:3d6f:9c0f with SMTP id x9-20020a170906440900b0094e3d6f9c0fmr1599071ejo.55.1681365684381; Wed, 12 Apr 2023 23:01:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681365684; cv=none; d=google.com; s=arc-20160816; b=xUVcShm17rhVZfeGbHTaABnYXEb7qn2lIRXJwzSW7eKcmSX6blQI4kLeXkSFRK3NSV 9ckfc2Qe4PfOi6c2V/D+g/4Z7fXklEothkwEKtK+VcQAD3GwC3ePan3ZgmgOm4DlJrjm b0h/buiHoYWVv00n2+rfAGH+u2BACeOGj5b102eGxAtCZULsqe5OM5HKk17YMmWY2Lq3 JjGyQiMar6Qci7LvtoT6jOVtBMyFSm1r/aU3xFkHaoFYQUIHtHU9rSORuZZ1nlCoycMG Rd50jsBBx0x07XTJ+FAnsmyqqyXHlwwvMJ/vEE3C3i1YHSz8JfdlxfGqR0+3TZ9grzNd RoAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=dxGFc8BWdb3mFUjsDUu5EqZ8Z2G0mV6aF2Gdu8bpECI=; b=kZ+EHrORTNz5a8Gq2aVkYhQejtT1a/FDh8ErCQtRn5hkv+P+tvdO5Nv4M+dxjiMMwv lF7uh4MN5ayx+BJtbouZ5rQv58OUubcr8UK+eM+JgHLT59LyTFWqu1RumovC89KW6WRz psbDzXWo1Z5yfvj80nqgOjhP2U062HVM1ixJ6RC+FRJKUNN6fjuiVVYgXTpK6UWQrR0G B6rmgdtyKd/aNo4eDFNxd5yvAMPBOsjJoe3UVIqslh6DajQW2ATLHPAWrU9521L14nC/ dC7Ox1hQFwJjQQKhEGY+QD7ixZ03KwnkDEkXhRXTthiwBYoE/IZCuQa58SkctYMFQNOj x43Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=zte.com.cn Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c30-20020a17090620de00b0094a94e7805bsi951318ejc.638.2023.04.12.23.01.00; Wed, 12 Apr 2023 23:01:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=zte.com.cn Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229810AbjDMFup (ORCPT + 99 others); Thu, 13 Apr 2023 01:50:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53748 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229492AbjDMFuo (ORCPT ); Thu, 13 Apr 2023 01:50:44 -0400 Received: from ubuntu20 (unknown [193.203.214.57]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 20283449C for ; Wed, 12 Apr 2023 22:50:42 -0700 (PDT) Received: by ubuntu20 (Postfix, from userid 1003) id 6F655E04C3; Thu, 13 Apr 2023 05:50:41 +0000 (UTC) From: Yang Yang To: akpm@linux-foundation.org, david@redhat.com Cc: yang.yang29@zte.com.cn, imbrenda@linux.ibm.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, ran.xiaokai@zte.com.cn, xu.xin.sc@gmail.com, xu.xin16@zte.com.cn, Xuexin Jiang Subject: [PATCH v7 1/6] ksm: support unsharing KSM-placed zero pages Date: Thu, 13 Apr 2023 13:50:38 +0800 Message-Id: <20230413055038.180952-1-yang.yang29@zte.com.cn> X-Mailer: git-send-email 2.25.1 In-Reply-To: <202304131346489021903@zte.com.cn> References: <202304131346489021903@zte.com.cn> MIME-Version: 1.0 X-Spam-Status: No, score=3.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,FSL_HELO_NON_FQDN_1, HEADER_FROM_DIFFERENT_DOMAINS,HELO_NO_DOMAIN,NO_DNS_FOR_FROM, RCVD_IN_PBL,RDNS_NONE,SPF_SOFTFAIL,SPOOFED_FREEMAIL_NO_RDNS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Level: *** X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1763039704123872303?= X-GMAIL-MSGID: =?utf-8?q?1763039704123872303?= From: xu xin When use_zero_pages of ksm is enabled, madvise(addr, len, MADV_UNMERGEABLE) and other ways (like write 2 to /sys/kernel/mm/ksm/run) to trigger unsharing will *not* actually unshare the shared zeropage as placed by KSM (which is against the MADV_UNMERGEABLE documentation). As these KSM-placed zero pages are out of the control of KSM, the related counts of ksm pages don't expose how many zero pages are placed by KSM (these special zero pages are different from those initially mapped zero pages, because the zero pages mapped to MADV_UNMERGEABLE areas are expected to be a complete and unshared page) To not blindly unshare all shared zero_pages in applicable VMAs, the patch use pte_mkdirty (related with architecture) to mark KSM-placed zero pages. Thus, MADV_UNMERGEABLE will only unshare those KSM-placed zero pages. The architecture must guarantee that pte_mkdirty won't treat the pte as writable. Otherwise, it will break KSM pages state (wrprotect) and affect the KSM functionality. For safety, we restrict this feature only to the tested and known-working architechtures fow now. The patch will not degrade the performance of use_zero_pages as it doesn't change the way of merging empty pages in use_zero_pages's feature. Signed-off-by: xu xin Suggested-by: David Hildenbrand Cc: Claudio Imbrenda Cc: Xuexin Jiang Reviewed-by: Xiaokai Ran Reviewed-by: Yang Yang --- include/linux/ksm.h | 9 +++++++++ mm/Kconfig | 24 +++++++++++++++++++++++- mm/ksm.c | 5 +++-- 3 files changed, 35 insertions(+), 3 deletions(-) diff --git a/include/linux/ksm.h b/include/linux/ksm.h index d5f69f18ee5a..f0cc085be42a 100644 --- a/include/linux/ksm.h +++ b/include/linux/ksm.h @@ -95,4 +95,13 @@ static inline void folio_migrate_ksm(struct folio *newfolio, struct folio *old) #endif /* CONFIG_MMU */ #endif /* !CONFIG_KSM */ +#ifdef CONFIG_KSM_ZERO_PAGES_TRACK +/* use pte_mkdirty to track a KSM-placed zero page */ +#define set_pte_ksm_zero(pte) pte_mkdirty(pte_mkspecial(pte)) +#define is_ksm_zero_pte(pte) (is_zero_pfn(pte_pfn(pte)) && pte_dirty(pte)) +#else /* !CONFIG_KSM_ZERO_PAGES_TRACK */ +#define set_pte_ksm_zero(pte) pte_mkspecial(pte) +#define is_ksm_zero_pte(pte) 0 +#endif /* CONFIG_KSM_ZERO_PAGES_TRACK */ + #endif /* __LINUX_KSM_H */ diff --git a/mm/Kconfig b/mm/Kconfig index 3894a6309c41..42f69f421a03 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -666,7 +666,7 @@ config MMU_NOTIFIER bool select INTERVAL_TREE -config KSM +menuconfig KSM bool "Enable KSM for page merging" depends on MMU select XXHASH @@ -681,6 +681,28 @@ config KSM until a program has madvised that an area is MADV_MERGEABLE, and root has set /sys/kernel/mm/ksm/run to 1 (if CONFIG_SYSFS is set). +if KSM + +config KSM_ZERO_PAGES_TRACK + bool "support tracking KSM-placed zero pages" + depends on KSM + depends on ARM || ARM64 || X86 + default y + help + This allows KSM to track KSM-placed zero pages, including supporting + unsharing and counting the KSM-placed zero pages. if say N, then + madvise(,,UNMERGEABLE) can't unshare the KSM-placed zero pages, and + users can't know how many zero pages are placed by KSM. This feature + depends on pte_mkdirty (related with architecture) to mark KSM-placed + zero pages. + + The architecture must guarantee that pte_mkdirty won't treat the pte + as writable. Otherwise, it will break KSM pages state (wrprotect) and + affect the KSM functionality. For safety, we restrict this feature only + to the tested and known-working architechtures. + +endif # KSM + config DEFAULT_MMAP_MIN_ADDR int "Low address space to protect from user allocation" depends on MMU diff --git a/mm/ksm.c b/mm/ksm.c index 7cd7e12cd3df..1d1771a6b3fe 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -447,7 +447,8 @@ static int break_ksm_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long nex if (is_migration_entry(entry)) page = pfn_swap_entry_to_page(entry); } - ret = page && PageKsm(page); + /* return 1 if the page is an normal ksm page or KSM-placed zero page */ + ret = (page && PageKsm(page)) || is_ksm_zero_pte(*pte); pte_unmap_unlock(pte, ptl); return ret; } @@ -1240,7 +1241,7 @@ static int replace_page(struct vm_area_struct *vma, struct page *page, page_add_anon_rmap(kpage, vma, addr, RMAP_NONE); newpte = mk_pte(kpage, vma->vm_page_prot); } else { - newpte = pte_mkspecial(pfn_pte(page_to_pfn(kpage), + newpte = set_pte_ksm_zero(pfn_pte(page_to_pfn(kpage), vma->vm_page_prot)); /* * We're replacing an anonymous page with a zero page, which is