From patchwork Mon Dec 18 10:50:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 180261 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp1154295dyi; Mon, 18 Dec 2023 02:51:48 -0800 (PST) X-Google-Smtp-Source: AGHT+IFRPj73IKFs66Nx3aKAiJBJYKw28WIqHg+EkBHqQ5fax8+A3zol4p4pJ4KAUHr+S22VNxdS X-Received: by 2002:a17:902:f543:b0:1d0:700b:3f88 with SMTP id h3-20020a170902f54300b001d0700b3f88mr22490745plf.66.1702896708446; Mon, 18 Dec 2023 02:51:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702896708; cv=none; d=google.com; s=arc-20160816; b=BTtMXHdhta0G+UkDDZqn+BigZPudLp3YMSW7mv2+Jf1Tu0RN3vtYLIjCWjKjxt/Uu4 FTnzc2X1pwD7upW+svNwYrJ2Xeh1gPS3er8CLxI7evZ8Wrs1Q+MY2xpyThVruvpqWY24 DaCffiAAz399QtTCDHcHUy69i5sUykB9FxdPyDw7JxuyDLTEIZpXTZ+w1rJmt7EZ1JXv 4x2VdTY13gblxIF+O+5dp1NUTiMZO3hYzyDqs2W38F0XunW8c2ZdDuYgG61QJFVadMbZ aJ6TV0ooctf2rso0Qp8lJkNdetzBCrhmZ1IRlIl12o/j8UFWzA+45n4K/Iidz6dHidOI yxKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=nHYd6pL2au0ylISSnRYYRDrN6i/JAMSJRS7tvhKFDUA=; fh=a3gZESXuO1ptbCtzvzEnZL5xUl+WC4Vy5hoF2yaOXSs=; b=xJwXaQwOOV4g1Xw3Y+41TlS+e/4dFg5QHHAhQBaZK1jsqUh69SbGtqasj45eNdYG/i vTf3L30d80sFf5MBcGPJllPVfOdQ+rY3pxDiETIDU8AeG3yxbDOdxMn/nikelr8CQv3h in5z5dCWR7mm3jVlm/TIuYZdTqgLN67cSklrJEYAQP9B77begNCYRqBjsB40N/YIgzzy 4bf+2+d2ULJX1AT8YdpnCjD5KCK+ADIvfn8SvfqRuB9QsqymNKOrh/QrO8uEQgVTulh2 rDHawRV5483Gd4KICPolDRhPPCcGSKLdsq6QRLWu0OrXiSfWS6sgaZ7Mx73DYzW100ST 35SA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3370-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3370-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id by4-20020a056a02058400b005cd8aaec136si3048107pgb.307.2023.12.18.02.51.48 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Dec 2023 02:51:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-3370-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3370-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3370-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 3A10E283FDE for ; Mon, 18 Dec 2023 10:51:48 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 9D5CA1A284; Mon, 18 Dec 2023 10:51:25 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 909BB199CC for ; Mon, 18 Dec 2023 10:51:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 754E92F4; Mon, 18 Dec 2023 02:52:06 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9BE873F738; Mon, 18 Dec 2023 02:51:18 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland , David Hildenbrand , Kefeng Wang , John Hubbard , Zi Yan , Barry Song <21cnbao@gmail.com>, Alistair Popple , Yang Shi Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 01/16] mm: thp: Batch-collapse PMD with set_ptes() Date: Mon, 18 Dec 2023 10:50:45 +0000 Message-Id: <20231218105100.172635-2-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231218105100.172635-1-ryan.roberts@arm.com> References: <20231218105100.172635-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785616619009231823 X-GMAIL-MSGID: 1785616619009231823 Refactor __split_huge_pmd_locked() so that a present PMD can be collapsed to PTEs in a single batch using set_ptes(). It also provides a future opportunity to batch-add the folio to the rmap using David's new batched rmap APIs. This should improve performance a little bit, but the real motivation is to remove the need for the arm64 backend to have to fold the contpte entries. Instead, since the ptes are set as a batch, the contpte blocks can be initially set up pre-folded (once the arm64 contpte support is added in the next few patches). This leads to noticeable performance improvement during split. Signed-off-by: Ryan Roberts Acked-by: David Hildenbrand --- mm/huge_memory.c | 59 ++++++++++++++++++++++++++++-------------------- 1 file changed, 34 insertions(+), 25 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 6be1a380a298..fbf7e95ea983 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2535,15 +2535,16 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, pte = pte_offset_map(&_pmd, haddr); VM_BUG_ON(!pte); - for (i = 0, addr = haddr; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE) { - pte_t entry; - /* - * Note that NUMA hinting access restrictions are not - * transferred to avoid any possibility of altering - * permissions across VMAs. - */ - if (freeze || pmd_migration) { + + /* + * Note that NUMA hinting access restrictions are not transferred to + * avoid any possibility of altering permissions across VMAs. + */ + if (freeze || pmd_migration) { + for (i = 0, addr = haddr; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE) { + pte_t entry; swp_entry_t swp_entry; + if (write) swp_entry = make_writable_migration_entry( page_to_pfn(page + i)); @@ -2562,28 +2563,36 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, entry = pte_swp_mksoft_dirty(entry); if (uffd_wp) entry = pte_swp_mkuffd_wp(entry); - } else { - entry = mk_pte(page + i, READ_ONCE(vma->vm_page_prot)); - if (write) - entry = pte_mkwrite(entry, vma); + + VM_WARN_ON(!pte_none(ptep_get(pte + i))); + set_pte_at(mm, addr, pte + i, entry); + } + } else { + pte_t entry; + + entry = mk_pte(page, READ_ONCE(vma->vm_page_prot)); + if (write) + entry = pte_mkwrite(entry, vma); + if (!young) + entry = pte_mkold(entry); + /* NOTE: this may set soft-dirty too on some archs */ + if (dirty) + entry = pte_mkdirty(entry); + if (soft_dirty) + entry = pte_mksoft_dirty(entry); + if (uffd_wp) + entry = pte_mkuffd_wp(entry); + + for (i = 0, addr = haddr; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE) { if (anon_exclusive) SetPageAnonExclusive(page + i); - if (!young) - entry = pte_mkold(entry); - /* NOTE: this may set soft-dirty too on some archs */ - if (dirty) - entry = pte_mkdirty(entry); - if (soft_dirty) - entry = pte_mksoft_dirty(entry); - if (uffd_wp) - entry = pte_mkuffd_wp(entry); page_add_anon_rmap(page + i, vma, addr, RMAP_NONE); + VM_WARN_ON(!pte_none(ptep_get(pte + i))); } - VM_BUG_ON(!pte_none(ptep_get(pte))); - set_pte_at(mm, addr, pte, entry); - pte++; + + set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR); } - pte_unmap(pte - 1); + pte_unmap(pte); if (!pmd_migration) page_remove_rmap(page, vma, true); From patchwork Mon Dec 18 10:50:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 180262 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp1154407dyi; Mon, 18 Dec 2023 02:52:02 -0800 (PST) X-Google-Smtp-Source: AGHT+IGPgaQnCq5JNdhSV6REvefelH3K1r+4CKmW3wOVkhHj7LwQzlr4byFbyGfReiuqtmtASa77 X-Received: by 2002:ac8:7c46:0:b0:425:8f81:6c8b with SMTP id o6-20020ac87c46000000b004258f816c8bmr25217683qtv.53.1702896722616; Mon, 18 Dec 2023 02:52:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702896722; cv=none; d=google.com; s=arc-20160816; b=wBeWReTERiq3ctt0HWckXzKLmSeL9ttn2VBO+ZCgBMe1VsUVvA0QstuXMXmftgzNiP fyTDHbcAxqAKQ0JPMVh3wtHzGfhJB82bzUkwShZr6pjrhf3CpxUZ1ZOA6nSu2If+XbHh ZzF3GYZT6iq2In3ZdDhGY0Az+psoMc61DpFGK8AeMCEsez/jebLSL1P0SMX6NONsNPAd BhGZYlsmuvdIiWnG/R+PdZPVytqFKdYPuhHCpKC6KdTfMJyJXDd9PIO/whqNXq5Q5qs0 jh8Iv6A9YRi2posFN56uh8iSTqm/T9Eqcg9BrMwQXiWMD7SCh5H5Ytx+kQ4Qfx0T0z/q D3Bw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=HMNkpYltWDvuMPO5V4A9nNfVvksmNsk6NYV9JnDoT3k=; fh=a3gZESXuO1ptbCtzvzEnZL5xUl+WC4Vy5hoF2yaOXSs=; b=a5DKxpnoR6CLlWA8vh5YZJRtDjlCGM1IT1OAZg79/AZktZbJ+FnQKE2aOa/fvuUpju Be8LguIoeyJ1/9opjsfxpzR2MH/3yR6ZzFV72NtJTU5ZXTclZK4SoUytL1iiyjSgzI4p l+p0go8TrhsvYrWz/mjtUyq6gr8HuKnXkOeitpp6Noktogu0QTJNiNFnXQcGh889gbvi bIPbpdBin5PzWyHaFBp2EvjwHizC/zuEDABDteCZzUmcHgcsLxOVNOtW/5lUfKf//FIC kjWmbLY4I/gv8BSNdTSE2634P426BL2mjkltyY00utTEMReKW+2yqg4oYXcKnDjcSgaG 1oQA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3371-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3371-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id fd3-20020a05622a4d0300b00423ecada9fasi25279800qtb.736.2023.12.18.02.52.02 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Dec 2023 02:52:02 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-3371-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3371-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3371-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 4E3461C22B83 for ; Mon, 18 Dec 2023 10:52:02 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7E1CB1A70B; Mon, 18 Dec 2023 10:51:29 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 4CB5A1A293 for ; Mon, 18 Dec 2023 10:51:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 31EB713D5; Mon, 18 Dec 2023 02:52:10 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3E0243F738; Mon, 18 Dec 2023 02:51:22 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland , David Hildenbrand , Kefeng Wang , John Hubbard , Zi Yan , Barry Song <21cnbao@gmail.com>, Alistair Popple , Yang Shi Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 02/16] mm: Batch-copy PTE ranges during fork() Date: Mon, 18 Dec 2023 10:50:46 +0000 Message-Id: <20231218105100.172635-3-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231218105100.172635-1-ryan.roberts@arm.com> References: <20231218105100.172635-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785616634127592977 X-GMAIL-MSGID: 1785616634127592977 Convert copy_pte_range() to copy a batch of ptes in one go. A given batch is determined by the architecture with the new helper, pte_batch_remaining(), and maps a physically contiguous block of memory, all belonging to the same folio. A pte batch is then write-protected in one go in the parent using the new helper, ptep_set_wrprotects() and is set in one go in the child using the new helper, set_ptes_full(). The primary motivation for this change is to reduce the number of tlb maintenance operations that the arm64 backend has to perform during fork, as it is about to add transparent support for the "contiguous bit" in its ptes. By write-protecting the parent using the new ptep_set_wrprotects() (note the 's' at the end) function, the backend can avoid having to unfold contig ranges of PTEs, which is expensive, when all ptes in the range are being write-protected. Similarly, by using set_ptes_full() rather than set_pte_at() to set up ptes in the child, the backend does not need to fold a contiguous range once they are all populated - they can be initially populated as a contiguous range in the first place. This code is very performance sensitive, and a significant amount of effort has been put into not regressing performance for the order-0 folio case. By default, pte_batch_remaining() is compile constant 1, which enables the compiler to simplify the extra loops that are added for batching and produce code that is equivalent (and equally performant) as the previous implementation. This change addresses the core-mm refactoring only and a separate change will implement pte_batch_remaining(), ptep_set_wrprotects() and set_ptes_full() in the arm64 backend to realize the performance improvement as part of the work to enable contpte mappings. To ensure the arm64 is performant once implemented, this change is very careful to only call ptep_get() once per pte batch. The following microbenchmark results demonstate that there is no significant performance change after this patch. Fork is called in a tight loop in a process with 1G of populated memory and the time for the function to execute is measured. 100 iterations per run, 8 runs performed on both Apple M2 (VM) and Ampere Altra (bare metal). Tests performed for case where 1G memory is comprised of order-0 folios and case where comprised of pte-mapped order-9 folios. Negative is faster, positive is slower, compared to baseline upon which the series is based: | Apple M2 VM | order-0 (pte-map) | order-9 (pte-map) | | fork |-------------------|-------------------| | microbench | mean | stdev | mean | stdev | |---------------|---------|---------|---------|---------| | baseline | 0.0% | 1.1% | 0.0% | 1.2% | | after-change | -1.0% | 2.0% | -0.1% | 1.1% | | Ampere Altra | order-0 (pte-map) | order-9 (pte-map) | | fork |-------------------|-------------------| | microbench | mean | stdev | mean | stdev | |---------------|---------|---------|---------|---------| | baseline | 0.0% | 1.0% | 0.0% | 0.1% | | after-change | -0.1% | 1.2% | -0.1% | 0.1% | Tested-by: John Hubbard Reviewed-by: Alistair Popple Signed-off-by: Ryan Roberts --- include/linux/pgtable.h | 80 +++++++++++++++++++++++++++++++++++ mm/memory.c | 92 ++++++++++++++++++++++++++--------------- 2 files changed, 139 insertions(+), 33 deletions(-) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index af7639c3b0a3..db93fb81465a 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -205,6 +205,27 @@ static inline int pmd_young(pmd_t pmd) #define arch_flush_lazy_mmu_mode() do {} while (0) #endif +#ifndef pte_batch_remaining +/** + * pte_batch_remaining - Number of pages from addr to next batch boundary. + * @pte: Page table entry for the first page. + * @addr: Address of the first page. + * @end: Batch ceiling (e.g. end of vma). + * + * Some architectures (arm64) can efficiently modify a contiguous batch of ptes. + * In such cases, this function returns the remaining number of pages to the end + * of the current batch, as defined by addr. This can be useful when iterating + * over ptes. + * + * May be overridden by the architecture, else batch size is always 1. + */ +static inline unsigned int pte_batch_remaining(pte_t pte, unsigned long addr, + unsigned long end) +{ + return 1; +} +#endif + #ifndef set_ptes #ifndef pte_next_pfn @@ -246,6 +267,34 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, #endif #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) +#ifndef set_ptes_full +/** + * set_ptes_full - Map consecutive pages to a contiguous range of addresses. + * @mm: Address space to map the pages into. + * @addr: Address to map the first page at. + * @ptep: Page table pointer for the first entry. + * @pte: Page table entry for the first page. + * @nr: Number of pages to map. + * @full: True if systematically setting all ptes and their previous values + * were known to be none (e.g. part of fork). + * + * Some architectures (arm64) can optimize the implementation if copying ptes + * batach-by-batch from the parent, where a batch is defined by + * pte_batch_remaining(). + * + * May be overridden by the architecture, else full is ignored and call is + * forwarded to set_ptes(). + * + * Context: The caller holds the page table lock. The pages all belong to the + * same folio. The PTEs are all in the same PMD. + */ +static inline void set_ptes_full(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte, unsigned int nr, int full) +{ + set_ptes(mm, addr, ptep, pte, nr); +} +#endif + #ifndef __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS extern int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address, pte_t *ptep, @@ -622,6 +671,37 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addres } #endif +#ifndef ptep_set_wrprotects +struct mm_struct; +/** + * ptep_set_wrprotects - Write protect a consecutive set of pages. + * @mm: Address space that the pages are mapped into. + * @address: Address of first page to write protect. + * @ptep: Page table pointer for the first entry. + * @nr: Number of pages to write protect. + * @full: True if systematically wite protecting all ptes (e.g. part of fork). + * + * Some architectures (arm64) can optimize the implementation if + * write-protecting ptes batach-by-batch, where a batch is defined by + * pte_batch_remaining(). + * + * May be overridden by the architecture, else implemented as a loop over + * ptep_set_wrprotect(). + * + * Context: The caller holds the page table lock. The PTEs are all in the same + * PMD. + */ +static inline void ptep_set_wrprotects(struct mm_struct *mm, + unsigned long address, pte_t *ptep, + unsigned int nr, int full) +{ + unsigned int i; + + for (i = 0; i < nr; i++, address += PAGE_SIZE, ptep++) + ptep_set_wrprotect(mm, address, ptep); +} +#endif + /* * On some architectures hardware does not set page access bit when accessing * memory page, it is responsibility of software setting this bit. It brings diff --git a/mm/memory.c b/mm/memory.c index 809746555827..111f8feeb56e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -929,42 +929,60 @@ copy_present_page(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma } /* - * Copy one pte. Returns 0 if succeeded, or -EAGAIN if one preallocated page - * is required to copy this pte. + * Copy set of contiguous ptes. Returns number of ptes copied if succeeded + * (always gte 1), or -EAGAIN if one preallocated page is required to copy the + * first pte. */ static inline int -copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, - pte_t *dst_pte, pte_t *src_pte, unsigned long addr, int *rss, - struct folio **prealloc) +copy_present_ptes(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, + pte_t *dst_pte, pte_t *src_pte, pte_t pte, + unsigned long addr, unsigned long end, + int *rss, struct folio **prealloc) { struct mm_struct *src_mm = src_vma->vm_mm; unsigned long vm_flags = src_vma->vm_flags; - pte_t pte = ptep_get(src_pte); struct page *page; struct folio *folio; + int nr, i, ret; + + nr = pte_batch_remaining(pte, addr, end); page = vm_normal_page(src_vma, addr, pte); - if (page) + if (page) { folio = page_folio(page); + folio_ref_add(folio, nr); + } if (page && folio_test_anon(folio)) { - /* - * If this page may have been pinned by the parent process, - * copy the page immediately for the child so that we'll always - * guarantee the pinned page won't be randomly replaced in the - * future. - */ - folio_get(folio); - if (unlikely(page_try_dup_anon_rmap(page, false, src_vma))) { - /* Page may be pinned, we have to copy. */ - folio_put(folio); - return copy_present_page(dst_vma, src_vma, dst_pte, src_pte, - addr, rss, prealloc, page); + for (i = 0; i < nr; i++, page++) { + /* + * If this page may have been pinned by the parent + * process, copy the page immediately for the child so + * that we'll always guarantee the pinned page won't be + * randomly replaced in the future. + */ + if (unlikely(page_try_dup_anon_rmap(page, false, src_vma))) { + if (i != 0) + break; + /* Page may be pinned, we have to copy. */ + folio_ref_sub(folio, nr); + ret = copy_present_page(dst_vma, src_vma, + dst_pte, src_pte, addr, + rss, prealloc, page); + return ret == 0 ? 1 : ret; + } + VM_BUG_ON(PageAnonExclusive(page)); } - rss[MM_ANONPAGES]++; + + if (unlikely(i < nr)) { + folio_ref_sub(folio, nr - i); + nr = i; + } + + rss[MM_ANONPAGES] += nr; } else if (page) { - folio_get(folio); - page_dup_file_rmap(page, false); - rss[mm_counter_file(page)]++; + for (i = 0; i < nr; i++) + page_dup_file_rmap(page + i, false); + rss[mm_counter_file(page)] += nr; } /* @@ -972,10 +990,9 @@ copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, * in the parent and the child */ if (is_cow_mapping(vm_flags) && pte_write(pte)) { - ptep_set_wrprotect(src_mm, addr, src_pte); + ptep_set_wrprotects(src_mm, addr, src_pte, nr, true); pte = pte_wrprotect(pte); } - VM_BUG_ON(page && folio_test_anon(folio) && PageAnonExclusive(page)); /* * If it's a shared mapping, mark it clean in @@ -988,8 +1005,8 @@ copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, if (!userfaultfd_wp(dst_vma)) pte = pte_clear_uffd_wp(pte); - set_pte_at(dst_vma->vm_mm, addr, dst_pte, pte); - return 0; + set_ptes_full(dst_vma->vm_mm, addr, dst_pte, pte, nr, true); + return nr; } static inline struct folio *folio_prealloc(struct mm_struct *src_mm, @@ -1030,6 +1047,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, int rss[NR_MM_COUNTERS]; swp_entry_t entry = (swp_entry_t){0}; struct folio *prealloc = NULL; + int nr_ptes; again: progress = 0; @@ -1060,6 +1078,8 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, arch_enter_lazy_mmu_mode(); do { + nr_ptes = 1; + /* * We are holding two locks at this point - either of them * could generate latencies in another task on another CPU. @@ -1095,16 +1115,21 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, * the now present pte. */ WARN_ON_ONCE(ret != -ENOENT); + ret = 0; } - /* copy_present_pte() will clear `*prealloc' if consumed */ - ret = copy_present_pte(dst_vma, src_vma, dst_pte, src_pte, - addr, rss, &prealloc); + /* copy_present_ptes() will clear `*prealloc' if consumed */ + nr_ptes = copy_present_ptes(dst_vma, src_vma, dst_pte, src_pte, + ptent, addr, end, rss, &prealloc); + /* * If we need a pre-allocated page for this pte, drop the * locks, allocate, and try again. */ - if (unlikely(ret == -EAGAIN)) + if (unlikely(nr_ptes == -EAGAIN)) { + ret = -EAGAIN; break; + } + if (unlikely(prealloc)) { /* * pre-alloc page cannot be reused by next time so as @@ -1115,8 +1140,9 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, folio_put(prealloc); prealloc = NULL; } - progress += 8; - } while (dst_pte++, src_pte++, addr += PAGE_SIZE, addr != end); + progress += 8 * nr_ptes; + } while (dst_pte += nr_ptes, src_pte += nr_ptes, + addr += PAGE_SIZE * nr_ptes, addr != end); arch_leave_lazy_mmu_mode(); pte_unmap_unlock(orig_src_pte, src_ptl); From patchwork Mon Dec 18 10:50:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 180263 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp1154510dyi; Mon, 18 Dec 2023 02:52:18 -0800 (PST) X-Google-Smtp-Source: AGHT+IHN+UMRezTWNTZaIqVMyjAIK7Kj0WaUPxXp27LS0cxq7BdxOsCS7SDOOKHFjwnGI0GT5+tO X-Received: by 2002:a17:906:188:b0:a1d:e5a:3034 with SMTP id 8-20020a170906018800b00a1d0e5a3034mr7085416ejb.31.1702896738663; Mon, 18 Dec 2023 02:52:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702896738; cv=none; d=google.com; s=arc-20160816; b=cQkxvicwVIIxHlrGB9JLECMAkLyxuT4f/PQE64TkdQQTS1eNa8c1jw0muzsWzBaXyV 8MZIVOzxYfrlwlLftMyigWZelhUN/XSmpxE29fQehEIb+kT6lDxIaL7rDG901BPyEe1X ioxA4k5h30w/yqb3O7HodmxujddSEmkNMeFqmvpqLBCC76LkJuVHm6fc/C8zLpXoPw+M EDyepEMK/mn9T3bcoEqx+eih0W7da+UwT3RaYGQEEzEPBiYt1z4WE/EVNLAz+fZmX3Fz yuqli2HF4uaa/C8r4eibmcs+I25qZVAhKwZ0kwsKoGm7nWdLdDwR3bwF2sMI2jseP2Cp OYbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=bwKlKTVosArOXLPMMU2ssUe/7Xt/n6Zbrde3/LmyRvQ=; fh=a3gZESXuO1ptbCtzvzEnZL5xUl+WC4Vy5hoF2yaOXSs=; b=0kxoho4O5fxL2eZ7HDICArNNw5JnJ6AWC02xGN07eM8p/SZ4aFKVpT3eKAAR6JYkyO nFannW/xY/jyBbBQo9/j6RU5aiczK7mHwnyhnJmU8ric/9l3saUi82I1NFACn1t6fAPG Vxe5gIZvO5BPYlA9gvrPkyLbrmUs3uLfDhkXJ402RpUTMJIqT9FXp753ALVZ7an4nIR2 g3FJE67TEMeZpy2F023egp80fEUnk0AgK8YVCBar4ju8AICuXuB8Sw27skWPnBdm2CPQ tK4Ii8fmuQl9VaOKp6xZW+QBiqTGgJWtS3WKaIXbKuvLXlpM4E48sGE7MQHUP0EwrTq8 yhkg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3372-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3372-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id g3-20020a170906198300b00a1de299f229si10000935ejd.439.2023.12.18.02.52.18 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Dec 2023 02:52:18 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-3372-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3372-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3372-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 199271F23281 for ; Mon, 18 Dec 2023 10:52:18 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3BF001B286; Mon, 18 Dec 2023 10:51:33 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id DA8B61A72E for ; Mon, 18 Dec 2023 10:51:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CA9C81424; Mon, 18 Dec 2023 02:52:13 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id EE4DF3F738; Mon, 18 Dec 2023 02:51:25 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland , David Hildenbrand , Kefeng Wang , John Hubbard , Zi Yan , Barry Song <21cnbao@gmail.com>, Alistair Popple , Yang Shi Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 03/16] mm: Batch-clear PTE ranges during zap_pte_range() Date: Mon, 18 Dec 2023 10:50:47 +0000 Message-Id: <20231218105100.172635-4-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231218105100.172635-1-ryan.roberts@arm.com> References: <20231218105100.172635-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785616650482582646 X-GMAIL-MSGID: 1785616650482582646 Convert zap_pte_range() to copy a batch of ptes in one go. A given batch is determined by the architecture (see pte_batch_remaining()), and maps a physically contiguous block of memory, all belonging to the same folio. A pte batch is cleared using the new helper, clear_ptes(). The primary motivation for this change is to reduce the number of tlb maintenance operations that the arm64 backend has to perform during exit and other syscalls that cause zap_pte_range() (e.g. munmap, madvise(DONTNEED), etc.), as it is about to add transparent support for the "contiguous bit" in its ptes. By clearing ptes using the new clear_ptes() API, the backend doesn't have to perform an expensive unfold operation when a PTE being cleared is part of a contpte block. Instead it can just clear the whole block immediately. This code is very performance sensitive, and a significant amount of effort has been put into not regressing performance for the order-0 folio case. By default, pte_batch_remaining() always returns 1, which enables the compiler to simplify the extra loops that are added for batching and produce code that is equivalent (and equally performant) as the previous implementation. This change addresses the core-mm refactoring only, and introduces clear_ptes() with a default implementation that calls ptep_get_and_clear_full() for each pte in the range. Note that this API returns the pte at the beginning of the batch, but with the dirty and young bits set if ANY of the ptes in the cleared batch had those bits set; this information is applied to the folio by the core-mm. Given the batch is guaranteed to cover only a single folio, collapsing this state does not lose any useful information. A separate change will implement clear_ptes() in the arm64 backend to realize the performance improvement as part of the work to enable contpte mappings. The following microbenchmark results demonstate that there is no madvise(dontneed) performance regression (and actually an improvement in some cases) after this patch. madvise(dontneed) is called for each page of a 1G populated mapping and the total time is measured. 100 iterations per run, 8 runs performed on both Apple M2 (VM) and Ampere Altra (bare metal). Tests performed for case where 1G memory is comprised of order-0 folios and case where comprised of pte-mapped order-9 folios. Negative is faster, positive is slower, compared to baseline upon which the series is based: | Apple M2 VM | order-0 (pte-map) | order-9 (pte-map) | | dontneed |-------------------|-------------------| | microbench | mean | stdev | mean | stdev | |---------------|---------|---------|---------|---------| | baseline | 0.0% | 7.5% | 0.0% | 7.9% | | after-change | -9.6% | 3.1% | -4.9% | 8.1% | | Ampere Altra | order-0 (pte-map) | order-9 (pte-map) | | dontneed |-------------------|-------------------| | microbench | mean | stdev | mean | stdev | |---------------|---------|---------|---------|---------| | baseline | 0.0% | 0.1% | 0.0% | 0.0% | | after-change | -0.2% | 0.1% | -0.1% | 0.0% | The following microbenchmark results demonstate that there is no munmap performance regression (and actually an improvement in some cases) after this patch. munmap is called for a 1G populated mapping and the time to execute the function is measured. 100 iterations per run, 8 runs performed on both Apple M2 (VM) and Ampere Altra (bare metal). Tests performed for case where 1G memory is comprised of order-0 folios and case where comprised of pte-mapped order-9 folios. Negative is faster, positive is slower, compared to baseline upon which the series is based: | Apple M2 VM | order-0 (pte-map) | order-9 (pte-map) | | munmap |-------------------|-------------------| | microbench | mean | stdev | mean | stdev | |---------------|---------|---------|---------|---------| | baseline | 0.0% | 3.8% | 0.0% | 6.4% | | after-change | -1.9% | 0.2% | -4.7% | 0.8% | | Ampere Altra | order-0 (pte-map) | order-9 (pte-map) | | munmap |-------------------|-------------------| | microbench | mean | stdev | mean | stdev | |---------------|---------|---------|---------|---------| | baseline | 0.0% | 0.9% | 0.0% | 0.1% | | after-change | -0.2% | 0.6% | -3.2% | 0.2% | Tested-by: John Hubbard Signed-off-by: Ryan Roberts Reviewed-by: Alistair Popple --- include/asm-generic/tlb.h | 11 +++++++ include/linux/pgtable.h | 43 ++++++++++++++++++++++++++ mm/memory.c | 64 ++++++++++++++++++++++++++++----------- mm/mmu_gather.c | 15 +++++++++ 4 files changed, 115 insertions(+), 18 deletions(-) diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index 129a3a759976..1b25929d2000 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -75,6 +75,10 @@ * boolean indicating if the queue is (now) full and a call to * tlb_flush_mmu() is required. * + * tlb_reserve_space() attempts to preallocate space for nr pages and returns + * the minimum garanteed number of pages that can be queued without overflow, + * which may be more or less than requested. + * * tlb_remove_page() and tlb_remove_page_size() imply the call to * tlb_flush_mmu() when required and has no return value. * @@ -263,6 +267,7 @@ struct mmu_gather_batch { extern bool __tlb_remove_page_size(struct mmu_gather *tlb, struct encoded_page *page, int page_size); +extern unsigned int tlb_reserve_space(struct mmu_gather *tlb, unsigned int nr); #ifdef CONFIG_SMP /* @@ -273,6 +278,12 @@ extern bool __tlb_remove_page_size(struct mmu_gather *tlb, extern void tlb_flush_rmaps(struct mmu_gather *tlb, struct vm_area_struct *vma); #endif +#else +static inline unsigned int tlb_reserve_space(struct mmu_gather *tlb, + unsigned int nr) +{ + return 1; +} #endif /* diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index db93fb81465a..170925379534 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -601,6 +601,49 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, } #endif +#ifndef clear_ptes +struct mm_struct; +/** + * clear_ptes - Clear a consecutive range of ptes and return the previous value. + * @mm: Address space that the ptes map. + * @address: Address corresponding to the first pte to clear. + * @ptep: Page table pointer for the first entry. + * @nr: Number of ptes to clear. + * @full: True if systematically clearing all ptes for the address space. + * + * A batched version of ptep_get_and_clear_full(), which returns the old pte + * value for the first pte in the range, but with young and/or dirty set if any + * of the ptes in the range were young or dirty. + * + * May be overridden by the architecture, else implemented as a loop over + * ptep_get_and_clear_full(). + * + * Context: The caller holds the page table lock. The PTEs are all in the same + * PMD. + */ +static inline pte_t clear_ptes(struct mm_struct *mm, + unsigned long address, pte_t *ptep, + unsigned int nr, int full) +{ + unsigned int i; + pte_t pte; + pte_t orig_pte = ptep_get_and_clear_full(mm, address, ptep, full); + + for (i = 1; i < nr; i++) { + address += PAGE_SIZE; + ptep++; + pte = ptep_get_and_clear_full(mm, address, ptep, full); + + if (pte_dirty(pte)) + orig_pte = pte_mkdirty(orig_pte); + + if (pte_young(pte)) + orig_pte = pte_mkyoung(orig_pte); + } + + return orig_pte; +} +#endif /* * If two threads concurrently fault at the same page, the thread that diff --git a/mm/memory.c b/mm/memory.c index 111f8feeb56e..81b023cf3182 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1447,6 +1447,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, pte_t *start_pte; pte_t *pte; swp_entry_t entry; + int nr; tlb_change_page_size(tlb, PAGE_SIZE); init_rss_vec(rss); @@ -1459,6 +1460,9 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, do { pte_t ptent = ptep_get(pte); struct page *page; + int i; + + nr = 1; if (pte_none(ptent)) continue; @@ -1467,43 +1471,67 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, break; if (pte_present(ptent)) { - unsigned int delay_rmap; + unsigned int delay_rmap = 0; + struct folio *folio; + bool full = false; + + /* + * tlb_gather always has at least one slot so avoid call + * to tlb_reserve_space() when pte_batch_remaining() is + * a compile-time constant 1 (default). + */ + nr = pte_batch_remaining(ptent, addr, end); + if (unlikely(nr > 1)) + nr = min_t(int, nr, tlb_reserve_space(tlb, nr)); page = vm_normal_page(vma, addr, ptent); if (unlikely(!should_zap_page(details, page))) continue; - ptent = ptep_get_and_clear_full(mm, addr, pte, - tlb->fullmm); + ptent = clear_ptes(mm, addr, pte, nr, tlb->fullmm); arch_check_zapped_pte(vma, ptent); - tlb_remove_tlb_entry(tlb, pte, addr); - zap_install_uffd_wp_if_needed(vma, addr, pte, details, - ptent); + + for (i = 0; i < nr; i++) { + unsigned long subaddr = addr + PAGE_SIZE * i; + + tlb_remove_tlb_entry(tlb, &pte[i], subaddr); + zap_install_uffd_wp_if_needed(vma, subaddr, + &pte[i], details, ptent); + } if (unlikely(!page)) { ksm_might_unmap_zero_page(mm, ptent); continue; } - delay_rmap = 0; - if (!PageAnon(page)) { + folio = page_folio(page); + if (!folio_test_anon(folio)) { if (pte_dirty(ptent)) { - set_page_dirty(page); + folio_mark_dirty(folio); if (tlb_delay_rmap(tlb)) { delay_rmap = 1; force_flush = 1; } } if (pte_young(ptent) && likely(vma_has_recency(vma))) - mark_page_accessed(page); + folio_mark_accessed(folio); } - rss[mm_counter(page)]--; - if (!delay_rmap) { - page_remove_rmap(page, vma, false); - if (unlikely(page_mapcount(page) < 0)) - print_bad_pte(vma, addr, ptent, page); + rss[mm_counter(page)] -= nr; + for (i = 0; i < nr; i++, page++) { + if (!delay_rmap) { + page_remove_rmap(page, vma, false); + if (unlikely(page_mapcount(page) < 0)) + print_bad_pte(vma, addr, ptent, page); + } + + /* + * nr calculated based on available space, so + * can only be full on final iteration. + */ + VM_WARN_ON(full); + full = __tlb_remove_page(tlb, page, delay_rmap); } - if (unlikely(__tlb_remove_page(tlb, page, delay_rmap))) { + if (unlikely(full)) { force_flush = 1; - addr += PAGE_SIZE; + addr += PAGE_SIZE * nr; break; } continue; @@ -1557,7 +1585,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, } pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); zap_install_uffd_wp_if_needed(vma, addr, pte, details, ptent); - } while (pte++, addr += PAGE_SIZE, addr != end); + } while (pte += nr, addr += PAGE_SIZE * nr, addr != end); add_mm_rss_vec(mm, rss); arch_leave_lazy_mmu_mode(); diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c index 4f559f4ddd21..39725756e6bf 100644 --- a/mm/mmu_gather.c +++ b/mm/mmu_gather.c @@ -47,6 +47,21 @@ static bool tlb_next_batch(struct mmu_gather *tlb) return true; } +unsigned int tlb_reserve_space(struct mmu_gather *tlb, unsigned int nr) +{ + struct mmu_gather_batch *batch = tlb->active; + unsigned int nr_alloc = batch->max - batch->nr; + + while (nr_alloc < nr) { + if (!tlb_next_batch(tlb)) + break; + nr_alloc += tlb->active->max; + } + + tlb->active = batch; + return nr_alloc; +} + #ifdef CONFIG_SMP static void tlb_flush_rmap_batch(struct mmu_gather_batch *batch, struct vm_area_struct *vma) { From patchwork Mon Dec 18 10:50:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 180264 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp1154777dyi; Mon, 18 Dec 2023 02:53:01 -0800 (PST) X-Google-Smtp-Source: AGHT+IF/AzjYT8ZKLmC3XtWUaG3GvVh+r+se5OD87ESDhs8w51e48DuBxg0nOGGTgsmD9swG7a+6 X-Received: by 2002:a05:622a:3d1:b0:41e:a62b:3d18 with SMTP id k17-20020a05622a03d100b0041ea62b3d18mr22973003qtx.59.1702896781242; Mon, 18 Dec 2023 02:53:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702896781; cv=none; d=google.com; s=arc-20160816; b=quiz3F5Ae00GN00Bz7GwywdU4l41FjE2ze4Md98pi0Aypejp5dCHaQwLgIHFwE+xgT PtwDdtyXliRp47QWTdutarJ3dphSh4/uzHIveapumH8dbs3KhfXMEAkrqQk24X1AwokR HBIg+e/7zmq3/6GpDKwiQIyAc63T4SQJgbyYI9cKTI/VrJpNtgdd1y/eihf8i919TIs4 T+5GkTAwv5WdKX2vxo8l1oblmeoIune2JbDLtFqtbxJo+H4PVIu/Cpa6m8M+5HMr4DMZ 0OBXqgeLjW+BpSlv6Hu+UYmy0RrSGk97xIIP9OufacTCkLXRtvq9sPQSbiLWZDEGCZMa WLqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=43xS6Y4SE6bF0+VyPS5TuDh+DNzmqC+J1CqyywgWfDk=; fh=a3gZESXuO1ptbCtzvzEnZL5xUl+WC4Vy5hoF2yaOXSs=; b=Qh6lgxBxa7zHs+MwgADjn2gcp/6lu0kw6T6dDQa8hm1ujvqDz98lm/yPhXnS2Pgy99 iQ31gYUMlwcor69Gp11b/yFKfKZJWzpUVwO087DkDQ5Zu2M+nPeJ5cWblpsFd9kxf2Hb hlzeuJ/Bz9fddlnhGVsa4t+pS4HMq1h+k3mo5TdQ1ZWvTO4e8Bpz09XEzo0rB6ER3VFg SLzOsPsiHtFmyXVSo3vgNgQtZcu6j3PiyTEdmsWGDK+aejjBLyrde0QpmMxcWoL2HeRI JjaO577lXgNnGznWC6SkbnI2z4yeveiB+mAdfVI4XzEHuOnDYIlO46JhPC5Gd8I63VRR 3fhQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3375-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3375-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id fd3-20020a05622a4d0300b00423ecada9fasi25279800qtb.736.2023.12.18.02.53.01 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Dec 2023 02:53:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-3375-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3375-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3375-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 0D5031C22BEC for ; Mon, 18 Dec 2023 10:53:01 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C8F52200C0; Mon, 18 Dec 2023 10:51:44 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id CD2171C691 for ; Mon, 18 Dec 2023 10:51:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A9CE72F4; Mon, 18 Dec 2023 02:52:24 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D00713F738; Mon, 18 Dec 2023 02:51:36 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland , David Hildenbrand , Kefeng Wang , John Hubbard , Zi Yan , Barry Song <21cnbao@gmail.com>, Alistair Popple , Yang Shi Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 06/16] arm64/mm: pte_clear(): New layer to manage contig bit Date: Mon, 18 Dec 2023 10:50:50 +0000 Message-Id: <20231218105100.172635-7-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231218105100.172635-1-ryan.roberts@arm.com> References: <20231218105100.172635-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785616695180035516 X-GMAIL-MSGID: 1785616695180035516 Create a new layer for the in-table PTE manipulation APIs. For now, The existing API is prefixed with double underscore to become the arch-private API and the public API is just a simple wrapper that calls the private API. The public API implementation will subsequently be used to transparently manipulate the contiguous bit where appropriate. But since there are already some contig-aware users (e.g. hugetlb, kernel mapper), we must first ensure those users use the private API directly so that the future contig-bit manipulations in the public API do not interfere with those existing uses. Tested-by: John Hubbard Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 3 ++- arch/arm64/mm/fixmap.c | 2 +- arch/arm64/mm/hugetlbpage.c | 2 +- arch/arm64/mm/mmu.c | 2 +- 4 files changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 323ec91add60..1464e990580a 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -93,7 +93,7 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys) __pte(__phys_to_pte_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)) #define pte_none(pte) (!pte_val(pte)) -#define pte_clear(mm, addr, ptep) \ +#define __pte_clear(mm, addr, ptep) \ __set_pte(ptep, __pte(0)) #define pte_page(pte) (pfn_to_page(pte_pfn(pte))) @@ -1121,6 +1121,7 @@ extern void ptep_modify_prot_commit(struct vm_area_struct *vma, #define set_pte __set_pte #define set_ptes __set_ptes +#define pte_clear __pte_clear #endif /* !__ASSEMBLY__ */ diff --git a/arch/arm64/mm/fixmap.c b/arch/arm64/mm/fixmap.c index 51cd4501816d..bfc02568805a 100644 --- a/arch/arm64/mm/fixmap.c +++ b/arch/arm64/mm/fixmap.c @@ -123,7 +123,7 @@ void __set_fixmap(enum fixed_addresses idx, if (pgprot_val(flags)) { __set_pte(ptep, pfn_pte(phys >> PAGE_SHIFT, flags)); } else { - pte_clear(&init_mm, addr, ptep); + __pte_clear(&init_mm, addr, ptep); flush_tlb_kernel_range(addr, addr+PAGE_SIZE); } } diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 741cb53672fd..510b2d4b89a9 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -400,7 +400,7 @@ void huge_pte_clear(struct mm_struct *mm, unsigned long addr, ncontig = num_contig_ptes(sz, &pgsize); for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) - pte_clear(mm, addr, ptep); + __pte_clear(mm, addr, ptep); } pte_t huge_ptep_get_and_clear(struct mm_struct *mm, diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index e884279b268e..080e9b50f595 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -859,7 +859,7 @@ static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr, continue; WARN_ON(!pte_present(pte)); - pte_clear(&init_mm, addr, ptep); + __pte_clear(&init_mm, addr, ptep); flush_tlb_kernel_range(addr, addr + PAGE_SIZE); if (free_mapped) free_hotplug_page_range(pte_page(pte), From patchwork Mon Dec 18 10:50:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 180265 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp1154868dyi; Mon, 18 Dec 2023 02:53:14 -0800 (PST) X-Google-Smtp-Source: AGHT+IGVoBAQpHdRv7/BVKgLkgJ6r/I3SpePAEdkJR13dTa/hvaGhsxo9a4WLYXKoK5Z0/NCJYdB X-Received: by 2002:ac8:4e89:0:b0:425:4043:18a8 with SMTP id 9-20020ac84e89000000b00425404318a8mr23872625qtp.91.1702896794144; Mon, 18 Dec 2023 02:53:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702896794; cv=none; d=google.com; s=arc-20160816; b=rDnHchqVpPTgftNqEY50IaEk4Iop+5RSILZ2eDZuTBshAVko0cylodKG1xHz0TaEzA NVttfWyEzq3hNvqHJ3RgwwzSJawEuIZ5Ih5tAszjMq6dVUMtieN5iWGI0E+9qd//8lxY u40M4YbmXeIfPq2OwwYPh2GrmKcM+TfotxPvDIvxDXEIf+PvoWTH0plNQg8XaZw2nkie mQ10ngt2Iq5zdMC+BrBUTYIdkei3uyijDVky+5msf1A1sYdTm9PklDE9IQ7ydf06tBqN CVcyu+/aZiBCLmmUXHW9nMlhogCDuWKvwwUh5bRdwkF0Dk0u/yjBdmVhyzwv5v4dqvfh iOCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=p8TO3jSXlQIO2M/vTLXz8jOzA91p4UDOv06l4gwoTwE=; fh=a3gZESXuO1ptbCtzvzEnZL5xUl+WC4Vy5hoF2yaOXSs=; b=ho1N8vUCqS3LtuH3H1hWEsuvmF9d41JVrMmneeH7LdfUFuBV0dn1QpE0485ifRnyfM duRwztqPBhWI71TQIGoBbY1UoaI68sjM/vPnbxv8wUibLrWMFOiCc9V0+RMFXZ6VV7px pxfzV1Hk/5DeLMJ7td78dC+ZksR25vCBdwkhmCbcRmUTp9kjooWfPgV9VqLXx/u/wwlv Lu5UnT3XNj0n0a7uWNXCUySiwwYi1JfYPxpAzH7fAvnIy/dNPunvUMR39HAAPIxMjI8b deFi0o3wkLAye/OP9R+672YHKiy14Mpi5bS8HrHrwkz9N5Kj690iG6GcRNGfQyEkRNIR BzVQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3376-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3376-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id b10-20020ac8754a000000b0042561ddacc0si22531866qtr.532.2023.12.18.02.53.14 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Dec 2023 02:53:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-3376-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3376-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3376-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id ECDBA1C22BC7 for ; Mon, 18 Dec 2023 10:53:13 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id DEAD5208B5; Mon, 18 Dec 2023 10:51:47 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 4120F1CA9E for ; Mon, 18 Dec 2023 10:51:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4AD001FB; Mon, 18 Dec 2023 02:52:28 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7132F3F738; Mon, 18 Dec 2023 02:51:40 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland , David Hildenbrand , Kefeng Wang , John Hubbard , Zi Yan , Barry Song <21cnbao@gmail.com>, Alistair Popple , Yang Shi Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 07/16] arm64/mm: ptep_get_and_clear(): New layer to manage contig bit Date: Mon, 18 Dec 2023 10:50:51 +0000 Message-Id: <20231218105100.172635-8-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231218105100.172635-1-ryan.roberts@arm.com> References: <20231218105100.172635-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785616709125547549 X-GMAIL-MSGID: 1785616709125547549 Create a new layer for the in-table PTE manipulation APIs. For now, The existing API is prefixed with double underscore to become the arch-private API and the public API is just a simple wrapper that calls the private API. The public API implementation will subsequently be used to transparently manipulate the contiguous bit where appropriate. But since there are already some contig-aware users (e.g. hugetlb, kernel mapper), we must first ensure those users use the private API directly so that the future contig-bit manipulations in the public API do not interfere with those existing uses. Tested-by: John Hubbard Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 5 +++-- arch/arm64/mm/hugetlbpage.c | 6 +++--- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 1464e990580a..994597a0bb0f 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -941,8 +941,7 @@ static inline int pmdp_test_and_clear_young(struct vm_area_struct *vma, } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -#define __HAVE_ARCH_PTEP_GET_AND_CLEAR -static inline pte_t ptep_get_and_clear(struct mm_struct *mm, +static inline pte_t __ptep_get_and_clear(struct mm_struct *mm, unsigned long address, pte_t *ptep) { pte_t pte = __pte(xchg_relaxed(&pte_val(*ptep), 0)); @@ -1122,6 +1121,8 @@ extern void ptep_modify_prot_commit(struct vm_area_struct *vma, #define set_pte __set_pte #define set_ptes __set_ptes #define pte_clear __pte_clear +#define __HAVE_ARCH_PTEP_GET_AND_CLEAR +#define ptep_get_and_clear __ptep_get_and_clear #endif /* !__ASSEMBLY__ */ diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 510b2d4b89a9..c2a753541d13 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -188,7 +188,7 @@ static pte_t get_clear_contig(struct mm_struct *mm, unsigned long i; for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) { - pte_t pte = ptep_get_and_clear(mm, addr, ptep); + pte_t pte = __ptep_get_and_clear(mm, addr, ptep); /* * If HW_AFDBM is enabled, then the HW could turn on @@ -236,7 +236,7 @@ static void clear_flush(struct mm_struct *mm, unsigned long i, saddr = addr; for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) - ptep_clear(mm, addr, ptep); + __ptep_get_and_clear(mm, addr, ptep); flush_tlb_range(&vma, saddr, addr); } @@ -411,7 +411,7 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, pte_t orig_pte = ptep_get(ptep); if (!pte_cont(orig_pte)) - return ptep_get_and_clear(mm, addr, ptep); + return __ptep_get_and_clear(mm, addr, ptep); ncontig = find_num_contig(mm, addr, ptep, &pgsize); From patchwork Mon Dec 18 10:50:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 180266 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp1154920dyi; Mon, 18 Dec 2023 02:53:28 -0800 (PST) X-Google-Smtp-Source: AGHT+IEdbGPBvvKxWS4y648EPgkgQhSzQ+Uy4cKnAfIHgajcAH9cz8qtz0DC8bsIlvNwT9mAIPiZ X-Received: by 2002:a17:906:d512:b0:a23:35c0:239f with SMTP id cq18-20020a170906d51200b00a2335c0239fmr445853ejc.117.1702896808289; Mon, 18 Dec 2023 02:53:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702896808; cv=none; d=google.com; s=arc-20160816; b=KdwTHBzUzE24ldSee4Ki9wuqvVcTr1jnnfp+Lr+WYHmO7RNL2JLPvoOVjnguPA93jA gdqVayAjtgHQ43mxjHNmBtdRd98vNfOAJEimJUXm5qa9PGDUeTjjPNGoy5s9HQNcLIdR ciy67gIrFMulodtNJ/M1VYPubAGG55Nr7llLS6Cg3mLOclTZpNbPdtTJ+BFRex9VV3dl buEY1BwETFLQRL/EgdTQlnflWT9rvuA4SUxtXVFWrzhN2hKfEcc7LI/MX47byyq4FMzc HSIN2FaRhm5fpFOCaQOTiQxsNPAbuz0pFWhdEBharKLK4feBY/FogAPCgY+wj3j72jDv 0nGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=OjvLCzI/KlO/iQQJYXpNOpyg2OgTuVIw6IDag87PmTA=; fh=a3gZESXuO1ptbCtzvzEnZL5xUl+WC4Vy5hoF2yaOXSs=; b=0LNyGlQ0neer1PVH/OsjvB3L+CCxNSTjJ6ahAqqTp5pKd++G/Ov3aH43b6haDMAL3p t1eYQ0Q5AB5qYN/bgO6wCRsR9MD3BQMRAEmnpP0glu+ruO0Bi65mx8rDu0+RS0a0+4MX i1HfBtIQOaApZBTXQsanCX6C+d+NxFll6WX0rHhJ7xZLnep+UBaBXclEVpYrhTcB6eJH 0ClL6aMXXDSYyHCnrP4E72o0AO057lvORx7t7uqG9dZo3NRkdryeWQOzVsQIYgMwhgHN 6zDrvuVpEKexSv8jn7+4Qo5RQvDBzJXi2GCByyq0QxSkZRiE+u0Fq1IxL3dWATFhrxCO gB5A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3377-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3377-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id g3-20020a170906198300b00a1de299f229si10000935ejd.439.2023.12.18.02.53.28 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Dec 2023 02:53:28 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-3377-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3377-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3377-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id E3DAD1F235A5 for ; Mon, 18 Dec 2023 10:53:27 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 81DD521113; Mon, 18 Dec 2023 10:51:51 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 0A864208B8 for ; Mon, 18 Dec 2023 10:51:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E02011FB; Mon, 18 Dec 2023 02:52:31 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 12C8F3F738; Mon, 18 Dec 2023 02:51:43 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland , David Hildenbrand , Kefeng Wang , John Hubbard , Zi Yan , Barry Song <21cnbao@gmail.com>, Alistair Popple , Yang Shi Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 08/16] arm64/mm: ptep_test_and_clear_young(): New layer to manage contig bit Date: Mon, 18 Dec 2023 10:50:52 +0000 Message-Id: <20231218105100.172635-9-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231218105100.172635-1-ryan.roberts@arm.com> References: <20231218105100.172635-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785616723579233178 X-GMAIL-MSGID: 1785616723579233178 Create a new layer for the in-table PTE manipulation APIs. For now, The existing API is prefixed with double underscore to become the arch-private API and the public API is just a simple wrapper that calls the private API. The public API implementation will subsequently be used to transparently manipulate the contiguous bit where appropriate. But since there are already some contig-aware users (e.g. hugetlb, kernel mapper), we must first ensure those users use the private API directly so that the future contig-bit manipulations in the public API do not interfere with those existing uses. Tested-by: John Hubbard Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 18 +++++++----------- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 994597a0bb0f..9b4a9909fd5b 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -887,8 +887,9 @@ static inline bool pud_user_accessible_page(pud_t pud) /* * Atomic pte/pmd modifications. */ -#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG -static inline int __ptep_test_and_clear_young(pte_t *ptep) +static inline int __ptep_test_and_clear_young(struct vm_area_struct *vma, + unsigned long address, + pte_t *ptep) { pte_t old_pte, pte; @@ -903,18 +904,11 @@ static inline int __ptep_test_and_clear_young(pte_t *ptep) return pte_young(pte); } -static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, - unsigned long address, - pte_t *ptep) -{ - return __ptep_test_and_clear_young(ptep); -} - #define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH static inline int ptep_clear_flush_young(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) { - int young = ptep_test_and_clear_young(vma, address, ptep); + int young = __ptep_test_and_clear_young(vma, address, ptep); if (young) { /* @@ -937,7 +931,7 @@ static inline int pmdp_test_and_clear_young(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) { - return ptep_test_and_clear_young(vma, address, (pte_t *)pmdp); + return __ptep_test_and_clear_young(vma, address, (pte_t *)pmdp); } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ @@ -1123,6 +1117,8 @@ extern void ptep_modify_prot_commit(struct vm_area_struct *vma, #define pte_clear __pte_clear #define __HAVE_ARCH_PTEP_GET_AND_CLEAR #define ptep_get_and_clear __ptep_get_and_clear +#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG +#define ptep_test_and_clear_young __ptep_test_and_clear_young #endif /* !__ASSEMBLY__ */ From patchwork Mon Dec 18 10:50:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 180267 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp1155020dyi; Mon, 18 Dec 2023 02:53:40 -0800 (PST) X-Google-Smtp-Source: AGHT+IGSaifn8HVI43Z4U4no1HykOXyTFhPGJmYCjGxW7ulIMbN+ii3LyQSyDUzcvi0+9TpS+eHV X-Received: by 2002:a17:902:f548:b0:1d3:cd50:c715 with SMTP id h8-20020a170902f54800b001d3cd50c715mr213815plf.42.1702896820283; Mon, 18 Dec 2023 02:53:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702896820; cv=none; d=google.com; s=arc-20160816; b=pbZrIwu2VqMvp1FPQXxPcLUTFijRQxD70yyikEd7kk4JU38TRZxeEUe+/h3YxCw+Ve MxbbDlHpGrgvSxyB2sVKBXTwSDhqAcmrvQJdMrLuVBGPUjqBg2yFSxYn4Vh4YBmG/+VZ frVEgR4vWx477+4ulP+TqtBbZ/p4fPp5YJjPmsMzFildK1SIOyF/mWE5AhS6nUUW/YOS nyxYmFMhwjHcj8bK6jE8M59DsztsyH4T4EZZDqZVmE5oHEvKZ9mkhf9UxrwCaSRBL//V vyOoj2JN4+GovRYlxal6HSyGELkW/s7MP/zlJFI1aLgdFUGiq61iEuVLmqKwa3jxxgy5 r/5w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=80eWkD1e3ScDQ5SjUCye7wOJgIG+FP4dY1bE4B1pVFU=; fh=a3gZESXuO1ptbCtzvzEnZL5xUl+WC4Vy5hoF2yaOXSs=; b=ivRkr0Y2h/4SXMeX0ECFkzXWbhjYg0M/8C/0qOXjcTdj4XJwKa3sBYDrI2Lp/xBFcZ WiRVsu4DqqAK4/bjKs/6u2V2qWHgMihh/fbGHQ25EDlF+DhoI9ApUvqcFNVxIT/dVCrX TDGujFcxQ/bylWgh9ha2uHqKn5I9QzCwYiA2V91CuOtt/veIwGzycdH/N1cnPjHf8kZK vowAXYdyC0BxA2Yzi1XoLijOvRXOHBVNMIk2T4y/Ac7q6/7pvU1wetw6uSCkQpN8gaiP wHR5CSW/Q9jr39y4KnPIIWT3qGMclVEIU49/ghhSDNytoAICj1woZyMUj4NfIbksUv2x gRKw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3378-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3378-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id b2-20020a170903228200b001cfc170c11asi17810909plh.475.2023.12.18.02.53.40 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Dec 2023 02:53:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-3378-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3378-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3378-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 12D0228408C for ; Mon, 18 Dec 2023 10:53:40 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 71693249F8; Mon, 18 Dec 2023 10:51:55 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B28562136B for ; Mon, 18 Dec 2023 10:51:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 843712F4; Mon, 18 Dec 2023 02:52:35 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A89773F738; Mon, 18 Dec 2023 02:51:47 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland , David Hildenbrand , Kefeng Wang , John Hubbard , Zi Yan , Barry Song <21cnbao@gmail.com>, Alistair Popple , Yang Shi Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 09/16] arm64/mm: ptep_clear_flush_young(): New layer to manage contig bit Date: Mon, 18 Dec 2023 10:50:53 +0000 Message-Id: <20231218105100.172635-10-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231218105100.172635-1-ryan.roberts@arm.com> References: <20231218105100.172635-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785616736347881310 X-GMAIL-MSGID: 1785616736347881310 Create a new layer for the in-table PTE manipulation APIs. For now, The existing API is prefixed with double underscore to become the arch-private API and the public API is just a simple wrapper that calls the private API. The public API implementation will subsequently be used to transparently manipulate the contiguous bit where appropriate. But since there are already some contig-aware users (e.g. hugetlb, kernel mapper), we must first ensure those users use the private API directly so that the future contig-bit manipulations in the public API do not interfere with those existing uses. Tested-by: John Hubbard Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 9b4a9909fd5b..fc1005222ee4 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -138,7 +138,7 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys) * so that we don't erroneously return false for pages that have been * remapped as PROT_NONE but are yet to be flushed from the TLB. * Note that we can't make any assumptions based on the state of the access - * flag, since ptep_clear_flush_young() elides a DSB when invalidating the + * flag, since __ptep_clear_flush_young() elides a DSB when invalidating the * TLB. */ #define pte_accessible(mm, pte) \ @@ -904,8 +904,7 @@ static inline int __ptep_test_and_clear_young(struct vm_area_struct *vma, return pte_young(pte); } -#define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH -static inline int ptep_clear_flush_young(struct vm_area_struct *vma, +static inline int __ptep_clear_flush_young(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) { int young = __ptep_test_and_clear_young(vma, address, ptep); @@ -1119,6 +1118,8 @@ extern void ptep_modify_prot_commit(struct vm_area_struct *vma, #define ptep_get_and_clear __ptep_get_and_clear #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG #define ptep_test_and_clear_young __ptep_test_and_clear_young +#define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH +#define ptep_clear_flush_young __ptep_clear_flush_young #endif /* !__ASSEMBLY__ */ From patchwork Mon Dec 18 10:50:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 180268 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp1155112dyi; Mon, 18 Dec 2023 02:53:53 -0800 (PST) X-Google-Smtp-Source: AGHT+IEz8Yjdi7C3ztWAbm41DhdvNJ3qu88mww3AdeBPmueS2fpkiwml3RsuSafk1SJ38KfjVmMk X-Received: by 2002:a05:6808:124a:b0:3b8:b063:6bb1 with SMTP id o10-20020a056808124a00b003b8b0636bb1mr19999871oiv.96.1702896833578; Mon, 18 Dec 2023 02:53:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702896833; cv=none; d=google.com; s=arc-20160816; b=jV6vBeAJwGnO2mGxSmfwl1e9LBrVrqGnyyv+Hj8EKI1PLABqZa+nukGdC+oChOR4AX MnwmNkvx1IGWatZ2uz/58H+OwgFj8Ll43oxeCYa1Gu24jnIAcDrqnWKKaO6v2FHwhtey OttCBFlRHPYVq0pVf5YvtQ2RrwIF5l0UPgW4B5L1X8aw2h/bJLnnAEDeXNeYbF8+VGIJ gwEUYPf4ItOhaE9kU3eb1qRYDKS69ooLXc+uMawFLR3rN5Yi95SlPzWIj4DvTelPZKJd EPEOynGpG/ZK1kyts9xzK2q6IPzKTYPy3rK2jVGFv5Zt8VsrhAZsrqqAFp8y1tfHJJ66 XXdQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=L96YjzXNEj/7992DdZl7vC+6Lpqmu+YvM9dQCrXn2YQ=; fh=a3gZESXuO1ptbCtzvzEnZL5xUl+WC4Vy5hoF2yaOXSs=; b=SHQ6IeFlWmeh8hghGDrHbv0UQEij6/iqzJB9h3Pg8OukPjGjhqoZeY3J4r9kG747uQ U3qADOhrcNPGW+y2CKcXa0REechVUEL018v7e4nl26sn9UrDAVPGpDKuRxIetiCyR6Ow 8RfZ8lpz/tuvysd103avIQ5rnGr5F0Y8PUQB/lavYzjByWgFPB0JuuqqsNEwW8+sawps VQIBo3B/KKwoQTWBRyRH+MYs4crWanKe38ynSiM1KOmPXCLT9R6RJZ76rUNb1yRSssfh IX+kdytKBye2mWcWGI4ufAYYWZi8pPnTTs1SxhNlXEYMDSyY8TZydlearh+jql1oyVAR s0qg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3379-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3379-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id x18-20020a0ce0d2000000b0067f2a430aa6si4845522qvk.582.2023.12.18.02.53.53 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Dec 2023 02:53:53 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-3379-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3379-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3379-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 5B6951C22B5F for ; Mon, 18 Dec 2023 10:53:53 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id BED5C32C9B; Mon, 18 Dec 2023 10:52:02 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1719422EF2 for ; Mon, 18 Dec 2023 10:51:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 267FD13D5; Mon, 18 Dec 2023 02:52:39 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4C8F23F738; Mon, 18 Dec 2023 02:51:51 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland , David Hildenbrand , Kefeng Wang , John Hubbard , Zi Yan , Barry Song <21cnbao@gmail.com>, Alistair Popple , Yang Shi Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 10/16] arm64/mm: ptep_set_wrprotect(): New layer to manage contig bit Date: Mon, 18 Dec 2023 10:50:54 +0000 Message-Id: <20231218105100.172635-11-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231218105100.172635-1-ryan.roberts@arm.com> References: <20231218105100.172635-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785616750353630426 X-GMAIL-MSGID: 1785616750353630426 Create a new layer for the in-table PTE manipulation APIs. For now, The existing API is prefixed with double underscore to become the arch-private API and the public API is just a simple wrapper that calls the private API. The public API implementation will subsequently be used to transparently manipulate the contiguous bit where appropriate. But since there are already some contig-aware users (e.g. hugetlb, kernel mapper), we must first ensure those users use the private API directly so that the future contig-bit manipulations in the public API do not interfere with those existing uses. Tested-by: John Hubbard Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 10 ++++++---- arch/arm64/mm/hugetlbpage.c | 2 +- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index fc1005222ee4..423cc32b2777 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -958,11 +958,11 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ /* - * ptep_set_wrprotect - mark read-only while trasferring potential hardware + * __ptep_set_wrprotect - mark read-only while trasferring potential hardware * dirty status (PTE_DBM && !PTE_RDONLY) to the software PTE_DIRTY bit. */ -#define __HAVE_ARCH_PTEP_SET_WRPROTECT -static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long address, pte_t *ptep) +static inline void __ptep_set_wrprotect(struct mm_struct *mm, + unsigned long address, pte_t *ptep) { pte_t old_pte, pte; @@ -980,7 +980,7 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addres static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long address, pmd_t *pmdp) { - ptep_set_wrprotect(mm, address, (pte_t *)pmdp); + __ptep_set_wrprotect(mm, address, (pte_t *)pmdp); } #define pmdp_establish pmdp_establish @@ -1120,6 +1120,8 @@ extern void ptep_modify_prot_commit(struct vm_area_struct *vma, #define ptep_test_and_clear_young __ptep_test_and_clear_young #define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH #define ptep_clear_flush_young __ptep_clear_flush_young +#define __HAVE_ARCH_PTEP_SET_WRPROTECT +#define ptep_set_wrprotect __ptep_set_wrprotect #endif /* !__ASSEMBLY__ */ diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index c2a753541d13..952462820d9d 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -493,7 +493,7 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, pte_t pte; if (!pte_cont(READ_ONCE(*ptep))) { - ptep_set_wrprotect(mm, addr, ptep); + __ptep_set_wrprotect(mm, addr, ptep); return; } From patchwork Mon Dec 18 10:50:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 180269 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp1155378dyi; Mon, 18 Dec 2023 02:54:35 -0800 (PST) X-Google-Smtp-Source: AGHT+IFYSGr6/0H0Irpny13z2V9Ilrc+ZVLJpLbwTACup6Luyugxeuk+9mk0/0JnkYcSXTZmkaEy X-Received: by 2002:a05:6808:640f:b0:3b9:f10f:b69f with SMTP id fg15-20020a056808640f00b003b9f10fb69fmr18796464oib.11.1702896875429; Mon, 18 Dec 2023 02:54:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702896875; cv=none; d=google.com; s=arc-20160816; b=G5usDIWsuEQ7zTOJovEChut88qzqHQcxbt5AStvfVlSSqfDyfO/7o3JkgYya3pGqQf v6kf0Rp0UkNO10rfNIRM2OtYqqrV+29Sm2ISNoU/PJ4XnOQ0QgdC+5BBnOsYi9fK7qQh /xD48lihsdMLMFGzXzXxJHJBjpJy70rioZDRyrRj6bzi/mNpAM5+1wnVd9NsNq3Yz7V5 PUN2ACtzdk+ToIKWFAEtaHjObmW4dr3JYmKLTvpWXbcRD7JS0ToRGfnSOJNAKoMvcHm7 peWQN6nrMROTqyELS91F+EjHAUFEml1v8txgHcxWyYjKKgOnFR2/DUEdHYF669l4jCVb 7QvA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=aK2GgN2wmseeTtTSFeJsHcj3Zu5KeNbIGeBRlv4JN2k=; fh=a3gZESXuO1ptbCtzvzEnZL5xUl+WC4Vy5hoF2yaOXSs=; b=NH0/3Hr9ZkKOTHK0zmy+fq8btLxhA1EdTU2gd/Q+cYspbcrCyoOEXZPWCUshiHDcxq DqKBe5oiCVM+nen1lcZ4gQwP5gbDg54LcPEoK0PqSHkJWf9H5HSaMigyLW0PbPjdUa2u CS+URoyuzHkHB3pwhdUzxHCanCHN5Ho9aHkT5sRdFv+WgKrgmr/2Ix7Yld72SohJUFrq kNLSGIGc+myeNezu3+Xi+S7AZaK2OB9y4DhtTdltpQ9e7yjkYN+du9Wd7aigBJuzmPEm VEMjxHhWSArDBQzDnMVt9uBn049utDKUCq9G73bCiWaWPPVQ/I50UzmRULnT1w1mqwmX yRZw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3381-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3381-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id t6-20020a63dd06000000b005cd9bc22be5si1568439pgg.283.2023.12.18.02.54.35 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Dec 2023 02:54:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-3381-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3381-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3381-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 7647C2841C8 for ; Mon, 18 Dec 2023 10:54:23 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 28845374E2; Mon, 18 Dec 2023 10:52:10 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 8EF0732C95 for ; Mon, 18 Dec 2023 10:52:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5EA192F4; Mon, 18 Dec 2023 02:52:46 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 852853F738; Mon, 18 Dec 2023 02:51:58 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland , David Hildenbrand , Kefeng Wang , John Hubbard , Zi Yan , Barry Song <21cnbao@gmail.com>, Alistair Popple , Yang Shi Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 12/16] arm64/mm: ptep_get(): New layer to manage contig bit Date: Mon, 18 Dec 2023 10:50:56 +0000 Message-Id: <20231218105100.172635-13-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231218105100.172635-1-ryan.roberts@arm.com> References: <20231218105100.172635-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785616793713614073 X-GMAIL-MSGID: 1785616793713614073 Create a new layer for the in-table PTE manipulation APIs. For now, The existing API is prefixed with double underscore to become the arch-private API and the public API is just a simple wrapper that calls the private API. The public API implementation will subsequently be used to transparently manipulate the contiguous bit where appropriate. But since there are already some contig-aware users (e.g. hugetlb, kernel mapper), we must first ensure those users use the private API directly so that the future contig-bit manipulations in the public API do not interfere with those existing uses. arm64 did not previously define an arch-specific ptep_get(), so override the default version in the arch code, and also define the private __ptep_get() version. Currently they both do the same thing that the default version does (READ_ONCE()). Some arch users (hugetlb) were already using ptep_get() so convert those to the private API. While other callsites were doing direct READ_ONCE(), so convert those to use the appropriate (public/private) API too. Tested-by: John Hubbard Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 12 +++++++++--- arch/arm64/kernel/efi.c | 2 +- arch/arm64/mm/fault.c | 4 ++-- arch/arm64/mm/hugetlbpage.c | 18 +++++++++--------- arch/arm64/mm/kasan_init.c | 2 +- arch/arm64/mm/mmu.c | 12 ++++++------ arch/arm64/mm/pageattr.c | 4 ++-- arch/arm64/mm/trans_pgd.c | 2 +- 8 files changed, 31 insertions(+), 25 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 85010c2d4dfa..6930c14f062f 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -276,6 +276,11 @@ static inline void __set_pte(pte_t *ptep, pte_t pte) } } +static inline pte_t __ptep_get(pte_t *ptep) +{ + return READ_ONCE(*ptep); +} + extern void __sync_icache_dcache(pte_t pteval); bool pgattr_change_is_safe(u64 old, u64 new); @@ -303,7 +308,7 @@ static inline void __check_safe_pte_update(struct mm_struct *mm, pte_t *ptep, if (!IS_ENABLED(CONFIG_DEBUG_VM)) return; - old_pte = READ_ONCE(*ptep); + old_pte = __ptep_get(ptep); if (!pte_valid(old_pte) || !pte_valid(pte)) return; @@ -893,7 +898,7 @@ static inline int __ptep_test_and_clear_young(struct vm_area_struct *vma, { pte_t old_pte, pte; - pte = READ_ONCE(*ptep); + pte = __ptep_get(ptep); do { old_pte = pte; pte = pte_mkold(pte); @@ -966,7 +971,7 @@ static inline void __ptep_set_wrprotect(struct mm_struct *mm, { pte_t old_pte, pte; - pte = READ_ONCE(*ptep); + pte = __ptep_get(ptep); do { old_pte = pte; pte = pte_wrprotect(pte); @@ -1111,6 +1116,7 @@ extern void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, pte_t old_pte, pte_t new_pte); +#define ptep_get __ptep_get #define set_pte __set_pte #define set_ptes __set_ptes #define pte_clear __pte_clear diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c index 44288a12fc6c..9afcc690fe73 100644 --- a/arch/arm64/kernel/efi.c +++ b/arch/arm64/kernel/efi.c @@ -103,7 +103,7 @@ static int __init set_permissions(pte_t *ptep, unsigned long addr, void *data) { struct set_perm_data *spd = data; const efi_memory_desc_t *md = spd->md; - pte_t pte = READ_ONCE(*ptep); + pte_t pte = __ptep_get(ptep); if (md->attribute & EFI_MEMORY_RO) pte = set_pte_bit(pte, __pgprot(PTE_RDONLY)); diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 7cebd9847aae..d63f3a0a7251 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -191,7 +191,7 @@ static void show_pte(unsigned long addr) if (!ptep) break; - pte = READ_ONCE(*ptep); + pte = __ptep_get(ptep); pr_cont(", pte=%016llx", pte_val(pte)); pte_unmap(ptep); } while(0); @@ -214,7 +214,7 @@ int __ptep_set_access_flags(struct vm_area_struct *vma, pte_t entry, int dirty) { pteval_t old_pteval, pteval; - pte_t pte = READ_ONCE(*ptep); + pte_t pte = __ptep_get(ptep); if (pte_same(pte, entry)) return 0; diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 627a9717e98c..52fb767607e0 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -152,14 +152,14 @@ pte_t huge_ptep_get(pte_t *ptep) { int ncontig, i; size_t pgsize; - pte_t orig_pte = ptep_get(ptep); + pte_t orig_pte = __ptep_get(ptep); if (!pte_present(orig_pte) || !pte_cont(orig_pte)) return orig_pte; ncontig = num_contig_ptes(page_size(pte_page(orig_pte)), &pgsize); for (i = 0; i < ncontig; i++, ptep++) { - pte_t pte = ptep_get(ptep); + pte_t pte = __ptep_get(ptep); if (pte_dirty(pte)) orig_pte = pte_mkdirty(orig_pte); @@ -184,7 +184,7 @@ static pte_t get_clear_contig(struct mm_struct *mm, unsigned long pgsize, unsigned long ncontig) { - pte_t orig_pte = ptep_get(ptep); + pte_t orig_pte = __ptep_get(ptep); unsigned long i; for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) { @@ -408,7 +408,7 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, { int ncontig; size_t pgsize; - pte_t orig_pte = ptep_get(ptep); + pte_t orig_pte = __ptep_get(ptep); if (!pte_cont(orig_pte)) return __ptep_get_and_clear(mm, addr, ptep); @@ -431,11 +431,11 @@ static int __cont_access_flags_changed(pte_t *ptep, pte_t pte, int ncontig) { int i; - if (pte_write(pte) != pte_write(ptep_get(ptep))) + if (pte_write(pte) != pte_write(__ptep_get(ptep))) return 1; for (i = 0; i < ncontig; i++) { - pte_t orig_pte = ptep_get(ptep + i); + pte_t orig_pte = __ptep_get(ptep + i); if (pte_dirty(pte) != pte_dirty(orig_pte)) return 1; @@ -492,7 +492,7 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, size_t pgsize; pte_t pte; - if (!pte_cont(READ_ONCE(*ptep))) { + if (!pte_cont(__ptep_get(ptep))) { __ptep_set_wrprotect(mm, addr, ptep); return; } @@ -517,7 +517,7 @@ pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, size_t pgsize; int ncontig; - if (!pte_cont(READ_ONCE(*ptep))) + if (!pte_cont(__ptep_get(ptep))) return ptep_clear_flush(vma, addr, ptep); ncontig = find_num_contig(mm, addr, ptep, &pgsize); @@ -550,7 +550,7 @@ pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr * when the permission changes from executable to non-executable * in cases where cpu is affected with errata #2645198. */ - if (pte_user_exec(READ_ONCE(*ptep))) + if (pte_user_exec(__ptep_get(ptep))) return huge_ptep_clear_flush(vma, addr, ptep); } return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep); diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c index 5eade712e9e5..5274c317d775 100644 --- a/arch/arm64/mm/kasan_init.c +++ b/arch/arm64/mm/kasan_init.c @@ -113,7 +113,7 @@ static void __init kasan_pte_populate(pmd_t *pmdp, unsigned long addr, memset(__va(page_phys), KASAN_SHADOW_INIT, PAGE_SIZE); next = addr + PAGE_SIZE; __set_pte(ptep, pfn_pte(__phys_to_pfn(page_phys), PAGE_KERNEL)); - } while (ptep++, addr = next, addr != end && pte_none(READ_ONCE(*ptep))); + } while (ptep++, addr = next, addr != end && pte_none(__ptep_get(ptep))); } static void __init kasan_pmd_populate(pud_t *pudp, unsigned long addr, diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 080e9b50f595..784f1e312447 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -176,7 +176,7 @@ static void init_pte(pmd_t *pmdp, unsigned long addr, unsigned long end, ptep = pte_set_fixmap_offset(pmdp, addr); do { - pte_t old_pte = READ_ONCE(*ptep); + pte_t old_pte = __ptep_get(ptep); __set_pte(ptep, pfn_pte(__phys_to_pfn(phys), prot)); @@ -185,7 +185,7 @@ static void init_pte(pmd_t *pmdp, unsigned long addr, unsigned long end, * only allow updates to the permission attributes. */ BUG_ON(!pgattr_change_is_safe(pte_val(old_pte), - READ_ONCE(pte_val(*ptep)))); + pte_val(__ptep_get(ptep)))); phys += PAGE_SIZE; } while (ptep++, addr += PAGE_SIZE, addr != end); @@ -854,7 +854,7 @@ static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr, do { ptep = pte_offset_kernel(pmdp, addr); - pte = READ_ONCE(*ptep); + pte = __ptep_get(ptep); if (pte_none(pte)) continue; @@ -987,7 +987,7 @@ static void free_empty_pte_table(pmd_t *pmdp, unsigned long addr, do { ptep = pte_offset_kernel(pmdp, addr); - pte = READ_ONCE(*ptep); + pte = __ptep_get(ptep); /* * This is just a sanity check here which verifies that @@ -1006,7 +1006,7 @@ static void free_empty_pte_table(pmd_t *pmdp, unsigned long addr, */ ptep = pte_offset_kernel(pmdp, 0UL); for (i = 0; i < PTRS_PER_PTE; i++) { - if (!pte_none(READ_ONCE(ptep[i]))) + if (!pte_none(__ptep_get(&ptep[i]))) return; } @@ -1475,7 +1475,7 @@ pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr, pte * when the permission changes from executable to non-executable * in cases where cpu is affected with errata #2645198. */ - if (pte_user_exec(READ_ONCE(*ptep))) + if (pte_user_exec(ptep_get(ptep))) return ptep_clear_flush(vma, addr, ptep); } return ptep_get_and_clear(vma->vm_mm, addr, ptep); diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c index a7996d8edf0a..0c4e3ecf989d 100644 --- a/arch/arm64/mm/pageattr.c +++ b/arch/arm64/mm/pageattr.c @@ -36,7 +36,7 @@ bool can_set_direct_map(void) static int change_page_range(pte_t *ptep, unsigned long addr, void *data) { struct page_change_data *cdata = data; - pte_t pte = READ_ONCE(*ptep); + pte_t pte = __ptep_get(ptep); pte = clear_pte_bit(pte, cdata->clear_mask); pte = set_pte_bit(pte, cdata->set_mask); @@ -245,5 +245,5 @@ bool kernel_page_present(struct page *page) return true; ptep = pte_offset_kernel(pmdp, addr); - return pte_valid(READ_ONCE(*ptep)); + return pte_valid(__ptep_get(ptep)); } diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c index 230b607cf881..5139a28130c0 100644 --- a/arch/arm64/mm/trans_pgd.c +++ b/arch/arm64/mm/trans_pgd.c @@ -33,7 +33,7 @@ static void *trans_alloc(struct trans_pgd_info *info) static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr) { - pte_t pte = READ_ONCE(*src_ptep); + pte_t pte = __ptep_get(src_ptep); if (pte_valid(pte)) { /* From patchwork Mon Dec 18 10:50:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 180270 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp1155425dyi; Mon, 18 Dec 2023 02:54:44 -0800 (PST) X-Google-Smtp-Source: AGHT+IG91X3GHEO4A6M5WpNIBQ3qQzzarCoAyfc3c7IIP3frAJg3b7BGlJ7koGddSPRs4eThRjnY X-Received: by 2002:a05:6a20:3504:b0:191:60c0:7cfd with SMTP id d4-20020a056a20350400b0019160c07cfdmr4814508pze.29.1702896884284; Mon, 18 Dec 2023 02:54:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702896884; cv=none; d=google.com; s=arc-20160816; b=gg7yf44GnLhmq+s8g21rqxIH+XGu94/KZntt8bBhAAqgjWVfeYDgtV27rbQzBHDGPH usJWTOKOQZOIWrF8fQFt6bLDF7vWSHc7mejOxWpmdMhVx15KVbkUHE/H203GMLKv1V+7 jCMXuVDaTTWfg8NzhQn39F8isA9eT/M1QfzI+2TpMidMYDZapsTaq+aYObg8bYaDDOfb YzffveLVn3cZWJhKp29NnlpElMFbGaCvBvcfH57Nn8cbfcCvCCyqsN3bdK4Mh21g/nVC uyM8vbwqiRbpSHDougxO/HaQPt9N3l6d1RycIakift1mKBUW69PHblaCKfTtEpj3t7tm WURg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=3XWNUf0y+LoNhJb/OvSX6jZDQxYAWNT/5GVeo+DL1UM=; fh=a3gZESXuO1ptbCtzvzEnZL5xUl+WC4Vy5hoF2yaOXSs=; b=SPHzaOJoM/TzVKTy/iJ6uELc+1tp9lFY4Mu2rSlD56vhgayu/7PlcgsdNOQFC0c/D7 YwKe4dBZoLrwrIC8RqclXw5YmOqbUGVMoOMNegbSIc111EGXyB1FwEJwNRyxW92v6cF0 DSl3kH3pGtTZorMzt86Y7S0Oysk0mre6Kxcy+mhHiuP3mb/+use/NEFAEjLq8xig56vO 8CTJvCg0zHG9uteHDobMoZHNM5L6AAd1GTK4AB2OELTB3vdl8ZoCVZ6wkJ3p9xRdYEbD rF5UsYie1LBLIa/B/bGw6/RkzJfT4CIKheeb0AcYN9pxUpJK8exY9gg96ztE4bfkQnOe Y2Kg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3382-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3382-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id l12-20020a17090a660c00b0028add4281a1si8159901pjj.82.2023.12.18.02.54.44 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Dec 2023 02:54:44 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-3382-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3382-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3382-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 5BC782823CA for ; Mon, 18 Dec 2023 10:54:33 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D6B5E37D17; Mon, 18 Dec 2023 10:52:12 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id EFB4A34CEB for ; Mon, 18 Dec 2023 10:52:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 000181FB; Mon, 18 Dec 2023 02:52:49 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 26B0F3F738; Mon, 18 Dec 2023 02:52:02 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland , David Hildenbrand , Kefeng Wang , John Hubbard , Zi Yan , Barry Song <21cnbao@gmail.com>, Alistair Popple , Yang Shi Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 13/16] arm64/mm: Split __flush_tlb_range() to elide trailing DSB Date: Mon, 18 Dec 2023 10:50:57 +0000 Message-Id: <20231218105100.172635-14-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231218105100.172635-1-ryan.roberts@arm.com> References: <20231218105100.172635-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785616803455313817 X-GMAIL-MSGID: 1785616803455313817 Split __flush_tlb_range() into __flush_tlb_range_nosync() + __flush_tlb_range(), in the same way as the existing flush_tlb_page() arrangement. This allows calling __flush_tlb_range_nosync() to elide the trailing DSB. Forthcoming "contpte" code will take advantage of this when clearing the young bit from a contiguous range of ptes. Tested-by: John Hubbard Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/tlbflush.h | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index bb2c2833a987..925ef3bdf9ed 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -399,7 +399,7 @@ do { \ #define __flush_s2_tlb_range_op(op, start, pages, stride, tlb_level) \ __flush_tlb_range_op(op, start, pages, stride, 0, tlb_level, false) -static inline void __flush_tlb_range(struct vm_area_struct *vma, +static inline void __flush_tlb_range_nosync(struct vm_area_struct *vma, unsigned long start, unsigned long end, unsigned long stride, bool last_level, int tlb_level) @@ -431,10 +431,19 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma, else __flush_tlb_range_op(vae1is, start, pages, stride, asid, tlb_level, true); - dsb(ish); mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, start, end); } +static inline void __flush_tlb_range(struct vm_area_struct *vma, + unsigned long start, unsigned long end, + unsigned long stride, bool last_level, + int tlb_level) +{ + __flush_tlb_range_nosync(vma, start, end, stride, + last_level, tlb_level); + dsb(ish); +} + static inline void flush_tlb_range(struct vm_area_struct *vma, unsigned long start, unsigned long end) { From patchwork Mon Dec 18 10:50:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 180271 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp1155554dyi; Mon, 18 Dec 2023 02:55:08 -0800 (PST) X-Google-Smtp-Source: AGHT+IEZx5kxobjanA54D9aZbKzbTR9T5rWvnR8DchCkKwDJF9MVilZrEwo4V4o3aZmm2PH/opH0 X-Received: by 2002:a05:6a00:4c0d:b0:6d8:47ff:606e with SMTP id ea13-20020a056a004c0d00b006d847ff606emr1073155pfb.15.1702896908386; Mon, 18 Dec 2023 02:55:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702896908; cv=none; d=google.com; s=arc-20160816; b=F+SfQqwFS/mjpJRsnB4ny0vzrcMdMR4EnmAqIhBQvV+rqD1lJRSgv40ikO6tu4mzvT 5N1IOZK/tcXG0O2hqF+8eaT0kQmgIP9ZUwgvsUDQ6qWwqnHrUURz8jr6ShyXArGe5I0E W4K/t9/mir6GP9/FKYh0lJICb021qLQVgoI7o5xMXysssTigZaBH0sYIjq2SSZ+Kt0xm La2c6rLfmGm+r9U4+Pcbh29oIOQKbK5oLQ91R7PIQYuZZezoxc/Gcir2ysux84sFxH/U wHEzFf1R1QZ006QwQqI0QoiITOZk8qjI6Kyco36LXxQ5ht2t2XdIqrbcg0TasDRdqQay YfCg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=j8NHhbxBUceulF2OYwyIFDA/Hi188nG8B9JKA9Sk7RA=; fh=a3gZESXuO1ptbCtzvzEnZL5xUl+WC4Vy5hoF2yaOXSs=; b=XqcOtMEUdYMBXKuAFYP2GryBHUckbe4evvDL+6AgiFgIFCcjM15HxO+QQTccCId315 PBdt08JSFWCfzqAt0zIi50SFYyJ8VfvZvHdxcJTBsNzkXz3Q8WioX3a9wPR71KTE2hWD NJLc74UeRMR/jrhuVU01wHtckkApx1patk1/+DcYkcvsSirRqTYkkBM2UT4Tr943fpZS Eh9SO9rXz5YGKSHBMdz9/dkyiKVTxDOXIiNVsN3lArUfgFoRqdtVSA5XVOXueMmkU1Fr +EqhkjhG7AzvpG+dTRIMLD3Sr2ZVbHmbQISoa9OY/x041cevh4UHTLIlEsapfLoD4ZwS Ghxw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3383-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3383-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id e16-20020a056a001a9000b006ce51b7827esi17852395pfv.294.2023.12.18.02.55.08 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Dec 2023 02:55:08 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-3383-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3383-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3383-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 0EC77282508 for ; Mon, 18 Dec 2023 10:54:59 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6C7B639842; Mon, 18 Dec 2023 10:52:22 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 735CC374F3 for ; Mon, 18 Dec 2023 10:52:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D7FF213D5; Mon, 18 Dec 2023 02:52:53 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id BD3703F738; Mon, 18 Dec 2023 02:52:05 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland , David Hildenbrand , Kefeng Wang , John Hubbard , Zi Yan , Barry Song <21cnbao@gmail.com>, Alistair Popple , Yang Shi Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 14/16] arm64/mm: Wire up PTE_CONT for user mappings Date: Mon, 18 Dec 2023 10:50:58 +0000 Message-Id: <20231218105100.172635-15-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231218105100.172635-1-ryan.roberts@arm.com> References: <20231218105100.172635-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785616828720579120 X-GMAIL-MSGID: 1785616828720579120 With the ptep API sufficiently refactored, we can now introduce a new "contpte" API layer, which transparently manages the PTE_CONT bit for user mappings. Whenever it detects a set of PTEs that meet the requirements for a contiguous range, the PTEs are re-painted with the PTE_CONT bit. Use of contpte mappings is intended to be transparent to the core-mm, which continues to interact with individual ptes. Since a contpte block only has a single access and dirty bit, the semantic here changes slightly; when getting a pte (e.g. ptep_get()) that is part of a contpte mapping, the access and dirty information are pulled from the block (so all ptes in the block return the same access/dirty info). When changing the access/dirty info on a pte (e.g. ptep_set_access_flags()) that is part of a contpte mapping, this change will affect the whole contpte block. This is works fine in practice since we guarantee that only a single folio is mapped by a contpte block, and the core-mm tracks access/dirty information per folio. This initial change provides a baseline that can be optimized in future commits. That said, fold/unfold operations (which imply tlb invalidation) are avoided where possible with a few tricks for access/dirty bit management. Write-protect modifications for contpte mappings are currently non-optimal, and incure a regression in fork() performance. This will be addressed in follow-up changes. In order for the public functions, which used to be pure inline, to continue to be callable by modules, export all the contpte_* symbols that are now called by those public inline functions. The feature is enabled/disabled with the ARM64_CONTPTE Kconfig parameter at build time. It defaults to enabled as long as its dependency, TRANSPARENT_HUGEPAGE is also enabled. The core-mm depends upon TRANSPARENT_HUGEPAGE to be able to allocate large folios, so if its not enabled, then there is no chance of meeting the physical contiguity requirement for contpte mappings. Tested-by: John Hubbard Signed-off-by: Ryan Roberts --- arch/arm64/Kconfig | 10 +- arch/arm64/include/asm/pgtable.h | 184 +++++++++++++++ arch/arm64/mm/Makefile | 1 + arch/arm64/mm/contpte.c | 388 +++++++++++++++++++++++++++++++ 4 files changed, 582 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/mm/contpte.c diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 7b071a00425d..de76e484ff3a 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -2209,6 +2209,15 @@ config UNWIND_PATCH_PAC_INTO_SCS select UNWIND_TABLES select DYNAMIC_SCS +config ARM64_CONTPTE + bool "Contiguous PTE mappings for user memory" if EXPERT + depends on TRANSPARENT_HUGEPAGE + default y + help + When enabled, user mappings are configured using the PTE contiguous + bit, for any mappings that meet the size and alignment requirements. + This reduces TLB pressure and improves performance. + endmenu # "Kernel Features" menu "Boot options" @@ -2318,4 +2327,3 @@ endmenu # "CPU Power Management" source "drivers/acpi/Kconfig" source "arch/arm64/kvm/Kconfig" - diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 6930c14f062f..e64120452301 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -133,6 +133,10 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys) */ #define pte_valid_not_user(pte) \ ((pte_val(pte) & (PTE_VALID | PTE_USER | PTE_UXN)) == (PTE_VALID | PTE_UXN)) +/* + * Returns true if the pte is valid and has the contiguous bit set. + */ +#define pte_valid_cont(pte) (pte_valid(pte) && pte_cont(pte)) /* * Could the pte be present in the TLB? We must check mm_tlb_flush_pending * so that we don't erroneously return false for pages that have been @@ -1116,6 +1120,184 @@ extern void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, pte_t old_pte, pte_t new_pte); +#ifdef CONFIG_ARM64_CONTPTE + +/* + * The contpte APIs are used to transparently manage the contiguous bit in ptes + * where it is possible and makes sense to do so. The PTE_CONT bit is considered + * a private implementation detail of the public ptep API (see below). + */ +extern void __contpte_try_fold(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte); +extern void __contpte_try_unfold(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte); +extern pte_t contpte_ptep_get(pte_t *ptep, pte_t orig_pte); +extern pte_t contpte_ptep_get_lockless(pte_t *orig_ptep); +extern void contpte_set_ptes(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte, unsigned int nr); +extern int contpte_ptep_test_and_clear_young(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep); +extern int contpte_ptep_clear_flush_young(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep); +extern int contpte_ptep_set_access_flags(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep, + pte_t entry, int dirty); + +static inline void contpte_try_fold(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte) +{ + /* + * Only bother trying if both the virtual and physical addresses are + * aligned and correspond to the last entry in a contig range. The core + * code mostly modifies ranges from low to high, so this is the likely + * the last modification in the contig range, so a good time to fold. + * We can't fold special mappings, because there is no associated folio. + */ + + const unsigned long contmask = CONT_PTES - 1; + bool valign = (((unsigned long)ptep >> 3) & contmask) == contmask; + bool palign = (pte_pfn(pte) & contmask) == contmask; + + if (valign && palign && + pte_valid(pte) && !pte_cont(pte) && !pte_special(pte)) + __contpte_try_fold(mm, addr, ptep, pte); +} + +static inline void contpte_try_unfold(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte) +{ + if (pte_valid_cont(pte)) + __contpte_try_unfold(mm, addr, ptep, pte); +} + +/* + * The below functions constitute the public API that arm64 presents to the + * core-mm to manipulate PTE entries within their page tables (or at least this + * is the subset of the API that arm64 needs to implement). These public + * versions will automatically and transparently apply the contiguous bit where + * it makes sense to do so. Therefore any users that are contig-aware (e.g. + * hugetlb, kernel mapper) should NOT use these APIs, but instead use the + * private versions, which are prefixed with double underscore. All of these + * APIs except for ptep_get_lockless() are expected to be called with the PTL + * held. + */ + +#define ptep_get ptep_get +static inline pte_t ptep_get(pte_t *ptep) +{ + pte_t pte = __ptep_get(ptep); + + if (!pte_valid_cont(pte)) + return pte; + + return contpte_ptep_get(ptep, pte); +} + +#define ptep_get_lockless ptep_get_lockless +static inline pte_t ptep_get_lockless(pte_t *ptep) +{ + pte_t pte = __ptep_get(ptep); + + if (!pte_valid_cont(pte)) + return pte; + + return contpte_ptep_get_lockless(ptep); +} + +static inline void set_pte(pte_t *ptep, pte_t pte) +{ + /* + * We don't have the mm or vaddr so cannot unfold or fold contig entries + * (since it requires tlb maintenance). set_pte() is not used in core + * code, so this should never even be called. Regardless do our best to + * service any call and emit a warning if there is any attempt to set a + * pte on top of an existing contig range. + */ + pte_t orig_pte = __ptep_get(ptep); + + WARN_ON_ONCE(pte_valid_cont(orig_pte)); + __set_pte(ptep, pte_mknoncont(pte)); +} + +#define set_ptes set_ptes +static inline void set_ptes(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte, unsigned int nr) +{ + pte = pte_mknoncont(pte); + + if (nr == 1) { + contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); + __set_ptes(mm, addr, ptep, pte, 1); + contpte_try_fold(mm, addr, ptep, pte); + } else + contpte_set_ptes(mm, addr, ptep, pte, nr); +} + +static inline void pte_clear(struct mm_struct *mm, + unsigned long addr, pte_t *ptep) +{ + contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); + __pte_clear(mm, addr, ptep); +} + +#define __HAVE_ARCH_PTEP_GET_AND_CLEAR +static inline pte_t ptep_get_and_clear(struct mm_struct *mm, + unsigned long addr, pte_t *ptep) +{ + contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); + return __ptep_get_and_clear(mm, addr, ptep); +} + +#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG +static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ + pte_t orig_pte = __ptep_get(ptep); + + if (!pte_valid_cont(orig_pte)) + return __ptep_test_and_clear_young(vma, addr, ptep); + + return contpte_ptep_test_and_clear_young(vma, addr, ptep); +} + +#define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH +static inline int ptep_clear_flush_young(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ + pte_t orig_pte = __ptep_get(ptep); + + if (!pte_valid_cont(orig_pte)) + return __ptep_clear_flush_young(vma, addr, ptep); + + return contpte_ptep_clear_flush_young(vma, addr, ptep); +} + +#define __HAVE_ARCH_PTEP_SET_WRPROTECT +static inline void ptep_set_wrprotect(struct mm_struct *mm, + unsigned long addr, pte_t *ptep) +{ + contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); + __ptep_set_wrprotect(mm, addr, ptep); + contpte_try_fold(mm, addr, ptep, __ptep_get(ptep)); +} + +#define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS +static inline int ptep_set_access_flags(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep, + pte_t entry, int dirty) +{ + pte_t orig_pte = __ptep_get(ptep); + + entry = pte_mknoncont(entry); + + if (!pte_valid_cont(orig_pte)) + return __ptep_set_access_flags(vma, addr, ptep, entry, dirty); + + return contpte_ptep_set_access_flags(vma, addr, ptep, entry, dirty); +} + +#else /* CONFIG_ARM64_CONTPTE */ + #define ptep_get __ptep_get #define set_pte __set_pte #define set_ptes __set_ptes @@ -1131,6 +1313,8 @@ extern void ptep_modify_prot_commit(struct vm_area_struct *vma, #define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS #define ptep_set_access_flags __ptep_set_access_flags +#endif /* CONFIG_ARM64_CONTPTE */ + #endif /* !__ASSEMBLY__ */ #endif /* __ASM_PGTABLE_H */ diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile index dbd1bc95967d..60454256945b 100644 --- a/arch/arm64/mm/Makefile +++ b/arch/arm64/mm/Makefile @@ -3,6 +3,7 @@ obj-y := dma-mapping.o extable.o fault.o init.o \ cache.o copypage.o flush.o \ ioremap.o mmap.o pgd.o mmu.o \ context.o proc.o pageattr.o fixmap.o +obj-$(CONFIG_ARM64_CONTPTE) += contpte.o obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o obj-$(CONFIG_PTDUMP_CORE) += ptdump.o obj-$(CONFIG_PTDUMP_DEBUGFS) += ptdump_debugfs.o diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c new file mode 100644 index 000000000000..69c36749dd98 --- /dev/null +++ b/arch/arm64/mm/contpte.c @@ -0,0 +1,388 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2023 ARM Ltd. + */ + +#include +#include +#include + +static inline bool mm_is_user(struct mm_struct *mm) +{ + /* + * Don't attempt to apply the contig bit to kernel mappings, because + * dynamically adding/removing the contig bit can cause page faults. + * These racing faults are ok for user space, since they get serialized + * on the PTL. But kernel mappings can't tolerate faults. + */ + return mm != &init_mm; +} + +static inline pte_t *contpte_align_down(pte_t *ptep) +{ + return (pte_t *)(ALIGN_DOWN((unsigned long)ptep >> 3, CONT_PTES) << 3); +} + +static void ptep_clear_flush_range(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, int nr) +{ + struct vm_area_struct vma = TLB_FLUSH_VMA(mm, 0); + unsigned long start_addr = addr; + int i; + + for (i = 0; i < nr; i++, ptep++, addr += PAGE_SIZE) + __pte_clear(mm, addr, ptep); + + __flush_tlb_range(&vma, start_addr, addr, PAGE_SIZE, true, 3); +} + +static bool ptep_any_valid(pte_t *ptep, int nr) +{ + int i; + + for (i = 0; i < nr; i++, ptep++) { + if (pte_valid(__ptep_get(ptep))) + return true; + } + + return false; +} + +static void contpte_convert(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte) +{ + struct vm_area_struct vma = TLB_FLUSH_VMA(mm, 0); + unsigned long start_addr; + pte_t *start_ptep; + int i; + + start_ptep = ptep = contpte_align_down(ptep); + start_addr = addr = ALIGN_DOWN(addr, CONT_PTE_SIZE); + pte = pfn_pte(ALIGN_DOWN(pte_pfn(pte), CONT_PTES), pte_pgprot(pte)); + + for (i = 0; i < CONT_PTES; i++, ptep++, addr += PAGE_SIZE) { + pte_t ptent = __ptep_get_and_clear(mm, addr, ptep); + + if (pte_dirty(ptent)) + pte = pte_mkdirty(pte); + + if (pte_young(ptent)) + pte = pte_mkyoung(pte); + } + + __flush_tlb_range(&vma, start_addr, addr, PAGE_SIZE, true, 3); + + __set_ptes(mm, start_addr, start_ptep, pte, CONT_PTES); +} + +void __contpte_try_fold(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte) +{ + /* + * We have already checked that the virtual and pysical addresses are + * correctly aligned for a contpte mapping in contpte_try_fold() so the + * remaining checks are to ensure that the contpte range is fully + * covered by a single folio, and ensure that all the ptes are valid + * with contiguous PFNs and matching prots. We ignore the state of the + * access and dirty bits for the purpose of deciding if its a contiguous + * range; the folding process will generate a single contpte entry which + * has a single access and dirty bit. Those 2 bits are the logical OR of + * their respective bits in the constituent pte entries. In order to + * ensure the contpte range is covered by a single folio, we must + * recover the folio from the pfn, but special mappings don't have a + * folio backing them. Fortunately contpte_try_fold() already checked + * that the pte is not special - we never try to fold special mappings. + * Note we can't use vm_normal_page() for this since we don't have the + * vma. + */ + + unsigned long folio_saddr; + unsigned long folio_eaddr; + unsigned long cont_saddr; + unsigned long cont_eaddr; + struct folio *folio; + struct page *page; + unsigned long pfn; + pte_t *orig_ptep; + pgprot_t prot; + pte_t subpte; + int i; + + if (!mm_is_user(mm)) + return; + + page = pte_page(pte); + folio = page_folio(page); + folio_saddr = addr - (page - &folio->page) * PAGE_SIZE; + folio_eaddr = folio_saddr + folio_nr_pages(folio) * PAGE_SIZE; + cont_saddr = ALIGN_DOWN(addr, CONT_PTE_SIZE); + cont_eaddr = cont_saddr + CONT_PTE_SIZE; + + if (folio_saddr > cont_saddr || folio_eaddr < cont_eaddr) + return; + + pfn = pte_pfn(pte) - ((addr - cont_saddr) >> PAGE_SHIFT); + prot = pte_pgprot(pte_mkold(pte_mkclean(pte))); + orig_ptep = ptep; + ptep = contpte_align_down(ptep); + + for (i = 0; i < CONT_PTES; i++, ptep++, pfn++) { + subpte = __ptep_get(ptep); + subpte = pte_mkold(pte_mkclean(subpte)); + + if (!pte_valid(subpte) || + pte_pfn(subpte) != pfn || + pgprot_val(pte_pgprot(subpte)) != pgprot_val(prot)) + return; + } + + pte = pte_mkcont(pte); + contpte_convert(mm, addr, orig_ptep, pte); +} +EXPORT_SYMBOL(__contpte_try_fold); + +void __contpte_try_unfold(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte) +{ + /* + * We have already checked that the ptes are contiguous in + * contpte_try_unfold(), so just check that the mm is user space. + */ + + if (!mm_is_user(mm)) + return; + + pte = pte_mknoncont(pte); + contpte_convert(mm, addr, ptep, pte); +} +EXPORT_SYMBOL(__contpte_try_unfold); + +pte_t contpte_ptep_get(pte_t *ptep, pte_t orig_pte) +{ + /* + * Gather access/dirty bits, which may be populated in any of the ptes + * of the contig range. We are guarranteed to be holding the PTL, so any + * contiguous range cannot be unfolded or otherwise modified under our + * feet. + */ + + pte_t pte; + int i; + + ptep = contpte_align_down(ptep); + + for (i = 0; i < CONT_PTES; i++, ptep++) { + pte = __ptep_get(ptep); + + if (pte_dirty(pte)) + orig_pte = pte_mkdirty(orig_pte); + + if (pte_young(pte)) + orig_pte = pte_mkyoung(orig_pte); + } + + return orig_pte; +} +EXPORT_SYMBOL(contpte_ptep_get); + +pte_t contpte_ptep_get_lockless(pte_t *orig_ptep) +{ + /* + * Gather access/dirty bits, which may be populated in any of the ptes + * of the contig range. We may not be holding the PTL, so any contiguous + * range may be unfolded/modified/refolded under our feet. Therefore we + * ensure we read a _consistent_ contpte range by checking that all ptes + * in the range are valid and have CONT_PTE set, that all pfns are + * contiguous and that all pgprots are the same (ignoring access/dirty). + * If we find a pte that is not consistent, then we must be racing with + * an update so start again. If the target pte does not have CONT_PTE + * set then that is considered consistent on its own because it is not + * part of a contpte range. + */ + + pgprot_t orig_prot; + unsigned long pfn; + pte_t orig_pte; + pgprot_t prot; + pte_t *ptep; + pte_t pte; + int i; + +retry: + orig_pte = __ptep_get(orig_ptep); + + if (!pte_valid_cont(orig_pte)) + return orig_pte; + + orig_prot = pte_pgprot(pte_mkold(pte_mkclean(orig_pte))); + ptep = contpte_align_down(orig_ptep); + pfn = pte_pfn(orig_pte) - (orig_ptep - ptep); + + for (i = 0; i < CONT_PTES; i++, ptep++, pfn++) { + pte = __ptep_get(ptep); + prot = pte_pgprot(pte_mkold(pte_mkclean(pte))); + + if (!pte_valid_cont(pte) || + pte_pfn(pte) != pfn || + pgprot_val(prot) != pgprot_val(orig_prot)) + goto retry; + + if (pte_dirty(pte)) + orig_pte = pte_mkdirty(orig_pte); + + if (pte_young(pte)) + orig_pte = pte_mkyoung(orig_pte); + } + + return orig_pte; +} +EXPORT_SYMBOL(contpte_ptep_get_lockless); + +void contpte_set_ptes(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte, unsigned int nr) +{ + unsigned long next; + unsigned long end; + unsigned long pfn; + pgprot_t prot; + pte_t orig_pte; + + if (!mm_is_user(mm)) + return __set_ptes(mm, addr, ptep, pte, nr); + + end = addr + (nr << PAGE_SHIFT); + pfn = pte_pfn(pte); + prot = pte_pgprot(pte); + + do { + next = pte_cont_addr_end(addr, end); + nr = (next - addr) >> PAGE_SHIFT; + pte = pfn_pte(pfn, prot); + + if (((addr | next | (pfn << PAGE_SHIFT)) & ~CONT_PTE_MASK) == 0) + pte = pte_mkcont(pte); + else + pte = pte_mknoncont(pte); + + /* + * If operating on a partial contiguous range then we must first + * unfold the contiguous range if it was previously folded. + * Otherwise we could end up with overlapping tlb entries. + */ + if (nr != CONT_PTES) + contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); + + /* + * If we are replacing ptes that were contiguous or if the new + * ptes are contiguous and any of the ptes being replaced are + * valid, we need to clear and flush the range to prevent + * overlapping tlb entries. + */ + orig_pte = __ptep_get(ptep); + if (pte_valid_cont(orig_pte) || + (pte_cont(pte) && ptep_any_valid(ptep, nr))) + ptep_clear_flush_range(mm, addr, ptep, nr); + + __set_ptes(mm, addr, ptep, pte, nr); + + addr = next; + ptep += nr; + pfn += nr; + + } while (addr != end); +} +EXPORT_SYMBOL(contpte_set_ptes); + +int contpte_ptep_test_and_clear_young(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ + /* + * ptep_clear_flush_young() technically requires us to clear the access + * flag for a _single_ pte. However, the core-mm code actually tracks + * access/dirty per folio, not per page. And since we only create a + * contig range when the range is covered by a single folio, we can get + * away with clearing young for the whole contig range here, so we avoid + * having to unfold. + */ + + int young = 0; + int i; + + ptep = contpte_align_down(ptep); + addr = ALIGN_DOWN(addr, CONT_PTE_SIZE); + + for (i = 0; i < CONT_PTES; i++, ptep++, addr += PAGE_SIZE) + young |= __ptep_test_and_clear_young(vma, addr, ptep); + + return young; +} +EXPORT_SYMBOL(contpte_ptep_test_and_clear_young); + +int contpte_ptep_clear_flush_young(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ + int young; + + young = contpte_ptep_test_and_clear_young(vma, addr, ptep); + + if (young) { + /* + * See comment in __ptep_clear_flush_young(); same rationale for + * eliding the trailing DSB applies here. + */ + addr = ALIGN_DOWN(addr, CONT_PTE_SIZE); + __flush_tlb_range_nosync(vma, addr, addr + CONT_PTE_SIZE, + PAGE_SIZE, true, 3); + } + + return young; +} +EXPORT_SYMBOL(contpte_ptep_clear_flush_young); + +int contpte_ptep_set_access_flags(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep, + pte_t entry, int dirty) +{ + unsigned long start_addr; + pte_t orig_pte; + int i; + + /* + * Gather the access/dirty bits for the contiguous range. If nothing has + * changed, its a noop. + */ + orig_pte = pte_mknoncont(ptep_get(ptep)); + if (pte_val(orig_pte) == pte_val(entry)) + return 0; + + /* + * We can fix up access/dirty bits without having to unfold/fold the + * contig range. But if the write bit is changing, we need to go through + * the full unfold/fold cycle. + */ + if (pte_write(orig_pte) == pte_write(entry)) { + /* + * For HW access management, we technically only need to update + * the flag on a single pte in the range. But for SW access + * management, we need to update all the ptes to prevent extra + * faults. Avoid per-page tlb flush in __ptep_set_access_flags() + * and instead flush the whole range at the end. + */ + ptep = contpte_align_down(ptep); + start_addr = addr = ALIGN_DOWN(addr, CONT_PTE_SIZE); + + for (i = 0; i < CONT_PTES; i++, ptep++, addr += PAGE_SIZE) + __ptep_set_access_flags(vma, addr, ptep, entry, 0); + + if (dirty) + __flush_tlb_range(vma, start_addr, addr, + PAGE_SIZE, true, 3); + } else { + __contpte_try_unfold(vma->vm_mm, addr, ptep, orig_pte); + __ptep_set_access_flags(vma, addr, ptep, entry, dirty); + contpte_try_fold(vma->vm_mm, addr, ptep, entry); + } + + return 1; +} +EXPORT_SYMBOL(contpte_ptep_set_access_flags); From patchwork Mon Dec 18 10:50:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 180272 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp1155595dyi; Mon, 18 Dec 2023 02:55:14 -0800 (PST) X-Google-Smtp-Source: AGHT+IGQjT9utkh7eiyrq4IhD2/09fzX7YWjvfDASJcQhe29+fkGg7saQWD58Jywu9zyL23w/ElS X-Received: by 2002:a17:90a:69e1:b0:28b:7c54:8154 with SMTP id s88-20020a17090a69e100b0028b7c548154mr349937pjj.10.1702896914230; Mon, 18 Dec 2023 02:55:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702896914; cv=none; d=google.com; s=arc-20160816; b=YHGkLh819Lv8mMSB8VUIxAdw0HxulaimVpZSQjtTLLBsXe8KqzLN3orIrY1ruq7L4e 5IRE5MWINysjEQcKOa3R1bB6HF8YrTjfimPXaGea7hTZkKMQl3XGBxlRxGMADl/legAG s8E3tSj51yDRV69geidhNocOkBReQn6MejtjaUgXvENLYT9NUiYqBzY9tEkBbl6IZ+b4 HsaSdyHZSD9Bp/TXCVn1g2B3esxuwQwOmv9TsrCAhrmOquWiwnmzMwJ0yFom1Op6oNoE hM9BUrmK/UOCgPe1oo1KR4Tf1ZaY+7m6RHIb1WCLHxoM+1KXf0meuTCxPWj2P74O2Z+q Maxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=B2PjP7HqUpeEkMhkbIFHBWS4ku1Qcu2bCx3FQTTUYsI=; fh=a3gZESXuO1ptbCtzvzEnZL5xUl+WC4Vy5hoF2yaOXSs=; b=VR1Xu/idR80TupuHLgXQFMsdCJZ36jxky/M88ySahW2Eo6MH98X9GJHQ5a2Myi6NZs G4ttkzjgrARuUf0fTk/FMUcsQIK2n7WkEsQyxOvvGHZ2yTq7w54FU/qERev5HXtWOSGQ LJGiuWMGYrKQ58eFycqogWRZTfe5SNGuxeYjsx3yqgzPVy294TEtS9Yf/9+aG68nv/8r VvX6LyzQotNAn7GIVd4jk6CqLnj415hWRMjplVxvPD6lNLH2gz4lcGPldHZarp2DCqRA Lr2BvBKhIiQi2hZ1yM5ZEzt0RCnbeqMjH0Hz3XXmcOfT5Mi9uVHRZb/QQvhhzGNnLWu8 j6TQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3384-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3384-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id o5-20020a17090ac70500b0028b5ce8ac50si3303772pjt.121.2023.12.18.02.55.14 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Dec 2023 02:55:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-3384-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3384-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3384-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 960AA28102D for ; Mon, 18 Dec 2023 10:55:05 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 8565D39860; Mon, 18 Dec 2023 10:52:23 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 48D1F37D20 for ; Mon, 18 Dec 2023 10:52:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 384F71FB; Mon, 18 Dec 2023 02:52:57 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5EF4F3F7C5; Mon, 18 Dec 2023 02:52:09 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland , David Hildenbrand , Kefeng Wang , John Hubbard , Zi Yan , Barry Song <21cnbao@gmail.com>, Alistair Popple , Yang Shi Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 15/16] arm64/mm: Implement new helpers to optimize fork() Date: Mon, 18 Dec 2023 10:50:59 +0000 Message-Id: <20231218105100.172635-16-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231218105100.172635-1-ryan.roberts@arm.com> References: <20231218105100.172635-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785616834672388227 X-GMAIL-MSGID: 1785616834672388227 With the core-mm changes in place to batch-copy ptes during fork, we can take advantage of this in arm64 to greatly reduce the number of tlbis we have to issue, and recover the lost fork performance incured when adding support for transparent contiguous ptes. This optimization covers 2 cases: 2) The memory being CoWed is contpte-sized (or bigger) folios. We set wrprotect in the parent and set the ptes in the child for a whole contpte block in one hit. This means we can operate on the whole block and don't need to unfold/fold. 1) The memory being CoWed is all order-0 folios. No folding or unfolding occurs here, but the added cost of checking if we need to fold on every pte adds up. Given we are forking, we are just copying the ptes already in the parent, so we should be maintaining the single/contpte state into the child anyway, and any check for folding will always be false. Therefore, we can elide the fold check in set_ptes_full() and ptep_set_wrprotects() when full=1. The optimization to wrprotect a whole contpte block without unfolding is possible thanks to the tightening of the Arm ARM in respect to the definition and behaviour when 'Misprogramming the Contiguous bit'. See section D21194 at https://developer.arm.com/documentation/102105/latest/ The following microbenchmark results demonstate the recovered (and overall improved) fork performance for large pte-mapped folios once this patch is applied. Fork is called in a tight loop in a process with 1G of populated memory and the time for the function to execute is measured. 100 iterations per run, 8 runs performed on both Apple M2 (VM) and Ampere Altra (bare metal). Tests performed for case where 1G memory is comprised of pte-mapped order-9 folios. Negative is faster, positive is slower, compared to baseline upon which the series is based: | fork | Apple M2 VM | Ampere Altra | | order-9 |-------------------|-------------------| | (pte-map) | mean | stdev | mean | stdev | |---------------|---------|---------|---------|---------| | baseline | 0.0% | 1.2% | 0.0% | 0.1% | | before-change | 541.5% | 2.8% | 3654.4% | 0.0% | | after-change | -25.4% | 1.9% | -6.7% | 0.1% | Tested-by: John Hubbard Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 97 ++++++++++++++++++++++++++------ arch/arm64/mm/contpte.c | 47 ++++++++++++++++ 2 files changed, 128 insertions(+), 16 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index e64120452301..d4805f73b9db 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -966,16 +966,12 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -/* - * __ptep_set_wrprotect - mark read-only while trasferring potential hardware - * dirty status (PTE_DBM && !PTE_RDONLY) to the software PTE_DIRTY bit. - */ -static inline void __ptep_set_wrprotect(struct mm_struct *mm, - unsigned long address, pte_t *ptep) +static inline void ___ptep_set_wrprotect(struct mm_struct *mm, + unsigned long address, pte_t *ptep, + pte_t pte) { - pte_t old_pte, pte; + pte_t old_pte; - pte = __ptep_get(ptep); do { old_pte = pte; pte = pte_wrprotect(pte); @@ -984,6 +980,26 @@ static inline void __ptep_set_wrprotect(struct mm_struct *mm, } while (pte_val(pte) != pte_val(old_pte)); } +/* + * __ptep_set_wrprotect - mark read-only while trasferring potential hardware + * dirty status (PTE_DBM && !PTE_RDONLY) to the software PTE_DIRTY bit. + */ +static inline void __ptep_set_wrprotect(struct mm_struct *mm, + unsigned long address, pte_t *ptep) +{ + ___ptep_set_wrprotect(mm, address, ptep, __ptep_get(ptep)); +} + +static inline void __ptep_set_wrprotects(struct mm_struct *mm, + unsigned long address, pte_t *ptep, + unsigned int nr, int full) +{ + unsigned int i; + + for (i = 0; i < nr; i++, address += PAGE_SIZE, ptep++) + __ptep_set_wrprotect(mm, address, ptep); +} + #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define __HAVE_ARCH_PMDP_SET_WRPROTECT static inline void pmdp_set_wrprotect(struct mm_struct *mm, @@ -1139,6 +1155,8 @@ extern int contpte_ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep); extern int contpte_ptep_clear_flush_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep); +extern void contpte_set_wrprotects(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, unsigned int nr, int full); extern int contpte_ptep_set_access_flags(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, pte_t entry, int dirty); @@ -1170,6 +1188,17 @@ static inline void contpte_try_unfold(struct mm_struct *mm, unsigned long addr, __contpte_try_unfold(mm, addr, ptep, pte); } +#define pte_batch_remaining pte_batch_remaining +static inline unsigned int pte_batch_remaining(pte_t pte, unsigned long addr, + unsigned long end) +{ + if (!pte_valid_cont(pte)) + return 1; + + return min(CONT_PTES - ((addr >> PAGE_SHIFT) & (CONT_PTES - 1)), + (end - addr) >> PAGE_SHIFT); +} + /* * The below functions constitute the public API that arm64 presents to the * core-mm to manipulate PTE entries within their page tables (or at least this @@ -1219,20 +1248,30 @@ static inline void set_pte(pte_t *ptep, pte_t pte) __set_pte(ptep, pte_mknoncont(pte)); } -#define set_ptes set_ptes -static inline void set_ptes(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte, unsigned int nr) +#define set_ptes_full set_ptes_full +static inline void set_ptes_full(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte, unsigned int nr, + int full) { pte = pte_mknoncont(pte); if (nr == 1) { - contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); + if (!full) + contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); __set_ptes(mm, addr, ptep, pte, 1); - contpte_try_fold(mm, addr, ptep, pte); + if (!full) + contpte_try_fold(mm, addr, ptep, pte); } else contpte_set_ptes(mm, addr, ptep, pte, nr); } +#define set_ptes set_ptes +static inline void set_ptes(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte, unsigned int nr) +{ + set_ptes_full(mm, addr, ptep, pte, nr, false); +} + static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { @@ -1272,13 +1311,38 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma, return contpte_ptep_clear_flush_young(vma, addr, ptep); } +#define ptep_set_wrprotects ptep_set_wrprotects +static inline void ptep_set_wrprotects(struct mm_struct *mm, + unsigned long addr, pte_t *ptep, + unsigned int nr, int full) +{ + if (nr == 1) { + /* + * Optimization: ptep_set_wrprotects() can only be called for + * present ptes so we only need to check contig bit as condition + * for unfold, and we can remove the contig bit from the pte we + * read to avoid re-reading. This speeds up fork() with is very + * sensitive for order-0 folios. Should be equivalent to + * contpte_try_unfold() for this case. + */ + pte_t orig_pte = __ptep_get(ptep); + + if (unlikely(pte_cont(orig_pte))) { + __contpte_try_unfold(mm, addr, ptep, orig_pte); + orig_pte = pte_mknoncont(orig_pte); + } + ___ptep_set_wrprotect(mm, addr, ptep, orig_pte); + if (!full) + contpte_try_fold(mm, addr, ptep, __ptep_get(ptep)); + } else + contpte_set_wrprotects(mm, addr, ptep, nr, full); +} + #define __HAVE_ARCH_PTEP_SET_WRPROTECT static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); - __ptep_set_wrprotect(mm, addr, ptep); - contpte_try_fold(mm, addr, ptep, __ptep_get(ptep)); + ptep_set_wrprotects(mm, addr, ptep, 1, false); } #define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS @@ -1310,6 +1374,7 @@ static inline int ptep_set_access_flags(struct vm_area_struct *vma, #define ptep_clear_flush_young __ptep_clear_flush_young #define __HAVE_ARCH_PTEP_SET_WRPROTECT #define ptep_set_wrprotect __ptep_set_wrprotect +#define ptep_set_wrprotects __ptep_set_wrprotects #define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS #define ptep_set_access_flags __ptep_set_access_flags diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c index 69c36749dd98..72e672024785 100644 --- a/arch/arm64/mm/contpte.c +++ b/arch/arm64/mm/contpte.c @@ -339,6 +339,53 @@ int contpte_ptep_clear_flush_young(struct vm_area_struct *vma, } EXPORT_SYMBOL(contpte_ptep_clear_flush_young); +void contpte_set_wrprotects(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, unsigned int nr, int full) +{ + unsigned long next; + unsigned long end; + + if (!mm_is_user(mm)) + return __ptep_set_wrprotects(mm, addr, ptep, nr, full); + + end = addr + (nr << PAGE_SHIFT); + + do { + next = pte_cont_addr_end(addr, end); + nr = (next - addr) >> PAGE_SHIFT; + + /* + * If wrprotecting an entire contig range, we can avoid + * unfolding. Just set wrprotect and wait for the later + * mmu_gather flush to invalidate the tlb. Until the flush, the + * page may or may not be wrprotected. After the flush, it is + * guarranteed wrprotected. If its a partial range though, we + * must unfold, because we can't have a case where CONT_PTE is + * set but wrprotect applies to a subset of the PTEs; this would + * cause it to continue to be unpredictable after the flush. + */ + if (nr != CONT_PTES) + contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); + + __ptep_set_wrprotects(mm, addr, ptep, nr, full); + + addr = next; + ptep += nr; + + /* + * If applying to a partial contig range, the change could have + * made the range foldable. Use the last pte in the range we + * just set for comparison, since contpte_try_fold() only + * triggers when acting on the last pte in the contig range. + */ + if (nr != CONT_PTES) + contpte_try_fold(mm, addr - PAGE_SIZE, ptep - 1, + __ptep_get(ptep - 1)); + + } while (addr != end); +} +EXPORT_SYMBOL(contpte_set_wrprotects); + int contpte_ptep_set_access_flags(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, pte_t entry, int dirty) From patchwork Mon Dec 18 10:51:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 180273 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp1155636dyi; Mon, 18 Dec 2023 02:55:23 -0800 (PST) X-Google-Smtp-Source: AGHT+IH4upYCGjPkPH9qHuHVAGWygxjYUbal+pcQLLMn3TcSfdD/HOClKdhs+eip+LaYqu3Qmlbq X-Received: by 2002:a05:6a20:8f1e:b0:18c:7b7:8630 with SMTP id b30-20020a056a208f1e00b0018c07b78630mr8949531pzk.13.1702896922965; Mon, 18 Dec 2023 02:55:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702896922; cv=none; d=google.com; s=arc-20160816; b=nOT5vRIww8G/glbvxuoICShukAk50i1iANNfue2GvKmvw1uX5Y82+cxKXKlLf5SG6t qKNw200npiRE4yrvLreYlauSvSZU6cDZ2UJ/o84Ze5kYtZtsySwwteEWG2tkcamQrgQ5 f5nVaWJAvCzdr4ekI+mBzFAHsZE7Pi7wVXh9eG/XuBOo/4MEPCIz7DQiJ+C/l55xFyrZ EffGx4x1bq/Cnczu6TeWZ/b6+CwWXUdx69LcKD+Pd0/ucGTifkQnWRff4CHijGOeRoLb Vc6suEZkjCSwtcaR9w4Z1LFLKm7+exHOr6iV5eZdPqErTds0gQxQFjgXHJGtq3SXTcRW Gtaw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=Pl80kIjTjpL3O0EXepdffRGD/m7SMioiLS6gBNRB6iM=; fh=a3gZESXuO1ptbCtzvzEnZL5xUl+WC4Vy5hoF2yaOXSs=; b=SGmR6V652p5pPkqC3pQIw5trzQyTkH7egxtkAaEi26POEKemibqEGNsVABHkzQuWMD B3TpF2n7MWFR6ZtsedXRMFj30FkdWWiSp+Sybxaf2OXz79fnfzZmEAjNsFsGHYZtzKDC 4eZjseXd8Qd8E6NLx53ekqY/S9PbUr9CaZpoVUp4xL+PEOpmUWrG3Qiik7Uey1nmCuTy o41T70qs+vri5wuUm8dwl93UWqWFF5ZhqRWIxJsE67ifH45g2oETWth2Fzd4T7VoJqes Ap/9QbA9tki74s460aYVHVcgSFEN60B9RSXQNph4vIkZe1tFe7rVVu2ssUKoyGq8zsWK 9XTg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3385-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3385-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id p14-20020a17090a2d8e00b0028b4ff3c37bsi3843273pjd.155.2023.12.18.02.55.22 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Dec 2023 02:55:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-3385-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3385-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3385-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id EA2372830AB for ; Mon, 18 Dec 2023 10:55:15 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id EAD7F39ADF; Mon, 18 Dec 2023 10:52:26 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 16B1B37D10 for ; Mon, 18 Dec 2023 10:52:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D081E1424; Mon, 18 Dec 2023 02:53:00 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 00DE33F738; Mon, 18 Dec 2023 02:52:12 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland , David Hildenbrand , Kefeng Wang , John Hubbard , Zi Yan , Barry Song <21cnbao@gmail.com>, Alistair Popple , Yang Shi Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 16/16] arm64/mm: Implement clear_ptes() to optimize exit, munmap, dontneed Date: Mon, 18 Dec 2023 10:51:00 +0000 Message-Id: <20231218105100.172635-17-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231218105100.172635-1-ryan.roberts@arm.com> References: <20231218105100.172635-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785616843582647568 X-GMAIL-MSGID: 1785616843582647568 With the core-mm changes in place to batch-clear ptes during zap_pte_range(), we can take advantage of this in arm64 to greatly reduce the number of tlbis we have to issue, and recover the lost performance in exit, munmap and madvise(DONTNEED) incured when adding support for transparent contiguous ptes. If we are clearing a whole contpte range, we can elide first unfolding that range and save the tlbis. We just clear the whole range. The following microbenchmark results demonstate the effect of this change on madvise(DONTNEED) performance for large pte-mapped folios. madvise(dontneed) is called for each page of a 1G populated mapping and the total time is measured. 100 iterations per run, 8 runs performed on both Apple M2 (VM) and Ampere Altra (bare metal). Tests performed for case where 1G memory is comprised of pte-mapped order-9 folios. Negative is faster, positive is slower, compared to baseline upon which the series is based: | dontneed | Apple M2 VM | Ampere Altra | | order-9 |-------------------|-------------------| | (pte-map) | mean | stdev | mean | stdev | |---------------|---------|---------|---------|---------| | baseline | 0.0% | 7.9% | 0.0% | 0.0% | | before-change | -1.3% | 7.0% | 13.0% | 0.0% | | after-change | -9.9% | 0.9% | 14.1% | 0.0% | The memory is initially all contpte-mapped and has to be unfolded (which requires tlbi for the whole block) when the first page is touched (since the test is madvise-ing 1 page at a time). Ampere Altra has high cost for tlbi; this is why cost increases there. The following microbenchmark results demonstate the recovery (and overall improvement) of munmap performance for large pte-mapped folios. munmap is called for a 1G populated mapping and the function runtime is measured. 100 iterations per run, 8 runs performed on both Apple M2 (VM) and Ampere Altra (bare metal). Tests performed for case where 1G memory is comprised of pte-mapped order-9 folios. Negative is faster, positive is slower, compared to baseline upon which the series is based: | munmap | Apple M2 VM | Ampere Altra | | order-9 |-------------------|-------------------| | (pte-map) | mean | stdev | mean | stdev | |---------------|---------|---------|---------|---------| | baseline | 0.0% | 6.4% | 0.0% | 0.1% | | before-change | 43.3% | 1.9% | 375.2% | 0.0% | | after-change | -6.0% | 1.4% | -0.6% | 0.2% | Tested-by: John Hubbard Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 42 +++++++++++++++++++++++++++++ arch/arm64/mm/contpte.c | 45 ++++++++++++++++++++++++++++++++ 2 files changed, 87 insertions(+) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index d4805f73b9db..f5bf059291c3 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -953,6 +953,29 @@ static inline pte_t __ptep_get_and_clear(struct mm_struct *mm, return pte; } +static inline pte_t __clear_ptes(struct mm_struct *mm, + unsigned long address, pte_t *ptep, + unsigned int nr, int full) +{ + pte_t orig_pte = __ptep_get_and_clear(mm, address, ptep); + unsigned int i; + pte_t pte; + + for (i = 1; i < nr; i++) { + address += PAGE_SIZE; + ptep++; + pte = __ptep_get_and_clear(mm, address, ptep); + + if (pte_dirty(pte)) + orig_pte = pte_mkdirty(orig_pte); + + if (pte_young(pte)) + orig_pte = pte_mkyoung(orig_pte); + } + + return orig_pte; +} + #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, @@ -1151,6 +1174,8 @@ extern pte_t contpte_ptep_get(pte_t *ptep, pte_t orig_pte); extern pte_t contpte_ptep_get_lockless(pte_t *orig_ptep); extern void contpte_set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte, unsigned int nr); +extern pte_t contpte_clear_ptes(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, unsigned int nr, int full); extern int contpte_ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep); extern int contpte_ptep_clear_flush_young(struct vm_area_struct *vma, @@ -1279,6 +1304,22 @@ static inline void pte_clear(struct mm_struct *mm, __pte_clear(mm, addr, ptep); } +#define clear_ptes clear_ptes +static inline pte_t clear_ptes(struct mm_struct *mm, + unsigned long addr, pte_t *ptep, + unsigned int nr, int full) +{ + pte_t pte; + + if (nr == 1) { + contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); + pte = __ptep_get_and_clear(mm, addr, ptep); + } else + pte = contpte_clear_ptes(mm, addr, ptep, nr, full); + + return pte; +} + #define __HAVE_ARCH_PTEP_GET_AND_CLEAR static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) @@ -1366,6 +1407,7 @@ static inline int ptep_set_access_flags(struct vm_area_struct *vma, #define set_pte __set_pte #define set_ptes __set_ptes #define pte_clear __pte_clear +#define clear_ptes __clear_ptes #define __HAVE_ARCH_PTEP_GET_AND_CLEAR #define ptep_get_and_clear __ptep_get_and_clear #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c index 72e672024785..6f2a15ac5163 100644 --- a/arch/arm64/mm/contpte.c +++ b/arch/arm64/mm/contpte.c @@ -293,6 +293,51 @@ void contpte_set_ptes(struct mm_struct *mm, unsigned long addr, } EXPORT_SYMBOL(contpte_set_ptes); +pte_t contpte_clear_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep, + unsigned int nr, int full) +{ + /* + * If we cover a partial contpte block at the beginning or end of the + * batch, unfold if currently folded. This makes it safe to clear some + * of the entries while keeping others. contpte blocks in the middle of + * the range, which are fully covered don't need to be unfolded because + * we will clear the full block. + */ + + unsigned int i; + pte_t pte; + pte_t tail; + + if (!mm_is_user(mm)) + return __clear_ptes(mm, addr, ptep, nr, full); + + if (ptep != contpte_align_down(ptep) || nr < CONT_PTES) + contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); + + if (ptep + nr != contpte_align_down(ptep + nr)) + contpte_try_unfold(mm, addr + PAGE_SIZE * (nr - 1), + ptep + nr - 1, + __ptep_get(ptep + nr - 1)); + + pte = __ptep_get_and_clear(mm, addr, ptep); + + for (i = 1; i < nr; i++) { + addr += PAGE_SIZE; + ptep++; + + tail = __ptep_get_and_clear(mm, addr, ptep); + + if (pte_dirty(tail)) + pte = pte_mkdirty(pte); + + if (pte_young(tail)) + pte = pte_mkyoung(pte); + } + + return pte; +} +EXPORT_SYMBOL(contpte_clear_ptes); + int contpte_ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) {