Message ID | 20231204105440.61448-1-ryan.roberts@arm.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp2678284vqy; Mon, 4 Dec 2023 02:55:07 -0800 (PST) X-Google-Smtp-Source: AGHT+IGtb6coiE+Ur/Xy8KOy465JlUOmqSLpg1hlDZfy+fsxPvqQHwMFC4ifdTzB5hPEAjEfNQs7 X-Received: by 2002:a05:6a20:5521:b0:18a:de88:e0d with SMTP id ko33-20020a056a20552100b0018ade880e0dmr1293683pzb.15.1701687306998; Mon, 04 Dec 2023 02:55:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701687306; cv=none; d=google.com; s=arc-20160816; b=mtws2jmaVbjqe61mtKK9LLDVcF5Jzd8EOu5mrxKo3cuqyyPhKfa2MHn52pC9P3zk/5 CbTwT3s1AbH0T+RNdrT1LTgKnmPIDVADP2Hn6nH9iUL62+45QhMMvnadqQ/SJhDqBsw9 0jf3zZtjfENdX/yQvSqhHS6P6gcn1q5jqXzDOn2MDiDG/crEUwzmPqzIaAF4UUNmc6Rj Ve3CCGS+QbixUIgzuhVKez8dmJYpM7OL4/rv5gy0HuMZl/xgp5woFO6S2GfItOufiPp7 RVD5klN0V4d1MLs846hZo6RpAGcos/q+Y5RCrc2r4jGUmxe2JuXMhbM3DmkZVJQMoeIv VAoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=MOp9o0j0JmU2ZJHhd/sYFgw8kkjUNr0e99NjFytP48k=; fh=a3gZESXuO1ptbCtzvzEnZL5xUl+WC4Vy5hoF2yaOXSs=; b=KIttukafuC7ws1MmDiKVtN9A1LMKIzf7uxD/U+6AuM3Umj/4Vr6FjzUCKTq2MVXicb f1YsrQVufSjsRXSKYl53p0Hc9o2R3NV35h/paakG17A4ONByr3C4IE1xqYcBJNK+cGe6 VrTWS29ysUJi8YJ3mTUoQrVdXzlALzcPot7G6eDip8J9quBnQe6Ox7/7X7H0k9hGhdes NHSoK02FTNEZDaPbo+DG3sPv7GVDXvLSTkTGRhcS2AEveqZzolrZYjH6JOE55Z6GzMk8 Egtolc/05ywbQaRzdl5G4DFc2O5X2q4TiHzfOgRLbVdd+2IJZ9HdYboUYQuFpZSSP6za ooiw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from pete.vger.email (pete.vger.email. [2620:137:e000::3:6]) by mx.google.com with ESMTPS id q21-20020a170902b11500b001d04b44859esi3778492plr.269.2023.12.04.02.55.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Dec 2023 02:55:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) client-ip=2620:137:e000::3:6; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id A5CBC805DEC5; Mon, 4 Dec 2023 02:55:01 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343880AbjLDKyw (ORCPT <rfc822;chrisfriedt@gmail.com> + 99 others); Mon, 4 Dec 2023 05:54:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59708 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229519AbjLDKyv (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 4 Dec 2023 05:54:51 -0500 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id EA28FBB for <linux-kernel@vger.kernel.org>; Mon, 4 Dec 2023 02:54:56 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B4207139F; Mon, 4 Dec 2023 02:55:43 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0B3673F6C4; Mon, 4 Dec 2023 02:54:52 -0800 (PST) From: Ryan Roberts <ryan.roberts@arm.com> To: Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, Ard Biesheuvel <ardb@kernel.org>, Marc Zyngier <maz@kernel.org>, Oliver Upton <oliver.upton@linux.dev>, James Morse <james.morse@arm.com>, Suzuki K Poulose <suzuki.poulose@arm.com>, Zenghui Yu <yuzenghui@huawei.com>, Andrey Ryabinin <ryabinin.a.a@gmail.com>, Alexander Potapenko <glider@google.com>, Andrey Konovalov <andreyknvl@gmail.com>, Dmitry Vyukov <dvyukov@google.com>, Vincenzo Frascino <vincenzo.frascino@arm.com>, Andrew Morton <akpm@linux-foundation.org>, Anshuman Khandual <anshuman.khandual@arm.com>, Matthew Wilcox <willy@infradead.org>, Yu Zhao <yuzhao@google.com>, Mark Rutland <mark.rutland@arm.com>, David Hildenbrand <david@redhat.com>, Kefeng Wang <wangkefeng.wang@huawei.com>, John Hubbard <jhubbard@nvidia.com>, Zi Yan <ziy@nvidia.com>, Barry Song <21cnbao@gmail.com>, Alistair Popple <apopple@nvidia.com>, Yang Shi <shy828301@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com>, linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 00/15] Transparent Contiguous PTEs for User Mappings Date: Mon, 4 Dec 2023 10:54:25 +0000 Message-Id: <20231204105440.61448-1-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Mon, 04 Dec 2023 02:55:01 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784348469595212775 X-GMAIL-MSGID: 1784348469595212775 |
Series |
Transparent Contiguous PTEs for User Mappings
|
|
Message
Ryan Roberts
Dec. 4, 2023, 10:54 a.m. UTC
Hi All, This is v3 of a series to opportunistically and transparently use contpte mappings (set the contiguous bit in ptes) for user memory when those mappings meet the requirements. It is part of a wider effort to improve performance by allocating and mapping variable-sized blocks of memory (folios). One aim is for the 4K kernel to approach the performance of the 16K kernel, but without breaking compatibility and without the associated increase in memory. Another aim is to benefit the 16K and 64K kernels by enabling 2M THP, since this is the contpte size for those kernels. We have good performance data that demonstrates both aims are being met (see below). Of course this is only one half of the change. We require the mapped physical memory to be the correct size and alignment for this to actually be useful (i.e. 64K for 4K pages, or 2M for 16K/64K pages). Fortunately folios are solving this problem for us. Filesystems that support it (XFS, AFS, EROFS, tmpfs, ...) will allocate large folios up to the PMD size today, and more filesystems are coming. And the other half of my work, to enable "multi-size THP" (large folios) for anonymous memory, makes contpte sized folios prevalent for anonymous memory too [3]. Optimistically, I would really like to get this series merged for v6.8; there is a chance that the multi-size THP series will also get merged for that version (although at this point pretty small). But even if it doesn't, this series still benefits file-backed memory from the file systems that support large folios so shouldn't be held up for it. Additionally I've got data that shows this series adds no regression when the system has no appropriate large folios. All dependecies listed against v1 are now resolved; This series applies cleanly against v6.7-rc1. Note that the first two patchs are for core-mm and provides the refactoring to make some crucial optimizations possible - which are then implemented in patches 14 and 15. The remaining patches are arm64-specific. Testing ======= I've tested this series together with multi-size THP [3] on both Ampere Altra (bare metal) and Apple M2 (VM): - mm selftests (inc new tests written for multi-size THP); no regressions - Speedometer Java script benchmark in Chromium web browser; no issues - Kernel compilation; no issues - Various tests under high memory pressure with swap enabled; no issues Performance =========== John Hubbard at Nvidia has indicated dramatic 10x performance improvements for some workloads at [4], when using 64K base page kernel. You can also see the original performance results I posted against v1 [1] which are still valid. I've additionally run the kernel compilation and speedometer benchmarks on a system with multi-size THP disabled and large folio support for file-backed memory intentionally disabled; I see no change in performance in this case (i.e. no regression when this change is "present but not useful"). Changes since v2 [2] ==================== - Removed contpte_ptep_get_and_clear_full() optimisation for exit() (v2#14), and replaced with a batch-clearing approach using a new arch helper, clear_ptes() (v3#2 and v3#15) (Alistair and Barry) - (v2#1 / v3#1) - Fixed folio refcounting so that refcount >= mapcount always (DavidH) - Reworked batch demarcation to avoid pte_pgprot() (DavidH) - Reverted return semantic of copy_present_page() and instead fix it up in copy_present_ptes() (Alistair) - Removed page_cont_mapped_vaddr() and replaced with simpler logic (Alistair) - Made batch accounting clearer in copy_pte_range() (Alistair) - (v2#12 / v3#13) - Renamed contpte_fold() -> contpte_convert() and hoisted setting/ clearing CONT_PTE bit to higher level (Alistair) Changes since v1 [1] ==================== - Export contpte_* symbols so that modules can continue to call inline functions (e.g. ptep_get) which may now call the contpte_* functions (thanks to JohnH) - Use pte_valid() instead of pte_present() where sensible (thanks to Catalin) - Factor out (pte_valid() && pte_cont()) into new pte_valid_cont() helper (thanks to Catalin) - Fixed bug in contpte_ptep_set_access_flags() where TLBIs were missed (thanks to Catalin) - Added ARM64_CONTPTE expert Kconfig (enabled by default) (thanks to Anshuman) - Simplified contpte_ptep_get_and_clear_full() - Improved various code comments [1] https://lore.kernel.org/linux-arm-kernel/20230622144210.2623299-1-ryan.roberts@arm.com/ [2] https://lore.kernel.org/linux-arm-kernel/20231115163018.1303287-1-ryan.roberts@arm.com/ [3] https://lore.kernel.org/linux-arm-kernel/20231204102027.57185-1-ryan.roberts@arm.com/ [4] https://lore.kernel.org/linux-mm/c507308d-bdd4-5f9e-d4ff-e96e4520be85@nvidia.com/ Thanks, Ryan Ryan Roberts (15): mm: Batch-copy PTE ranges during fork() mm: Batch-clear PTE ranges during zap_pte_range() arm64/mm: set_pte(): New layer to manage contig bit arm64/mm: set_ptes()/set_pte_at(): New layer to manage contig bit arm64/mm: pte_clear(): New layer to manage contig bit arm64/mm: ptep_get_and_clear(): New layer to manage contig bit arm64/mm: ptep_test_and_clear_young(): New layer to manage contig bit arm64/mm: ptep_clear_flush_young(): New layer to manage contig bit arm64/mm: ptep_set_wrprotect(): New layer to manage contig bit arm64/mm: ptep_set_access_flags(): New layer to manage contig bit arm64/mm: ptep_get(): New layer to manage contig bit arm64/mm: Split __flush_tlb_range() to elide trailing DSB arm64/mm: Wire up PTE_CONT for user mappings arm64/mm: Implement ptep_set_wrprotects() to optimize fork() arm64/mm: Implement clear_ptes() to optimize exit() arch/arm64/Kconfig | 10 +- arch/arm64/include/asm/pgtable.h | 343 ++++++++++++++++++++--- arch/arm64/include/asm/tlbflush.h | 13 +- arch/arm64/kernel/efi.c | 4 +- arch/arm64/kernel/mte.c | 2 +- arch/arm64/kvm/guest.c | 2 +- arch/arm64/mm/Makefile | 1 + arch/arm64/mm/contpte.c | 436 ++++++++++++++++++++++++++++++ arch/arm64/mm/fault.c | 12 +- arch/arm64/mm/fixmap.c | 4 +- arch/arm64/mm/hugetlbpage.c | 40 +-- arch/arm64/mm/kasan_init.c | 6 +- arch/arm64/mm/mmu.c | 16 +- arch/arm64/mm/pageattr.c | 6 +- arch/arm64/mm/trans_pgd.c | 6 +- include/asm-generic/tlb.h | 9 + include/linux/pgtable.h | 39 +++ mm/memory.c | 258 +++++++++++++----- mm/mmu_gather.c | 14 + 19 files changed, 1067 insertions(+), 154 deletions(-) create mode 100644 arch/arm64/mm/contpte.c -- 2.25.1
Comments
On 12/4/23 02:54, Ryan Roberts wrote: > Hi All, > > This is v3 of a series to opportunistically and transparently use contpte > mappings (set the contiguous bit in ptes) for user memory when those mappings > meet the requirements. It is part of a wider effort to improve performance by > allocating and mapping variable-sized blocks of memory (folios). One aim is for > the 4K kernel to approach the performance of the 16K kernel, but without > breaking compatibility and without the associated increase in memory. Another > aim is to benefit the 16K and 64K kernels by enabling 2M THP, since this is the > contpte size for those kernels. We have good performance data that demonstrates > both aims are being met (see below). > > Of course this is only one half of the change. We require the mapped physical > memory to be the correct size and alignment for this to actually be useful (i.e. > 64K for 4K pages, or 2M for 16K/64K pages). Fortunately folios are solving this > problem for us. Filesystems that support it (XFS, AFS, EROFS, tmpfs, ...) will > allocate large folios up to the PMD size today, and more filesystems are coming. > And the other half of my work, to enable "multi-size THP" (large folios) for > anonymous memory, makes contpte sized folios prevalent for anonymous memory too > [3]. > Hi Ryan, Using a couple of Armv8 systems, I've tested this patchset. Details are in my reply to the mTHP patchset [1]. So for this patchset, please feel free to add: Tested-by: John Hubbard <jhubbard@nvidia.com> [1] https://lore.kernel.org/all/2be046e1-ef95-4244-ae23-e56071ae1218@nvidia.com/ thanks,
Ryan Roberts <ryan.roberts@arm.com> writes: > Convert zap_pte_range() to clear a set of ptes in a batch. A given batch > maps a physically contiguous block of memory, all belonging to the same > folio. This will likely improve performance by a tiny amount due to > removing duplicate calls to mark the folio dirty and accessed. And also > provides us with a future opportunity to batch the rmap removal. > > However, the primary motivation for this change is to reduce the number > of tlb maintenance operations that the arm64 backend has to perform > during exit and other syscalls that cause zap_pte_range() (e.g. munmap, > madvise(DONTNEED), etc.), as it is about to add transparent support for > the "contiguous bit" in its ptes. By clearing ptes using the new > clear_ptes() API, the backend doesn't have to perform an expensive > unfold operation when a PTE being cleared is part of a contpte block. > Instead it can just clear the whole block immediately. > > This change addresses the core-mm refactoring only, and introduces > clear_ptes() with a default implementation that calls > ptep_get_and_clear_full() for each pte in the range. Note that this API > returns the pte at the beginning of the batch, but with the dirty and > young bits set if ANY of the ptes in the cleared batch had those bits > set; this information is applied to the folio by the core-mm. Given the > batch is garranteed to cover only a single folio, collapsing this state Nit: s/garranteed/guaranteed/ > does not lose any useful information. > > A separate change will implement clear_ptes() in the arm64 backend to > realize the performance improvement as part of the work to enable > contpte mappings. > > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> > --- > include/asm-generic/tlb.h | 9 ++++++ > include/linux/pgtable.h | 26 ++++++++++++++++ > mm/memory.c | 63 ++++++++++++++++++++++++++------------- > mm/mmu_gather.c | 14 +++++++++ > 4 files changed, 92 insertions(+), 20 deletions(-) <snip> > diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c > index 4f559f4ddd21..57b4d5f0dfa4 100644 > --- a/mm/mmu_gather.c > +++ b/mm/mmu_gather.c > @@ -47,6 +47,20 @@ static bool tlb_next_batch(struct mmu_gather *tlb) > return true; > } > > +unsigned int tlb_get_guaranteed_space(struct mmu_gather *tlb) > +{ > + struct mmu_gather_batch *batch = tlb->active; > + unsigned int nr_next = 0; > + > + /* Allocate next batch so we can guarrantee at least one batch. */ > + if (tlb_next_batch(tlb)) { > + tlb->active = batch; Rather than calling tlb_next_batch(tlb) and then undoing some of what it does I think it would be clearer to factor out the allocation part of tlb_next_batch(tlb) into a separate function (eg. tlb_alloc_batch) that you can call from both here and tlb_next_batch(). Otherwise I think this overall direction looks better than trying to play funny games in the arch layer as it's much clearer what's going on to core-mm code. - Alistair > + nr_next = batch->next->max; > + } > + > + return batch->max - batch->nr + nr_next; > +} > + > #ifdef CONFIG_SMP > static void tlb_flush_rmap_batch(struct mmu_gather_batch *batch, struct vm_area_struct *vma) > {
Ryan Roberts <ryan.roberts@arm.com> writes: > With the core-mm changes in place to batch-clear ptes during > zap_pte_range(), we can take advantage of this in arm64 to greatly > reduce the number of tlbis we have to issue, and recover the lost exit > performance incured when adding support for transparent contiguous ptes. > > If we are clearing a whole contpte range, we can elide first unfolding > that range and save the tlbis. We just clear the whole range. > > The following shows the results of running a kernel compilation workload > and measuring the cost of arm64_sys_exit_group() (which at ~1.5% is a > very small part of the overall workload). > > Benchmarks were run on Ampere Altra in 2 configs; single numa node and 2 > numa nodes (tlbis are more expensive in 2 node config). > > - baseline: v6.7-rc1 + anonfolio-v7 > - no-opt: contpte series without any attempt to optimize exit() > - simple-ptep_get_clear_full: simple optimization to exploit full=1. > ptep_get_clear_full() does not fully conform to its intended semantic > - robust-ptep_get_clear_full: similar to previous but > ptep_get_clear_full() fully conforms to its intended semantic > - clear_ptes: optimization implemented by this patch > > | config | numa=1 | numa=2 | > |----------------------------|--------|--------| > | baseline | 0% | 0% | > | no-opt | 190% | 768% | > | simple-ptep_get_clear_full | 8% | 29% | > | robust-ptep_get_clear_full | 21% | 19% | > | clear_ptes | 13% | 9% | > > In all cases, the cost of arm64_sys_exit_group() increases; this is > anticipated because there is more work to do to tear down the page > tables. But clear_ptes() gives the smallest increase overall. > > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> > --- > arch/arm64/include/asm/pgtable.h | 32 ++++++++++++++++++++++++ > arch/arm64/mm/contpte.c | 42 ++++++++++++++++++++++++++++++++ > 2 files changed, 74 insertions(+) > > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > index 9bd2f57a9e11..ff6b3cc9e819 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -1145,6 +1145,8 @@ extern pte_t contpte_ptep_get(pte_t *ptep, pte_t orig_pte); > extern pte_t contpte_ptep_get_lockless(pte_t *orig_ptep); > extern void contpte_set_ptes(struct mm_struct *mm, unsigned long addr, > pte_t *ptep, pte_t pte, unsigned int nr); > +extern pte_t contpte_clear_ptes(struct mm_struct *mm, unsigned long addr, > + pte_t *ptep, unsigned int nr); > extern int contpte_ptep_test_and_clear_young(struct vm_area_struct *vma, > unsigned long addr, pte_t *ptep); > extern int contpte_ptep_clear_flush_young(struct vm_area_struct *vma, > @@ -1270,6 +1272,36 @@ static inline void pte_clear(struct mm_struct *mm, > __pte_clear(mm, addr, ptep); > } > > +#define clear_ptes clear_ptes > +static inline pte_t clear_ptes(struct mm_struct *mm, > + unsigned long addr, pte_t *ptep, int full, > + unsigned int nr) > +{ > + pte_t pte; > + > + if (!contpte_is_enabled(mm)) { I think it would be better to call the generic definition of clear_ptes() here. Obviously that won't exist if clear_ptes is defined here, but you could alcall it __clear_ptes() and #define clear_ptes __clear_ptes when the arch specific helper isn't defined. > + unsigned int i; > + pte_t tail; > + > + pte = __ptep_get_and_clear(mm, addr, ptep); > + for (i = 1; i < nr; i++) { > + addr += PAGE_SIZE; > + ptep++; > + tail = __ptep_get_and_clear(mm, addr, ptep); > + if (pte_dirty(tail)) > + pte = pte_mkdirty(pte); > + if (pte_young(tail)) > + pte = pte_mkyoung(pte); > + } > + } else if (nr == 1) { > + contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); > + pte = __ptep_get_and_clear(mm, addr, ptep); > + } else > + pte = contpte_clear_ptes(mm, addr, ptep, nr); > + > + return pte; > +} > + > #define __HAVE_ARCH_PTEP_GET_AND_CLEAR > static inline pte_t ptep_get_and_clear(struct mm_struct *mm, > unsigned long addr, pte_t *ptep) > diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c > index 2a57df16bf58..34b43bde3fcd 100644 > --- a/arch/arm64/mm/contpte.c > +++ b/arch/arm64/mm/contpte.c > @@ -257,6 +257,48 @@ void contpte_set_ptes(struct mm_struct *mm, unsigned long addr, > } > EXPORT_SYMBOL(contpte_set_ptes); > > +pte_t contpte_clear_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep, > + unsigned int nr) > +{ > + /* > + * If we cover a partial contpte block at the beginning or end of the > + * batch, unfold if currently folded. This makes it safe to clear some > + * of the entries while keeping others. contpte blocks in the middle of > + * the range, which are fully covered don't need to be unfolded because > + * we will clear the full block. > + */ > + > + unsigned int i; > + pte_t pte; > + pte_t tail; > + > + if (ptep != contpte_align_down(ptep) || nr < CONT_PTES) > + contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); > + > + if (ptep + nr != contpte_align_down(ptep + nr)) > + contpte_try_unfold(mm, addr + PAGE_SIZE * (nr - 1), > + ptep + nr - 1, > + __ptep_get(ptep + nr - 1)); > + > + pte = __ptep_get_and_clear(mm, addr, ptep); > + > + for (i = 1; i < nr; i++) { > + addr += PAGE_SIZE; > + ptep++; > + > + tail = __ptep_get_and_clear(mm, addr, ptep); > + > + if (pte_dirty(tail)) > + pte = pte_mkdirty(pte); > + > + if (pte_young(tail)) > + pte = pte_mkyoung(pte); > + } > + > + return pte; > +} > +EXPORT_SYMBOL(contpte_clear_ptes); > + > int contpte_ptep_test_and_clear_young(struct vm_area_struct *vma, > unsigned long addr, pte_t *ptep) > {