Message ID | 20230228213738.272178-1-willy@infradead.org |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp3268555wrd; Tue, 28 Feb 2023 13:42:41 -0800 (PST) X-Google-Smtp-Source: AK7set/FyqIjVIdlxpVmP9ZcvMWT08pJkbvD6aaU1z5A50JpUIyojcF8HrnvfNTQeMFpVAj4AVEy X-Received: by 2002:a17:907:962a:b0:8ee:babc:d40b with SMTP id gb42-20020a170907962a00b008eebabcd40bmr5735132ejc.58.1677620561753; Tue, 28 Feb 2023 13:42:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1677620561; cv=none; d=google.com; s=arc-20160816; b=iQGb84Fd8Z1UROGX3QsJRz7vMTL0Pf9X7kEP2RP2q22obdT2Hbc9qlSAYihSGWJpII 1Bu44fkaZkU0Uh6iLDFxNWbcCRVWF16XhoebIczx8m8MUlIXRXGGesgva8F/82z9EvjY WSbbG8RFm1eCZ1qIgBMpfmQohT+jr1UdZLcJ2I3cqWVJEvyr0c/f/Zr55rlsYFfBXDqp QobDlcURq4T/Lkgr5UgttSbIeu7rJ268z7nc4DCEn4M9APg6KT6n4OA1JGGJVVYik5VI /FNgeNKkc8fSJ4Y4jBgP1l+rqsjwIzW2tqBHXMpL+x7tAgwupfPxjDdaCiLBO2NVl844 L87A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=fLtuHMMW/l3ffYaEEFZhV99ypxiXAgkmGjII9SpctF0=; b=e0RRrTMKatG895DZTdMBRrC6tHGpPqFLhmCTZCiLw2VXTgp96/3Zel3t9GNl7Zl5kC 6KptJvAmZECiL2jkWq8ZvsWj/12x9DtDb1+Wd8CUcMciF8aqOA2QE94qBKkPLuRfB2yB x1Cl+mapDKpiexubCSMKw0nBDVPsgSsvL2Rr7aFIny0fgYlO06PelCUlAknBOPfpS+dJ 7ivG8sfaMNBdN+rbMxHAE9L0vem5zpwDEm4lpfzoYkRiIGyEz/G3ZWckcqSowgf57Pg4 Dkn2q7n4ORutHLlMnZ9yKT65ebUXEv36Qi7LDkUTtParE9O573muVZK3v7pAQW9eoYrC +qrw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=Zs0bpvtH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id qq3-20020a17090720c300b008ceb29eb5c2si13392888ejb.380.2023.02.28.13.42.14; Tue, 28 Feb 2023 13:42:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=Zs0bpvtH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230227AbjB1Vjh (ORCPT <rfc822;aaron.seo0120@gmail.com> + 99 others); Tue, 28 Feb 2023 16:39:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51766 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229602AbjB1Vie (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 28 Feb 2023 16:38:34 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 95D6634332; Tue, 28 Feb 2023 13:37:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:In-Reply-To:References; bh=fLtuHMMW/l3ffYaEEFZhV99ypxiXAgkmGjII9SpctF0=; b=Zs0bpvtHoywcNWJGY1rRPAt6Hi 1rvpGwYePXAiLphmGpiEbcxbj46IpCKIniNuOx6PHGKMLl7ZO49+J/ZjZ/KwcWY6Ko0bUgpPsz46O bzBfRYtorcWPEmMdWs39oqWD6bxVaeCZJ0N6WZtnqI4LeBgX0UPDWBT/gHQhNnNQfn6fSQRsFPKlA YkVY5R4//ic3TBIzyohEVBax9WW3C++qPsvZClhpiVuznmoGsoW24S78hqmLL7IErXrissVDQpt8X Hh2wKxhKmteZIg2AOZ6ATy9ZwHkynMZOdV6CTiWBAtRFTJ1dJw2t0SZBFtLxRzrqQFRw4XNqaV1V8 WuIZiBow==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1pX7fH-0018oS-Fe; Tue, 28 Feb 2023 21:37:39 +0000 From: "Matthew Wilcox (Oracle)" <willy@infradead.org> To: linux-mm@kvack.org, linux-arch@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>, linux-kernel@vger.kernel.org Subject: [PATCH v3 00/34] New page table range API Date: Tue, 28 Feb 2023 21:37:03 +0000 Message-Id: <20230228213738.272178-1-willy@infradead.org> X-Mailer: git-send-email 2.37.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759112657877860263?= X-GMAIL-MSGID: =?utf-8?q?1759112657877860263?= |
Series |
New page table range API
|
|
Message
Matthew Wilcox
Feb. 28, 2023, 9:37 p.m. UTC
This patchset changes the API used by the MM to set up page table entries. The four APIs are: set_ptes(mm, addr, ptep, pte, nr) update_mmu_cache_range(vma, addr, ptep, nr) flush_dcache_folio(folio) flush_icache_pages(vma, page, nr) flush_dcache_folio() isn't technically new, but no architecture implemented it, so I've done that for you. The old APIs remain around but are mostly implemented by calling the new interfaces. The new APIs are based around setting up N page table entries at once. The N entries belong to the same PMD, the same folio and the same VMA, so ptep++ is a legitimate operation, and locking is taken care of for you. Some architectures can do a better job of it than just a loop, but I have hesitated to make too deep a change to architectures I don't understand well. One thing I have changed in every architecture is that PG_arch_1 is now a per-folio bit instead of a per-page bit. This was something that would have to happen eventually, and it makes sense to do it now rather than iterate over every page involved in a cache flush and figure out if it needs to happen. The point of all this is better performance, and Fengwei Yin has measured improvement on x86. I suspect you'll see improvement on your architecture too. Try the new will-it-scale test mentioned here: https://lore.kernel.org/linux-mm/20230206140639.538867-5-fengwei.yin@intel.com/ You'll need to run it on an XFS filesystem and have CONFIG_TRANSPARENT_HUGEPAGE set. For testing, I've only run the code on x86. If an x86->foo compiler exists in Debian, I've built defconfig. I'm relying on the buildbots to tell me what I missed, and people who actually have the hardware to tell me if it actually works. I'd like to get this into the MM tree soon after the current merge window closes, so quick feedback would be appreciated. v3: - Reinstate flush_dcache_icache_phys() on PowerPC - Fix folio_flush_mapping(). The documentation was correct and the implementation was completely wrong - Change the flush_dcache_page() documentation to describe flush_dcache_folio() instead - Split ARM from ARC. I messed up my git commands - Remove page_mapping_file() - Rationalise how flush_icache_pages() and flush_icache_page() are defined - Use flush_icache_pages() in do_set_pmd() - Pick up Guo Ren's Ack for csky Matthew Wilcox (Oracle) (30): mm: Convert page_table_check_pte_set() to page_table_check_ptes_set() mm: Add generic flush_icache_pages() and documentation mm: Add folio_flush_mapping() mm: Remove ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO alpha: Implement the new page table range API arc: Implement the new page table range API arm: Implement the new page table range API arm64: Implement the new page table range API csky: Implement the new page table range API hexagon: Implement the new page table range API ia64: Implement the new page table range API loongarch: Implement the new page table range API m68k: Implement the new page table range API microblaze: Implement the new page table range API mips: Implement the new page table range API nios2: Implement the new page table range API openrisc: Implement the new page table range API parisc: Implement the new page table range API powerpc: Implement the new page table range API riscv: Implement the new page table range API s390: Implement the new page table range API superh: Implement the new page table range API sparc32: Implement the new page table range API sparc64: Implement the new page table range API um: Implement the new page table range API x86: Implement the new page table range API xtensa: Implement the new page table range API mm: Remove page_mapping_file() mm: Rationalise flush_icache_pages() and flush_icache_page() mm: Use flush_icache_pages() in do_set_pmd() Yin Fengwei (4): filemap: Add filemap_map_folio_range() rmap: add folio_add_file_rmap_range() mm: Convert do_set_pte() to set_pte_range() filemap: Batch PTE mappings Documentation/core-api/cachetlb.rst | 51 +++++----- Documentation/filesystems/locking.rst | 2 +- arch/alpha/include/asm/cacheflush.h | 13 ++- arch/alpha/include/asm/pgtable.h | 18 +++- arch/arc/include/asm/cacheflush.h | 14 +-- arch/arc/include/asm/pgtable-bits-arcv2.h | 20 +++- arch/arc/mm/cache.c | 61 +++++++----- arch/arc/mm/tlb.c | 18 ++-- arch/arm/include/asm/cacheflush.h | 29 +++--- arch/arm/include/asm/pgtable.h | 5 +- arch/arm/include/asm/tlbflush.h | 13 ++- arch/arm/mm/copypage-v4mc.c | 5 +- arch/arm/mm/copypage-v6.c | 5 +- arch/arm/mm/copypage-xscale.c | 5 +- arch/arm/mm/dma-mapping.c | 24 ++--- arch/arm/mm/fault-armv.c | 14 +-- arch/arm/mm/flush.c | 99 +++++++++++-------- arch/arm/mm/mm.h | 2 +- arch/arm/mm/mmu.c | 14 ++- arch/arm64/include/asm/cacheflush.h | 4 +- arch/arm64/include/asm/pgtable.h | 25 +++-- arch/arm64/mm/flush.c | 36 +++---- arch/csky/abiv1/cacheflush.c | 32 ++++--- arch/csky/abiv1/inc/abi/cacheflush.h | 3 +- arch/csky/abiv2/cacheflush.c | 30 +++--- arch/csky/abiv2/inc/abi/cacheflush.h | 11 ++- arch/csky/include/asm/pgtable.h | 21 +++- arch/hexagon/include/asm/cacheflush.h | 9 +- arch/hexagon/include/asm/pgtable.h | 16 +++- arch/ia64/hp/common/sba_iommu.c | 26 ++--- arch/ia64/include/asm/cacheflush.h | 14 ++- arch/ia64/include/asm/pgtable.h | 14 ++- arch/ia64/mm/init.c | 29 ++++-- arch/loongarch/include/asm/cacheflush.h | 2 +- arch/loongarch/include/asm/pgtable.h | 30 ++++-- arch/m68k/include/asm/cacheflush_mm.h | 25 +++-- arch/m68k/include/asm/pgtable_mm.h | 21 +++- arch/m68k/mm/motorola.c | 2 +- arch/microblaze/include/asm/cacheflush.h | 8 ++ arch/microblaze/include/asm/pgtable.h | 17 +++- arch/microblaze/include/asm/tlbflush.h | 4 +- arch/mips/include/asm/cacheflush.h | 32 ++++--- arch/mips/include/asm/pgtable.h | 36 ++++--- arch/mips/mm/c-r4k.c | 5 +- arch/mips/mm/cache.c | 56 +++++------ arch/mips/mm/init.c | 17 ++-- arch/nios2/include/asm/cacheflush.h | 6 +- arch/nios2/include/asm/pgtable.h | 27 ++++-- arch/nios2/mm/cacheflush.c | 62 ++++++------ arch/openrisc/include/asm/cacheflush.h | 8 +- arch/openrisc/include/asm/pgtable.h | 27 +++++- arch/openrisc/mm/cache.c | 12 ++- arch/parisc/include/asm/cacheflush.h | 14 ++- arch/parisc/include/asm/pgtable.h | 28 ++++-- arch/parisc/kernel/cache.c | 101 ++++++++++++++------ arch/powerpc/include/asm/book3s/pgtable.h | 10 +- arch/powerpc/include/asm/cacheflush.h | 14 ++- arch/powerpc/include/asm/kvm_ppc.h | 10 +- arch/powerpc/include/asm/nohash/pgtable.h | 13 +-- arch/powerpc/include/asm/pgtable.h | 6 ++ arch/powerpc/mm/book3s64/hash_utils.c | 11 ++- arch/powerpc/mm/cacheflush.c | 40 +++----- arch/powerpc/mm/nohash/e500_hugetlbpage.c | 3 +- arch/powerpc/mm/pgtable.c | 51 +++++----- arch/riscv/include/asm/cacheflush.h | 19 ++-- arch/riscv/include/asm/pgtable.h | 26 +++-- arch/riscv/mm/cacheflush.c | 11 +-- arch/s390/include/asm/pgtable.h | 34 +++++-- arch/sh/include/asm/cacheflush.h | 21 ++-- arch/sh/include/asm/pgtable.h | 6 +- arch/sh/include/asm/pgtable_32.h | 16 +++- arch/sh/mm/cache-j2.c | 4 +- arch/sh/mm/cache-sh4.c | 26 +++-- arch/sh/mm/cache-sh7705.c | 26 +++-- arch/sh/mm/cache.c | 54 ++++++----- arch/sh/mm/kmap.c | 3 +- arch/sparc/include/asm/cacheflush_32.h | 9 +- arch/sparc/include/asm/cacheflush_64.h | 19 ++-- arch/sparc/include/asm/pgtable_32.h | 15 ++- arch/sparc/include/asm/pgtable_64.h | 25 ++++- arch/sparc/kernel/smp_64.c | 56 +++++++---- arch/sparc/mm/init_32.c | 13 ++- arch/sparc/mm/init_64.c | 78 ++++++++------- arch/sparc/mm/tlb.c | 5 +- arch/um/include/asm/pgtable.h | 15 ++- arch/x86/include/asm/pgtable.h | 21 +++- arch/xtensa/include/asm/cacheflush.h | 11 ++- arch/xtensa/include/asm/pgtable.h | 24 +++-- arch/xtensa/mm/cache.c | 83 +++++++++------- include/asm-generic/cacheflush.h | 7 -- include/linux/cacheflush.h | 13 ++- include/linux/mm.h | 3 +- include/linux/page_table_check.h | 14 +-- include/linux/pagemap.h | 28 ++++-- include/linux/rmap.h | 2 + mm/filemap.c | 111 +++++++++++++--------- mm/memory.c | 30 +++--- mm/page_table_check.c | 14 +-- mm/rmap.c | 60 +++++++++--- mm/util.c | 2 +- 100 files changed, 1436 insertions(+), 848 deletions(-)
Comments
On Tue, Feb 28, 2023 at 09:37:03PM +0000, Matthew Wilcox (Oracle) wrote: > This patchset changes the API used by the MM to set up page table entries. > The four APIs are: > set_ptes(mm, addr, ptep, pte, nr) > update_mmu_cache_range(vma, addr, ptep, nr) > flush_dcache_folio(folio) > flush_icache_pages(vma, page, nr) > > flush_dcache_folio() isn't technically new, but no architecture > implemented it, so I've done that for you. The old APIs remain around > but are mostly implemented by calling the new interfaces. > > The new APIs are based around setting up N page table entries at once. > The N entries belong to the same PMD, the same folio and the same VMA, > so ptep++ is a legitimate operation, and locking is taken care of for > you. Some architectures can do a better job of it than just a loop, > but I have hesitated to make too deep a change to architectures I don't > understand well. The new set_ptes() looks unnecessarily duplicated all over arch/ What do you say about adding the patch below on top of the series? Ideally it should be split into per-arch bits, but I can send it separately as a cleanup on top. diff --git a/arch/alpha/include/asm/pgtable.h b/arch/alpha/include/asm/pgtable.h index 1e3354e9731b..65fb9e66675d 100644 --- a/arch/alpha/include/asm/pgtable.h +++ b/arch/alpha/include/asm/pgtable.h @@ -37,6 +37,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, pte_val(pte) += 1UL << 32; } } +#define set_ptes set_ptes #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) /* PMD_SHIFT determines the size of the area a second-level page table can map */ diff --git a/arch/arc/include/asm/pgtable-bits-arcv2.h b/arch/arc/include/asm/pgtable-bits-arcv2.h index 4a1b2ce204c6..06d8039180c0 100644 --- a/arch/arc/include/asm/pgtable-bits-arcv2.h +++ b/arch/arc/include/asm/pgtable-bits-arcv2.h @@ -100,19 +100,6 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) return __pte((pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot)); } -static inline void set_ptes(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte, unsigned int nr) -{ - for (;;) { - set_pte(ptep, pte); - if (--nr == 0) - break; - ptep++; - pte_val(pte) += PAGE_SIZE; - } -} -#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) - void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address, pte_t *ptep, unsigned int nr); diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h index 6525ac82bd50..0d326b201797 100644 --- a/arch/arm/include/asm/pgtable.h +++ b/arch/arm/include/asm/pgtable.h @@ -209,6 +209,7 @@ extern void __sync_icache_dcache(pte_t pteval); void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pteval, unsigned int nr); +#define set_ptes set_ptes #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) static inline pte_t clear_pte_bit(pte_t pte, pgprot_t prot) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 4d1b79dbff16..a8d6460c5c9f 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -369,6 +369,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, pte_val(pte) += PAGE_SIZE; } } +#define set_ptes set_ptes #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) /* diff --git a/arch/csky/include/asm/pgtable.h b/arch/csky/include/asm/pgtable.h index a30ae048233e..e426f1820deb 100644 --- a/arch/csky/include/asm/pgtable.h +++ b/arch/csky/include/asm/pgtable.h @@ -91,20 +91,6 @@ static inline void set_pte(pte_t *p, pte_t pte) smp_mb(); } -static inline void set_ptes(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte, unsigned int nr) -{ - for (;;) { - set_pte(ptep, pte); - if (--nr == 0) - break; - ptep++; - pte_val(pte) += PAGE_SIZE; - } -} - -#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) - static inline pte_t *pmd_page_vaddr(pmd_t pmd) { unsigned long ptr; diff --git a/arch/hexagon/include/asm/pgtable.h b/arch/hexagon/include/asm/pgtable.h index f58f1d920769..67ab91662e83 100644 --- a/arch/hexagon/include/asm/pgtable.h +++ b/arch/hexagon/include/asm/pgtable.h @@ -345,26 +345,6 @@ static inline int pte_exec(pte_t pte) #define pte_pfn(pte) (pte_val(pte) >> PAGE_SHIFT) #define set_pmd(pmdptr, pmdval) (*(pmdptr) = (pmdval)) -/* - * set_ptes - update page table and do whatever magic may be - * necessary to make the underlying hardware/firmware take note. - * - * VM may require a virtual instruction to alert the MMU. - */ -static inline void set_ptes(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte, unsigned int nr) -{ - for (;;) { - set_pte(ptep, pte); - if (--nr == 0) - break; - ptep++; - pte_val(pte) += PAGE_SIZE; - } -} - -#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) - static inline unsigned long pmd_page_vaddr(pmd_t pmd) { return (unsigned long)__va(pmd_val(pmd) & PAGE_MASK); diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h index 0c2be4ea664b..65a6e3b30721 100644 --- a/arch/ia64/include/asm/pgtable.h +++ b/arch/ia64/include/asm/pgtable.h @@ -303,19 +303,6 @@ static inline void set_pte(pte_t *ptep, pte_t pteval) *ptep = pteval; } -static inline void set_ptes(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte, unsigned int nr) -{ - for (;;) { - set_pte(ptep, pte); - if (--nr == 0) - break; - ptep++; - pte_val(pte) += PAGE_SIZE; - } -} -#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, add, ptep, pte, 1) - /* * Make page protection values cacheable, uncacheable, or write- * combining. Note that "protection" is really a misnomer here as the diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h index 9154d317ffb4..d4b0ca7b4bf7 100644 --- a/arch/loongarch/include/asm/pgtable.h +++ b/arch/loongarch/include/asm/pgtable.h @@ -346,6 +346,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, } } +#define set_ptes set_ptes #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) diff --git a/arch/m68k/include/asm/pgtable_mm.h b/arch/m68k/include/asm/pgtable_mm.h index 400206c17c97..8c2db20abdb6 100644 --- a/arch/m68k/include/asm/pgtable_mm.h +++ b/arch/m68k/include/asm/pgtable_mm.h @@ -32,20 +32,6 @@ *(pteptr) = (pteval); \ } while(0) -static inline void set_ptes(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte, unsigned int nr) -{ - for (;;) { - set_pte(ptep, pte); - if (--nr == 0) - break; - ptep++; - pte_val(pte) += PAGE_SIZE; - } -} - -#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) - /* PMD_SHIFT determines the size of the area a second-level page table can map */ #if CONFIG_PGTABLE_LEVELS == 3 #define PMD_SHIFT 18 diff --git a/arch/microblaze/include/asm/pgtable.h b/arch/microblaze/include/asm/pgtable.h index a01e1369b486..3e7643a986ad 100644 --- a/arch/microblaze/include/asm/pgtable.h +++ b/arch/microblaze/include/asm/pgtable.h @@ -335,20 +335,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte) *ptep = pte; } -static inline void set_ptes(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte, unsigned int nr) -{ - for (;;) { - set_pte(ptep, pte); - if (--nr == 0) - break; - ptep++; - pte_val(pte) += 1 << PFN_SHIFT_OFFSET; - } -} - -#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) - #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h index 0cf0455e6ae8..18b77567ef72 100644 --- a/arch/mips/include/asm/pgtable.h +++ b/arch/mips/include/asm/pgtable.h @@ -108,6 +108,7 @@ do { \ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte, unsigned int nr); +#define set_ptes set_ptes #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) #if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32) diff --git a/arch/nios2/include/asm/pgtable.h b/arch/nios2/include/asm/pgtable.h index 8a77821a17a5..2a994b225a41 100644 --- a/arch/nios2/include/asm/pgtable.h +++ b/arch/nios2/include/asm/pgtable.h @@ -193,6 +193,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, } } +#define set_ptes set_ptes #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) static inline int pmd_none(pmd_t pmd) diff --git a/arch/openrisc/include/asm/pgtable.h b/arch/openrisc/include/asm/pgtable.h index 1a7077150d7b..8f27730a9ab7 100644 --- a/arch/openrisc/include/asm/pgtable.h +++ b/arch/openrisc/include/asm/pgtable.h @@ -47,20 +47,6 @@ extern void paging_init(void); */ #define set_pte(pteptr, pteval) ((*(pteptr)) = (pteval)) -static inline void set_ptes(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte, unsigned int nr) -{ - for (;;) { - set_pte(ptep, pte); - if (--nr == 0) - break; - ptep++; - pte_val(pte) += PAGE_SIZE; - } -} - -#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) - /* * (pmds are folded into pgds so this doesn't get actually called, * but the define is needed for a generic inline function.) diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h index 78ee9816f423..cd04e85cb012 100644 --- a/arch/parisc/include/asm/pgtable.h +++ b/arch/parisc/include/asm/pgtable.h @@ -73,6 +73,7 @@ extern void __update_cache(pte_t pte); mb(); \ } while(0) +#define set_ptes set_ptes #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) #endif /* !__ASSEMBLY__ */ diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index bf1263ff7e67..f10b6c2f8ade 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -43,6 +43,7 @@ struct mm_struct; void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte, unsigned int nr); +#define set_ptes set_ptes #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) #define update_mmu_cache(vma, addr, ptep) \ update_mmu_cache_range(vma, addr, ptep, 1); diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 3a3a776fc047..8bc49496f8a6 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -473,6 +473,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, pte_val(pteval) += 1 << _PAGE_PFN_SHIFT; } } +#define set_ptes set_ptes #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) static inline void pte_clear(struct mm_struct *mm, diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 46bf475116f1..2fc20558af6b 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1346,6 +1346,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, } } +#define set_ptes set_ptes #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) /* diff --git a/arch/sh/include/asm/pgtable_32.h b/arch/sh/include/asm/pgtable_32.h index 03ba1834e126..d2f17e944bea 100644 --- a/arch/sh/include/asm/pgtable_32.h +++ b/arch/sh/include/asm/pgtable_32.h @@ -319,6 +319,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, } } +#define set_ptes set_ptes #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) /* diff --git a/arch/sparc/include/asm/pgtable_32.h b/arch/sparc/include/asm/pgtable_32.h index 47ae55ea1837..7fbc7772a9b7 100644 --- a/arch/sparc/include/asm/pgtable_32.h +++ b/arch/sparc/include/asm/pgtable_32.h @@ -101,20 +101,6 @@ static inline void set_pte(pte_t *ptep, pte_t pteval) srmmu_swap((unsigned long *)ptep, pte_val(pteval)); } -static inline void set_ptes(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte, unsigned int nr) -{ - for (;;) { - set_pte(ptep, pte); - if (--nr == 0) - break; - ptep++; - pte_val(pte) += PAGE_SIZE; - } -} - -#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) - static inline int srmmu_device_memory(unsigned long x) { return ((x & 0xF0000000) != 0); diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h index d5c0088e0c6a..fddca662ba1b 100644 --- a/arch/sparc/include/asm/pgtable_64.h +++ b/arch/sparc/include/asm/pgtable_64.h @@ -924,6 +924,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, } } +#define set_ptes set_ptes #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1); #define pte_clear(mm,addr,ptep) \ diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h index ca78c90ae74f..60d2b20ff218 100644 --- a/arch/um/include/asm/pgtable.h +++ b/arch/um/include/asm/pgtable.h @@ -242,20 +242,6 @@ static inline void set_pte(pte_t *pteptr, pte_t pteval) if(pte_present(*pteptr)) *pteptr = pte_mknewprot(*pteptr); } -static inline void set_ptes(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte, unsigned int nr) -{ - for (;;) { - set_pte(ptep, pte); - if (--nr == 0) - break; - ptep++; - pte_val(pte) += PAGE_SIZE; - } -} - -#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) - #define __HAVE_ARCH_PTE_SAME static inline int pte_same(pte_t pte_a, pte_t pte_b) { diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index f424371ea143..1e5fd352880d 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1019,22 +1019,6 @@ static inline pud_t native_local_pudp_get_and_clear(pud_t *pudp) return res; } -static inline void set_ptes(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte, unsigned int nr) -{ - page_table_check_ptes_set(mm, addr, ptep, pte, nr); - - for (;;) { - set_pte(ptep, pte); - if (--nr == 0) - break; - ptep++; - pte = __pte(pte_val(pte) + PAGE_SIZE); - } -} - -#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) - static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pmd_t pmd) { diff --git a/arch/xtensa/include/asm/pgtable.h b/arch/xtensa/include/asm/pgtable.h index 293101530541..adeee96518b9 100644 --- a/arch/xtensa/include/asm/pgtable.h +++ b/arch/xtensa/include/asm/pgtable.h @@ -306,20 +306,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte) update_pte(ptep, pte); } -static inline void set_ptes(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte, unsigned int nr) -{ - for (;;) { - set_pte(ptep, pte); - if (--nr == 0) - break; - ptep++; - pte_val(pte) += PAGE_SIZE; - } -} - -#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) - static inline void set_pmd(pmd_t *pmdp, pmd_t pmdval) { diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index c63cd44777ec..ef204712eda3 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -172,6 +172,24 @@ static inline int pmd_young(pmd_t pmd) } #endif +#ifndef set_ptes +static inline void set_ptes(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte, unsigned int nr) +{ + page_table_check_ptes_set(mm, addr, ptep, pte, nr); + + for (;;) { + set_pte(ptep, pte); + if (--nr == 0) + break; + ptep++; + pte = __pte(pte_val(pte) + PAGE_SIZE); + } +} +#define set_ptes set_ptes +#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) +#endif + #ifndef __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS extern int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address, pte_t *ptep,
Hi Willy, On Tue, Feb 28, 2023 at 10:40 PM Matthew Wilcox (Oracle) <willy@infradead.org> wrote: > This patchset changes the API used by the MM to set up page table entries. > The four APIs are: > set_ptes(mm, addr, ptep, pte, nr) > update_mmu_cache_range(vma, addr, ptep, nr) > flush_dcache_folio(folio) > flush_icache_pages(vma, page, nr) > > flush_dcache_folio() isn't technically new, but no architecture > implemented it, so I've done that for you. The old APIs remain around > but are mostly implemented by calling the new interfaces. > > The new APIs are based around setting up N page table entries at once. > The N entries belong to the same PMD, the same folio and the same VMA, > so ptep++ is a legitimate operation, and locking is taken care of for > you. Some architectures can do a better job of it than just a loop, > but I have hesitated to make too deep a change to architectures I don't > understand well. > > One thing I have changed in every architecture is that PG_arch_1 is now a > per-folio bit instead of a per-page bit. This was something that would > have to happen eventually, and it makes sense to do it now rather than > iterate over every page involved in a cache flush and figure out if it > needs to happen. > > The point of all this is better performance, and Fengwei Yin has > measured improvement on x86. I suspect you'll see improvement on > your architecture too. Try the new will-it-scale test mentioned here: > https://lore.kernel.org/linux-mm/20230206140639.538867-5-fengwei.yin@intel.com/ > You'll need to run it on an XFS filesystem and have > CONFIG_TRANSPARENT_HUGEPAGE set. Thanks for your series! > For testing, I've only run the code on x86. If an x86->foo compiler > exists in Debian, I've built defconfig. I'm relying on the buildbots > to tell me what I missed, and people who actually have the hardware to > tell me if it actually works. Seems to work fine on ARAnyM and qemu-system-m68k/virt, so Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Gr{oetje,eeting}s, Geert
On 28/02/2023 21:37, Matthew Wilcox (Oracle) wrote: > This patchset changes the API used by the MM to set up page table entries. > The four APIs are: > set_ptes(mm, addr, ptep, pte, nr) > update_mmu_cache_range(vma, addr, ptep, nr) > flush_dcache_folio(folio) > flush_icache_pages(vma, page, nr) > > flush_dcache_folio() isn't technically new, but no architecture > implemented it, so I've done that for you. The old APIs remain around > but are mostly implemented by calling the new interfaces. > > The new APIs are based around setting up N page table entries at once. > The N entries belong to the same PMD, the same folio and the same VMA, > so ptep++ is a legitimate operation, and locking is taken care of for > you. Some architectures can do a better job of it than just a loop, > but I have hesitated to make too deep a change to architectures I don't > understand well. > > One thing I have changed in every architecture is that PG_arch_1 is now a > per-folio bit instead of a per-page bit. This was something that would > have to happen eventually, and it makes sense to do it now rather than > iterate over every page involved in a cache flush and figure out if it > needs to happen. > > The point of all this is better performance, and Fengwei Yin has > measured improvement on x86. I suspect you'll see improvement on > your architecture too. Try the new will-it-scale test mentioned here: > https://lore.kernel.org/linux-mm/20230206140639.538867-5-fengwei.yin@intel.com/ > You'll need to run it on an XFS filesystem and have > CONFIG_TRANSPARENT_HUGEPAGE set. > > For testing, I've only run the code on x86. If an x86->foo compiler > exists in Debian, I've built defconfig. I'm relying on the buildbots > to tell me what I missed, and people who actually have the hardware to > tell me if it actually works. > > I'd like to get this into the MM tree soon after the current merge window > closes, so quick feedback would be appreciated. I've boot-tested the series (with the Yin's typo fix for patch 32) on arm64 FVP and Ampere Altra. On the Altra, I also ran page_fault4 from will-it-scale, and see ~35% improvement from this series. So: Tested-by: Ryan Roberts <ryan.roberts@arm.com> Thanks, Ryan