[v3,00/34] New page table range API

Message ID 20230228213738.272178-1-willy@infradead.org
Headers
Series New page table range API |

Message

Matthew Wilcox Feb. 28, 2023, 9:37 p.m. UTC
  This patchset changes the API used by the MM to set up page table entries.
The four APIs are:
    set_ptes(mm, addr, ptep, pte, nr)
    update_mmu_cache_range(vma, addr, ptep, nr)
    flush_dcache_folio(folio)
    flush_icache_pages(vma, page, nr)

flush_dcache_folio() isn't technically new, but no architecture
implemented it, so I've done that for you.  The old APIs remain around
but are mostly implemented by calling the new interfaces.

The new APIs are based around setting up N page table entries at once.
The N entries belong to the same PMD, the same folio and the same VMA,
so ptep++ is a legitimate operation, and locking is taken care of for
you.  Some architectures can do a better job of it than just a loop,
but I have hesitated to make too deep a change to architectures I don't
understand well.

One thing I have changed in every architecture is that PG_arch_1 is now a
per-folio bit instead of a per-page bit.  This was something that would
have to happen eventually, and it makes sense to do it now rather than
iterate over every page involved in a cache flush and figure out if it
needs to happen.

The point of all this is better performance, and Fengwei Yin has
measured improvement on x86.  I suspect you'll see improvement on
your architecture too.  Try the new will-it-scale test mentioned here:
https://lore.kernel.org/linux-mm/20230206140639.538867-5-fengwei.yin@intel.com/
You'll need to run it on an XFS filesystem and have
CONFIG_TRANSPARENT_HUGEPAGE set.

For testing, I've only run the code on x86.  If an x86->foo compiler
exists in Debian, I've built defconfig.  I'm relying on the buildbots
to tell me what I missed, and people who actually have the hardware to
tell me if it actually works.

I'd like to get this into the MM tree soon after the current merge window
closes, so quick feedback would be appreciated.

v3:
 - Reinstate flush_dcache_icache_phys() on PowerPC
 - Fix folio_flush_mapping().  The documentation was correct and the
   implementation was completely wrong
 - Change the flush_dcache_page() documentation to describe
   flush_dcache_folio() instead
 - Split ARM from ARC.  I messed up my git commands
 - Remove page_mapping_file()
 - Rationalise how flush_icache_pages() and flush_icache_page() are defined
 - Use flush_icache_pages() in do_set_pmd()
 - Pick up Guo Ren's Ack for csky

Matthew Wilcox (Oracle) (30):
  mm: Convert page_table_check_pte_set() to page_table_check_ptes_set()
  mm: Add generic flush_icache_pages() and documentation
  mm: Add folio_flush_mapping()
  mm: Remove ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO
  alpha: Implement the new page table range API
  arc: Implement the new page table range API
  arm: Implement the new page table range API
  arm64: Implement the new page table range API
  csky: Implement the new page table range API
  hexagon: Implement the new page table range API
  ia64: Implement the new page table range API
  loongarch: Implement the new page table range API
  m68k: Implement the new page table range API
  microblaze: Implement the new page table range API
  mips: Implement the new page table range API
  nios2: Implement the new page table range API
  openrisc: Implement the new page table range API
  parisc: Implement the new page table range API
  powerpc: Implement the new page table range API
  riscv: Implement the new page table range API
  s390: Implement the new page table range API
  superh: Implement the new page table range API
  sparc32: Implement the new page table range API
  sparc64: Implement the new page table range API
  um: Implement the new page table range API
  x86: Implement the new page table range API
  xtensa: Implement the new page table range API
  mm: Remove page_mapping_file()
  mm: Rationalise flush_icache_pages() and flush_icache_page()
  mm: Use flush_icache_pages() in do_set_pmd()

Yin Fengwei (4):
  filemap: Add filemap_map_folio_range()
  rmap: add folio_add_file_rmap_range()
  mm: Convert do_set_pte() to set_pte_range()
  filemap: Batch PTE mappings

 Documentation/core-api/cachetlb.rst       |  51 +++++-----
 Documentation/filesystems/locking.rst     |   2 +-
 arch/alpha/include/asm/cacheflush.h       |  13 ++-
 arch/alpha/include/asm/pgtable.h          |  18 +++-
 arch/arc/include/asm/cacheflush.h         |  14 +--
 arch/arc/include/asm/pgtable-bits-arcv2.h |  20 +++-
 arch/arc/mm/cache.c                       |  61 +++++++-----
 arch/arc/mm/tlb.c                         |  18 ++--
 arch/arm/include/asm/cacheflush.h         |  29 +++---
 arch/arm/include/asm/pgtable.h            |   5 +-
 arch/arm/include/asm/tlbflush.h           |  13 ++-
 arch/arm/mm/copypage-v4mc.c               |   5 +-
 arch/arm/mm/copypage-v6.c                 |   5 +-
 arch/arm/mm/copypage-xscale.c             |   5 +-
 arch/arm/mm/dma-mapping.c                 |  24 ++---
 arch/arm/mm/fault-armv.c                  |  14 +--
 arch/arm/mm/flush.c                       |  99 +++++++++++--------
 arch/arm/mm/mm.h                          |   2 +-
 arch/arm/mm/mmu.c                         |  14 ++-
 arch/arm64/include/asm/cacheflush.h       |   4 +-
 arch/arm64/include/asm/pgtable.h          |  25 +++--
 arch/arm64/mm/flush.c                     |  36 +++----
 arch/csky/abiv1/cacheflush.c              |  32 ++++---
 arch/csky/abiv1/inc/abi/cacheflush.h      |   3 +-
 arch/csky/abiv2/cacheflush.c              |  30 +++---
 arch/csky/abiv2/inc/abi/cacheflush.h      |  11 ++-
 arch/csky/include/asm/pgtable.h           |  21 +++-
 arch/hexagon/include/asm/cacheflush.h     |   9 +-
 arch/hexagon/include/asm/pgtable.h        |  16 +++-
 arch/ia64/hp/common/sba_iommu.c           |  26 ++---
 arch/ia64/include/asm/cacheflush.h        |  14 ++-
 arch/ia64/include/asm/pgtable.h           |  14 ++-
 arch/ia64/mm/init.c                       |  29 ++++--
 arch/loongarch/include/asm/cacheflush.h   |   2 +-
 arch/loongarch/include/asm/pgtable.h      |  30 ++++--
 arch/m68k/include/asm/cacheflush_mm.h     |  25 +++--
 arch/m68k/include/asm/pgtable_mm.h        |  21 +++-
 arch/m68k/mm/motorola.c                   |   2 +-
 arch/microblaze/include/asm/cacheflush.h  |   8 ++
 arch/microblaze/include/asm/pgtable.h     |  17 +++-
 arch/microblaze/include/asm/tlbflush.h    |   4 +-
 arch/mips/include/asm/cacheflush.h        |  32 ++++---
 arch/mips/include/asm/pgtable.h           |  36 ++++---
 arch/mips/mm/c-r4k.c                      |   5 +-
 arch/mips/mm/cache.c                      |  56 +++++------
 arch/mips/mm/init.c                       |  17 ++--
 arch/nios2/include/asm/cacheflush.h       |   6 +-
 arch/nios2/include/asm/pgtable.h          |  27 ++++--
 arch/nios2/mm/cacheflush.c                |  62 ++++++------
 arch/openrisc/include/asm/cacheflush.h    |   8 +-
 arch/openrisc/include/asm/pgtable.h       |  27 +++++-
 arch/openrisc/mm/cache.c                  |  12 ++-
 arch/parisc/include/asm/cacheflush.h      |  14 ++-
 arch/parisc/include/asm/pgtable.h         |  28 ++++--
 arch/parisc/kernel/cache.c                | 101 ++++++++++++++------
 arch/powerpc/include/asm/book3s/pgtable.h |  10 +-
 arch/powerpc/include/asm/cacheflush.h     |  14 ++-
 arch/powerpc/include/asm/kvm_ppc.h        |  10 +-
 arch/powerpc/include/asm/nohash/pgtable.h |  13 +--
 arch/powerpc/include/asm/pgtable.h        |   6 ++
 arch/powerpc/mm/book3s64/hash_utils.c     |  11 ++-
 arch/powerpc/mm/cacheflush.c              |  40 +++-----
 arch/powerpc/mm/nohash/e500_hugetlbpage.c |   3 +-
 arch/powerpc/mm/pgtable.c                 |  51 +++++-----
 arch/riscv/include/asm/cacheflush.h       |  19 ++--
 arch/riscv/include/asm/pgtable.h          |  26 +++--
 arch/riscv/mm/cacheflush.c                |  11 +--
 arch/s390/include/asm/pgtable.h           |  34 +++++--
 arch/sh/include/asm/cacheflush.h          |  21 ++--
 arch/sh/include/asm/pgtable.h             |   6 +-
 arch/sh/include/asm/pgtable_32.h          |  16 +++-
 arch/sh/mm/cache-j2.c                     |   4 +-
 arch/sh/mm/cache-sh4.c                    |  26 +++--
 arch/sh/mm/cache-sh7705.c                 |  26 +++--
 arch/sh/mm/cache.c                        |  54 ++++++-----
 arch/sh/mm/kmap.c                         |   3 +-
 arch/sparc/include/asm/cacheflush_32.h    |   9 +-
 arch/sparc/include/asm/cacheflush_64.h    |  19 ++--
 arch/sparc/include/asm/pgtable_32.h       |  15 ++-
 arch/sparc/include/asm/pgtable_64.h       |  25 ++++-
 arch/sparc/kernel/smp_64.c                |  56 +++++++----
 arch/sparc/mm/init_32.c                   |  13 ++-
 arch/sparc/mm/init_64.c                   |  78 ++++++++-------
 arch/sparc/mm/tlb.c                       |   5 +-
 arch/um/include/asm/pgtable.h             |  15 ++-
 arch/x86/include/asm/pgtable.h            |  21 +++-
 arch/xtensa/include/asm/cacheflush.h      |  11 ++-
 arch/xtensa/include/asm/pgtable.h         |  24 +++--
 arch/xtensa/mm/cache.c                    |  83 +++++++++-------
 include/asm-generic/cacheflush.h          |   7 --
 include/linux/cacheflush.h                |  13 ++-
 include/linux/mm.h                        |   3 +-
 include/linux/page_table_check.h          |  14 +--
 include/linux/pagemap.h                   |  28 ++++--
 include/linux/rmap.h                      |   2 +
 mm/filemap.c                              | 111 +++++++++++++---------
 mm/memory.c                               |  30 +++---
 mm/page_table_check.c                     |  14 +--
 mm/rmap.c                                 |  60 +++++++++---
 mm/util.c                                 |   2 +-
 100 files changed, 1436 insertions(+), 848 deletions(-)
  

Comments

Mike Rapoport March 3, 2023, 2:19 p.m. UTC | #1
On Tue, Feb 28, 2023 at 09:37:03PM +0000, Matthew Wilcox (Oracle) wrote:
> This patchset changes the API used by the MM to set up page table entries.
> The four APIs are:
>     set_ptes(mm, addr, ptep, pte, nr)
>     update_mmu_cache_range(vma, addr, ptep, nr)
>     flush_dcache_folio(folio)
>     flush_icache_pages(vma, page, nr)
> 
> flush_dcache_folio() isn't technically new, but no architecture
> implemented it, so I've done that for you.  The old APIs remain around
> but are mostly implemented by calling the new interfaces.
> 
> The new APIs are based around setting up N page table entries at once.
> The N entries belong to the same PMD, the same folio and the same VMA,
> so ptep++ is a legitimate operation, and locking is taken care of for
> you.  Some architectures can do a better job of it than just a loop,
> but I have hesitated to make too deep a change to architectures I don't
> understand well.

The new set_ptes() looks unnecessarily duplicated all over arch/
What do you say about adding the patch below on top of the series?
Ideally it should be split into per-arch bits, but I can send it separately
as a cleanup on top.

diff --git a/arch/alpha/include/asm/pgtable.h b/arch/alpha/include/asm/pgtable.h
index 1e3354e9731b..65fb9e66675d 100644
--- a/arch/alpha/include/asm/pgtable.h
+++ b/arch/alpha/include/asm/pgtable.h
@@ -37,6 +37,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 		pte_val(pte) += 1UL << 32;
 	}
 }
+#define set_ptes set_ptes
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 /* PMD_SHIFT determines the size of the area a second-level page table can map */
diff --git a/arch/arc/include/asm/pgtable-bits-arcv2.h b/arch/arc/include/asm/pgtable-bits-arcv2.h
index 4a1b2ce204c6..06d8039180c0 100644
--- a/arch/arc/include/asm/pgtable-bits-arcv2.h
+++ b/arch/arc/include/asm/pgtable-bits-arcv2.h
@@ -100,19 +100,6 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 	return __pte((pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot));
 }
 
-static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
-		pte_t *ptep, pte_t pte, unsigned int nr)
-{
-	for (;;) {
-		set_pte(ptep, pte);
-		if (--nr == 0)
-			break;
-		ptep++;
-		pte_val(pte) += PAGE_SIZE;
-	}
-}
-#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
-
 void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
 		      pte_t *ptep, unsigned int nr);
 
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 6525ac82bd50..0d326b201797 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -209,6 +209,7 @@ extern void __sync_icache_dcache(pte_t pteval);
 
 void set_ptes(struct mm_struct *mm, unsigned long addr,
 		      pte_t *ptep, pte_t pteval, unsigned int nr);
+#define set_ptes set_ptes
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 static inline pte_t clear_pte_bit(pte_t pte, pgprot_t prot)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 4d1b79dbff16..a8d6460c5c9f 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -369,6 +369,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 		pte_val(pte) += PAGE_SIZE;
 	}
 }
+#define set_ptes set_ptes
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 /*
diff --git a/arch/csky/include/asm/pgtable.h b/arch/csky/include/asm/pgtable.h
index a30ae048233e..e426f1820deb 100644
--- a/arch/csky/include/asm/pgtable.h
+++ b/arch/csky/include/asm/pgtable.h
@@ -91,20 +91,6 @@ static inline void set_pte(pte_t *p, pte_t pte)
 	smp_mb();
 }
 
-static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
-		pte_t *ptep, pte_t pte, unsigned int nr)
-{
-	for (;;) {
-		set_pte(ptep, pte);
-		if (--nr == 0)
-			break;
-		ptep++;
-		pte_val(pte) += PAGE_SIZE;
-	}
-}
-
-#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
-
 static inline pte_t *pmd_page_vaddr(pmd_t pmd)
 {
 	unsigned long ptr;
diff --git a/arch/hexagon/include/asm/pgtable.h b/arch/hexagon/include/asm/pgtable.h
index f58f1d920769..67ab91662e83 100644
--- a/arch/hexagon/include/asm/pgtable.h
+++ b/arch/hexagon/include/asm/pgtable.h
@@ -345,26 +345,6 @@ static inline int pte_exec(pte_t pte)
 #define pte_pfn(pte) (pte_val(pte) >> PAGE_SHIFT)
 #define set_pmd(pmdptr, pmdval) (*(pmdptr) = (pmdval))
 
-/*
- * set_ptes - update page table and do whatever magic may be
- * necessary to make the underlying hardware/firmware take note.
- *
- * VM may require a virtual instruction to alert the MMU.
- */
-static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
-		pte_t *ptep, pte_t pte, unsigned int nr)
-{
-	for (;;) {
-		set_pte(ptep, pte);
-		if (--nr == 0)
-			break;
-		ptep++;
-		pte_val(pte) += PAGE_SIZE;
-	}
-}
-
-#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
-
 static inline unsigned long pmd_page_vaddr(pmd_t pmd)
 {
 	return (unsigned long)__va(pmd_val(pmd) & PAGE_MASK);
diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h
index 0c2be4ea664b..65a6e3b30721 100644
--- a/arch/ia64/include/asm/pgtable.h
+++ b/arch/ia64/include/asm/pgtable.h
@@ -303,19 +303,6 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
 	*ptep = pteval;
 }
 
-static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
-		pte_t *ptep, pte_t pte, unsigned int nr)
-{
-	for (;;) {
-		set_pte(ptep, pte);
-		if (--nr == 0)
-			break;
-		ptep++;
-		pte_val(pte) += PAGE_SIZE;
-	}
-}
-#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, add, ptep, pte, 1)
-
 /*
  * Make page protection values cacheable, uncacheable, or write-
  * combining.  Note that "protection" is really a misnomer here as the
diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
index 9154d317ffb4..d4b0ca7b4bf7 100644
--- a/arch/loongarch/include/asm/pgtable.h
+++ b/arch/loongarch/include/asm/pgtable.h
@@ -346,6 +346,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 	}
 }
 
+#define set_ptes set_ptes
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
diff --git a/arch/m68k/include/asm/pgtable_mm.h b/arch/m68k/include/asm/pgtable_mm.h
index 400206c17c97..8c2db20abdb6 100644
--- a/arch/m68k/include/asm/pgtable_mm.h
+++ b/arch/m68k/include/asm/pgtable_mm.h
@@ -32,20 +32,6 @@
 		*(pteptr) = (pteval);				\
 	} while(0)
 
-static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
-		pte_t *ptep, pte_t pte, unsigned int nr)
-{
-	for (;;) {
-		set_pte(ptep, pte);
-		if (--nr == 0)
-			break;
-		ptep++;
-		pte_val(pte) += PAGE_SIZE;
-	}
-}
-
-#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
-
 /* PMD_SHIFT determines the size of the area a second-level page table can map */
 #if CONFIG_PGTABLE_LEVELS == 3
 #define PMD_SHIFT	18
diff --git a/arch/microblaze/include/asm/pgtable.h b/arch/microblaze/include/asm/pgtable.h
index a01e1369b486..3e7643a986ad 100644
--- a/arch/microblaze/include/asm/pgtable.h
+++ b/arch/microblaze/include/asm/pgtable.h
@@ -335,20 +335,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
 	*ptep = pte;
 }
 
-static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
-		pte_t *ptep, pte_t pte, unsigned int nr)
-{
-	for (;;) {
-		set_pte(ptep, pte);
-		if (--nr == 0)
-			break;
-		ptep++;
-		pte_val(pte) += 1 << PFN_SHIFT_OFFSET;
-	}
-}
-
-#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
-
 #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
 static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
 		unsigned long address, pte_t *ptep)
diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
index 0cf0455e6ae8..18b77567ef72 100644
--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -108,6 +108,7 @@ do {									\
 static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 		pte_t *ptep, pte_t pte, unsigned int nr);
 
+#define set_ptes set_ptes
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 #if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
diff --git a/arch/nios2/include/asm/pgtable.h b/arch/nios2/include/asm/pgtable.h
index 8a77821a17a5..2a994b225a41 100644
--- a/arch/nios2/include/asm/pgtable.h
+++ b/arch/nios2/include/asm/pgtable.h
@@ -193,6 +193,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 	}
 }
 
+#define set_ptes set_ptes
 #define set_pte_at(mm, addr, ptep, pte)	set_ptes(mm, addr, ptep, pte, 1)
 
 static inline int pmd_none(pmd_t pmd)
diff --git a/arch/openrisc/include/asm/pgtable.h b/arch/openrisc/include/asm/pgtable.h
index 1a7077150d7b..8f27730a9ab7 100644
--- a/arch/openrisc/include/asm/pgtable.h
+++ b/arch/openrisc/include/asm/pgtable.h
@@ -47,20 +47,6 @@ extern void paging_init(void);
  */
 #define set_pte(pteptr, pteval) ((*(pteptr)) = (pteval))
 
-static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
-		pte_t *ptep, pte_t pte, unsigned int nr)
-{
-	for (;;) {
-		set_pte(ptep, pte);
-		if (--nr == 0)
-			break;
-		ptep++;
-		pte_val(pte) += PAGE_SIZE;
-	}
-}
-
-#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
-
 /*
  * (pmds are folded into pgds so this doesn't get actually called,
  * but the define is needed for a generic inline function.)
diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h
index 78ee9816f423..cd04e85cb012 100644
--- a/arch/parisc/include/asm/pgtable.h
+++ b/arch/parisc/include/asm/pgtable.h
@@ -73,6 +73,7 @@ extern void __update_cache(pte_t pte);
 		mb();				\
 	} while(0)
 
+#define set_ptes set_ptes
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index bf1263ff7e67..f10b6c2f8ade 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -43,6 +43,7 @@ struct mm_struct;
 
 void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
 		pte_t pte, unsigned int nr);
+#define set_ptes set_ptes
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 #define update_mmu_cache(vma, addr, ptep) \
 	update_mmu_cache_range(vma, addr, ptep, 1);
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 3a3a776fc047..8bc49496f8a6 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -473,6 +473,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 		pte_val(pteval) += 1 << _PAGE_PFN_SHIFT;
 	}
 }
+#define set_ptes set_ptes
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 static inline void pte_clear(struct mm_struct *mm,
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 46bf475116f1..2fc20558af6b 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1346,6 +1346,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 	}
 }
 
+#define set_ptes set_ptes
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 /*
diff --git a/arch/sh/include/asm/pgtable_32.h b/arch/sh/include/asm/pgtable_32.h
index 03ba1834e126..d2f17e944bea 100644
--- a/arch/sh/include/asm/pgtable_32.h
+++ b/arch/sh/include/asm/pgtable_32.h
@@ -319,6 +319,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 	}
 }
 
+#define set_ptes set_ptes
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 /*
diff --git a/arch/sparc/include/asm/pgtable_32.h b/arch/sparc/include/asm/pgtable_32.h
index 47ae55ea1837..7fbc7772a9b7 100644
--- a/arch/sparc/include/asm/pgtable_32.h
+++ b/arch/sparc/include/asm/pgtable_32.h
@@ -101,20 +101,6 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
 	srmmu_swap((unsigned long *)ptep, pte_val(pteval));
 }
 
-static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
-		pte_t *ptep, pte_t pte, unsigned int nr)
-{
-	for (;;) {
-		set_pte(ptep, pte);
-		if (--nr == 0)
-			break;
-		ptep++;
-		pte_val(pte) += PAGE_SIZE;
-	}
-}
-
-#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
-
 static inline int srmmu_device_memory(unsigned long x)
 {
 	return ((x & 0xF0000000) != 0);
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index d5c0088e0c6a..fddca662ba1b 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -924,6 +924,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 	}
 }
 
+#define set_ptes set_ptes
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1);
 
 #define pte_clear(mm,addr,ptep)		\
diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h
index ca78c90ae74f..60d2b20ff218 100644
--- a/arch/um/include/asm/pgtable.h
+++ b/arch/um/include/asm/pgtable.h
@@ -242,20 +242,6 @@ static inline void set_pte(pte_t *pteptr, pte_t pteval)
 	if(pte_present(*pteptr)) *pteptr = pte_mknewprot(*pteptr);
 }
 
-static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
-		pte_t *ptep, pte_t pte, unsigned int nr)
-{
-	for (;;) {
-		set_pte(ptep, pte);
-		if (--nr == 0)
-			break;
-		ptep++;
-		pte_val(pte) += PAGE_SIZE;
-	}
-}
-
-#define set_pte_at(mm, addr, ptep, pte)	set_ptes(mm, addr, ptep, pte, 1)
-
 #define __HAVE_ARCH_PTE_SAME
 static inline int pte_same(pte_t pte_a, pte_t pte_b)
 {
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index f424371ea143..1e5fd352880d 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1019,22 +1019,6 @@ static inline pud_t native_local_pudp_get_and_clear(pud_t *pudp)
 	return res;
 }
 
-static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
-			      pte_t *ptep, pte_t pte, unsigned int nr)
-{
-	page_table_check_ptes_set(mm, addr, ptep, pte, nr);
-
-	for (;;) {
-		set_pte(ptep, pte);
-		if (--nr == 0)
-			break;
-		ptep++;
-		pte = __pte(pte_val(pte) + PAGE_SIZE);
-	}
-}
-
-#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
-
 static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
 			      pmd_t *pmdp, pmd_t pmd)
 {
diff --git a/arch/xtensa/include/asm/pgtable.h b/arch/xtensa/include/asm/pgtable.h
index 293101530541..adeee96518b9 100644
--- a/arch/xtensa/include/asm/pgtable.h
+++ b/arch/xtensa/include/asm/pgtable.h
@@ -306,20 +306,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
 	update_pte(ptep, pte);
 }
 
-static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
-		pte_t *ptep, pte_t pte, unsigned int nr)
-{
-	for (;;) {
-		set_pte(ptep, pte);
-		if (--nr == 0)
-			break;
-		ptep++;
-		pte_val(pte) += PAGE_SIZE;
-	}
-}
-
-#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
-
 static inline void
 set_pmd(pmd_t *pmdp, pmd_t pmdval)
 {
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index c63cd44777ec..ef204712eda3 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -172,6 +172,24 @@ static inline int pmd_young(pmd_t pmd)
 }
 #endif
 
+#ifndef set_ptes
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+			    pte_t *ptep, pte_t pte, unsigned int nr)
+{
+	page_table_check_ptes_set(mm, addr, ptep, pte, nr);
+
+	for (;;) {
+		set_pte(ptep, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte = __pte(pte_val(pte) + PAGE_SIZE);
+	}
+}
+#define set_ptes set_ptes
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
+#endif
+
 #ifndef __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
 extern int ptep_set_access_flags(struct vm_area_struct *vma,
 				 unsigned long address, pte_t *ptep,
  
Geert Uytterhoeven March 5, 2023, 10:15 a.m. UTC | #2
Hi Willy,

On Tue, Feb 28, 2023 at 10:40 PM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
> This patchset changes the API used by the MM to set up page table entries.
> The four APIs are:
>     set_ptes(mm, addr, ptep, pte, nr)
>     update_mmu_cache_range(vma, addr, ptep, nr)
>     flush_dcache_folio(folio)
>     flush_icache_pages(vma, page, nr)
>
> flush_dcache_folio() isn't technically new, but no architecture
> implemented it, so I've done that for you.  The old APIs remain around
> but are mostly implemented by calling the new interfaces.
>
> The new APIs are based around setting up N page table entries at once.
> The N entries belong to the same PMD, the same folio and the same VMA,
> so ptep++ is a legitimate operation, and locking is taken care of for
> you.  Some architectures can do a better job of it than just a loop,
> but I have hesitated to make too deep a change to architectures I don't
> understand well.
>
> One thing I have changed in every architecture is that PG_arch_1 is now a
> per-folio bit instead of a per-page bit.  This was something that would
> have to happen eventually, and it makes sense to do it now rather than
> iterate over every page involved in a cache flush and figure out if it
> needs to happen.
>
> The point of all this is better performance, and Fengwei Yin has
> measured improvement on x86.  I suspect you'll see improvement on
> your architecture too.  Try the new will-it-scale test mentioned here:
> https://lore.kernel.org/linux-mm/20230206140639.538867-5-fengwei.yin@intel.com/
> You'll need to run it on an XFS filesystem and have
> CONFIG_TRANSPARENT_HUGEPAGE set.

Thanks for your series!

> For testing, I've only run the code on x86.  If an x86->foo compiler
> exists in Debian, I've built defconfig.  I'm relying on the buildbots
> to tell me what I missed, and people who actually have the hardware to
> tell me if it actually works.

Seems to work fine on ARAnyM and qemu-system-m68k/virt, so
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert
  
Ryan Roberts March 9, 2023, 11:09 a.m. UTC | #3
On 28/02/2023 21:37, Matthew Wilcox (Oracle) wrote:
> This patchset changes the API used by the MM to set up page table entries.
> The four APIs are:
>     set_ptes(mm, addr, ptep, pte, nr)
>     update_mmu_cache_range(vma, addr, ptep, nr)
>     flush_dcache_folio(folio)
>     flush_icache_pages(vma, page, nr)
> 
> flush_dcache_folio() isn't technically new, but no architecture
> implemented it, so I've done that for you.  The old APIs remain around
> but are mostly implemented by calling the new interfaces.
> 
> The new APIs are based around setting up N page table entries at once.
> The N entries belong to the same PMD, the same folio and the same VMA,
> so ptep++ is a legitimate operation, and locking is taken care of for
> you.  Some architectures can do a better job of it than just a loop,
> but I have hesitated to make too deep a change to architectures I don't
> understand well.
> 
> One thing I have changed in every architecture is that PG_arch_1 is now a
> per-folio bit instead of a per-page bit.  This was something that would
> have to happen eventually, and it makes sense to do it now rather than
> iterate over every page involved in a cache flush and figure out if it
> needs to happen.
> 
> The point of all this is better performance, and Fengwei Yin has
> measured improvement on x86.  I suspect you'll see improvement on
> your architecture too.  Try the new will-it-scale test mentioned here:
> https://lore.kernel.org/linux-mm/20230206140639.538867-5-fengwei.yin@intel.com/
> You'll need to run it on an XFS filesystem and have
> CONFIG_TRANSPARENT_HUGEPAGE set.
> 
> For testing, I've only run the code on x86.  If an x86->foo compiler
> exists in Debian, I've built defconfig.  I'm relying on the buildbots
> to tell me what I missed, and people who actually have the hardware to
> tell me if it actually works.
> 
> I'd like to get this into the MM tree soon after the current merge window
> closes, so quick feedback would be appreciated.

I've boot-tested the series (with the Yin's typo fix for patch 32) on arm64 FVP
and Ampere Altra. On the Altra, I also ran page_fault4 from will-it-scale, and
see ~35% improvement from this series. So:

Tested-by: Ryan Roberts <ryan.roberts@arm.com>

Thanks,
Ryan