[v2,14/46] hugetlb: split PTE markers when doing HGM walks

Message ID 20230218002819.1486479-15-jthoughton@google.com
State New
Headers
Series hugetlb: introduce HugeTLB high-granularity mapping |

Commit Message

James Houghton Feb. 18, 2023, 12:27 a.m. UTC
  Fix how UFFDIO_CONTINUE and UFFDIO_WRITEPROTECT interact in these two
ways:
 - UFFDIO_WRITEPROTECT no longer prevents a high-granularity
   UFFDIO_CONTINUE.
 - UFFD-WP PTE markers installed with UFFDIO_WRITEPROTECT will be
   properly propagated when high-granularily UFFDIO_CONTINUEs are
   performed.

Note: UFFDIO_WRITEPROTECT is not yet permitted at PAGE_SIZE granularity.

Signed-off-by: James Houghton <jthoughton@google.com>
  

Comments

kernel test robot Feb. 18, 2023, 7:49 p.m. UTC | #1
Hi James,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on next-20230217]
[cannot apply to kvm/queue shuah-kselftest/next shuah-kselftest/fixes arnd-asm-generic/master linus/master kvm/linux-next v6.2-rc8 v6.2-rc7 v6.2-rc6 v6.2-rc8]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/James-Houghton/hugetlb-don-t-set-PageUptodate-for-UFFDIO_CONTINUE/20230218-083216
patch link:    https://lore.kernel.org/r/20230218002819.1486479-15-jthoughton%40google.com
patch subject: [PATCH v2 14/46] hugetlb: split PTE markers when doing HGM walks
config: powerpc-randconfig-r001-20230217 (https://download.01.org/0day-ci/archive/20230219/202302190304.YdPwtMZS-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project db89896bbbd2251fff457699635acbbedeead27f)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install powerpc cross compiling tool for clang build
        # apt-get install binutils-powerpc-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/55c33d65b06ad109b87a418540fe98f7365185d4
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review James-Houghton/hugetlb-don-t-set-PageUptodate-for-UFFDIO_CONTINUE/20230218-083216
        git checkout 55c33d65b06ad109b87a418540fe98f7365185d4
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=powerpc olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=powerpc SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202302190304.YdPwtMZS-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from include/linux/highmem.h:12:
   In file included from include/linux/hardirq.h:11:
   In file included from arch/powerpc/include/asm/hardirq.h:6:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:640:
   arch/powerpc/include/asm/io-defs.h:47:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(insl, (unsigned long p, void *b, unsigned long c),
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/io.h:637:3: note: expanded from macro 'DEF_PCI_AC_NORET'
                   __do_##name al;                                 \
                   ^~~~~~~~~~~~~~
   <scratch space>:104:1: note: expanded from here
   __do_insl
   ^
   arch/powerpc/include/asm/io.h:579:56: note: expanded from macro '__do_insl'
   #define __do_insl(p, b, n)      readsl((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
                                          ~~~~~~~~~~~~~~~~~~~~~^
   In file included from mm/hugetlb.c:11:
   In file included from include/linux/highmem.h:12:
   In file included from include/linux/hardirq.h:11:
   In file included from arch/powerpc/include/asm/hardirq.h:6:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:640:
   arch/powerpc/include/asm/io-defs.h:49:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(outsb, (unsigned long p, const void *b, unsigned long c),
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/io.h:637:3: note: expanded from macro 'DEF_PCI_AC_NORET'
                   __do_##name al;                                 \
                   ^~~~~~~~~~~~~~
   <scratch space>:106:1: note: expanded from here
   __do_outsb
   ^
   arch/powerpc/include/asm/io.h:580:58: note: expanded from macro '__do_outsb'
   #define __do_outsb(p, b, n)     writesb((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
                                           ~~~~~~~~~~~~~~~~~~~~~^
   In file included from mm/hugetlb.c:11:
   In file included from include/linux/highmem.h:12:
   In file included from include/linux/hardirq.h:11:
   In file included from arch/powerpc/include/asm/hardirq.h:6:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:640:
   arch/powerpc/include/asm/io-defs.h:51:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(outsw, (unsigned long p, const void *b, unsigned long c),
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/io.h:637:3: note: expanded from macro 'DEF_PCI_AC_NORET'
                   __do_##name al;                                 \
                   ^~~~~~~~~~~~~~
   <scratch space>:108:1: note: expanded from here
   __do_outsw
   ^
   arch/powerpc/include/asm/io.h:581:58: note: expanded from macro '__do_outsw'
   #define __do_outsw(p, b, n)     writesw((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
                                           ~~~~~~~~~~~~~~~~~~~~~^
   In file included from mm/hugetlb.c:11:
   In file included from include/linux/highmem.h:12:
   In file included from include/linux/hardirq.h:11:
   In file included from arch/powerpc/include/asm/hardirq.h:6:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:640:
   arch/powerpc/include/asm/io-defs.h:53:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(outsl, (unsigned long p, const void *b, unsigned long c),
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/io.h:637:3: note: expanded from macro 'DEF_PCI_AC_NORET'
                   __do_##name al;                                 \
                   ^~~~~~~~~~~~~~
   <scratch space>:110:1: note: expanded from here
   __do_outsl
   ^
   arch/powerpc/include/asm/io.h:582:58: note: expanded from macro '__do_outsl'
   #define __do_outsl(p, b, n)     writesl((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
                                           ~~~~~~~~~~~~~~~~~~~~~^
   mm/hugetlb.c:653:8: error: call to undeclared function '__pte_alloc_one'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
           new = __pte_alloc_one(mm, GFP_PGTABLE_USER);
                 ^
   mm/hugetlb.c:653:8: note: did you mean 'pte_alloc_one'?
   arch/powerpc/include/asm/pgalloc.h:30:25: note: 'pte_alloc_one' declared here
   static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
                           ^
   mm/hugetlb.c:653:28: error: use of undeclared identifier 'GFP_PGTABLE_USER'
           new = __pte_alloc_one(mm, GFP_PGTABLE_USER);
                                     ^
   mm/hugetlb.c:660:25: error: incompatible pointer types passing 'pgtable_t' (aka 'unsigned long *') to parameter of type 'struct page *' [-Werror,-Wincompatible-pointer-types]
                   pgtable_pte_page_dtor(new);
                                         ^~~
   include/linux/mm.h:2661:55: note: passing argument to parameter 'page' here
   static inline void pgtable_pte_page_dtor(struct page *page)
                                                         ^
   mm/hugetlb.c:661:3: error: incompatible pointer types passing 'pgtable_t' (aka 'unsigned long *') to parameter of type 'struct page *' [-Werror,-Wincompatible-pointer-types]
                   __free_page(new);
                   ^~~~~~~~~~~~~~~~
   include/linux/gfp.h:319:40: note: expanded from macro '__free_page'
   #define __free_page(page) __free_pages((page), 0)
                                          ^~~~~~
   include/linux/gfp.h:302:39: note: passing argument to parameter 'page' here
   extern void __free_pages(struct page *page, unsigned int order);
                                         ^
>> mm/hugetlb.c:666:44: error: incompatible pointer types passing 'pgtable_t' (aka 'unsigned long *') to parameter of type 'const struct page *' [-Werror,-Wincompatible-pointer-types]
                   hugetlb_install_markers_pte(page_address(new), marker);
                                                            ^~~
   include/linux/mm.h:2001:39: note: passing argument to parameter 'page' here
   void *page_address(const struct page *page);
                                         ^
   6 warnings and 5 errors generated.


vim +666 mm/hugetlb.c

   606	
   607	/*
   608	 * hugetlb_alloc_pte -- Allocate a PTE beneath a pmd_none PMD-level hpte.
   609	 *
   610	 * See the comment above hugetlb_alloc_pmd.
   611	 */
   612	pte_t *hugetlb_alloc_pte(struct mm_struct *mm, struct hugetlb_pte *hpte,
   613			unsigned long addr)
   614	{
   615		spinlock_t *ptl = hugetlb_pte_lockptr(hpte);
   616		pgtable_t new;
   617		pmd_t *pmdp;
   618		pmd_t pmd;
   619		bool is_marker;
   620		pte_marker marker;
   621	
   622		if (hpte->level != HUGETLB_LEVEL_PMD)
   623			return ERR_PTR(-EINVAL);
   624	
   625		pmdp = (pmd_t *)hpte->ptep;
   626	retry:
   627		is_marker = false;
   628		pmd = READ_ONCE(*pmdp);
   629		if (likely(pmd_present(pmd)))
   630			return unlikely(pmd_leaf(pmd))
   631				? ERR_PTR(-EEXIST)
   632				: pte_offset_kernel(pmdp, addr);
   633		else if (!pmd_none(pmd)) {
   634			/*
   635			 * Not present and not none means that a swap entry lives here.
   636			 * If it's a PTE marker, we can deal with it. If it's another
   637			 * swap entry, we don't attempt to split it.
   638			 */
   639			is_marker = is_pte_marker(__pte(pmd_val(pmd)));
   640			if (!is_marker)
   641				return ERR_PTR(-EEXIST);
   642	
   643			marker = pte_marker_get(pte_to_swp_entry(__pte(pmd_val(pmd))));
   644		}
   645	
   646		/*
   647		 * With CONFIG_HIGHPTE, calling `pte_alloc_one` directly may result
   648		 * in page tables being allocated in high memory, needing a kmap to
   649		 * access. Instead, we call __pte_alloc_one directly with
   650		 * GFP_PGTABLE_USER to prevent these PTEs being allocated in high
   651		 * memory.
   652		 */
   653		new = __pte_alloc_one(mm, GFP_PGTABLE_USER);
   654		if (!new)
   655			return ERR_PTR(-ENOMEM);
   656	
   657		spin_lock(ptl);
   658		if (!pmd_same(pmd, *pmdp)) {
   659			spin_unlock(ptl);
   660			pgtable_pte_page_dtor(new);
   661			__free_page(new);
   662			goto retry;
   663		}
   664	
   665		if (is_marker)
 > 666			hugetlb_install_markers_pte(page_address(new), marker);
   667	
   668		mm_inc_nr_ptes(mm);
   669		smp_wmb(); /* See comment in pmd_install() */
   670		pmd_populate(mm, pmdp, new);
   671		spin_unlock(ptl);
   672		return pte_offset_kernel(pmdp, addr);
   673	}
   674
  
Mike Kravetz Feb. 28, 2023, 10:48 p.m. UTC | #2
On 02/18/23 00:27, James Houghton wrote:
> Fix how UFFDIO_CONTINUE and UFFDIO_WRITEPROTECT interact in these two
> ways:
>  - UFFDIO_WRITEPROTECT no longer prevents a high-granularity
>    UFFDIO_CONTINUE.
>  - UFFD-WP PTE markers installed with UFFDIO_WRITEPROTECT will be
>    properly propagated when high-granularily UFFDIO_CONTINUEs are
>    performed.
> 
> Note: UFFDIO_WRITEPROTECT is not yet permitted at PAGE_SIZE granularity.
> 
> Signed-off-by: James Houghton <jthoughton@google.com>
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 810c05feb41f..f74183acc521 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c

Seems relatively straight forward,

Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
  

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 810c05feb41f..f74183acc521 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -506,6 +506,30 @@  static bool has_same_uncharge_info(struct file_region *rg,
 #endif
 }
 
+static void hugetlb_install_markers_pmd(pmd_t *pmdp, pte_marker marker)
+{
+	int i;
+
+	for (i = 0; i < PTRS_PER_PMD; ++i)
+		/*
+		 * WRITE_ONCE not needed because the pud hasn't been
+		 * installed yet.
+		 */
+		pmdp[i] = __pmd(pte_val(make_pte_marker(marker)));
+}
+
+static void hugetlb_install_markers_pte(pte_t *ptep, pte_marker marker)
+{
+	int i;
+
+	for (i = 0; i < PTRS_PER_PTE; ++i)
+		/*
+		 * WRITE_ONCE not needed because the pmd hasn't been
+		 * installed yet.
+		 */
+		ptep[i] = make_pte_marker(marker);
+}
+
 /*
  * hugetlb_alloc_pmd -- Allocate or find a PMD beneath a PUD-level hpte.
  *
@@ -528,23 +552,32 @@  pmd_t *hugetlb_alloc_pmd(struct mm_struct *mm, struct hugetlb_pte *hpte,
 	pmd_t *new;
 	pud_t *pudp;
 	pud_t pud;
+	bool is_marker;
+	pte_marker marker;
 
 	if (hpte->level != HUGETLB_LEVEL_PUD)
 		return ERR_PTR(-EINVAL);
 
 	pudp = (pud_t *)hpte->ptep;
 retry:
+	is_marker = false;
 	pud = READ_ONCE(*pudp);
 	if (likely(pud_present(pud)))
 		return unlikely(pud_leaf(pud))
 			? ERR_PTR(-EEXIST)
 			: pmd_offset(pudp, addr);
-	else if (!pud_none(pud))
+	else if (!pud_none(pud)) {
 		/*
-		 * Not present and not none means that a swap entry lives here,
-		 * and we can't get rid of it.
+		 * Not present and not none means that a swap entry lives here.
+		 * If it's a PTE marker, we can deal with it. If it's another
+		 * swap entry, we don't attempt to split it.
 		 */
-		return ERR_PTR(-EEXIST);
+		is_marker = is_pte_marker(__pte(pud_val(pud)));
+		if (!is_marker)
+			return ERR_PTR(-EEXIST);
+
+		marker = pte_marker_get(pte_to_swp_entry(__pte(pud_val(pud))));
+	}
 
 	new = pmd_alloc_one(mm, addr);
 	if (!new)
@@ -557,6 +590,13 @@  pmd_t *hugetlb_alloc_pmd(struct mm_struct *mm, struct hugetlb_pte *hpte,
 		goto retry;
 	}
 
+	/*
+	 * Install markers before PUD to avoid races with other
+	 * page tables walks.
+	 */
+	if (is_marker)
+		hugetlb_install_markers_pmd(new, marker);
+
 	mm_inc_nr_pmds(mm);
 	smp_wmb(); /* See comment in pmd_install() */
 	pud_populate(mm, pudp, new);
@@ -576,23 +616,32 @@  pte_t *hugetlb_alloc_pte(struct mm_struct *mm, struct hugetlb_pte *hpte,
 	pgtable_t new;
 	pmd_t *pmdp;
 	pmd_t pmd;
+	bool is_marker;
+	pte_marker marker;
 
 	if (hpte->level != HUGETLB_LEVEL_PMD)
 		return ERR_PTR(-EINVAL);
 
 	pmdp = (pmd_t *)hpte->ptep;
 retry:
+	is_marker = false;
 	pmd = READ_ONCE(*pmdp);
 	if (likely(pmd_present(pmd)))
 		return unlikely(pmd_leaf(pmd))
 			? ERR_PTR(-EEXIST)
 			: pte_offset_kernel(pmdp, addr);
-	else if (!pmd_none(pmd))
+	else if (!pmd_none(pmd)) {
 		/*
-		 * Not present and not none means that a swap entry lives here,
-		 * and we can't get rid of it.
+		 * Not present and not none means that a swap entry lives here.
+		 * If it's a PTE marker, we can deal with it. If it's another
+		 * swap entry, we don't attempt to split it.
 		 */
-		return ERR_PTR(-EEXIST);
+		is_marker = is_pte_marker(__pte(pmd_val(pmd)));
+		if (!is_marker)
+			return ERR_PTR(-EEXIST);
+
+		marker = pte_marker_get(pte_to_swp_entry(__pte(pmd_val(pmd))));
+	}
 
 	/*
 	 * With CONFIG_HIGHPTE, calling `pte_alloc_one` directly may result
@@ -613,6 +662,9 @@  pte_t *hugetlb_alloc_pte(struct mm_struct *mm, struct hugetlb_pte *hpte,
 		goto retry;
 	}
 
+	if (is_marker)
+		hugetlb_install_markers_pte(page_address(new), marker);
+
 	mm_inc_nr_ptes(mm);
 	smp_wmb(); /* See comment in pmd_install() */
 	pmd_populate(mm, pmdp, new);
@@ -7384,7 +7436,12 @@  static int __hugetlb_hgm_walk(struct mm_struct *mm, struct vm_area_struct *vma,
 		if (!pte_present(pte)) {
 			if (!alloc)
 				return 0;
-			if (unlikely(!huge_pte_none(pte)))
+			/*
+			 * In hugetlb_alloc_pmd and hugetlb_alloc_pte,
+			 * we split PTE markers, so we can tolerate
+			 * PTE markers here.
+			 */
+			if (unlikely(!huge_pte_none_mostly(pte)))
 				return -EEXIST;
 		} else if (hugetlb_pte_present_leaf(hpte, pte))
 			return 0;