[v2,14/46] hugetlb: split PTE markers when doing HGM walks
Commit Message
Fix how UFFDIO_CONTINUE and UFFDIO_WRITEPROTECT interact in these two
ways:
- UFFDIO_WRITEPROTECT no longer prevents a high-granularity
UFFDIO_CONTINUE.
- UFFD-WP PTE markers installed with UFFDIO_WRITEPROTECT will be
properly propagated when high-granularily UFFDIO_CONTINUEs are
performed.
Note: UFFDIO_WRITEPROTECT is not yet permitted at PAGE_SIZE granularity.
Signed-off-by: James Houghton <jthoughton@google.com>
Comments
Hi James,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on next-20230217]
[cannot apply to kvm/queue shuah-kselftest/next shuah-kselftest/fixes arnd-asm-generic/master linus/master kvm/linux-next v6.2-rc8 v6.2-rc7 v6.2-rc6 v6.2-rc8]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/James-Houghton/hugetlb-don-t-set-PageUptodate-for-UFFDIO_CONTINUE/20230218-083216
patch link: https://lore.kernel.org/r/20230218002819.1486479-15-jthoughton%40google.com
patch subject: [PATCH v2 14/46] hugetlb: split PTE markers when doing HGM walks
config: powerpc-randconfig-r001-20230217 (https://download.01.org/0day-ci/archive/20230219/202302190304.YdPwtMZS-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project db89896bbbd2251fff457699635acbbedeead27f)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc cross compiling tool for clang build
# apt-get install binutils-powerpc-linux-gnu
# https://github.com/intel-lab-lkp/linux/commit/55c33d65b06ad109b87a418540fe98f7365185d4
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review James-Houghton/hugetlb-don-t-set-PageUptodate-for-UFFDIO_CONTINUE/20230218-083216
git checkout 55c33d65b06ad109b87a418540fe98f7365185d4
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=powerpc olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=powerpc SHELL=/bin/bash
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202302190304.YdPwtMZS-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from include/linux/highmem.h:12:
In file included from include/linux/hardirq.h:11:
In file included from arch/powerpc/include/asm/hardirq.h:6:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/powerpc/include/asm/io.h:640:
arch/powerpc/include/asm/io-defs.h:47:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
DEF_PCI_AC_NORET(insl, (unsigned long p, void *b, unsigned long c),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/include/asm/io.h:637:3: note: expanded from macro 'DEF_PCI_AC_NORET'
__do_##name al; \
^~~~~~~~~~~~~~
<scratch space>:104:1: note: expanded from here
__do_insl
^
arch/powerpc/include/asm/io.h:579:56: note: expanded from macro '__do_insl'
#define __do_insl(p, b, n) readsl((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
~~~~~~~~~~~~~~~~~~~~~^
In file included from mm/hugetlb.c:11:
In file included from include/linux/highmem.h:12:
In file included from include/linux/hardirq.h:11:
In file included from arch/powerpc/include/asm/hardirq.h:6:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/powerpc/include/asm/io.h:640:
arch/powerpc/include/asm/io-defs.h:49:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
DEF_PCI_AC_NORET(outsb, (unsigned long p, const void *b, unsigned long c),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/include/asm/io.h:637:3: note: expanded from macro 'DEF_PCI_AC_NORET'
__do_##name al; \
^~~~~~~~~~~~~~
<scratch space>:106:1: note: expanded from here
__do_outsb
^
arch/powerpc/include/asm/io.h:580:58: note: expanded from macro '__do_outsb'
#define __do_outsb(p, b, n) writesb((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
~~~~~~~~~~~~~~~~~~~~~^
In file included from mm/hugetlb.c:11:
In file included from include/linux/highmem.h:12:
In file included from include/linux/hardirq.h:11:
In file included from arch/powerpc/include/asm/hardirq.h:6:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/powerpc/include/asm/io.h:640:
arch/powerpc/include/asm/io-defs.h:51:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
DEF_PCI_AC_NORET(outsw, (unsigned long p, const void *b, unsigned long c),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/include/asm/io.h:637:3: note: expanded from macro 'DEF_PCI_AC_NORET'
__do_##name al; \
^~~~~~~~~~~~~~
<scratch space>:108:1: note: expanded from here
__do_outsw
^
arch/powerpc/include/asm/io.h:581:58: note: expanded from macro '__do_outsw'
#define __do_outsw(p, b, n) writesw((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
~~~~~~~~~~~~~~~~~~~~~^
In file included from mm/hugetlb.c:11:
In file included from include/linux/highmem.h:12:
In file included from include/linux/hardirq.h:11:
In file included from arch/powerpc/include/asm/hardirq.h:6:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/powerpc/include/asm/io.h:640:
arch/powerpc/include/asm/io-defs.h:53:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
DEF_PCI_AC_NORET(outsl, (unsigned long p, const void *b, unsigned long c),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/include/asm/io.h:637:3: note: expanded from macro 'DEF_PCI_AC_NORET'
__do_##name al; \
^~~~~~~~~~~~~~
<scratch space>:110:1: note: expanded from here
__do_outsl
^
arch/powerpc/include/asm/io.h:582:58: note: expanded from macro '__do_outsl'
#define __do_outsl(p, b, n) writesl((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
~~~~~~~~~~~~~~~~~~~~~^
mm/hugetlb.c:653:8: error: call to undeclared function '__pte_alloc_one'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
new = __pte_alloc_one(mm, GFP_PGTABLE_USER);
^
mm/hugetlb.c:653:8: note: did you mean 'pte_alloc_one'?
arch/powerpc/include/asm/pgalloc.h:30:25: note: 'pte_alloc_one' declared here
static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
^
mm/hugetlb.c:653:28: error: use of undeclared identifier 'GFP_PGTABLE_USER'
new = __pte_alloc_one(mm, GFP_PGTABLE_USER);
^
mm/hugetlb.c:660:25: error: incompatible pointer types passing 'pgtable_t' (aka 'unsigned long *') to parameter of type 'struct page *' [-Werror,-Wincompatible-pointer-types]
pgtable_pte_page_dtor(new);
^~~
include/linux/mm.h:2661:55: note: passing argument to parameter 'page' here
static inline void pgtable_pte_page_dtor(struct page *page)
^
mm/hugetlb.c:661:3: error: incompatible pointer types passing 'pgtable_t' (aka 'unsigned long *') to parameter of type 'struct page *' [-Werror,-Wincompatible-pointer-types]
__free_page(new);
^~~~~~~~~~~~~~~~
include/linux/gfp.h:319:40: note: expanded from macro '__free_page'
#define __free_page(page) __free_pages((page), 0)
^~~~~~
include/linux/gfp.h:302:39: note: passing argument to parameter 'page' here
extern void __free_pages(struct page *page, unsigned int order);
^
>> mm/hugetlb.c:666:44: error: incompatible pointer types passing 'pgtable_t' (aka 'unsigned long *') to parameter of type 'const struct page *' [-Werror,-Wincompatible-pointer-types]
hugetlb_install_markers_pte(page_address(new), marker);
^~~
include/linux/mm.h:2001:39: note: passing argument to parameter 'page' here
void *page_address(const struct page *page);
^
6 warnings and 5 errors generated.
vim +666 mm/hugetlb.c
606
607 /*
608 * hugetlb_alloc_pte -- Allocate a PTE beneath a pmd_none PMD-level hpte.
609 *
610 * See the comment above hugetlb_alloc_pmd.
611 */
612 pte_t *hugetlb_alloc_pte(struct mm_struct *mm, struct hugetlb_pte *hpte,
613 unsigned long addr)
614 {
615 spinlock_t *ptl = hugetlb_pte_lockptr(hpte);
616 pgtable_t new;
617 pmd_t *pmdp;
618 pmd_t pmd;
619 bool is_marker;
620 pte_marker marker;
621
622 if (hpte->level != HUGETLB_LEVEL_PMD)
623 return ERR_PTR(-EINVAL);
624
625 pmdp = (pmd_t *)hpte->ptep;
626 retry:
627 is_marker = false;
628 pmd = READ_ONCE(*pmdp);
629 if (likely(pmd_present(pmd)))
630 return unlikely(pmd_leaf(pmd))
631 ? ERR_PTR(-EEXIST)
632 : pte_offset_kernel(pmdp, addr);
633 else if (!pmd_none(pmd)) {
634 /*
635 * Not present and not none means that a swap entry lives here.
636 * If it's a PTE marker, we can deal with it. If it's another
637 * swap entry, we don't attempt to split it.
638 */
639 is_marker = is_pte_marker(__pte(pmd_val(pmd)));
640 if (!is_marker)
641 return ERR_PTR(-EEXIST);
642
643 marker = pte_marker_get(pte_to_swp_entry(__pte(pmd_val(pmd))));
644 }
645
646 /*
647 * With CONFIG_HIGHPTE, calling `pte_alloc_one` directly may result
648 * in page tables being allocated in high memory, needing a kmap to
649 * access. Instead, we call __pte_alloc_one directly with
650 * GFP_PGTABLE_USER to prevent these PTEs being allocated in high
651 * memory.
652 */
653 new = __pte_alloc_one(mm, GFP_PGTABLE_USER);
654 if (!new)
655 return ERR_PTR(-ENOMEM);
656
657 spin_lock(ptl);
658 if (!pmd_same(pmd, *pmdp)) {
659 spin_unlock(ptl);
660 pgtable_pte_page_dtor(new);
661 __free_page(new);
662 goto retry;
663 }
664
665 if (is_marker)
> 666 hugetlb_install_markers_pte(page_address(new), marker);
667
668 mm_inc_nr_ptes(mm);
669 smp_wmb(); /* See comment in pmd_install() */
670 pmd_populate(mm, pmdp, new);
671 spin_unlock(ptl);
672 return pte_offset_kernel(pmdp, addr);
673 }
674
On 02/18/23 00:27, James Houghton wrote:
> Fix how UFFDIO_CONTINUE and UFFDIO_WRITEPROTECT interact in these two
> ways:
> - UFFDIO_WRITEPROTECT no longer prevents a high-granularity
> UFFDIO_CONTINUE.
> - UFFD-WP PTE markers installed with UFFDIO_WRITEPROTECT will be
> properly propagated when high-granularily UFFDIO_CONTINUEs are
> performed.
>
> Note: UFFDIO_WRITEPROTECT is not yet permitted at PAGE_SIZE granularity.
>
> Signed-off-by: James Houghton <jthoughton@google.com>
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 810c05feb41f..f74183acc521 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
Seems relatively straight forward,
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
@@ -506,6 +506,30 @@ static bool has_same_uncharge_info(struct file_region *rg,
#endif
}
+static void hugetlb_install_markers_pmd(pmd_t *pmdp, pte_marker marker)
+{
+ int i;
+
+ for (i = 0; i < PTRS_PER_PMD; ++i)
+ /*
+ * WRITE_ONCE not needed because the pud hasn't been
+ * installed yet.
+ */
+ pmdp[i] = __pmd(pte_val(make_pte_marker(marker)));
+}
+
+static void hugetlb_install_markers_pte(pte_t *ptep, pte_marker marker)
+{
+ int i;
+
+ for (i = 0; i < PTRS_PER_PTE; ++i)
+ /*
+ * WRITE_ONCE not needed because the pmd hasn't been
+ * installed yet.
+ */
+ ptep[i] = make_pte_marker(marker);
+}
+
/*
* hugetlb_alloc_pmd -- Allocate or find a PMD beneath a PUD-level hpte.
*
@@ -528,23 +552,32 @@ pmd_t *hugetlb_alloc_pmd(struct mm_struct *mm, struct hugetlb_pte *hpte,
pmd_t *new;
pud_t *pudp;
pud_t pud;
+ bool is_marker;
+ pte_marker marker;
if (hpte->level != HUGETLB_LEVEL_PUD)
return ERR_PTR(-EINVAL);
pudp = (pud_t *)hpte->ptep;
retry:
+ is_marker = false;
pud = READ_ONCE(*pudp);
if (likely(pud_present(pud)))
return unlikely(pud_leaf(pud))
? ERR_PTR(-EEXIST)
: pmd_offset(pudp, addr);
- else if (!pud_none(pud))
+ else if (!pud_none(pud)) {
/*
- * Not present and not none means that a swap entry lives here,
- * and we can't get rid of it.
+ * Not present and not none means that a swap entry lives here.
+ * If it's a PTE marker, we can deal with it. If it's another
+ * swap entry, we don't attempt to split it.
*/
- return ERR_PTR(-EEXIST);
+ is_marker = is_pte_marker(__pte(pud_val(pud)));
+ if (!is_marker)
+ return ERR_PTR(-EEXIST);
+
+ marker = pte_marker_get(pte_to_swp_entry(__pte(pud_val(pud))));
+ }
new = pmd_alloc_one(mm, addr);
if (!new)
@@ -557,6 +590,13 @@ pmd_t *hugetlb_alloc_pmd(struct mm_struct *mm, struct hugetlb_pte *hpte,
goto retry;
}
+ /*
+ * Install markers before PUD to avoid races with other
+ * page tables walks.
+ */
+ if (is_marker)
+ hugetlb_install_markers_pmd(new, marker);
+
mm_inc_nr_pmds(mm);
smp_wmb(); /* See comment in pmd_install() */
pud_populate(mm, pudp, new);
@@ -576,23 +616,32 @@ pte_t *hugetlb_alloc_pte(struct mm_struct *mm, struct hugetlb_pte *hpte,
pgtable_t new;
pmd_t *pmdp;
pmd_t pmd;
+ bool is_marker;
+ pte_marker marker;
if (hpte->level != HUGETLB_LEVEL_PMD)
return ERR_PTR(-EINVAL);
pmdp = (pmd_t *)hpte->ptep;
retry:
+ is_marker = false;
pmd = READ_ONCE(*pmdp);
if (likely(pmd_present(pmd)))
return unlikely(pmd_leaf(pmd))
? ERR_PTR(-EEXIST)
: pte_offset_kernel(pmdp, addr);
- else if (!pmd_none(pmd))
+ else if (!pmd_none(pmd)) {
/*
- * Not present and not none means that a swap entry lives here,
- * and we can't get rid of it.
+ * Not present and not none means that a swap entry lives here.
+ * If it's a PTE marker, we can deal with it. If it's another
+ * swap entry, we don't attempt to split it.
*/
- return ERR_PTR(-EEXIST);
+ is_marker = is_pte_marker(__pte(pmd_val(pmd)));
+ if (!is_marker)
+ return ERR_PTR(-EEXIST);
+
+ marker = pte_marker_get(pte_to_swp_entry(__pte(pmd_val(pmd))));
+ }
/*
* With CONFIG_HIGHPTE, calling `pte_alloc_one` directly may result
@@ -613,6 +662,9 @@ pte_t *hugetlb_alloc_pte(struct mm_struct *mm, struct hugetlb_pte *hpte,
goto retry;
}
+ if (is_marker)
+ hugetlb_install_markers_pte(page_address(new), marker);
+
mm_inc_nr_ptes(mm);
smp_wmb(); /* See comment in pmd_install() */
pmd_populate(mm, pmdp, new);
@@ -7384,7 +7436,12 @@ static int __hugetlb_hgm_walk(struct mm_struct *mm, struct vm_area_struct *vma,
if (!pte_present(pte)) {
if (!alloc)
return 0;
- if (unlikely(!huge_pte_none(pte)))
+ /*
+ * In hugetlb_alloc_pmd and hugetlb_alloc_pte,
+ * we split PTE markers, so we can tolerate
+ * PTE markers here.
+ */
+ if (unlikely(!huge_pte_none_mostly(pte)))
return -EEXIST;
} else if (hugetlb_pte_present_leaf(hpte, pte))
return 0;