diff mbox series

[v2,20/46] hugetlb: add HGM support to follow_hugetlb_page

Message ID	20230218002819.1486479-21-jthoughton@google.com
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Date: Sat, 18 Feb 2023 00:27:53 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> Message-ID: <20230218002819.1486479-21-jthoughton@google.com> Subject: [PATCH v2 20/46] hugetlb: add HGM support to follow_hugetlb_page From: James Houghton <jthoughton@google.com> To: Mike Kravetz <mike.kravetz@oracle.com>, Muchun Song <songmuchun@bytedance.com>, Peter Xu <peterx@redhat.com>, Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@redhat.com>, David Rientjes <rientjes@google.com>, Axel Rasmussen <axelrasmussen@google.com>, Mina Almasry <almasrymina@google.com>, "Zach O'Keefe" <zokeefe@google.com>, Manish Mishra <manish.mishra@nutanix.com>, Naoya Horiguchi <naoya.horiguchi@nec.com>, "Dr . David Alan Gilbert" <dgilbert@redhat.com>, "Matthew Wilcox (Oracle)" <willy@infradead.org>, Vlastimil Babka <vbabka@suse.cz>, Baolin Wang <baolin.wang@linux.alibaba.com>, Miaohe Lin <linmiaohe@huawei.com>, Yang Shi <shy828301@gmail.com>, Frank van der Linden <fvdl@google.com>, Jiaqi Yan <jiaqiyan@google.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton <jthoughton@google.com> Content-Type: text/plain; charset="UTF-8" Precedence: bulk
Series	hugetlb: introduce HugeTLB high-granularity mapping \| [v2,00/46] hugetlb: introduce HugeTLB high-granularity mapping [v2,01/46] hugetlb: don't set PageUptodate for UFFDIO_CONTINUE [v2,02/46] hugetlb: remove mk_huge_pte; it is unused [v2,03/46] hugetlb: remove redundant pte_mkhuge in migration path [v2,04/46] hugetlb: only adjust address ranges when VMAs want PMD sharing [v2,05/46] rmap: hugetlb: switch from page_dup_file_rmap to page_add_file_rmap [v2,06/46] hugetlb: add CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING [v2,07/46] mm: add VM_HUGETLB_HGM VMA flag [v2,08/46] hugetlb: add HugeTLB HGM enablement helpers [v2,09/46] mm: add MADV_SPLIT to enable HugeTLB HGM [v2,10/46] hugetlb: make huge_pte_lockptr take an explicit shift argument [v2,11/46] hugetlb: add hugetlb_pte to track HugeTLB page table entries [v2,12/46] hugetlb: add hugetlb_alloc_pmd and hugetlb_alloc_pte [v2,13/46] hugetlb: add hugetlb_hgm_walk and hugetlb_walk_step [v2,14/46] hugetlb: split PTE markers when doing HGM walks [v2,15/46] hugetlb: add make_huge_pte_with_shift [v2,16/46] hugetlb: make default arch_make_huge_pte understand small mappings [v2,17/46] hugetlbfs: do a full walk to check if vma maps a page [v2,18/46] hugetlb: add HGM support to __unmap_hugepage_range [v2,19/46] hugetlb: add HGM support to hugetlb_change_protection [v2,20/46] hugetlb: add HGM support to follow_hugetlb_page [v2,21/46] hugetlb: add HGM support to hugetlb_follow_page_mask [v2,22/46] hugetlb: add HGM support to copy_hugetlb_page_range [v2,23/46] hugetlb: add HGM support to move_hugetlb_page_tables [v2,24/46] hugetlb: add HGM support to hugetlb_fault and hugetlb_no_page [v2,25/46] hugetlb: use struct hugetlb_pte for walk_hugetlb_range [v2,26/46] mm: rmap: provide pte_order in page_vma_mapped_walk [v2,27/46] mm: rmap: update try_to_{migrate,unmap} to handle mapcount for HGM [v2,28/46] mm: rmap: in try_to_{migrate,unmap}, check head page for hugetlb page flags [v2,29/46] hugetlb: update page_vma_mapped to do high-granularity walks [v2,30/46] hugetlb: add high-granularity migration support [v2,31/46] hugetlb: sort hstates in hugetlb_init_hstates [v2,32/46] hugetlb: add for_each_hgm_shift [v2,33/46] hugetlb: userfaultfd: add support for high-granularity UFFDIO_CONTINUE [v2,34/46] hugetlb: add MADV_COLLAPSE for hugetlb [v2,35/46] hugetlb: add check to prevent refcount overflow via HGM [v2,36/46] hugetlb: remove huge_pte_lock and huge_pte_lockptr [v2,37/46] hugetlb: replace make_huge_pte with make_huge_pte_with_shift [v2,38/46] mm: smaps: add stats for HugeTLB mapping size [v2,39/46] hugetlb: x86: enable high-granularity mapping for x86_64 [v2,40/46] docs: hugetlb: update hugetlb and userfaultfd admin-guides with HGM info [v2,41/46] docs: proc: include information about HugeTLB HGM [v2,42/46] selftests/mm: add HugeTLB HGM to userfaultfd selftest [v2,43/46] KVM: selftests: add HugeTLB HGM to KVM demand paging selftest [v2,44/46] selftests/mm: add anon and shared hugetlb to migration test [v2,45/46] selftests/mm: add hugetlb HGM test to migration selftest [v2,46/46] selftests/mm: add HGM UFFDIO_CONTINUE and hwpoison tests

Commit Message

James Houghton Feb. 18, 2023, 12:27 a.m. UTC

  Enable high-granularity mapping support in GUP.

In case it is confusing, pfn_offset is the offset (in PAGE_SIZE units)
that vaddr points to within the subpage that hpte points to.

Signed-off-by: James Houghton <jthoughton@google.com>

diff mbox series

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 7321c6602d6f..c26b040f4fb5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6634,11 +6634,9 @@  static void record_subpages_vmas(struct page *page, struct vm_area_struct *vma,
 }
 
 static inline bool __follow_hugetlb_must_fault(struct vm_area_struct *vma,
-					       unsigned int flags, pte_t *pte,
+					       unsigned int flags, pte_t pteval,
 					       bool *unshare)
 {
-	pte_t pteval = huge_ptep_get(pte);
-
 	*unshare = false;
 	if (is_swap_pte(pteval))
 		return true;
@@ -6713,11 +6711,13 @@  long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	int err = -EFAULT, refs;
 
 	while (vaddr < vma->vm_end && remainder) {
-		pte_t *pte;
+		pte_t *ptep, pte;
 		spinlock_t *ptl = NULL;
 		bool unshare = false;
 		int absent;
-		struct page *page;
+		unsigned long pages_per_hpte;
+		struct page *page, *subpage;
+		struct hugetlb_pte hpte;
 
 		/*
 		 * If we have a pending SIGKILL, don't keep faulting pages and
@@ -6734,13 +6734,19 @@  long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		 * each hugepage.  We have to make sure we get the
 		 * first, for the page indexing below to work.
 		 *
-		 * Note that page table lock is not held when pte is null.
+		 * hugetlb_full_walk will mask the address appropriately.
+		 *
+		 * Note that page table lock is not held when ptep is null.
 		 */
-		pte = hugetlb_walk(vma, vaddr & huge_page_mask(h),
-				   huge_page_size(h));
-		if (pte)
-			ptl = huge_pte_lock(h, mm, pte);
-		absent = !pte || huge_pte_none(huge_ptep_get(pte));
+		if (hugetlb_full_walk(&hpte, vma, vaddr)) {
+			ptep = NULL;
+			absent = true;
+		} else {
+			ptl = hugetlb_pte_lock(&hpte);
+			ptep = hpte.ptep;
+			pte = huge_ptep_get(ptep);
+			absent = huge_pte_none(pte);
+		}
 
 		/*
 		 * When coredumping, it suits get_dump_page if we just return
@@ -6751,13 +6757,21 @@  long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		 */
 		if (absent && (flags & FOLL_DUMP) &&
 		    !hugetlbfs_pagecache_present(h, vma, vaddr)) {
-			if (pte)
+			if (ptep)
 				spin_unlock(ptl);
 			hugetlb_vma_unlock_read(vma);
 			remainder = 0;
 			break;
 		}
 
+		if (!absent && pte_present(pte) &&
+				!hugetlb_pte_present_leaf(&hpte, pte)) {
+			/* We raced with someone splitting the PTE, so retry. */
+			spin_unlock(ptl);
+			hugetlb_vma_unlock_read(vma);
+			continue;
+		}
+
 		/*
 		 * We need call hugetlb_fault for both hugepages under migration
 		 * (in which case hugetlb_fault waits for the migration,) and
@@ -6773,7 +6787,7 @@  long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			vm_fault_t ret;
 			unsigned int fault_flags = 0;
 
-			if (pte)
+			if (ptep)
 				spin_unlock(ptl);
 			hugetlb_vma_unlock_read(vma);
 
@@ -6822,8 +6836,10 @@  long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			continue;
 		}
 
-		pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT;
-		page = pte_page(huge_ptep_get(pte));
+		pfn_offset = (vaddr & ~hugetlb_pte_mask(&hpte)) >> PAGE_SHIFT;
+		subpage = pte_page(pte);
+		pages_per_hpte = hugetlb_pte_size(&hpte) / PAGE_SIZE;
+		page = compound_head(subpage);
 
 		VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) &&
 			       !PageAnonExclusive(page), page);
@@ -6833,22 +6849,22 @@  long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		 * and skip the same_page loop below.
 		 */
 		if (!pages && !vmas && !pfn_offset &&
-		    (vaddr + huge_page_size(h) < vma->vm_end) &&
-		    (remainder >= pages_per_huge_page(h))) {
-			vaddr += huge_page_size(h);
-			remainder -= pages_per_huge_page(h);
-			i += pages_per_huge_page(h);
+		    (vaddr + hugetlb_pte_size(&hpte) < vma->vm_end) &&
+		    (remainder >= pages_per_hpte)) {
+			vaddr += hugetlb_pte_size(&hpte);
+			remainder -= pages_per_hpte;
+			i += pages_per_hpte;
 			spin_unlock(ptl);
 			hugetlb_vma_unlock_read(vma);
 			continue;
 		}
 
 		/* vaddr may not be aligned to PAGE_SIZE */
-		refs = min3(pages_per_huge_page(h) - pfn_offset, remainder,
+		refs = min3(pages_per_hpte - pfn_offset, remainder,
 		    (vma->vm_end - ALIGN_DOWN(vaddr, PAGE_SIZE)) >> PAGE_SHIFT);
 
 		if (pages || vmas)
-			record_subpages_vmas(nth_page(page, pfn_offset),
+			record_subpages_vmas(nth_page(subpage, pfn_offset),
 					     vma, refs,
 					     likely(pages) ? pages + i : NULL,
 					     vmas ? vmas + i : NULL);