diff mbox series

[RFC,08/10] mm/hugetlb: Make follow_hugetlb_page RCU-safe

Message ID	20221030213032.335589-1-peterx@redhat.com
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; From: Peter Xu <peterx@redhat.com> To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Andrew Morton <akpm@linux-foundation.org>, Miaohe Lin <linmiaohe@huawei.com>, Muchun Song <songmuchun@bytedance.com>, Rik van Riel <riel@surriel.com>, James Houghton <jthoughton@google.com>, Nadav Amit <nadav.amit@gmail.com>, Mike Kravetz <mike.kravetz@oracle.com>, David Hildenbrand <david@redhat.com>, Andrea Arcangeli <aarcange@redhat.com>, peterx@redhat.com Subject: [PATCH RFC 08/10] mm/hugetlb: Make follow_hugetlb_page RCU-safe Date: Sun, 30 Oct 2022 17:30:32 -0400 Message-Id: <20221030213032.335589-1-peterx@redhat.com> In-Reply-To: <20221030212929.335473-1-peterx@redhat.com> References: <20221030212929.335473-1-peterx@redhat.com> MIME-Version: 1.0 Content-type: text/plain Content-Transfer-Encoding: 8bit Precedence: bulk
Series	mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare \| [RFC,00/10] mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare [RFC,01/10] mm/hugetlb: Let vma_offset_start() to return start [RFC,02/10] mm/hugetlb: Comment huge_pte_offset() for its locking requirements [RFC,03/10] mm/hugetlb: Make hugetlb_vma_maps_page() RCU-safe [RFC,04/10] mm/hugetlb: Make userfaultfd_huge_must_wait() RCU-safe [RFC,05/10] mm/hugetlb: Make walk_hugetlb_range() RCU-safe [RFC,06/10] mm/hugetlb: Make page_vma_mapped_walk() RCU-safe [RFC,07/10] mm/hugetlb: Make hugetlb_follow_page_mask() RCU-safe [RFC,08/10] mm/hugetlb: Make follow_hugetlb_page RCU-safe [RFC,09/10] mm/hugetlb: Make hugetlb_fault() RCU-safe [RFC,10/10] mm/hugetlb: Comment at rest huge_pte_offset() places

Commit Message

Peter Xu Oct. 30, 2022, 9:30 p.m. UTC

  RCU makes sure the pte_t* won't go away from under us.  Please refer to the
comment above huge_pte_offset() for more information.

Some small trick is used to release RCU slightly earlier, but that should
be safe and just cleaner (with rich comment in code).

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/hugetlb.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff mbox series

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 85214095fb85..5dc87e4e6780 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6300,6 +6300,9 @@  long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			break;
 		}
 
+		/* For huge_pte_offset() */
+		rcu_read_lock();
+
 		/*
 		 * Some archs (sparc64, sh*) have multiple pte_ts to
 		 * each hugepage.  We have to make sure we get the
@@ -6324,6 +6327,7 @@  long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		    !hugetlbfs_pagecache_present(h, vma, vaddr)) {
 			if (pte)
 				spin_unlock(ptl);
+			rcu_read_unlock();
 			remainder = 0;
 			break;
 		}
@@ -6345,6 +6349,8 @@  long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 
 			if (pte)
 				spin_unlock(ptl);
+			rcu_read_unlock();
+
 			if (flags & FOLL_WRITE)
 				fault_flags |= FAULT_FLAG_WRITE;
 			else if (unshare)
@@ -6387,6 +6393,19 @@  long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			continue;
 		}
 
+		/*
+		 * When reach here, it means the pteval is not absent, so
+		 * anyone who wants to free and invalidate the pgtable page
+		 * (aka, pte*) should need to first unmap the entries which
+		 * relies on the pgtable lock.  Since we're holding it,
+		 * we're safe even without RCU anymore.
+		 *
+		 * We can also just release RCU after each unlock of
+		 * pgtable below, but this is just much cleaner, and also
+		 * smaller critical section.
+		 */
+		rcu_read_unlock();
+
 		pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT;
 		page = pte_page(huge_ptep_get(pte));