mm: hugetlb_vmemmap: fix a race between vmemmap pmd split

Message ID 20230707033859.16148-1-songmuchun@bytedance.com
State New
Headers
Series mm: hugetlb_vmemmap: fix a race between vmemmap pmd split |

Commit Message

Muchun Song July 7, 2023, 3:38 a.m. UTC
  The local variable @page in __split_vmemmap_huge_pmd() to obtain a pmd
page without holding page_table_lock may possiblely get the page table
page instead of a huge pmd page.  The effect may be in set_pte_at()
since we may pass an invalid page struct, if set_pte_at() wants to
access the page struct (e.g. CONFIG_PAGE_TABLE_CHECK is enabled), it
may crash the kernel.  So fix it. And inline __split_vmemmap_huge_pmd()
since it only has one user.

Fixes: d8d55f5616cf ("mm: sparsemem: use page table lock to protect kernel pmd operations")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/hugetlb_vmemmap.c | 34 ++++++++++++++--------------------
 1 file changed, 14 insertions(+), 20 deletions(-)
  

Comments

Andrew Morton July 7, 2023, 7:38 p.m. UTC | #1
On Fri,  7 Jul 2023 11:38:59 +0800 Muchun Song <songmuchun@bytedance.com> wrote:

> The local variable @page in __split_vmemmap_huge_pmd() to obtain a pmd
> page without holding page_table_lock may possiblely get the page table
> page instead of a huge pmd page.  The effect may be in set_pte_at()
> since we may pass an invalid page struct, if set_pte_at() wants to
> access the page struct (e.g. CONFIG_PAGE_TABLE_CHECK is enabled), it
> may crash the kernel.  So fix it. And inline __split_vmemmap_huge_pmd()
> since it only has one user.

Is this likely enough to justify a backport?

I'm thinking "add cc:stable and merge into 6.6-rc1", so it hits -stable
after a couple of months of testing.
  
Andrew Morton July 7, 2023, 7:41 p.m. UTC | #2
On Fri,  7 Jul 2023 11:38:59 +0800 Muchun Song <songmuchun@bytedance.com> wrote:

> And inline __split_vmemmap_huge_pmd()
> since it only has one user.

"open code" would be a better term than "inline" in this situation.

If we are to offer this change to -stable then it would be better to do
the open-coding of __split_vmemmap_huge_pmd() in a separate, later
patch.
  
Muchun Song July 8, 2023, 1:40 a.m. UTC | #3
> On Jul 8, 2023, at 03:41, Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> On Fri,  7 Jul 2023 11:38:59 +0800 Muchun Song <songmuchun@bytedance.com> wrote:
> 
>> And inline __split_vmemmap_huge_pmd()
>> since it only has one user.
> 
> "open code" would be a better term than "inline" in this situation.
> 
> If we are to offer this change to -stable then it would be better to do
> the open-coding of __split_vmemmap_huge_pmd() in a separate, later
> patch.
> 

I see. Bug fix is better to "open code" instead of "inline". However, it
is a simpler and cleaner way to fix this bug by using "inline". Because
we must hold init_mm.page_table_lock to get the page table page in __split_vmemmap_huge_pmd(), then it is just a couple of duplicated
code from split_vmemmap_huge_pmd(). Consequently, split_vmemmap_huge_pmd()
is redundant, just remove it. And rename __split_vmemmap_huge_pmd()
to split_vmemmap_huge_pmd(). The result is the same as the "inline" way.
So I keep "inline" to fix this.

Thanks.
  
Muchun Song July 8, 2023, 1:42 a.m. UTC | #4
> On Jul 8, 2023, at 03:38, Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> On Fri,  7 Jul 2023 11:38:59 +0800 Muchun Song <songmuchun@bytedance.com> wrote:
> 
>> The local variable @page in __split_vmemmap_huge_pmd() to obtain a pmd
>> page without holding page_table_lock may possiblely get the page table
>> page instead of a huge pmd page.  The effect may be in set_pte_at()
>> since we may pass an invalid page struct, if set_pte_at() wants to
>> access the page struct (e.g. CONFIG_PAGE_TABLE_CHECK is enabled), it
>> may crash the kernel.  So fix it. And inline __split_vmemmap_huge_pmd()
>> since it only has one user.
> 
> Is this likely enough to justify a backport?
> 
> I'm thinking "add cc:stable and merge into 6.6-rc1", so it hits -stable
> after a couple of months of testing.
> 

Hi Andrew,

It is better to backport it to stable. Could you help me add it?

Thanks.
  
Andrew Morton July 9, 2023, 12:46 a.m. UTC | #5
On Sat, 8 Jul 2023 09:42:57 +0800 Muchun Song <muchun.song@linux.dev> wrote:

> 
> 
> > On Jul 8, 2023, at 03:38, Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > On Fri,  7 Jul 2023 11:38:59 +0800 Muchun Song <songmuchun@bytedance.com> wrote:
> > 
> >> The local variable @page in __split_vmemmap_huge_pmd() to obtain a pmd
> >> page without holding page_table_lock may possiblely get the page table
> >> page instead of a huge pmd page.  The effect may be in set_pte_at()
> >> since we may pass an invalid page struct, if set_pte_at() wants to
> >> access the page struct (e.g. CONFIG_PAGE_TABLE_CHECK is enabled), it
> >> may crash the kernel.  So fix it. And inline __split_vmemmap_huge_pmd()
> >> since it only has one user.
> > 
> > Is this likely enough to justify a backport?
> > 
> > I'm thinking "add cc:stable and merge into 6.6-rc1", so it hits -stable
> > after a couple of months of testing.
> > 
> 
> Hi Andrew,
> 
> It is better to backport it to stable. Could you help me add it?
> 

I have added cc:stable to this and I have staged it for 6.6-rc1.
  

Patch

diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index c2007ef5e9b0..4b9734777f69 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -36,14 +36,22 @@  struct vmemmap_remap_walk {
 	struct list_head	*vmemmap_pages;
 };
 
-static int __split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start)
+static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start)
 {
 	pmd_t __pmd;
 	int i;
 	unsigned long addr = start;
-	struct page *page = pmd_page(*pmd);
-	pte_t *pgtable = pte_alloc_one_kernel(&init_mm);
+	struct page *head;
+	pte_t *pgtable;
+
+	spin_lock(&init_mm.page_table_lock);
+	head = pmd_leaf(*pmd) ? pmd_page(*pmd) : NULL;
+	spin_unlock(&init_mm.page_table_lock);
 
+	if (!head)
+		return 0;
+
+	pgtable = pte_alloc_one_kernel(&init_mm);
 	if (!pgtable)
 		return -ENOMEM;
 
@@ -53,7 +61,7 @@  static int __split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start)
 		pte_t entry, *pte;
 		pgprot_t pgprot = PAGE_KERNEL;
 
-		entry = mk_pte(page + i, pgprot);
+		entry = mk_pte(head + i, pgprot);
 		pte = pte_offset_kernel(&__pmd, addr);
 		set_pte_at(&init_mm, addr, pte, entry);
 	}
@@ -65,8 +73,8 @@  static int __split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start)
 		 * be treated as indepdenent small pages (as they can be freed
 		 * individually).
 		 */
-		if (!PageReserved(page))
-			split_page(page, get_order(PMD_SIZE));
+		if (!PageReserved(head))
+			split_page(head, get_order(PMD_SIZE));
 
 		/* Make pte visible before pmd. See comment in pmd_install(). */
 		smp_wmb();
@@ -80,20 +88,6 @@  static int __split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start)
 	return 0;
 }
 
-static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start)
-{
-	int leaf;
-
-	spin_lock(&init_mm.page_table_lock);
-	leaf = pmd_leaf(*pmd);
-	spin_unlock(&init_mm.page_table_lock);
-
-	if (!leaf)
-		return 0;
-
-	return __split_vmemmap_huge_pmd(pmd, start);
-}
-
 static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr,
 			      unsigned long end,
 			      struct vmemmap_remap_walk *walk)