mm/gup: Fix follow_devmap_p[mu]d() on page==NULL handling

Message ID 20231123180222.1048297-1-peterx@redhat.com
State New
Headers
Series mm/gup: Fix follow_devmap_p[mu]d() on page==NULL handling |

Commit Message

Peter Xu Nov. 23, 2023, 6:02 p.m. UTC
  This is a bug found not by any report but only by code observations.

When GUP sees a devpmd/devpud and if page==NULL is returned, it means a
fault is probably required.  Here falling through when page==NULL can cause
unexpected behavior.

Fix both cases by catching the page==NULL cases with no_page_table().

Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Fixes: 080dbb618b4b ("mm/follow_page_mask: split follow_page_mask to smaller functions.")
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/gup.c | 2 ++
 1 file changed, 2 insertions(+)
  

Comments

Andrew Morton Nov. 24, 2023, 7:20 p.m. UTC | #1
On Thu, 23 Nov 2023 13:02:22 -0500 Peter Xu <peterx@redhat.com> wrote:

> This is a bug found not by any report but only by code observations.
> 
> When GUP sees a devpmd/devpud and if page==NULL is returned, it means a
> fault is probably required.  Here falling through when page==NULL can cause
> unexpected behavior.
> 

Well this is worrisome.  We aren't able to construct a test case to
demonstrate this bug?  Why is that?  Is it perhaps just dead code?
  
Peter Xu Nov. 26, 2023, 9:55 p.m. UTC | #2
On Fri, Nov 24, 2023 at 11:20:59AM -0800, Andrew Morton wrote:
> On Thu, 23 Nov 2023 13:02:22 -0500 Peter Xu <peterx@redhat.com> wrote:
> 
> > This is a bug found not by any report but only by code observations.
> > 
> > When GUP sees a devpmd/devpud and if page==NULL is returned, it means a
> > fault is probably required.  Here falling through when page==NULL can cause
> > unexpected behavior.
> > 
> 
> Well this is worrisome.  We aren't able to construct a test case to
> demonstrate this bug?  Why is that?  Is it perhaps just dead code?

IIUC it's not dead code. Take the example of follow_devmap_pmd(), it can
return page==NULL at least when seeing write bit missing:

	if (flags & FOLL_WRITE && !pmd_write(*pmd))
		return NULL;

AFAICT it can happen if someone does "echo 4 > /proc/$PID/clear_refs" when
the mm contains the devmap pmd.  Same to pud.

It'll be nice if someone that works with dax would like to verify it.  In
my series (refactor hugetlb gup, part 2) IIUC some hugetlb selftest can
start to trigger this path, but I'll need to check.  So far it's dax-only.

Thanks,
  
David Hildenbrand Nov. 27, 2023, 4:22 p.m. UTC | #3
On 26.11.23 22:55, Peter Xu wrote:
> On Fri, Nov 24, 2023 at 11:20:59AM -0800, Andrew Morton wrote:
>> On Thu, 23 Nov 2023 13:02:22 -0500 Peter Xu <peterx@redhat.com> wrote:
>>
>>> This is a bug found not by any report but only by code observations.
>>>
>>> When GUP sees a devpmd/devpud and if page==NULL is returned, it means a
>>> fault is probably required.  Here falling through when page==NULL can cause
>>> unexpected behavior.
>>>
>>
>> Well this is worrisome.  We aren't able to construct a test case to
>> demonstrate this bug?  Why is that?  Is it perhaps just dead code?
> 
> IIUC it's not dead code. Take the example of follow_devmap_pmd(), it can
> return page==NULL at least when seeing write bit missing:
> 
> 	if (flags & FOLL_WRITE && !pmd_write(*pmd))
> 		return NULL;
> 
> AFAICT it can happen if someone does "echo 4 > /proc/$PID/clear_refs" when
> the mm contains the devmap pmd.  Same to pud.
> 
> It'll be nice if someone that works with dax would like to verify it.  In
> my series (refactor hugetlb gup, part 2) IIUC some hugetlb selftest can
> start to trigger this path, but I'll need to check.  So far it's dax-only.

It certainly looks weird to continue there. Triggering it by mmaping 
some devdax device might be possible (e.g., using devdax emulation).

We know the PMD is present and the PMD is devmap. We take the pmd lock, 
and in follow_devmap_pmd() we recheck both.

I suspect the original idea was: if it's suddenly no longer present or 
no longer devmap, it was replaced by a PTE table. So we know a deeper 
level is there and can simply continue instead of triggering a fault.

But that does not seem to be the case, because I suspect the PMD could 
have been zapped (MADV_DONTNEED?) in the meantime, and the "writability" 
check is similarly weird.

So I assume the patch from Peter is ok: even if the PMD got replaced by 
a PTE table, we'd trigger a fault and simply retry.

Acked-by: David Hildenbrand <david@redhat.com>
  

Patch

diff --git a/mm/gup.c b/mm/gup.c
index 231711efa390..0a5f0e91bfec 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -710,6 +710,7 @@  static struct page *follow_pmd_mask(struct vm_area_struct *vma,
 		spin_unlock(ptl);
 		if (page)
 			return page;
+		return no_page_table(vma, flags);
 	}
 	if (likely(!pmd_trans_huge(pmdval)))
 		return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap);
@@ -758,6 +759,7 @@  static struct page *follow_pud_mask(struct vm_area_struct *vma,
 		spin_unlock(ptl);
 		if (page)
 			return page;
+		return no_page_table(vma, flags);
 	}
 	if (unlikely(pud_bad(*pud)))
 		return no_page_table(vma, flags);