[15/31] mm/userfaultfd: allow pte_offset_map_lock() to fail

Message ID 49d92b15-3442-4e84-39bd-c77c316bf844@google.com
State New
Headers
Series mm: allow pte_offset_map[_lock]() to fail |

Commit Message

Hugh Dickins May 22, 2023, 5:07 a.m. UTC
  mfill_atomic_install_pte() and mfill_atomic_pte_zeropage() treat
failed pte_offset_map_lock() as -EFAULT, with no attempt to retry.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/userfaultfd.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)
  

Comments

Peter Xu May 24, 2023, 10:44 p.m. UTC | #1
Hi, Hugh,

On Sun, May 21, 2023 at 10:07:35PM -0700, Hugh Dickins wrote:
> mfill_atomic_install_pte() and mfill_atomic_pte_zeropage() treat
> failed pte_offset_map_lock() as -EFAULT, with no attempt to retry.

Could you help explain why it should be -EFAULT, not -EAGAIN or -EEXIST?

IIUC right now if pte existed we have -EEXIST returned as part of the
userfault ABI, no matter whether it's pte or thp.

IMHO it may boil down to my limited knowledge on how pte_offset_map_lock()
is used after this part 2 series, and I assume the core changes will be in
your 3rd series (besides this one and the arch one).

Please shed some light if there's quick answers (IIUC this is for speeding
up collapsing shmem thps, but still no much clue here), or I can also wait
for reading the 3rd part if it'll come soon in any form.

Thanks,
  
Hugh Dickins May 25, 2023, 10:06 p.m. UTC | #2
On Wed, 24 May 2023, Peter Xu wrote:
> On Sun, May 21, 2023 at 10:07:35PM -0700, Hugh Dickins wrote:
> > mfill_atomic_install_pte() and mfill_atomic_pte_zeropage() treat
> > failed pte_offset_map_lock() as -EFAULT, with no attempt to retry.
> 
> Could you help explain why it should be -EFAULT, not -EAGAIN or -EEXIST?

Thanks a lot for looking, Peter.

No good justification for -EFAULT: I just grabbed the closest, fairly
neutral, error code that I could see already being in use there: but now
that you mention -EAGAIN, which I can see being used from mfill_atomic(),
yes, that would be ideal - and consistent with how it's already being used.

I'll make that change, thanks for suggesting.  (And it had bugged me how
my fs/userfaultfd.c was electing to retry, but this one electing to fail.)

> 
> IIUC right now if pte existed we have -EEXIST returned as part of the
> userfault ABI, no matter whether it's pte or thp.

It might or might not correspond to -EEXIST - it might even end up as
-EFAULT on a retry after -EAGAIN: I see mfill_atomic() contains both
-EEXIST and -EFAULT cases for pmd_trans_huge().  Actually, I could
say that the -EFAULT case there corresponds to the -EFAULT in this
15/31 patch, but that would be by coincidence not design: I'm happier
with your -EAGAIN suggestion.

> 
> IMHO it may boil down to my limited knowledge on how pte_offset_map_lock()
> is used after this part 2 series, and I assume the core changes will be in
> your 3rd series (besides this one and the arch one).
> 
> Please shed some light if there's quick answers (IIUC this is for speeding
> up collapsing shmem thps, but still no much clue here), or I can also wait
> for reading the 3rd part if it'll come soon in any form.

It wouldn't be particularly easy to deduce from the third series of
patches, rather submerged in implementation details.  Just keep in mind
that, like in the "old" pmd_trans_unstable() cases, there may be instants
at which, when trying to get the lock on a page table, that page table
might already have gone, or been replaced by something else e.g. a THP,
and a retry necessary at the outer level (if it's important to persist).

Hugh
  
Peter Xu May 26, 2023, 4:25 p.m. UTC | #3
On Thu, May 25, 2023 at 03:06:27PM -0700, Hugh Dickins wrote:
> On Wed, 24 May 2023, Peter Xu wrote:
> > On Sun, May 21, 2023 at 10:07:35PM -0700, Hugh Dickins wrote:
> > > mfill_atomic_install_pte() and mfill_atomic_pte_zeropage() treat
> > > failed pte_offset_map_lock() as -EFAULT, with no attempt to retry.
> > 
> > Could you help explain why it should be -EFAULT, not -EAGAIN or -EEXIST?
> 
> Thanks a lot for looking, Peter.
> 
> No good justification for -EFAULT: I just grabbed the closest, fairly
> neutral, error code that I could see already being in use there: but now
> that you mention -EAGAIN, which I can see being used from mfill_atomic(),
> yes, that would be ideal - and consistent with how it's already being used.
> 
> I'll make that change, thanks for suggesting.  (And it had bugged me how
> my fs/userfaultfd.c was electing to retry, but this one electing to fail.)

Thanks.

> 
> > 
> > IIUC right now if pte existed we have -EEXIST returned as part of the
> > userfault ABI, no matter whether it's pte or thp.
> 
> It might or might not correspond to -EEXIST - it might even end up as
> -EFAULT on a retry after -EAGAIN: I see mfill_atomic() contains both
> -EEXIST and -EFAULT cases for pmd_trans_huge().  Actually, I could
> say that the -EFAULT case there corresponds to the -EFAULT in this
> 15/31 patch, but that would be by coincidence not design: I'm happier
> with your -EAGAIN suggestion.

I had a feeling that that 2nd -EFAULT there could crash some userapp
already if it got returned somewhere, because the userapp shouldn't expect
that.  IMHO it should also return -EAGAIN, or even -EEXIST because even if
user retries, we should highly possibly see that thp again, so the -EEXIST
should possibly follow anyway.

Not a big deal here I think - if an userapp can trigger that -EFAULT I'd
say it's also a user bug because it made two decisions already on resolving
page fault for single VA, and it's racy between them..

> 
> > 
> > IMHO it may boil down to my limited knowledge on how pte_offset_map_lock()
> > is used after this part 2 series, and I assume the core changes will be in
> > your 3rd series (besides this one and the arch one).
> > 
> > Please shed some light if there's quick answers (IIUC this is for speeding
> > up collapsing shmem thps, but still no much clue here), or I can also wait
> > for reading the 3rd part if it'll come soon in any form.
> 
> It wouldn't be particularly easy to deduce from the third series of
> patches, rather submerged in implementation details.  Just keep in mind
> that, like in the "old" pmd_trans_unstable() cases, there may be instants
> at which, when trying to get the lock on a page table, that page table
> might already have gone, or been replaced by something else e.g. a THP,
> and a retry necessary at the outer level (if it's important to persist).

I'm actually still curious how the 3rd series will look like; would love to
read it when it comes.

Thanks,
  

Patch

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index e97a0b4889fc..b1554286a31c 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -76,14 +76,16 @@  int mfill_atomic_install_pte(pmd_t *dst_pmd,
 	if (flags & MFILL_ATOMIC_WP)
 		_dst_pte = pte_mkuffd_wp(_dst_pte);
 
+	ret = -EFAULT;
 	dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
+	if (!dst_pte)
+		goto out;
 
 	if (vma_is_shmem(dst_vma)) {
 		/* serialize against truncate with the page table lock */
 		inode = dst_vma->vm_file->f_inode;
 		offset = linear_page_index(dst_vma, dst_addr);
 		max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
-		ret = -EFAULT;
 		if (unlikely(offset >= max_off))
 			goto out_unlock;
 	}
@@ -121,6 +123,7 @@  int mfill_atomic_install_pte(pmd_t *dst_pmd,
 	ret = 0;
 out_unlock:
 	pte_unmap_unlock(dst_pte, ptl);
+out:
 	return ret;
 }
 
@@ -212,13 +215,15 @@  static int mfill_atomic_pte_zeropage(pmd_t *dst_pmd,
 
 	_dst_pte = pte_mkspecial(pfn_pte(my_zero_pfn(dst_addr),
 					 dst_vma->vm_page_prot));
+	ret = -EFAULT;
 	dst_pte = pte_offset_map_lock(dst_vma->vm_mm, dst_pmd, dst_addr, &ptl);
+	if (!dst_pte)
+		goto out;
 	if (dst_vma->vm_file) {
 		/* the shmem MAP_PRIVATE case requires checking the i_size */
 		inode = dst_vma->vm_file->f_inode;
 		offset = linear_page_index(dst_vma, dst_addr);
 		max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
-		ret = -EFAULT;
 		if (unlikely(offset >= max_off))
 			goto out_unlock;
 	}
@@ -231,6 +236,7 @@  static int mfill_atomic_pte_zeropage(pmd_t *dst_pmd,
 	ret = 0;
 out_unlock:
 	pte_unmap_unlock(dst_pte, ptl);
+out:
 	return ret;
 }