[0/3] mm/uffd: Fix missing markers on hugetlb

Message ID 20230104225207.1066932-1-peterx@redhat.com
Headers
Series mm/uffd: Fix missing markers on hugetlb |

Message

Peter Xu Jan. 4, 2023, 10:52 p.m. UTC
  When James was developing the vma split fix for hugetlb pmd sharing, he
found that hugetlb uffd-wp is broken with the test case he developed [1]:

https://lore.kernel.org/r/CADrL8HWSym93=yNpTUdWebOEzUOTR2ffbfUk04XdK6O+PNJNoA@mail.gmail.com

Missing hugetlb pgtable pages caused uffd-wp to lose message when vma split
happens to be across a shared huge pmd range in the test.

The issue is pgtable pre-allocation on hugetlb path was overlooked.  That
was fixed in patch 1.

Meanwhile there's another issue on proper reporting of pgtable allocation
failures during UFFDIO_WRITEPROTECT.  When pgtable allocation failed during
the ioctl(UFFDIO_WRITEPROTECT), we will silent the error so the user cannot
detect it (even if extremely rare).  This issue can happen not only on
hugetlb but also shmem.  Anon is not affected because anon doesn't require
pgtable allocation during wr-protection.  Patch 2 prepares for such a
change, then patch 3 allows the error to be reported to the users.

This set only marks patch 1 to copy stable, because it's a real bug to be
fixed for all kernels 5.19+.

Patch 2-3 will be an enhancement to process pgtable allocation errors, it
should hardly be hit even during heavy workloads in the past of my tests,
but it should make the interface clearer.  Not copying stable for patch 2-3
due to that.  I'll prepare a man page update after patch 2-3 lands.

Tested with:

  - James's reproducer above [1] so it'll start to pass with the vma split
    fix:
    https://lore.kernel.org/r/20230101230042.244286-1-jthoughton@google.com
  - Faked memory pressures to make sure -ENOMEM returned with either shmem
    and hugetlbfs
  - Some uffd general routines

Peter Xu (3):
  mm/hugetlb: Pre-allocate pgtable pages for uffd wr-protects
  mm/mprotect: Use long for page accountings and retval
  mm/uffd: Detect pgtable allocation failures

 include/linux/hugetlb.h       |  4 +-
 include/linux/mm.h            |  2 +-
 include/linux/userfaultfd_k.h |  2 +-
 mm/hugetlb.c                  | 21 +++++++--
 mm/mempolicy.c                |  4 +-
 mm/mprotect.c                 | 89 ++++++++++++++++++++++-------------
 mm/userfaultfd.c              | 16 +++++--
 7 files changed, 88 insertions(+), 50 deletions(-)
  

Comments

David Hildenbrand Jan. 5, 2023, 8:16 a.m. UTC | #1
On 04.01.23 23:52, Peter Xu wrote:
> When James was developing the vma split fix for hugetlb pmd sharing, he
> found that hugetlb uffd-wp is broken with the test case he developed [1]:
> 
> https://lore.kernel.org/r/CADrL8HWSym93=yNpTUdWebOEzUOTR2ffbfUk04XdK6O+PNJNoA@mail.gmail.com
> 
> Missing hugetlb pgtable pages caused uffd-wp to lose message when vma split
> happens to be across a shared huge pmd range in the test.
> 
> The issue is pgtable pre-allocation on hugetlb path was overlooked.  That
> was fixed in patch 1.

Nice timing, I stumbled over that while adjusting background snapshot 
code in QEMU and wondered why we are not allocating page tables in that 
case -- and wanted to ask you why :)