[v3,4/7] mm/gup: introduce the FOLL_SAME_FILE GUP flag

Message ID 2feedd2bad6fd1ec4bc4639f9d9012c5ae2faf1f.1681558407.git.lstoakes@gmail.com
State New
Headers
Series remove the vmas parameter from GUP APIs |

Commit Message

Lorenzo Stoakes April 15, 2023, 12:09 p.m. UTC
  This flag causes GUP to assert that all VMAs within the input range possess
the same vma->vm_file. If not, the operation fails.

This is part of a patch series which eliminates the vmas parameter from the
GUP API, implementing the one remaining assertion within the entire kernel
that requires access to the VMAs associated with a GUP range.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
---
 include/linux/mm_types.h |  2 ++
 mm/gup.c                 | 16 ++++++++++++----
 2 files changed, 14 insertions(+), 4 deletions(-)
  

Comments

David Hildenbrand April 17, 2023, 11:13 a.m. UTC | #1
On 15.04.23 14:09, Lorenzo Stoakes wrote:
> This flag causes GUP to assert that all VMAs within the input range possess
> the same vma->vm_file. If not, the operation fails.
> 
> This is part of a patch series which eliminates the vmas parameter from the
> GUP API, implementing the one remaining assertion within the entire kernel
> that requires access to the VMAs associated with a GUP range.
> 
> Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>

[...]

> ---
>   						&start, &nr_pages, i,
> @@ -1595,7 +1603,7 @@ long faultin_vma_page_range(struct vm_area_struct *vma, unsigned long start,
>   	 * We want to report -EINVAL instead of -EFAULT for any permission
>   	 * problems or incompatible mappings.
>   	 */
> -	if (check_vma_flags(vma, gup_flags))
> +	if (check_vma_flags(vma, vma->vm_file, gup_flags))
>   		return -EINVAL;

FOLL_SAME_FILE is never set here, just pass NULL instead of vma->vm_file.


As we're not allowing to drop the mmap lock, why can't io_uring simply 
go over all VMAs once, after pinning succeeded, and make sure that the 
files match (or even before pinning)?

In most cases, we're dealing with a single VMA only, it's not like the 
common case is that io_uring pins accross 100s of VMAs.

So I really wonder if the GUP complexity is justified by something. 
(removing the VMAs is certainly a welcome surprise -- as it doesn't make 
any sense when used with FOLL_UNLOCKABLE).
  
Lorenzo Stoakes April 17, 2023, 11:27 a.m. UTC | #2
On Mon, Apr 17, 2023 at 01:13:58PM +0200, David Hildenbrand wrote:
> On 15.04.23 14:09, Lorenzo Stoakes wrote:
> > This flag causes GUP to assert that all VMAs within the input range possess
> > the same vma->vm_file. If not, the operation fails.
> >
> > This is part of a patch series which eliminates the vmas parameter from the
> > GUP API, implementing the one remaining assertion within the entire kernel
> > that requires access to the VMAs associated with a GUP range.
> >
> > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
>
> [...]
>
> > ---
> >   						&start, &nr_pages, i,
> > @@ -1595,7 +1603,7 @@ long faultin_vma_page_range(struct vm_area_struct *vma, unsigned long start,
> >   	 * We want to report -EINVAL instead of -EFAULT for any permission
> >   	 * problems or incompatible mappings.
> >   	 */
> > -	if (check_vma_flags(vma, gup_flags))
> > +	if (check_vma_flags(vma, vma->vm_file, gup_flags))
> >   		return -EINVAL;
>
> FOLL_SAME_FILE is never set here, just pass NULL instead of vma->vm_file.
>
>
> As we're not allowing to drop the mmap lock, why can't io_uring simply go
> over all VMAs once, after pinning succeeded, and make sure that the files
> match (or even before pinning)?
>
> In most cases, we're dealing with a single VMA only, it's not like the
> common case is that io_uring pins accross 100s of VMAs.
>
> So I really wonder if the GUP complexity is justified by something.

This is one that needs some input from Jens I think (added to cc, for some
reason I didn't include him on this one but I did on the one updating uring
to use it).

I agree it'd be better not to introduce this flag if we can avoid it,
intent was to avoid having to add complexity to the uring code when we're
already iterating through VMAs in the GUP code, but as you say it's highly
likely most cases will consist of a single VMA anyway.

If Jens is OK with us iterating here I'm more than happy to respin without
adding the flag!

> (removing the VMAs is certainly a welcome surprise -- as it doesn't make any
> sense when used with FOLL_UNLOCKABLE).

This is a product of reading the GUP code when writing the GUP bit for the
book and wishing it were more sane :) I suspect I'll have some more patches
in this area aimed at marrying the disparate APIs where sensible to do so.

>
> --
> Thanks,
>
> David / dhildenb
>
>
  

Patch

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 3fc9e680f174..84d1aec9dbab 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1185,6 +1185,8 @@  enum {
 	FOLL_PCI_P2PDMA = 1 << 10,
 	/* allow interrupts from generic signals */
 	FOLL_INTERRUPTIBLE = 1 << 11,
+	/* assert that the range spans VMAs with the same vma->vm_file */
+	FOLL_SAME_FILE = 1 << 12,
 
 	/* See also internal only FOLL flags in mm/internal.h */
 };
diff --git a/mm/gup.c b/mm/gup.c
index 9440aa54c741..3954ce499a4a 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -959,7 +959,8 @@  static int faultin_page(struct vm_area_struct *vma,
 	return 0;
 }
 
-static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
+static int check_vma_flags(struct vm_area_struct *vma, struct file *file,
+			   unsigned long gup_flags)
 {
 	vm_flags_t vm_flags = vma->vm_flags;
 	int write = (gup_flags & FOLL_WRITE);
@@ -968,7 +969,7 @@  static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
 	if (vm_flags & (VM_IO | VM_PFNMAP))
 		return -EFAULT;
 
-	if (gup_flags & FOLL_ANON && !vma_is_anonymous(vma))
+	if ((gup_flags & FOLL_ANON) && !vma_is_anonymous(vma))
 		return -EFAULT;
 
 	if ((gup_flags & FOLL_LONGTERM) && vma_is_fsdax(vma))
@@ -977,6 +978,9 @@  static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
 	if (vma_is_secretmem(vma))
 		return -EFAULT;
 
+	if ((gup_flags & FOLL_SAME_FILE) && vma->vm_file != file)
+		return -EFAULT;
+
 	if (write) {
 		if (!(vm_flags & VM_WRITE)) {
 			if (!(gup_flags & FOLL_FORCE))
@@ -1081,6 +1085,7 @@  static long __get_user_pages(struct mm_struct *mm,
 	long ret = 0, i = 0;
 	struct vm_area_struct *vma = NULL;
 	struct follow_page_context ctx = { NULL };
+	struct file *file = NULL;
 
 	if (!nr_pages)
 		return 0;
@@ -1111,10 +1116,13 @@  static long __get_user_pages(struct mm_struct *mm,
 				ret = -EFAULT;
 				goto out;
 			}
-			ret = check_vma_flags(vma, gup_flags);
+			ret = check_vma_flags(vma, i == 0 ? vma->vm_file : file,
+					      gup_flags);
 			if (ret)
 				goto out;
 
+			file = vma->vm_file;
+
 			if (is_vm_hugetlb_page(vma)) {
 				i = follow_hugetlb_page(mm, vma, pages, vmas,
 						&start, &nr_pages, i,
@@ -1595,7 +1603,7 @@  long faultin_vma_page_range(struct vm_area_struct *vma, unsigned long start,
 	 * We want to report -EINVAL instead of -EFAULT for any permission
 	 * problems or incompatible mappings.
 	 */
-	if (check_vma_flags(vma, gup_flags))
+	if (check_vma_flags(vma, vma->vm_file, gup_flags))
 		return -EINVAL;
 
 	ret = __get_user_pages(mm, start, nr_pages, gup_flags,