[v5,1/6] mm/gup: remove unused vmas parameter from get_user_pages()
Commit Message
No invocation of get_user_pages() use the vmas parameter, so remove it.
The GUP API is confusing and caveated. Recent changes have done much to
improve that, however there is more we can do. Exporting vmas is a prime
target as the caller has to be extremely careful to preclude their use
after the mmap_lock has expired or otherwise be left with dangling
pointers.
Removing the vmas parameter focuses the GUP functions upon their primary
purpose - pinning (and outputting) pages as well as performing the actions
implied by the input flags.
This is part of a patch series aiming to remove the vmas parameter
altogether.
Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Christian König <christian.koenig@amd.com> (for radeon parts)
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
---
arch/x86/kernel/cpu/sgx/ioctl.c | 2 +-
drivers/gpu/drm/radeon/radeon_ttm.c | 2 +-
drivers/misc/sgi-gru/grufault.c | 2 +-
include/linux/mm.h | 3 +--
mm/gup.c | 9 +++------
mm/gup_test.c | 5 ++---
virt/kvm/kvm_main.c | 2 +-
7 files changed, 10 insertions(+), 15 deletions(-)
Comments
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
On Sun, May 14, 2023, Lorenzo Stoakes wrote:
> No invocation of get_user_pages() use the vmas parameter, so remove it.
>
> The GUP API is confusing and caveated. Recent changes have done much to
> improve that, however there is more we can do. Exporting vmas is a prime
> target as the caller has to be extremely careful to preclude their use
> after the mmap_lock has expired or otherwise be left with dangling
> pointers.
>
> Removing the vmas parameter focuses the GUP functions upon their primary
> purpose - pinning (and outputting) pages as well as performing the actions
> implied by the input flags.
>
> This is part of a patch series aiming to remove the vmas parameter
> altogether.
>
> Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Acked-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Acked-by: Christian K�nig <christian.koenig@amd.com> (for radeon parts)
> Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
> Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
> ---
> arch/x86/kernel/cpu/sgx/ioctl.c | 2 +-
> drivers/gpu/drm/radeon/radeon_ttm.c | 2 +-
> drivers/misc/sgi-gru/grufault.c | 2 +-
> include/linux/mm.h | 3 +--
> mm/gup.c | 9 +++------
> mm/gup_test.c | 5 ++---
> virt/kvm/kvm_main.c | 2 +-
> 7 files changed, 10 insertions(+), 15 deletions(-)
Acked-by: Sean Christopherson <seanjc@google.com> (KVM)
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index cb5c13eee193..eaa5bb8dbadc 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2477,7 +2477,7 @@ static inline int check_user_page_hwpoison(unsigned long addr)
> {
> int rc, flags = FOLL_HWPOISON | FOLL_WRITE;
>
> - rc = get_user_pages(addr, 1, flags, NULL, NULL);
> + rc = get_user_pages(addr, 1, flags, NULL);
> return rc == -EHWPOISON;
Unrelated to this patch, I think there's a pre-existing bug here. If gup() returns
a valid page, KVM will leak the refcount and unintentionally pin the page. That's
highly unlikely as check_user_page_hwpoison() is called iff get_user_pages_unlocked()
fails (called by hva_to_pfn_slow()), but it's theoretically possible that userspace
could change the VMAs between hva_to_pfn_slow() and check_user_page_hwpoison() since
KVM doesn't hold any relevant locks at this point.
E.g. if there's no VMA during hva_to_pfn_{fast,slow}(), npages==-EFAULT and KVM
will invoke check_user_page_hwpoison(). If userspace installs a valid mapping
after hva_to_pfn_slow() but before KVM acquires mmap_lock, then gup() will find
a valid page.
I _think_ the fix is to simply delete this code. The bug was introduced by commit
fafc3dbaac64 ("KVM: Replace is_hwpoison_address with __get_user_pages"). At that
time, KVM didn't check for "npages == -EHWPOISON" from the first call to
get_user_pages_unlocked(). Later on, commit 0857b9e95c1a ("KVM: Enable async page
fault processing") reworked the caller to be:
mmap_read_lock(current->mm);
if (npages == -EHWPOISON ||
(!async && check_user_page_hwpoison(addr))) {
pfn = KVM_PFN_ERR_HWPOISON;
goto exit;
}
where async really means NOWAIT, so that the hwpoison use of gup() didn't sleep.
KVM: Enable async page fault processing
If asynchronous hva_to_pfn() is requested call GUP with FOLL_NOWAIT to
avoid sleeping on IO. Check for hwpoison is done at the same time,
otherwise check_user_page_hwpoison() will call GUP again and will put
vcpu to sleep.
There are other potential problems too, e.g. the hwpoison call doesn't honor
the recently introduced @interruptible flag.
I don't see any reason to keep check_user_page_hwpoison(), KVM can simply rely on
the "npages == -EHWPOISON" check. get_user_pages_unlocked() is guaranteed to be
called with roughly equivalent flags, and the flags that aren't equivalent are
arguably bugs in check_user_page_hwpoison(), e.g. assuming FOLL_WRITE is wrong.
TL;DR: Go ahead with this change, I'll submit a separate patch to delete the
buggy KVM code.
On 15.05.23 21:07, Sean Christopherson wrote:
> On Sun, May 14, 2023, Lorenzo Stoakes wrote:
>> No invocation of get_user_pages() use the vmas parameter, so remove it.
>>
>> The GUP API is confusing and caveated. Recent changes have done much to
>> improve that, however there is more we can do. Exporting vmas is a prime
>> target as the caller has to be extremely careful to preclude their use
>> after the mmap_lock has expired or otherwise be left with dangling
>> pointers.
>>
>> Removing the vmas parameter focuses the GUP functions upon their primary
>> purpose - pinning (and outputting) pages as well as performing the actions
>> implied by the input flags.
>>
>> This is part of a patch series aiming to remove the vmas parameter
>> altogether.
>>
>> Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Acked-by: David Hildenbrand <david@redhat.com>
>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>> Acked-by: Christian K�nig <christian.koenig@amd.com> (for radeon parts)
>> Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
>> Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
>> ---
>> arch/x86/kernel/cpu/sgx/ioctl.c | 2 +-
>> drivers/gpu/drm/radeon/radeon_ttm.c | 2 +-
>> drivers/misc/sgi-gru/grufault.c | 2 +-
>> include/linux/mm.h | 3 +--
>> mm/gup.c | 9 +++------
>> mm/gup_test.c | 5 ++---
>> virt/kvm/kvm_main.c | 2 +-
>> 7 files changed, 10 insertions(+), 15 deletions(-)
>
> Acked-by: Sean Christopherson <seanjc@google.com> (KVM)
>
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index cb5c13eee193..eaa5bb8dbadc 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -2477,7 +2477,7 @@ static inline int check_user_page_hwpoison(unsigned long addr)
>> {
>> int rc, flags = FOLL_HWPOISON | FOLL_WRITE;
>>
>> - rc = get_user_pages(addr, 1, flags, NULL, NULL);
>> + rc = get_user_pages(addr, 1, flags, NULL);
>> return rc == -EHWPOISON;
>
> Unrelated to this patch, I think there's a pre-existing bug here. If gup() returns
> a valid page, KVM will leak the refcount and unintentionally pin the page. That's
When passing NULL as "pages" to get_user_pages(),
__get_user_pages_locked() won't set FOLL_GET. As FOLL_PIN is also not
set, we won't be messing with the mapcount of the page.
So even if get_user_pages() returns "1", we should be fine.
Or am I misunderstanding your concern? At least hva_to_pfn_slow() most
certainly didn't return "1" if we end up calling
check_user_page_hwpoison(), so nothing would have been pinned there as well.
On Tue, May 16, 2023, David Hildenbrand wrote:
> On 15.05.23 21:07, Sean Christopherson wrote:
> > On Sun, May 14, 2023, Lorenzo Stoakes wrote:
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > index cb5c13eee193..eaa5bb8dbadc 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -2477,7 +2477,7 @@ static inline int check_user_page_hwpoison(unsigned long addr)
> > > {
> > > int rc, flags = FOLL_HWPOISON | FOLL_WRITE;
> > > - rc = get_user_pages(addr, 1, flags, NULL, NULL);
> > > + rc = get_user_pages(addr, 1, flags, NULL);
> > > return rc == -EHWPOISON;
> >
> > Unrelated to this patch, I think there's a pre-existing bug here. If gup() returns
> > a valid page, KVM will leak the refcount and unintentionally pin the page. That's
>
> When passing NULL as "pages" to get_user_pages(), __get_user_pages_locked()
> won't set FOLL_GET. As FOLL_PIN is also not set, we won't be messing with
> the mapcount of the page.
Ah, that's what I'm missing.
> So even if get_user_pages() returns "1", we should be fine.
>
>
> Or am I misunderstanding your concern?
Nope, you covered everything. I do think we can drop the extra gup() though,
AFAICT it's 100% redundant. But it's not a bug.
Thanks!
On 16.05.23 16:30, Sean Christopherson wrote:
> On Tue, May 16, 2023, David Hildenbrand wrote:
>> On 15.05.23 21:07, Sean Christopherson wrote:
>>> On Sun, May 14, 2023, Lorenzo Stoakes wrote:
>>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>>> index cb5c13eee193..eaa5bb8dbadc 100644
>>>> --- a/virt/kvm/kvm_main.c
>>>> +++ b/virt/kvm/kvm_main.c
>>>> @@ -2477,7 +2477,7 @@ static inline int check_user_page_hwpoison(unsigned long addr)
>>>> {
>>>> int rc, flags = FOLL_HWPOISON | FOLL_WRITE;
>>>> - rc = get_user_pages(addr, 1, flags, NULL, NULL);
>>>> + rc = get_user_pages(addr, 1, flags, NULL);
>>>> return rc == -EHWPOISON;
>>>
>>> Unrelated to this patch, I think there's a pre-existing bug here. If gup() returns
>>> a valid page, KVM will leak the refcount and unintentionally pin the page. That's
>>
>> When passing NULL as "pages" to get_user_pages(), __get_user_pages_locked()
>> won't set FOLL_GET. As FOLL_PIN is also not set, we won't be messing with
>> the mapcount of the page.
For completeness: s/mapcount/refcount/ :)
>
> Ah, that's what I'm missing.
On 5/16/23 07:35, David Hildenbrand wrote:
...
>>> When passing NULL as "pages" to get_user_pages(), __get_user_pages_locked()
>>> won't set FOLL_GET. As FOLL_PIN is also not set, we won't be messing with
>>> the mapcount of the page.
>
> For completeness: s/mapcount/refcount/ :)
whew, you had me going there! Now it all adds up. :)
thanks,
@@ -214,7 +214,7 @@ static int __sgx_encl_add_page(struct sgx_encl *encl,
if (!(vma->vm_flags & VM_MAYEXEC))
return -EACCES;
- ret = get_user_pages(src, 1, 0, &src_page, NULL);
+ ret = get_user_pages(src, 1, 0, &src_page);
if (ret < 1)
return -EFAULT;
@@ -359,7 +359,7 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_device *bdev, struct ttm_tt *ttm
struct page **pages = ttm->pages + pinned;
r = get_user_pages(userptr, num_pages, write ? FOLL_WRITE : 0,
- pages, NULL);
+ pages);
if (r < 0)
goto release_pages;
@@ -185,7 +185,7 @@ static int non_atomic_pte_lookup(struct vm_area_struct *vma,
#else
*pageshift = PAGE_SHIFT;
#endif
- if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page, NULL) <= 0)
+ if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page) <= 0)
return -EFAULT;
*paddr = page_to_phys(page);
put_page(page);
@@ -2382,8 +2382,7 @@ long pin_user_pages_remote(struct mm_struct *mm,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas, int *locked);
long get_user_pages(unsigned long start, unsigned long nr_pages,
- unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas);
+ unsigned int gup_flags, struct page **pages);
long pin_user_pages(unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas);
@@ -2294,8 +2294,6 @@ long get_user_pages_remote(struct mm_struct *mm,
* @pages: array that receives pointers to the pages pinned.
* Should be at least nr_pages long. Or NULL, if caller
* only intends to ensure the pages are faulted in.
- * @vmas: array of pointers to vmas corresponding to each page.
- * Or NULL if the caller does not require them.
*
* This is the same as get_user_pages_remote(), just with a less-flexible
* calling convention where we assume that the mm being operated on belongs to
@@ -2303,16 +2301,15 @@ long get_user_pages_remote(struct mm_struct *mm,
* obviously don't pass FOLL_REMOTE in here.
*/
long get_user_pages(unsigned long start, unsigned long nr_pages,
- unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas)
+ unsigned int gup_flags, struct page **pages)
{
int locked = 1;
- if (!is_valid_gup_args(pages, vmas, NULL, &gup_flags, FOLL_TOUCH))
+ if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_TOUCH))
return -EINVAL;
return __get_user_pages_locked(current->mm, start, nr_pages, pages,
- vmas, &locked, gup_flags);
+ NULL, &locked, gup_flags);
}
EXPORT_SYMBOL(get_user_pages);
@@ -139,8 +139,7 @@ static int __gup_test_ioctl(unsigned int cmd,
pages + i);
break;
case GUP_BASIC_TEST:
- nr = get_user_pages(addr, nr, gup->gup_flags, pages + i,
- NULL);
+ nr = get_user_pages(addr, nr, gup->gup_flags, pages + i);
break;
case PIN_FAST_BENCHMARK:
nr = pin_user_pages_fast(addr, nr, gup->gup_flags,
@@ -161,7 +160,7 @@ static int __gup_test_ioctl(unsigned int cmd,
pages + i, NULL);
else
nr = get_user_pages(addr, nr, gup->gup_flags,
- pages + i, NULL);
+ pages + i);
break;
default:
ret = -EINVAL;
@@ -2477,7 +2477,7 @@ static inline int check_user_page_hwpoison(unsigned long addr)
{
int rc, flags = FOLL_HWPOISON | FOLL_WRITE;
- rc = get_user_pages(addr, 1, flags, NULL, NULL);
+ rc = get_user_pages(addr, 1, flags, NULL);
return rc == -EHWPOISON;
}