diff mbox series

mm/userfaultfd: Support operation on multiple VMAs

Message ID	20230213104323.1792839-1-usama.anjum@collabora.com
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; sender: usama.anjum) by madras.collabora.co.uk (Postfix) with ESMTPSA id B98E56602136; Mon, 13 Feb 2023 10:43:39 +0000 (GMT) From: Muhammad Usama Anjum <usama.anjum@collabora.com> To: peterx@redhat.com, david@redhat.com, Andrew Morton <akpm@linux-foundation.org> Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>, kernel@collabora.com, Paul Gofman <pgofman@codeweavers.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH] mm/userfaultfd: Support operation on multiple VMAs Date: Mon, 13 Feb 2023 15:43:23 +0500 Message-Id: <20230213104323.1792839-1-usama.anjum@collabora.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	mm/userfaultfd: Support operation on multiple VMAs \| mm/userfaultfd: Support operation on multiple VMAs

Commit Message

Muhammad Usama Anjum Feb. 13, 2023, 10:43 a.m. UTC

  mwriteprotect_range() errors out if [start, end) doesn't fall in one
VMA. We are facing a use case where multiple VMAs are present in one
range of interest. For example, the following pseudocode reproduces the
error which we are trying to fix:

- Allocate memory of size 16 pages with PROT_NONE with mmap
- Register userfaultfd
- Change protection of the first half (1 to 8 pages) of memory to
  PROT_READ | PROT_WRITE. This breaks the memory area in two VMAs.
- Now UFFDIO_WRITEPROTECT_MODE_WP on the whole memory of 16 pages errors
  out.

This is a simple use case where user may or may not know if the memory
area has been divided into multiple VMAs.

Reported-by: Paul Gofman <pgofman@codeweavers.com>
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
---
 mm/userfaultfd.c | 36 +++++++++++++++++++-----------------
 1 file changed, 19 insertions(+), 17 deletions(-)

Comments

David Hildenbrand Feb. 13, 2023, 11:44 a.m. UTC | #1

On 13.02.23 11:43, Muhammad Usama Anjum wrote:
> mwriteprotect_range() errors out if [start, end) doesn't fall in one
> VMA. We are facing a use case where multiple VMAs are present in one
> range of interest. For example, the following pseudocode reproduces the
> error which we are trying to fix:
> 
> - Allocate memory of size 16 pages with PROT_NONE with mmap
> - Register userfaultfd
> - Change protection of the first half (1 to 8 pages) of memory to
>    PROT_READ | PROT_WRITE. This breaks the memory area in two VMAs.
> - Now UFFDIO_WRITEPROTECT_MODE_WP on the whole memory of 16 pages errors
>    out.
> 
> This is a simple use case where user may or may not know if the memory
> area has been divided into multiple VMAs.
> 
> Reported-by: Paul Gofman <pgofman@codeweavers.com>
> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
> ---
>   mm/userfaultfd.c | 36 +++++++++++++++++++-----------------
>   1 file changed, 19 insertions(+), 17 deletions(-)
> 
> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index 65ad172add27..46e0a014af68 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -738,9 +738,11 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
>   			unsigned long len, bool enable_wp,
>   			atomic_t *mmap_changing)
>   {
> +	unsigned long end = start + len;
>   	struct vm_area_struct *dst_vma;
>   	unsigned long page_mask;
>   	int err;
> +	VMA_ITERATOR(vmi, dst_mm, start);
>   
>   	/*
>   	 * Sanitize the command parameters:
> @@ -762,26 +764,26 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
>   	if (mmap_changing && atomic_read(mmap_changing))
>   		goto out_unlock;
>   
> -	err = -ENOENT;
> -	dst_vma = find_dst_vma(dst_mm, start, len);
> -
> -	if (!dst_vma)
> -		goto out_unlock;
> -	if (!userfaultfd_wp(dst_vma))
> -		goto out_unlock;
> -	if (!vma_can_userfault(dst_vma, dst_vma->vm_flags))
> -		goto out_unlock;
> +	for_each_vma_range(vmi, dst_vma, end) {
> +		err = -ENOENT;
>   
> -	if (is_vm_hugetlb_page(dst_vma)) {
> -		err = -EINVAL;
> -		page_mask = vma_kernel_pagesize(dst_vma) - 1;
> -		if ((start & page_mask) || (len & page_mask))
> -			goto out_unlock;
> -	}
> +		if (!dst_vma->vm_userfaultfd_ctx.ctx)
> +			break;
> +		if (!userfaultfd_wp(dst_vma))
> +			break;
> +		if (!vma_can_userfault(dst_vma, dst_vma->vm_flags))
> +			break;
>   
> -	uffd_wp_range(dst_mm, dst_vma, start, len, enable_wp);
> +		if (is_vm_hugetlb_page(dst_vma)) {
> +			err = -EINVAL;
> +			page_mask = vma_kernel_pagesize(dst_vma) - 1;
> +			if ((start & page_mask) || (len & page_mask))
> +				break;
> +		}
>   
> -	err = 0;
> +		uffd_wp_range(dst_mm, dst_vma, start, len, enable_wp);

I suspect you should be adjusting the range to only cover that specific 
VMA here.

Muhammad Usama Anjum Feb. 13, 2023, 3:04 p.m. UTC | #2

Hi David,

Thank you for quick review!

On 2/13/23 4:44 PM, David Hildenbrand wrote:
> On 13.02.23 11:43, Muhammad Usama Anjum wrote:
>> mwriteprotect_range() errors out if [start, end) doesn't fall in one
>> VMA. We are facing a use case where multiple VMAs are present in one
>> range of interest. For example, the following pseudocode reproduces the
>> error which we are trying to fix:
>>
>> - Allocate memory of size 16 pages with PROT_NONE with mmap
>> - Register userfaultfd
>> - Change protection of the first half (1 to 8 pages) of memory to
>>    PROT_READ | PROT_WRITE. This breaks the memory area in two VMAs.
>> - Now UFFDIO_WRITEPROTECT_MODE_WP on the whole memory of 16 pages errors
>>    out.
>>
>> This is a simple use case where user may or may not know if the memory
>> area has been divided into multiple VMAs.
>>
>> Reported-by: Paul Gofman <pgofman@codeweavers.com>
>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
>> ---
>>   mm/userfaultfd.c | 36 +++++++++++++++++++-----------------
>>   1 file changed, 19 insertions(+), 17 deletions(-)
>>
>> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
>> index 65ad172add27..46e0a014af68 100644
>> --- a/mm/userfaultfd.c
>> +++ b/mm/userfaultfd.c
>> @@ -738,9 +738,11 @@ int mwriteprotect_range(struct mm_struct *dst_mm,
>> unsigned long start,
>>               unsigned long len, bool enable_wp,
>>               atomic_t *mmap_changing)
>>   {
>> +    unsigned long end = start + len;
>>       struct vm_area_struct *dst_vma;
>>       unsigned long page_mask;
>>       int err;
>> +    VMA_ITERATOR(vmi, dst_mm, start);
>>         /*
>>        * Sanitize the command parameters:
>> @@ -762,26 +764,26 @@ int mwriteprotect_range(struct mm_struct *dst_mm,
>> unsigned long start,
>>       if (mmap_changing && atomic_read(mmap_changing))
>>           goto out_unlock;
>>   -    err = -ENOENT;
>> -    dst_vma = find_dst_vma(dst_mm, start, len);
>> -
>> -    if (!dst_vma)
>> -        goto out_unlock;
>> -    if (!userfaultfd_wp(dst_vma))
>> -        goto out_unlock;
>> -    if (!vma_can_userfault(dst_vma, dst_vma->vm_flags))
>> -        goto out_unlock;
>> +    for_each_vma_range(vmi, dst_vma, end) {
>> +        err = -ENOENT;
>>   -    if (is_vm_hugetlb_page(dst_vma)) {
>> -        err = -EINVAL;
>> -        page_mask = vma_kernel_pagesize(dst_vma) - 1;
>> -        if ((start & page_mask) || (len & page_mask))
>> -            goto out_unlock;
>> -    }
>> +        if (!dst_vma->vm_userfaultfd_ctx.ctx)
>> +            break;
>> +        if (!userfaultfd_wp(dst_vma))
>> +            break;
>> +        if (!vma_can_userfault(dst_vma, dst_vma->vm_flags))
>> +            break;
>>   -    uffd_wp_range(dst_mm, dst_vma, start, len, enable_wp);
>> +        if (is_vm_hugetlb_page(dst_vma)) {
>> +            err = -EINVAL;
>> +            page_mask = vma_kernel_pagesize(dst_vma) - 1;
>> +            if ((start & page_mask) || (len & page_mask))
>> +                break;
>> +        }
>>   -    err = 0;
>> +        uffd_wp_range(dst_mm, dst_vma, start, len, enable_wp);
> 
> I suspect you should be adjusting the range to only cover that specific VMA
> here.
Sorry, you are right. I don't know why it is still working with the
blunder. Will send a v2.

Thanks,
Usama

David Hildenbrand Feb. 13, 2023, 3:12 p.m. UTC | #3

On 13.02.23 16:04, Muhammad Usama Anjum wrote:
> Hi David,
> 
> Thank you for quick review!
> 
> On 2/13/23 4:44 PM, David Hildenbrand wrote:
>> On 13.02.23 11:43, Muhammad Usama Anjum wrote:
>>> mwriteprotect_range() errors out if [start, end) doesn't fall in one
>>> VMA. We are facing a use case where multiple VMAs are present in one
>>> range of interest. For example, the following pseudocode reproduces the
>>> error which we are trying to fix:
>>>
>>> - Allocate memory of size 16 pages with PROT_NONE with mmap
>>> - Register userfaultfd
>>> - Change protection of the first half (1 to 8 pages) of memory to
>>>     PROT_READ | PROT_WRITE. This breaks the memory area in two VMAs.
>>> - Now UFFDIO_WRITEPROTECT_MODE_WP on the whole memory of 16 pages errors
>>>     out.
>>>
>>> This is a simple use case where user may or may not know if the memory
>>> area has been divided into multiple VMAs.
>>>
>>> Reported-by: Paul Gofman <pgofman@codeweavers.com>
>>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
>>> ---
>>>    mm/userfaultfd.c | 36 +++++++++++++++++++-----------------
>>>    1 file changed, 19 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
>>> index 65ad172add27..46e0a014af68 100644
>>> --- a/mm/userfaultfd.c
>>> +++ b/mm/userfaultfd.c
>>> @@ -738,9 +738,11 @@ int mwriteprotect_range(struct mm_struct *dst_mm,
>>> unsigned long start,
>>>                unsigned long len, bool enable_wp,
>>>                atomic_t *mmap_changing)
>>>    {
>>> +    unsigned long end = start + len;
>>>        struct vm_area_struct *dst_vma;
>>>        unsigned long page_mask;
>>>        int err;
>>> +    VMA_ITERATOR(vmi, dst_mm, start);
>>>          /*
>>>         * Sanitize the command parameters:
>>> @@ -762,26 +764,26 @@ int mwriteprotect_range(struct mm_struct *dst_mm,
>>> unsigned long start,
>>>        if (mmap_changing && atomic_read(mmap_changing))
>>>            goto out_unlock;
>>>    -    err = -ENOENT;
>>> -    dst_vma = find_dst_vma(dst_mm, start, len);
>>> -
>>> -    if (!dst_vma)
>>> -        goto out_unlock;
>>> -    if (!userfaultfd_wp(dst_vma))
>>> -        goto out_unlock;
>>> -    if (!vma_can_userfault(dst_vma, dst_vma->vm_flags))
>>> -        goto out_unlock;
>>> +    for_each_vma_range(vmi, dst_vma, end) {
>>> +        err = -ENOENT;
>>>    -    if (is_vm_hugetlb_page(dst_vma)) {
>>> -        err = -EINVAL;
>>> -        page_mask = vma_kernel_pagesize(dst_vma) - 1;
>>> -        if ((start & page_mask) || (len & page_mask))
>>> -            goto out_unlock;
>>> -    }
>>> +        if (!dst_vma->vm_userfaultfd_ctx.ctx)
>>> +            break;
>>> +        if (!userfaultfd_wp(dst_vma))
>>> +            break;
>>> +        if (!vma_can_userfault(dst_vma, dst_vma->vm_flags))
>>> +            break;
>>>    -    uffd_wp_range(dst_mm, dst_vma, start, len, enable_wp);
>>> +        if (is_vm_hugetlb_page(dst_vma)) {
>>> +            err = -EINVAL;
>>> +            page_mask = vma_kernel_pagesize(dst_vma) - 1;
>>> +            if ((start & page_mask) || (len & page_mask))
>>> +                break;
>>> +        }
>>>    -    err = 0;
>>> +        uffd_wp_range(dst_mm, dst_vma, start, len, enable_wp);
>>
>> I suspect you should be adjusting the range to only cover that specific VMA
>> here.
> Sorry, you are right. I don't know why it is still working with the
> blunder. Will send a v2.

Maybe worth adding some sanity checks (VM_WARN_ONCE()) in there (e.g., 
change_protection()) to catch that.

diff mbox series

Patch

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 65ad172add27..46e0a014af68 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -738,9 +738,11 @@  int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
 			unsigned long len, bool enable_wp,
 			atomic_t *mmap_changing)
 {
+	unsigned long end = start + len;
 	struct vm_area_struct *dst_vma;
 	unsigned long page_mask;
 	int err;
+	VMA_ITERATOR(vmi, dst_mm, start);
 
 	/*
 	 * Sanitize the command parameters:
@@ -762,26 +764,26 @@  int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
 	if (mmap_changing && atomic_read(mmap_changing))
 		goto out_unlock;
 
-	err = -ENOENT;
-	dst_vma = find_dst_vma(dst_mm, start, len);
-
-	if (!dst_vma)
-		goto out_unlock;
-	if (!userfaultfd_wp(dst_vma))
-		goto out_unlock;
-	if (!vma_can_userfault(dst_vma, dst_vma->vm_flags))
-		goto out_unlock;
+	for_each_vma_range(vmi, dst_vma, end) {
+		err = -ENOENT;
 
-	if (is_vm_hugetlb_page(dst_vma)) {
-		err = -EINVAL;
-		page_mask = vma_kernel_pagesize(dst_vma) - 1;
-		if ((start & page_mask) || (len & page_mask))
-			goto out_unlock;
-	}
+		if (!dst_vma->vm_userfaultfd_ctx.ctx)
+			break;
+		if (!userfaultfd_wp(dst_vma))
+			break;
+		if (!vma_can_userfault(dst_vma, dst_vma->vm_flags))
+			break;
 
-	uffd_wp_range(dst_mm, dst_vma, start, len, enable_wp);
+		if (is_vm_hugetlb_page(dst_vma)) {
+			err = -EINVAL;
+			page_mask = vma_kernel_pagesize(dst_vma) - 1;
+			if ((start & page_mask) || (len & page_mask))
+				break;
+		}
 
-	err = 0;
+		uffd_wp_range(dst_mm, dst_vma, start, len, enable_wp);
+		err = 0;
+	}
 out_unlock:
 	mmap_read_unlock(dst_mm);
 	return err;