[v3,1/4] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap()

Message ID 20230714161733.4144503-1-ryan.roberts@arm.com
State New
Headers
Series variable-order, large folios for anonymous memory |

Commit Message

Ryan Roberts July 14, 2023, 4:17 p.m. UTC
  In preparation for FLEXIBLE_THP support, improve
folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be
passed to it. In this case, all contained pages are accounted using the
order-0 folio (or base page) scheme.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
---
 mm/rmap.c | 28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

--
2.25.1
  

Comments

Yu Zhao July 14, 2023, 4:52 p.m. UTC | #1
On Fri, Jul 14, 2023 at 10:17 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> In preparation for FLEXIBLE_THP support, improve
> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be
> passed to it. In this case, all contained pages are accounted using the
> order-0 folio (or base page) scheme.
>
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> Reviewed-by: Yu Zhao <yuzhao@google.com>
> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>

This patch doesn't depend on the rest of the series and therefore can
be merged separately in case the rest needs more discussion.

Ryan, please feel free to post other code paths (those from v1) you've
optimized for large anon folios at any time, since each code path can
be reviewed and merged individually as well.
  
Ryan Roberts July 14, 2023, 6:01 p.m. UTC | #2
On 14/07/2023 17:52, Yu Zhao wrote:
> On Fri, Jul 14, 2023 at 10:17 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> In preparation for FLEXIBLE_THP support, improve
>> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be
>> passed to it. In this case, all contained pages are accounted using the
>> order-0 folio (or base page) scheme.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> Reviewed-by: Yu Zhao <yuzhao@google.com>
>> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
> 
> This patch doesn't depend on the rest of the series and therefore can
> be merged separately in case the rest needs more discussion.
> 
> Ryan, please feel free to post other code paths (those from v1) you've
> optimized for large anon folios at any time, since each code path can
> be reviewed and merged individually as well.

Will do. Hoping to get the "batch zap" series out on Monday. Others need a bit
more time to bake as they will need to be aligned to the changes from the review
of this series first.
  
David Hildenbrand July 17, 2023, 1 p.m. UTC | #3
On 14.07.23 18:17, Ryan Roberts wrote:
> In preparation for FLEXIBLE_THP support, improve
> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be
> passed to it. In this case, all contained pages are accounted using the
> order-0 folio (or base page) scheme.
> 
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> Reviewed-by: Yu Zhao <yuzhao@google.com>
> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
> ---
>   mm/rmap.c | 28 +++++++++++++++++++++-------
>   1 file changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 0c0d8857dfce..f293d072368a 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1278,31 +1278,45 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
>    * This means the inc-and-test can be bypassed.
>    * The folio does not have to be locked.
>    *
> - * If the folio is large, it is accounted as a THP.  As the folio
> + * If the folio is pmd-mappable, it is accounted as a THP.  As the folio
>    * is new, it's assumed to be mapped exclusively by a single process.
>    */
>   void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>   		unsigned long address)
>   {
> -	int nr;
> +	int nr = folio_nr_pages(folio);
> 
> -	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
> +	VM_BUG_ON_VMA(address < vma->vm_start ||
> +			address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
>   	__folio_set_swapbacked(folio);
> 
> -	if (likely(!folio_test_pmd_mappable(folio))) {
> +	if (!folio_test_large(folio)) {

Why remove the "likely" here? The patch itself does not change anything 
about that condition.

>   		/* increment count (starts at -1) */
>   		atomic_set(&folio->_mapcount, 0);
> -		nr = 1;
> +		__page_set_anon_rmap(folio, &folio->page, vma, address, 1);
> +	} else if (!folio_test_pmd_mappable(folio)) {
> +		int i;
> +
> +		for (i = 0; i < nr; i++) {
> +			struct page *page = folio_page(folio, i);
> +
> +			/* increment count (starts at -1) */
> +			atomic_set(&page->_mapcount, 0);
> +			__page_set_anon_rmap(folio, page, vma,
> +					address + (i << PAGE_SHIFT), 1);
> +		}
> +
> +		/* increment count (starts at 0) */

That comment is a bit misleading. We're not talking about a mapcount as 
in the other cases here.

> +		atomic_set(&folio->_nr_pages_mapped, nr);
>   	} else {
>   		/* increment count (starts at -1) */
>   		atomic_set(&folio->_entire_mapcount, 0);
>   		atomic_set(&folio->_nr_pages_mapped, COMPOUND_MAPPED);
> -		nr = folio_nr_pages(folio);
> +		__page_set_anon_rmap(folio, &folio->page, vma, address, 1);
>   		__lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr);
>   	}
> 

Apart from that, LGTM.
  
Ryan Roberts July 17, 2023, 1:13 p.m. UTC | #4
On 17/07/2023 14:00, David Hildenbrand wrote:
> On 14.07.23 18:17, Ryan Roberts wrote:
>> In preparation for FLEXIBLE_THP support, improve
>> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be
>> passed to it. In this case, all contained pages are accounted using the
>> order-0 folio (or base page) scheme.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> Reviewed-by: Yu Zhao <yuzhao@google.com>
>> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
>> ---
>>   mm/rmap.c | 28 +++++++++++++++++++++-------
>>   1 file changed, 21 insertions(+), 7 deletions(-)
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index 0c0d8857dfce..f293d072368a 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1278,31 +1278,45 @@ void page_add_anon_rmap(struct page *page, struct
>> vm_area_struct *vma,
>>    * This means the inc-and-test can be bypassed.
>>    * The folio does not have to be locked.
>>    *
>> - * If the folio is large, it is accounted as a THP.  As the folio
>> + * If the folio is pmd-mappable, it is accounted as a THP.  As the folio
>>    * is new, it's assumed to be mapped exclusively by a single process.
>>    */
>>   void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>>           unsigned long address)
>>   {
>> -    int nr;
>> +    int nr = folio_nr_pages(folio);
>>
>> -    VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
>> +    VM_BUG_ON_VMA(address < vma->vm_start ||
>> +            address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
>>       __folio_set_swapbacked(folio);
>>
>> -    if (likely(!folio_test_pmd_mappable(folio))) {
>> +    if (!folio_test_large(folio)) {
> 
> Why remove the "likely" here? The patch itself does not change anything about
> that condition.

Good question; I'm not sure why. Will have to put it down to bad copy/paste
fixup. Will put it back in the next version.

> 
>>           /* increment count (starts at -1) */
>>           atomic_set(&folio->_mapcount, 0);
>> -        nr = 1;
>> +        __page_set_anon_rmap(folio, &folio->page, vma, address, 1);
>> +    } else if (!folio_test_pmd_mappable(folio)) {
>> +        int i;
>> +
>> +        for (i = 0; i < nr; i++) {
>> +            struct page *page = folio_page(folio, i);
>> +
>> +            /* increment count (starts at -1) */
>> +            atomic_set(&page->_mapcount, 0);
>> +            __page_set_anon_rmap(folio, page, vma,
>> +                    address + (i << PAGE_SHIFT), 1);
>> +        }
>> +
>> +        /* increment count (starts at 0) */
> 
> That comment is a bit misleading. We're not talking about a mapcount as in the
> other cases here.

Correct, I'm talking about _nr_pages_mapped, which starts 0, not -1 like
_mapcount. The comment was intended to be in the style used in other similar
places in rmap.c. I could change it to: "_nr_pages_mapped is 0-based, so set it
to the number of pages in the folio" or remove it entirely? What do you prefer?

> 
>> +        atomic_set(&folio->_nr_pages_mapped, nr);
>>       } else {
>>           /* increment count (starts at -1) */
>>           atomic_set(&folio->_entire_mapcount, 0);
>>           atomic_set(&folio->_nr_pages_mapped, COMPOUND_MAPPED);
>> -        nr = folio_nr_pages(folio);
>> +        __page_set_anon_rmap(folio, &folio->page, vma, address, 1);
>>           __lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr);
>>       }
>>
> 
> Apart from that, LGTM.
>
  
David Hildenbrand July 17, 2023, 1:19 p.m. UTC | #5
On 17.07.23 15:13, Ryan Roberts wrote:
> On 17/07/2023 14:00, David Hildenbrand wrote:
>> On 14.07.23 18:17, Ryan Roberts wrote:
>>> In preparation for FLEXIBLE_THP support, improve
>>> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be
>>> passed to it. In this case, all contained pages are accounted using the
>>> order-0 folio (or base page) scheme.
>>>
>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>> Reviewed-by: Yu Zhao <yuzhao@google.com>
>>> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
>>> ---
>>>    mm/rmap.c | 28 +++++++++++++++++++++-------
>>>    1 file changed, 21 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>> index 0c0d8857dfce..f293d072368a 100644
>>> --- a/mm/rmap.c
>>> +++ b/mm/rmap.c
>>> @@ -1278,31 +1278,45 @@ void page_add_anon_rmap(struct page *page, struct
>>> vm_area_struct *vma,
>>>     * This means the inc-and-test can be bypassed.
>>>     * The folio does not have to be locked.
>>>     *
>>> - * If the folio is large, it is accounted as a THP.  As the folio
>>> + * If the folio is pmd-mappable, it is accounted as a THP.  As the folio
>>>     * is new, it's assumed to be mapped exclusively by a single process.
>>>     */
>>>    void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>>>            unsigned long address)
>>>    {
>>> -    int nr;
>>> +    int nr = folio_nr_pages(folio);
>>>
>>> -    VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
>>> +    VM_BUG_ON_VMA(address < vma->vm_start ||
>>> +            address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
>>>        __folio_set_swapbacked(folio);
>>>
>>> -    if (likely(!folio_test_pmd_mappable(folio))) {
>>> +    if (!folio_test_large(folio)) {
>>
>> Why remove the "likely" here? The patch itself does not change anything about
>> that condition.
> 
> Good question; I'm not sure why. Will have to put it down to bad copy/paste
> fixup. Will put it back in the next version.
> 
>>
>>>            /* increment count (starts at -1) */
>>>            atomic_set(&folio->_mapcount, 0);
>>> -        nr = 1;
>>> +        __page_set_anon_rmap(folio, &folio->page, vma, address, 1);
>>> +    } else if (!folio_test_pmd_mappable(folio)) {
>>> +        int i;
>>> +
>>> +        for (i = 0; i < nr; i++) {
>>> +            struct page *page = folio_page(folio, i);
>>> +
>>> +            /* increment count (starts at -1) */
>>> +            atomic_set(&page->_mapcount, 0);
>>> +            __page_set_anon_rmap(folio, page, vma,
>>> +                    address + (i << PAGE_SHIFT), 1);
>>> +        }
>>> +
>>> +        /* increment count (starts at 0) */
>>
>> That comment is a bit misleading. We're not talking about a mapcount as in the
>> other cases here.
> 
> Correct, I'm talking about _nr_pages_mapped, which starts 0, not -1 like
> _mapcount. The comment was intended to be in the style used in other similar
> places in rmap.c. I could change it to: "_nr_pages_mapped is 0-based, so set it
> to the number of pages in the folio" or remove it entirely? What do you prefer?
> 

We only have to comment what's weird, not what's normal.

IOW, we also didn't have such a comment in the existing code when doing 
atomic_set(&folio->_nr_pages_mapped, COMPOUND_MAPPED);


What might make sense here is a simple

"All pages of the folio are PTE-mapped."
  
Ryan Roberts July 17, 2023, 1:21 p.m. UTC | #6
On 17/07/2023 14:19, David Hildenbrand wrote:
> On 17.07.23 15:13, Ryan Roberts wrote:
>> On 17/07/2023 14:00, David Hildenbrand wrote:
>>> On 14.07.23 18:17, Ryan Roberts wrote:
>>>> In preparation for FLEXIBLE_THP support, improve
>>>> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be
>>>> passed to it. In this case, all contained pages are accounted using the
>>>> order-0 folio (or base page) scheme.
>>>>
>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>>> Reviewed-by: Yu Zhao <yuzhao@google.com>
>>>> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
>>>> ---
>>>>    mm/rmap.c | 28 +++++++++++++++++++++-------
>>>>    1 file changed, 21 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>> index 0c0d8857dfce..f293d072368a 100644
>>>> --- a/mm/rmap.c
>>>> +++ b/mm/rmap.c
>>>> @@ -1278,31 +1278,45 @@ void page_add_anon_rmap(struct page *page, struct
>>>> vm_area_struct *vma,
>>>>     * This means the inc-and-test can be bypassed.
>>>>     * The folio does not have to be locked.
>>>>     *
>>>> - * If the folio is large, it is accounted as a THP.  As the folio
>>>> + * If the folio is pmd-mappable, it is accounted as a THP.  As the folio
>>>>     * is new, it's assumed to be mapped exclusively by a single process.
>>>>     */
>>>>    void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct
>>>> *vma,
>>>>            unsigned long address)
>>>>    {
>>>> -    int nr;
>>>> +    int nr = folio_nr_pages(folio);
>>>>
>>>> -    VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
>>>> +    VM_BUG_ON_VMA(address < vma->vm_start ||
>>>> +            address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
>>>>        __folio_set_swapbacked(folio);
>>>>
>>>> -    if (likely(!folio_test_pmd_mappable(folio))) {
>>>> +    if (!folio_test_large(folio)) {
>>>
>>> Why remove the "likely" here? The patch itself does not change anything about
>>> that condition.
>>
>> Good question; I'm not sure why. Will have to put it down to bad copy/paste
>> fixup. Will put it back in the next version.
>>
>>>
>>>>            /* increment count (starts at -1) */
>>>>            atomic_set(&folio->_mapcount, 0);
>>>> -        nr = 1;
>>>> +        __page_set_anon_rmap(folio, &folio->page, vma, address, 1);
>>>> +    } else if (!folio_test_pmd_mappable(folio)) {
>>>> +        int i;
>>>> +
>>>> +        for (i = 0; i < nr; i++) {
>>>> +            struct page *page = folio_page(folio, i);
>>>> +
>>>> +            /* increment count (starts at -1) */
>>>> +            atomic_set(&page->_mapcount, 0);
>>>> +            __page_set_anon_rmap(folio, page, vma,
>>>> +                    address + (i << PAGE_SHIFT), 1);
>>>> +        }
>>>> +
>>>> +        /* increment count (starts at 0) */
>>>
>>> That comment is a bit misleading. We're not talking about a mapcount as in the
>>> other cases here.
>>
>> Correct, I'm talking about _nr_pages_mapped, which starts 0, not -1 like
>> _mapcount. The comment was intended to be in the style used in other similar
>> places in rmap.c. I could change it to: "_nr_pages_mapped is 0-based, so set it
>> to the number of pages in the folio" or remove it entirely? What do you prefer?
>>
> 
> We only have to comment what's weird, not what's normal.
> 
> IOW, we also didn't have such a comment in the existing code when doing
> atomic_set(&folio->_nr_pages_mapped, COMPOUND_MAPPED);
> 
> 
> What might make sense here is a simple
> 
> "All pages of the folio are PTE-mapped."
> 

ACK - thanks.
  

Patch

diff --git a/mm/rmap.c b/mm/rmap.c
index 0c0d8857dfce..f293d072368a 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1278,31 +1278,45 @@  void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
  * This means the inc-and-test can be bypassed.
  * The folio does not have to be locked.
  *
- * If the folio is large, it is accounted as a THP.  As the folio
+ * If the folio is pmd-mappable, it is accounted as a THP.  As the folio
  * is new, it's assumed to be mapped exclusively by a single process.
  */
 void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
 		unsigned long address)
 {
-	int nr;
+	int nr = folio_nr_pages(folio);

-	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
+	VM_BUG_ON_VMA(address < vma->vm_start ||
+			address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
 	__folio_set_swapbacked(folio);

-	if (likely(!folio_test_pmd_mappable(folio))) {
+	if (!folio_test_large(folio)) {
 		/* increment count (starts at -1) */
 		atomic_set(&folio->_mapcount, 0);
-		nr = 1;
+		__page_set_anon_rmap(folio, &folio->page, vma, address, 1);
+	} else if (!folio_test_pmd_mappable(folio)) {
+		int i;
+
+		for (i = 0; i < nr; i++) {
+			struct page *page = folio_page(folio, i);
+
+			/* increment count (starts at -1) */
+			atomic_set(&page->_mapcount, 0);
+			__page_set_anon_rmap(folio, page, vma,
+					address + (i << PAGE_SHIFT), 1);
+		}
+
+		/* increment count (starts at 0) */
+		atomic_set(&folio->_nr_pages_mapped, nr);
 	} else {
 		/* increment count (starts at -1) */
 		atomic_set(&folio->_entire_mapcount, 0);
 		atomic_set(&folio->_nr_pages_mapped, COMPOUND_MAPPED);
-		nr = folio_nr_pages(folio);
+		__page_set_anon_rmap(folio, &folio->page, vma, address, 1);
 		__lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr);
 	}

 	__lruvec_stat_mod_folio(folio, NR_ANON_MAPPED, nr);
-	__page_set_anon_rmap(folio, &folio->page, vma, address, 1);
 }

 /**