[v6,7/9] arm64/mm: Override arch_wants_pte_order()

Message ID 20230929114421.3761121-8-ryan.roberts@arm.com
State New
Headers
Series variable-order, large folios for anonymous memory |

Commit Message

Ryan Roberts Sept. 29, 2023, 11:44 a.m. UTC
  Define an arch-specific override of arch_wants_pte_order() so that when
anon_orders=recommend is set, large folios will be allocated for
anonymous memory with an order that is compatible with arm64's HPA uarch
feature.

Reviewed-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/pgtable.h | 10 ++++++++++
 1 file changed, 10 insertions(+)
  

Comments

Catalin Marinas Oct. 2, 2023, 3:21 p.m. UTC | #1
On Fri, Sep 29, 2023 at 12:44:18PM +0100, Ryan Roberts wrote:
> Define an arch-specific override of arch_wants_pte_order() so that when
> anon_orders=recommend is set, large folios will be allocated for
> anonymous memory with an order that is compatible with arm64's HPA uarch
> feature.
> 
> Reviewed-by: Yu Zhao <yuzhao@google.com>
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 7f7d9b1df4e5..e3d2449dec5c 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -1110,6 +1110,16 @@ extern pte_t ptep_modify_prot_start(struct vm_area_struct *vma,
>  extern void ptep_modify_prot_commit(struct vm_area_struct *vma,
>  				    unsigned long addr, pte_t *ptep,
>  				    pte_t old_pte, pte_t new_pte);
> +
> +#define arch_wants_pte_order arch_wants_pte_order
> +static inline int arch_wants_pte_order(void)
> +{
> +	/*
> +	 * Many arm64 CPUs support hardware page aggregation (HPA), which can
> +	 * coalesce 4 contiguous pages into a single TLB entry.
> +	 */
> +	return 2;
> +}

I haven't followed the discussions on previous revisions of this series
but I wonder why not return a bitmap from arch_wants_pte_order(). For
arm64 we may want an order 6 at some point (contiguous ptes) with a
fallback to order 2 as the next best.
  
Ryan Roberts Oct. 3, 2023, 7:32 a.m. UTC | #2
On 02/10/2023 16:21, Catalin Marinas wrote:
> On Fri, Sep 29, 2023 at 12:44:18PM +0100, Ryan Roberts wrote:
>> Define an arch-specific override of arch_wants_pte_order() so that when
>> anon_orders=recommend is set, large folios will be allocated for
>> anonymous memory with an order that is compatible with arm64's HPA uarch
>> feature.
>>
>> Reviewed-by: Yu Zhao <yuzhao@google.com>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> 
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> 
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index 7f7d9b1df4e5..e3d2449dec5c 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -1110,6 +1110,16 @@ extern pte_t ptep_modify_prot_start(struct vm_area_struct *vma,
>>  extern void ptep_modify_prot_commit(struct vm_area_struct *vma,
>>  				    unsigned long addr, pte_t *ptep,
>>  				    pte_t old_pte, pte_t new_pte);
>> +
>> +#define arch_wants_pte_order arch_wants_pte_order
>> +static inline int arch_wants_pte_order(void)
>> +{
>> +	/*
>> +	 * Many arm64 CPUs support hardware page aggregation (HPA), which can
>> +	 * coalesce 4 contiguous pages into a single TLB entry.
>> +	 */
>> +	return 2;
>> +}
> 
> I haven't followed the discussions on previous revisions of this series
> but I wonder why not return a bitmap from arch_wants_pte_order(). For
> arm64 we may want an order 6 at some point (contiguous ptes) with a
> fallback to order 2 as the next best.
> 

This sounds like good idea to me - I'll implement it, assuming there is a next
rev. (Or in the unlikely event that this is the only pending change, I'd rather
defer it to when we actually need it with the contpte series).

This is just a hangover from the "MVP" approach that I was persuing in v5, where
we didn't want to configure too many orders for fear of fragmentation. But in v6
I've introduced UABI to configure the set of orders, and this function feeds
into the special "recommend" set. So I think it is appropriate that this API
allows expression of multiple orders as you suggest.

Side note: I don't think order-6 is ever a contpte size? Its order-4 for 4K,
order-7 for 16k and order-5 for 64k.
  
Catalin Marinas Oct. 3, 2023, 12:05 p.m. UTC | #3
On Tue, Oct 03, 2023 at 08:32:29AM +0100, Ryan Roberts wrote:
> On 02/10/2023 16:21, Catalin Marinas wrote:
> > On Fri, Sep 29, 2023 at 12:44:18PM +0100, Ryan Roberts wrote:
> >> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> >> index 7f7d9b1df4e5..e3d2449dec5c 100644
> >> --- a/arch/arm64/include/asm/pgtable.h
> >> +++ b/arch/arm64/include/asm/pgtable.h
> >> @@ -1110,6 +1110,16 @@ extern pte_t ptep_modify_prot_start(struct vm_area_struct *vma,
> >>  extern void ptep_modify_prot_commit(struct vm_area_struct *vma,
> >>  				    unsigned long addr, pte_t *ptep,
> >>  				    pte_t old_pte, pte_t new_pte);
> >> +
> >> +#define arch_wants_pte_order arch_wants_pte_order
> >> +static inline int arch_wants_pte_order(void)
> >> +{
> >> +	/*
> >> +	 * Many arm64 CPUs support hardware page aggregation (HPA), which can
> >> +	 * coalesce 4 contiguous pages into a single TLB entry.
> >> +	 */
> >> +	return 2;
> >> +}
> > 
> > I haven't followed the discussions on previous revisions of this series
> > but I wonder why not return a bitmap from arch_wants_pte_order(). For
> > arm64 we may want an order 6 at some point (contiguous ptes) with a
> > fallback to order 2 as the next best.
> 
> This sounds like good idea to me - I'll implement it, assuming there is a next
> rev. (Or in the unlikely event that this is the only pending change, I'd rather
> defer it to when we actually need it with the contpte series).

Fine by me, at the moment there wouldn't be any user, so a patch on top
later would do.

> Side note: I don't think order-6 is ever a contpte size? Its order-4 for 4K,
> order-7 for 16k and order-5 for 64k.

Yes, it's order-4 for 4K pages (I was thinking too much of the "64" in 64KB).
  

Patch

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 7f7d9b1df4e5..e3d2449dec5c 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1110,6 +1110,16 @@  extern pte_t ptep_modify_prot_start(struct vm_area_struct *vma,
 extern void ptep_modify_prot_commit(struct vm_area_struct *vma,
 				    unsigned long addr, pte_t *ptep,
 				    pte_t old_pte, pte_t new_pte);
+
+#define arch_wants_pte_order arch_wants_pte_order
+static inline int arch_wants_pte_order(void)
+{
+	/*
+	 * Many arm64 CPUs support hardware page aggregation (HPA), which can
+	 * coalesce 4 contiguous pages into a single TLB entry.
+	 */
+	return 2;
+}
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_PGTABLE_H */