Message ID | 20230929114421.3761121-8-ryan.roberts@arm.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp3962520vqu; Fri, 29 Sep 2023 04:47:45 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH8AfW7YNmLexx20T/qZl1SIXSUKvCAWdeniODKmO4QnqsGZUFSaGWgVQ64EOfxcCVqPMJy X-Received: by 2002:a05:6a21:193:b0:15e:22a4:b897 with SMTP id le19-20020a056a21019300b0015e22a4b897mr7485433pzb.10.1695988065399; Fri, 29 Sep 2023 04:47:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695988065; cv=none; d=google.com; s=arc-20160816; b=CXZNz2q/1zkhZ5IutiM5vbXdmhvQsXBWdYwokvm6RRQWhZVaLkJ7VA13zr045+90se YdY0kwdhc9MFmT5OoeSQATH6j4ujWh7TzJeT4u8epUL0p9ixA3hMFob1nmMUQZs84lei XlSt+xfBCP1xZWu8/G4Ps+0W2CaR6vkX8zy+J6qbbabeAP/umOorteTJFRTunxpoDjCv fC2VCr2JgSHQWCSo1AzBlkeG6VyAw9qCQFEvLuDtoG+MUMt09bAo+MhWUOeifVQE66jW Hk6b5lgGqDs1UQ4JFyc1vMYd3RUp2xtUTmGjOzfooOotGANgfVPNdEq1uKxkGnnnFg9g 8HOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=R12DE0IaGz6VExNqcRsTIxpYiYIG/gDiSa9v3a/m4ts=; fh=smwoDWwCmhzJYttwqG7Q1aXJ58+o1gEThPYfICPOx+Y=; b=GPZXGG3L75b8vkTF8VhOTRJfakAk6JYBv4NXYX33mSmNnWZebOFuBA5U+JN1YmZRKM H5R2YOYjVgvaF5abtHrWqHHn2XjBQzzn5RmUsuGASkwgAjV/fZe3X/WptnDl3iejc75a gIsj98mYNj9E08sOA1m05j90f7hTfRXW/wIxH+A/0H3E215Zym9ruBoE+/hlL6MiSETH 5StcT9l7+WQsBg8XT37dGcvu8cr9559vk6yFgMXzmR+dzilpgEVruNaIh6fnVx/RqEdy XugnIPOwzKZkx+kLgtXeuX7jLONd17n4N6zzsmjpwLaUYx32kNhCdNyrkCQ9yrTzRNNE n5jw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from pete.vger.email (pete.vger.email. [2620:137:e000::3:6]) by mx.google.com with ESMTPS id g18-20020a631112000000b00563d9ff5158si20464724pgl.350.2023.09.29.04.47.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Sep 2023 04:47:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) client-ip=2620:137:e000::3:6; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id 8D69480301FA; Fri, 29 Sep 2023 04:46:17 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233219AbjI2LpY (ORCPT <rfc822;pwkd43@gmail.com> + 20 others); Fri, 29 Sep 2023 07:45:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52462 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233105AbjI2Lo7 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 29 Sep 2023 07:44:59 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 0C335CE8 for <linux-kernel@vger.kernel.org>; Fri, 29 Sep 2023 04:44:53 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0EAF31FB; Fri, 29 Sep 2023 04:45:31 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2C6323F59C; Fri, 29 Sep 2023 04:44:50 -0700 (PDT) From: Ryan Roberts <ryan.roberts@arm.com> To: Andrew Morton <akpm@linux-foundation.org>, Matthew Wilcox <willy@infradead.org>, Yin Fengwei <fengwei.yin@intel.com>, David Hildenbrand <david@redhat.com>, Yu Zhao <yuzhao@google.com>, Catalin Marinas <catalin.marinas@arm.com>, Anshuman Khandual <anshuman.khandual@arm.com>, Yang Shi <shy828301@gmail.com>, "Huang, Ying" <ying.huang@intel.com>, Zi Yan <ziy@nvidia.com>, Luis Chamberlain <mcgrof@kernel.org>, Itaru Kitayama <itaru.kitayama@gmail.com>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, John Hubbard <jhubbard@nvidia.com>, David Rientjes <rientjes@google.com>, Vlastimil Babka <vbabka@suse.cz>, Hugh Dickins <hughd@google.com> Cc: Ryan Roberts <ryan.roberts@arm.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH v6 7/9] arm64/mm: Override arch_wants_pte_order() Date: Fri, 29 Sep 2023 12:44:18 +0100 Message-Id: <20230929114421.3761121-8-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230929114421.3761121-1-ryan.roberts@arm.com> References: <20230929114421.3761121-1-ryan.roberts@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Fri, 29 Sep 2023 04:46:17 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1778372381671344692 X-GMAIL-MSGID: 1778372381671344692 |
Series |
variable-order, large folios for anonymous memory
|
|
Commit Message
Ryan Roberts
Sept. 29, 2023, 11:44 a.m. UTC
Define an arch-specific override of arch_wants_pte_order() so that when anon_orders=recommend is set, large folios will be allocated for anonymous memory with an order that is compatible with arm64's HPA uarch feature. Reviewed-by: Yu Zhao <yuzhao@google.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> --- arch/arm64/include/asm/pgtable.h | 10 ++++++++++ 1 file changed, 10 insertions(+)
Comments
On Fri, Sep 29, 2023 at 12:44:18PM +0100, Ryan Roberts wrote: > Define an arch-specific override of arch_wants_pte_order() so that when > anon_orders=recommend is set, large folios will be allocated for > anonymous memory with an order that is compatible with arm64's HPA uarch > feature. > > Reviewed-by: Yu Zhao <yuzhao@google.com> > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > index 7f7d9b1df4e5..e3d2449dec5c 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -1110,6 +1110,16 @@ extern pte_t ptep_modify_prot_start(struct vm_area_struct *vma, > extern void ptep_modify_prot_commit(struct vm_area_struct *vma, > unsigned long addr, pte_t *ptep, > pte_t old_pte, pte_t new_pte); > + > +#define arch_wants_pte_order arch_wants_pte_order > +static inline int arch_wants_pte_order(void) > +{ > + /* > + * Many arm64 CPUs support hardware page aggregation (HPA), which can > + * coalesce 4 contiguous pages into a single TLB entry. > + */ > + return 2; > +} I haven't followed the discussions on previous revisions of this series but I wonder why not return a bitmap from arch_wants_pte_order(). For arm64 we may want an order 6 at some point (contiguous ptes) with a fallback to order 2 as the next best.
On 02/10/2023 16:21, Catalin Marinas wrote: > On Fri, Sep 29, 2023 at 12:44:18PM +0100, Ryan Roberts wrote: >> Define an arch-specific override of arch_wants_pte_order() so that when >> anon_orders=recommend is set, large folios will be allocated for >> anonymous memory with an order that is compatible with arm64's HPA uarch >> feature. >> >> Reviewed-by: Yu Zhao <yuzhao@google.com> >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> > > Acked-by: Catalin Marinas <catalin.marinas@arm.com> > >> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h >> index 7f7d9b1df4e5..e3d2449dec5c 100644 >> --- a/arch/arm64/include/asm/pgtable.h >> +++ b/arch/arm64/include/asm/pgtable.h >> @@ -1110,6 +1110,16 @@ extern pte_t ptep_modify_prot_start(struct vm_area_struct *vma, >> extern void ptep_modify_prot_commit(struct vm_area_struct *vma, >> unsigned long addr, pte_t *ptep, >> pte_t old_pte, pte_t new_pte); >> + >> +#define arch_wants_pte_order arch_wants_pte_order >> +static inline int arch_wants_pte_order(void) >> +{ >> + /* >> + * Many arm64 CPUs support hardware page aggregation (HPA), which can >> + * coalesce 4 contiguous pages into a single TLB entry. >> + */ >> + return 2; >> +} > > I haven't followed the discussions on previous revisions of this series > but I wonder why not return a bitmap from arch_wants_pte_order(). For > arm64 we may want an order 6 at some point (contiguous ptes) with a > fallback to order 2 as the next best. > This sounds like good idea to me - I'll implement it, assuming there is a next rev. (Or in the unlikely event that this is the only pending change, I'd rather defer it to when we actually need it with the contpte series). This is just a hangover from the "MVP" approach that I was persuing in v5, where we didn't want to configure too many orders for fear of fragmentation. But in v6 I've introduced UABI to configure the set of orders, and this function feeds into the special "recommend" set. So I think it is appropriate that this API allows expression of multiple orders as you suggest. Side note: I don't think order-6 is ever a contpte size? Its order-4 for 4K, order-7 for 16k and order-5 for 64k.
On Tue, Oct 03, 2023 at 08:32:29AM +0100, Ryan Roberts wrote: > On 02/10/2023 16:21, Catalin Marinas wrote: > > On Fri, Sep 29, 2023 at 12:44:18PM +0100, Ryan Roberts wrote: > >> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > >> index 7f7d9b1df4e5..e3d2449dec5c 100644 > >> --- a/arch/arm64/include/asm/pgtable.h > >> +++ b/arch/arm64/include/asm/pgtable.h > >> @@ -1110,6 +1110,16 @@ extern pte_t ptep_modify_prot_start(struct vm_area_struct *vma, > >> extern void ptep_modify_prot_commit(struct vm_area_struct *vma, > >> unsigned long addr, pte_t *ptep, > >> pte_t old_pte, pte_t new_pte); > >> + > >> +#define arch_wants_pte_order arch_wants_pte_order > >> +static inline int arch_wants_pte_order(void) > >> +{ > >> + /* > >> + * Many arm64 CPUs support hardware page aggregation (HPA), which can > >> + * coalesce 4 contiguous pages into a single TLB entry. > >> + */ > >> + return 2; > >> +} > > > > I haven't followed the discussions on previous revisions of this series > > but I wonder why not return a bitmap from arch_wants_pte_order(). For > > arm64 we may want an order 6 at some point (contiguous ptes) with a > > fallback to order 2 as the next best. > > This sounds like good idea to me - I'll implement it, assuming there is a next > rev. (Or in the unlikely event that this is the only pending change, I'd rather > defer it to when we actually need it with the contpte series). Fine by me, at the moment there wouldn't be any user, so a patch on top later would do. > Side note: I don't think order-6 is ever a contpte size? Its order-4 for 4K, > order-7 for 16k and order-5 for 64k. Yes, it's order-4 for 4K pages (I was thinking too much of the "64" in 64KB).
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 7f7d9b1df4e5..e3d2449dec5c 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1110,6 +1110,16 @@ extern pte_t ptep_modify_prot_start(struct vm_area_struct *vma, extern void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, pte_t old_pte, pte_t new_pte); + +#define arch_wants_pte_order arch_wants_pte_order +static inline int arch_wants_pte_order(void) +{ + /* + * Many arm64 CPUs support hardware page aggregation (HPA), which can + * coalesce 4 contiguous pages into a single TLB entry. + */ + return 2; +} #endif /* !__ASSEMBLY__ */ #endif /* __ASM_PGTABLE_H */