Message ID | 20230703135330.1865927-3-ryan.roberts@arm.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9f45:0:b0:3ea:f831:8777 with SMTP id v5csp543809vqx; Mon, 3 Jul 2023 07:00:18 -0700 (PDT) X-Google-Smtp-Source: APBJJlF8ELrabh8t8I/tCn1tD6JiuzrlGgwZwybRL1oG5yJYFYJwx3yOnFaHf8HqkNgLG66S7cVv X-Received: by 2002:a05:6a00:280a:b0:676:2a5c:7bc5 with SMTP id bl10-20020a056a00280a00b006762a5c7bc5mr11850014pfb.1.1688392817617; Mon, 03 Jul 2023 07:00:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688392817; cv=none; d=google.com; s=arc-20160816; b=by+nOyG0fuyubRXYI2uB3TnULuFN3Z4rmQwohlzidr4fTafnpBGn+PtCY+nnZHzl1x WwK5OzSQUmkjKyea/udFNHgOccqLr2GQzocQ0VVy4Z2O0cH7HiIES0pmnsMjWgF7LFsg 2N7n/f2pVO9BVa8aV/rNJehTOZiWQaqhZv8QbFLqqF2NO6pm/1M6oveW6Faqbf+t3yy4 q2RaNGDBj6AkwK7JpLno/gkgcBc1u0WvmMWWVlLeqr9Jc1Q3LhAp4lcbf/SLtRCezYI8 gACG2dV46QMe6h9yhJJt0ZVAM7oPREb4kDGM4c619cadEWIoAt6v03TAkJvMijRTFkEX nILg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=qDlU2rGsKBPfwZSw0bLPzWmjupT8xRy+8rDXmR7/VPM=; fh=W2gTym8Tes+4XZMJyNdq9yGC4n1yJ+vTSxJnRC1NAwA=; b=xxXd3AkQciKtwCxEPzEAvtm4lK0B8f0bRUuVC+vRUFzpiYZ4pdMeSmT8C3XrDKOm2U oA5gBNsLOzFBJStTUs5y6e345dWUA+kQnfn7az0fqlXHoNSBpPKbarNb2zMy3AVEwcte qh7fcdcrYstw/7EnPF1dfjR+1l1A38D5HK46P/yB3hHE7RpCMEaq1Xmu+E/Al9BSxWby 2tiZV2zjArOw0PtAS0zstHha/cYV8oBiXw/jE/s/Aqqol+oeY/p0ngFF5vyCvAzxfDsq URDVoHauTILt6W+lbEG13pAn17Rxws7xnU2pe8l20WGPy5Z2zj9KD2r+wvBfTfi/J19w SFsA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b21-20020a056a000cd500b0067c2b7b23e3si13719076pfv.345.2023.07.03.06.59.51; Mon, 03 Jul 2023 07:00:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231319AbjGCNx7 (ORCPT <rfc822;ivan.orlov0322@gmail.com> + 99 others); Mon, 3 Jul 2023 09:53:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35520 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229849AbjGCNx4 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 3 Jul 2023 09:53:56 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id EEBF310D9 for <linux-kernel@vger.kernel.org>; Mon, 3 Jul 2023 06:53:46 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6A42A1424; Mon, 3 Jul 2023 06:54:29 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E6A393F73F; Mon, 3 Jul 2023 06:53:44 -0700 (PDT) From: Ryan Roberts <ryan.roberts@arm.com> To: Andrew Morton <akpm@linux-foundation.org>, Matthew Wilcox <willy@infradead.org>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Yin Fengwei <fengwei.yin@intel.com>, David Hildenbrand <david@redhat.com>, Yu Zhao <yuzhao@google.com>, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, Anshuman Khandual <anshuman.khandual@arm.com>, Yang Shi <shy828301@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com>, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 2/5] mm: Allow deferred splitting of arbitrary large anon folios Date: Mon, 3 Jul 2023 14:53:27 +0100 Message-Id: <20230703135330.1865927-3-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230703135330.1865927-1-ryan.roberts@arm.com> References: <20230703135330.1865927-1-ryan.roberts@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1770408187148080429?= X-GMAIL-MSGID: =?utf-8?q?1770408187148080429?= |
Series |
variable-order, large folios for anonymous memory
|
|
Commit Message
Ryan Roberts
July 3, 2023, 1:53 p.m. UTC
With the introduction of large folios for anonymous memory, we would like to be able to split them when they have unmapped subpages, in order to free those unused pages under memory pressure. So remove the artificial requirement that the large folio needed to be at least PMD-sized. Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Yu Zhao <yuzhao@google.com> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com> --- mm/rmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Comments
Ryan Roberts <ryan.roberts@arm.com> writes: > With the introduction of large folios for anonymous memory, we would > like to be able to split them when they have unmapped subpages, in order > to free those unused pages under memory pressure. So remove the > artificial requirement that the large folio needed to be at least > PMD-sized. > > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> > Reviewed-by: Yu Zhao <yuzhao@google.com> > Reviewed-by: Yin Fengwei <fengwei.yin@intel.com> > --- > mm/rmap.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/rmap.c b/mm/rmap.c > index 82ef5ba363d1..bbcb2308a1c5 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1474,7 +1474,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, > * page of the folio is unmapped and at least one page > * is still mapped. > */ > - if (folio_test_pmd_mappable(folio) && folio_test_anon(folio)) > + if (folio_test_large(folio) && folio_test_anon(folio)) > if (!compound || nr < nr_pmdmapped) > deferred_split_folio(folio); > } One possible issue is that even for large folios mapped only in one process, in zap_pte_range(), we will always call deferred_split_folio() unnecessarily before freeing a large folio. Best Regards, Huang, Ying
Somehow I managed to reply only to the linux-arm-kernel list on first attempt so resending: On 07/07/2023 09:21, Huang, Ying wrote: > Ryan Roberts <ryan.roberts@arm.com> writes: > >> With the introduction of large folios for anonymous memory, we would >> like to be able to split them when they have unmapped subpages, in order >> to free those unused pages under memory pressure. So remove the >> artificial requirement that the large folio needed to be at least >> PMD-sized. >> >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >> Reviewed-by: Yu Zhao <yuzhao@google.com> >> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com> >> --- >> mm/rmap.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/mm/rmap.c b/mm/rmap.c >> index 82ef5ba363d1..bbcb2308a1c5 100644 >> --- a/mm/rmap.c >> +++ b/mm/rmap.c >> @@ -1474,7 +1474,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, >> * page of the folio is unmapped and at least one page >> * is still mapped. >> */ >> - if (folio_test_pmd_mappable(folio) && folio_test_anon(folio)) >> + if (folio_test_large(folio) && folio_test_anon(folio)) >> if (!compound || nr < nr_pmdmapped) >> deferred_split_folio(folio); >> } > > One possible issue is that even for large folios mapped only in one > process, in zap_pte_range(), we will always call deferred_split_folio() > unnecessarily before freeing a large folio. Hi Huang, thanks for reviewing! I have a patch that solves this problem by determining a range of ptes covered by a single folio and doing a "batch zap". This prevents the need to add the folio to the deferred split queue, only to remove it again shortly afterwards. This reduces lock contention and I can measure a performance improvement for the kernel compilation benchmark. See [1]. However, I decided to remove it from this patch set on Yu Zhao's advice. We are aiming for the minimal patch set to start with and wanted to focus people on that. I intend to submit it separately later on. [1] https://lore.kernel.org/linux-mm/20230626171430.3167004-8-ryan.roberts@arm.com/ Thanks, Ryan > > Best Regards, > Huang, Ying > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Ryan Roberts <ryan.roberts@arm.com> writes: > Somehow I managed to reply only to the linux-arm-kernel list on first attempt so > resending: > > On 07/07/2023 09:21, Huang, Ying wrote: >> Ryan Roberts <ryan.roberts@arm.com> writes: >> >>> With the introduction of large folios for anonymous memory, we would >>> like to be able to split them when they have unmapped subpages, in order >>> to free those unused pages under memory pressure. So remove the >>> artificial requirement that the large folio needed to be at least >>> PMD-sized. >>> >>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >>> Reviewed-by: Yu Zhao <yuzhao@google.com> >>> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com> >>> --- >>> mm/rmap.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/mm/rmap.c b/mm/rmap.c >>> index 82ef5ba363d1..bbcb2308a1c5 100644 >>> --- a/mm/rmap.c >>> +++ b/mm/rmap.c >>> @@ -1474,7 +1474,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, >>> * page of the folio is unmapped and at least one page >>> * is still mapped. >>> */ >>> - if (folio_test_pmd_mappable(folio) && folio_test_anon(folio)) >>> + if (folio_test_large(folio) && folio_test_anon(folio)) >>> if (!compound || nr < nr_pmdmapped) >>> deferred_split_folio(folio); >>> } >> >> One possible issue is that even for large folios mapped only in one >> process, in zap_pte_range(), we will always call deferred_split_folio() >> unnecessarily before freeing a large folio. > > Hi Huang, thanks for reviewing! > > I have a patch that solves this problem by determining a range of ptes covered > by a single folio and doing a "batch zap". This prevents the need to add the > folio to the deferred split queue, only to remove it again shortly afterwards. > This reduces lock contention and I can measure a performance improvement for the > kernel compilation benchmark. See [1]. > > However, I decided to remove it from this patch set on Yu Zhao's advice. We are > aiming for the minimal patch set to start with and wanted to focus people on > that. I intend to submit it separately later on. > > [1] https://lore.kernel.org/linux-mm/20230626171430.3167004-8-ryan.roberts@arm.com/ Thanks for your information! "batch zap" can solve the problem. And, I agree with Matthew's comments to fix the large folios interaction issues before merging the patches to allocate large folios as in the following email. https://lore.kernel.org/linux-mm/ZKVdUDuwNWDUCWc5@casper.infradead.org/ If so, we don't need to introduce the above problem or a large patchset. Best Regards, Huang, Ying
On 10/07/2023 06:37, Huang, Ying wrote: > Ryan Roberts <ryan.roberts@arm.com> writes: > >> Somehow I managed to reply only to the linux-arm-kernel list on first attempt so >> resending: >> >> On 07/07/2023 09:21, Huang, Ying wrote: >>> Ryan Roberts <ryan.roberts@arm.com> writes: >>> >>>> With the introduction of large folios for anonymous memory, we would >>>> like to be able to split them when they have unmapped subpages, in order >>>> to free those unused pages under memory pressure. So remove the >>>> artificial requirement that the large folio needed to be at least >>>> PMD-sized. >>>> >>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >>>> Reviewed-by: Yu Zhao <yuzhao@google.com> >>>> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com> >>>> --- >>>> mm/rmap.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/mm/rmap.c b/mm/rmap.c >>>> index 82ef5ba363d1..bbcb2308a1c5 100644 >>>> --- a/mm/rmap.c >>>> +++ b/mm/rmap.c >>>> @@ -1474,7 +1474,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, >>>> * page of the folio is unmapped and at least one page >>>> * is still mapped. >>>> */ >>>> - if (folio_test_pmd_mappable(folio) && folio_test_anon(folio)) >>>> + if (folio_test_large(folio) && folio_test_anon(folio)) >>>> if (!compound || nr < nr_pmdmapped) >>>> deferred_split_folio(folio); >>>> } >>> >>> One possible issue is that even for large folios mapped only in one >>> process, in zap_pte_range(), we will always call deferred_split_folio() >>> unnecessarily before freeing a large folio. >> >> Hi Huang, thanks for reviewing! >> >> I have a patch that solves this problem by determining a range of ptes covered >> by a single folio and doing a "batch zap". This prevents the need to add the >> folio to the deferred split queue, only to remove it again shortly afterwards. >> This reduces lock contention and I can measure a performance improvement for the >> kernel compilation benchmark. See [1]. >> >> However, I decided to remove it from this patch set on Yu Zhao's advice. We are >> aiming for the minimal patch set to start with and wanted to focus people on >> that. I intend to submit it separately later on. >> >> [1] https://lore.kernel.org/linux-mm/20230626171430.3167004-8-ryan.roberts@arm.com/ > > Thanks for your information! "batch zap" can solve the problem. > > And, I agree with Matthew's comments to fix the large folios interaction > issues before merging the patches to allocate large folios as in the > following email. > > https://lore.kernel.org/linux-mm/ZKVdUDuwNWDUCWc5@casper.infradead.org/ > > If so, we don't need to introduce the above problem or a large patchset. I appreciate Matthew's and others position about not wanting to merge a minimal implementation while there are some fundamental features (e.g. compaction) it doesn't play well with - I'm working to create a definitive list so these items can be tracked and tackled. That said, I don't see this "batch zap" patch as an example of this. It's just a performance enhancement that improves things even further than large anon folios on their own. I'd rather concentrate on the core changes first then deal with this type of thing later. Does that work for you? > > Best Regards, > Huang, Ying
Ryan Roberts <ryan.roberts@arm.com> writes: > On 10/07/2023 06:37, Huang, Ying wrote: >> Ryan Roberts <ryan.roberts@arm.com> writes: >> >>> Somehow I managed to reply only to the linux-arm-kernel list on first attempt so >>> resending: >>> >>> On 07/07/2023 09:21, Huang, Ying wrote: >>>> Ryan Roberts <ryan.roberts@arm.com> writes: >>>> >>>>> With the introduction of large folios for anonymous memory, we would >>>>> like to be able to split them when they have unmapped subpages, in order >>>>> to free those unused pages under memory pressure. So remove the >>>>> artificial requirement that the large folio needed to be at least >>>>> PMD-sized. >>>>> >>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >>>>> Reviewed-by: Yu Zhao <yuzhao@google.com> >>>>> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com> >>>>> --- >>>>> mm/rmap.c | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/mm/rmap.c b/mm/rmap.c >>>>> index 82ef5ba363d1..bbcb2308a1c5 100644 >>>>> --- a/mm/rmap.c >>>>> +++ b/mm/rmap.c >>>>> @@ -1474,7 +1474,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, >>>>> * page of the folio is unmapped and at least one page >>>>> * is still mapped. >>>>> */ >>>>> - if (folio_test_pmd_mappable(folio) && folio_test_anon(folio)) >>>>> + if (folio_test_large(folio) && folio_test_anon(folio)) >>>>> if (!compound || nr < nr_pmdmapped) >>>>> deferred_split_folio(folio); >>>>> } >>>> >>>> One possible issue is that even for large folios mapped only in one >>>> process, in zap_pte_range(), we will always call deferred_split_folio() >>>> unnecessarily before freeing a large folio. >>> >>> Hi Huang, thanks for reviewing! >>> >>> I have a patch that solves this problem by determining a range of ptes covered >>> by a single folio and doing a "batch zap". This prevents the need to add the >>> folio to the deferred split queue, only to remove it again shortly afterwards. >>> This reduces lock contention and I can measure a performance improvement for the >>> kernel compilation benchmark. See [1]. >>> >>> However, I decided to remove it from this patch set on Yu Zhao's advice. We are >>> aiming for the minimal patch set to start with and wanted to focus people on >>> that. I intend to submit it separately later on. >>> >>> [1] https://lore.kernel.org/linux-mm/20230626171430.3167004-8-ryan.roberts@arm.com/ >> >> Thanks for your information! "batch zap" can solve the problem. >> >> And, I agree with Matthew's comments to fix the large folios interaction >> issues before merging the patches to allocate large folios as in the >> following email. >> >> https://lore.kernel.org/linux-mm/ZKVdUDuwNWDUCWc5@casper.infradead.org/ >> >> If so, we don't need to introduce the above problem or a large patchset. > > I appreciate Matthew's and others position about not wanting to merge a minimal > implementation while there are some fundamental features (e.g. compaction) it > doesn't play well with - I'm working to create a definitive list so these items > can be tracked and tackled. Good to know this, Thanks! > That said, I don't see this "batch zap" patch as an example of this. It's just a > performance enhancement that improves things even further than large anon folios > on their own. I'd rather concentrate on the core changes first then deal with > this type of thing later. Does that work for you? IIUC, allocating large folios upon page fault depends on splitting large folios in page_remove_rmap() to avoid memory wastage. Splitting large folios in page_remove_rmap() depends on "batch zap" to avoid performance regression in zap_pte_range(). So we need them to be done earlier. Or I miss something? Best Regards, Huang, Ying
On 10/07/2023 10:01, Huang, Ying wrote: > Ryan Roberts <ryan.roberts@arm.com> writes: > >> On 10/07/2023 06:37, Huang, Ying wrote: >>> Ryan Roberts <ryan.roberts@arm.com> writes: >>> >>>> Somehow I managed to reply only to the linux-arm-kernel list on first attempt so >>>> resending: >>>> >>>> On 07/07/2023 09:21, Huang, Ying wrote: >>>>> Ryan Roberts <ryan.roberts@arm.com> writes: >>>>> >>>>>> With the introduction of large folios for anonymous memory, we would >>>>>> like to be able to split them when they have unmapped subpages, in order >>>>>> to free those unused pages under memory pressure. So remove the >>>>>> artificial requirement that the large folio needed to be at least >>>>>> PMD-sized. >>>>>> >>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >>>>>> Reviewed-by: Yu Zhao <yuzhao@google.com> >>>>>> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com> >>>>>> --- >>>>>> mm/rmap.c | 2 +- >>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/mm/rmap.c b/mm/rmap.c >>>>>> index 82ef5ba363d1..bbcb2308a1c5 100644 >>>>>> --- a/mm/rmap.c >>>>>> +++ b/mm/rmap.c >>>>>> @@ -1474,7 +1474,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, >>>>>> * page of the folio is unmapped and at least one page >>>>>> * is still mapped. >>>>>> */ >>>>>> - if (folio_test_pmd_mappable(folio) && folio_test_anon(folio)) >>>>>> + if (folio_test_large(folio) && folio_test_anon(folio)) >>>>>> if (!compound || nr < nr_pmdmapped) >>>>>> deferred_split_folio(folio); >>>>>> } >>>>> >>>>> One possible issue is that even for large folios mapped only in one >>>>> process, in zap_pte_range(), we will always call deferred_split_folio() >>>>> unnecessarily before freeing a large folio. >>>> >>>> Hi Huang, thanks for reviewing! >>>> >>>> I have a patch that solves this problem by determining a range of ptes covered >>>> by a single folio and doing a "batch zap". This prevents the need to add the >>>> folio to the deferred split queue, only to remove it again shortly afterwards. >>>> This reduces lock contention and I can measure a performance improvement for the >>>> kernel compilation benchmark. See [1]. >>>> >>>> However, I decided to remove it from this patch set on Yu Zhao's advice. We are >>>> aiming for the minimal patch set to start with and wanted to focus people on >>>> that. I intend to submit it separately later on. >>>> >>>> [1] https://lore.kernel.org/linux-mm/20230626171430.3167004-8-ryan.roberts@arm.com/ >>> >>> Thanks for your information! "batch zap" can solve the problem. >>> >>> And, I agree with Matthew's comments to fix the large folios interaction >>> issues before merging the patches to allocate large folios as in the >>> following email. >>> >>> https://lore.kernel.org/linux-mm/ZKVdUDuwNWDUCWc5@casper.infradead.org/ >>> >>> If so, we don't need to introduce the above problem or a large patchset. >> >> I appreciate Matthew's and others position about not wanting to merge a minimal >> implementation while there are some fundamental features (e.g. compaction) it >> doesn't play well with - I'm working to create a definitive list so these items >> can be tracked and tackled. > > Good to know this, Thanks! > >> That said, I don't see this "batch zap" patch as an example of this. It's just a >> performance enhancement that improves things even further than large anon folios >> on their own. I'd rather concentrate on the core changes first then deal with >> this type of thing later. Does that work for you? > > IIUC, allocating large folios upon page fault depends on splitting large > folios in page_remove_rmap() to avoid memory wastage. Splitting large > folios in page_remove_rmap() depends on "batch zap" to avoid performance > regression in zap_pte_range(). So we need them to be done earlier. Or > I miss something? My point was just that large anon folios improves performance significantly overall, despite a small perf regression in zap_pte_range(). That regression is reduced further by a patch from Yin Fengwei to reduce the lock contention [1]. So it doesn't seem urgent to me to get the "batch zap" change in. I'll add it to my list, then prioritize it against the other stuff. [1] https://lore.kernel.org/linux-mm/20230429082759.1600796-1-fengwei.yin@intel.com/ > > Best Regards, > Huang, Ying
Ryan Roberts <ryan.roberts@arm.com> writes: > On 10/07/2023 10:01, Huang, Ying wrote: >> Ryan Roberts <ryan.roberts@arm.com> writes: >> >>> On 10/07/2023 06:37, Huang, Ying wrote: >>>> Ryan Roberts <ryan.roberts@arm.com> writes: >>>> >>>>> Somehow I managed to reply only to the linux-arm-kernel list on first attempt so >>>>> resending: >>>>> >>>>> On 07/07/2023 09:21, Huang, Ying wrote: >>>>>> Ryan Roberts <ryan.roberts@arm.com> writes: >>>>>> >>>>>>> With the introduction of large folios for anonymous memory, we would >>>>>>> like to be able to split them when they have unmapped subpages, in order >>>>>>> to free those unused pages under memory pressure. So remove the >>>>>>> artificial requirement that the large folio needed to be at least >>>>>>> PMD-sized. >>>>>>> >>>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >>>>>>> Reviewed-by: Yu Zhao <yuzhao@google.com> >>>>>>> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com> >>>>>>> --- >>>>>>> mm/rmap.c | 2 +- >>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/mm/rmap.c b/mm/rmap.c >>>>>>> index 82ef5ba363d1..bbcb2308a1c5 100644 >>>>>>> --- a/mm/rmap.c >>>>>>> +++ b/mm/rmap.c >>>>>>> @@ -1474,7 +1474,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, >>>>>>> * page of the folio is unmapped and at least one page >>>>>>> * is still mapped. >>>>>>> */ >>>>>>> - if (folio_test_pmd_mappable(folio) && folio_test_anon(folio)) >>>>>>> + if (folio_test_large(folio) && folio_test_anon(folio)) >>>>>>> if (!compound || nr < nr_pmdmapped) >>>>>>> deferred_split_folio(folio); >>>>>>> } >>>>>> >>>>>> One possible issue is that even for large folios mapped only in one >>>>>> process, in zap_pte_range(), we will always call deferred_split_folio() >>>>>> unnecessarily before freeing a large folio. >>>>> >>>>> Hi Huang, thanks for reviewing! >>>>> >>>>> I have a patch that solves this problem by determining a range of ptes covered >>>>> by a single folio and doing a "batch zap". This prevents the need to add the >>>>> folio to the deferred split queue, only to remove it again shortly afterwards. >>>>> This reduces lock contention and I can measure a performance improvement for the >>>>> kernel compilation benchmark. See [1]. >>>>> >>>>> However, I decided to remove it from this patch set on Yu Zhao's advice. We are >>>>> aiming for the minimal patch set to start with and wanted to focus people on >>>>> that. I intend to submit it separately later on. >>>>> >>>>> [1] https://lore.kernel.org/linux-mm/20230626171430.3167004-8-ryan.roberts@arm.com/ >>>> >>>> Thanks for your information! "batch zap" can solve the problem. >>>> >>>> And, I agree with Matthew's comments to fix the large folios interaction >>>> issues before merging the patches to allocate large folios as in the >>>> following email. >>>> >>>> https://lore.kernel.org/linux-mm/ZKVdUDuwNWDUCWc5@casper.infradead.org/ >>>> >>>> If so, we don't need to introduce the above problem or a large patchset. >>> >>> I appreciate Matthew's and others position about not wanting to merge a minimal >>> implementation while there are some fundamental features (e.g. compaction) it >>> doesn't play well with - I'm working to create a definitive list so these items >>> can be tracked and tackled. >> >> Good to know this, Thanks! >> >>> That said, I don't see this "batch zap" patch as an example of this. It's just a >>> performance enhancement that improves things even further than large anon folios >>> on their own. I'd rather concentrate on the core changes first then deal with >>> this type of thing later. Does that work for you? >> >> IIUC, allocating large folios upon page fault depends on splitting large >> folios in page_remove_rmap() to avoid memory wastage. Splitting large >> folios in page_remove_rmap() depends on "batch zap" to avoid performance >> regression in zap_pte_range(). So we need them to be done earlier. Or >> I miss something? > > My point was just that large anon folios improves performance significantly > overall, despite a small perf regression in zap_pte_range(). That regression is > reduced further by a patch from Yin Fengwei to reduce the lock contention [1]. > So it doesn't seem urgent to me to get the "batch zap" change in. I don't think Fengwei's patch will help much here. Because that patch is to optimize if the folio isn't in deferred split queue, but now the folio will be put in deferred split queue. And I don't think allocating large folios upon page fault is more urgent. We should avoid regression if possible. > I'll add it to my list, then prioritize it against the other stuff. > > [1] https://lore.kernel.org/linux-mm/20230429082759.1600796-1-fengwei.yin@intel.com/ > Best Regards, Huang, Ying
diff --git a/mm/rmap.c b/mm/rmap.c index 82ef5ba363d1..bbcb2308a1c5 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1474,7 +1474,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, * page of the folio is unmapped and at least one page * is still mapped. */ - if (folio_test_pmd_mappable(folio) && folio_test_anon(folio)) + if (folio_test_large(folio) && folio_test_anon(folio)) if (!compound || nr < nr_pmdmapped) deferred_split_folio(folio); }