[v2,0/5] Optimize mmap_exit for large folios

Message ID	20230830095011.1228673-1-ryan.roberts@arm.com
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; From: Ryan Roberts <ryan.roberts@arm.com> To: Will Deacon <will@kernel.org>, "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>, Andrew Morton <akpm@linux-foundation.org>, Nick Piggin <npiggin@gmail.com>, Peter Zijlstra <peterz@infradead.org>, Christian Borntraeger <borntraeger@linux.ibm.com>, Sven Schnelle <svens@linux.ibm.com>, Arnd Bergmann <arnd@arndb.de>, "Matthew Wilcox (Oracle)" <willy@infradead.org>, David Hildenbrand <david@redhat.com>, Yu Zhao <yuzhao@google.com>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Yin Fengwei <fengwei.yin@intel.com>, Yang Shi <shy828301@gmail.com>, "Huang, Ying" <ying.huang@intel.com>, Zi Yan <ziy@nvidia.com> Cc: Ryan Roberts <ryan.roberts@arm.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 0/5] Optimize mmap_exit for large folios Date: Wed, 30 Aug 2023 10:50:06 +0100 Message-Id: <20230830095011.1228673-1-ryan.roberts@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Optimize mmap_exit for large folios \| [v2,0/5] Optimize mmap_exit for large folios [v2,4/5] mm: Refector release_pages()

Message ID

20230830095011.1228673-1-ryan.roberts@arm.com

Headers

Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
From: Ryan Roberts <ryan.roberts@arm.com>
To: Will Deacon <will@kernel.org>,
        "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
        Andrew Morton <akpm@linux-foundation.org>,
        Nick Piggin <npiggin@gmail.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Christian Borntraeger <borntraeger@linux.ibm.com>,
        Sven Schnelle <svens@linux.ibm.com>,
        Arnd Bergmann <arnd@arndb.de>,
        "Matthew Wilcox (Oracle)" <willy@infradead.org>,
        David Hildenbrand <david@redhat.com>,
        Yu Zhao <yuzhao@google.com>,
        "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
        Yin Fengwei <fengwei.yin@intel.com>,
        Yang Shi <shy828301@gmail.com>,
        "Huang, Ying" <ying.huang@intel.com>, Zi Yan <ziy@nvidia.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>, linux-mm@kvack.org,
        linux-kernel@vger.kernel.org
Subject: [PATCH v2 0/5] Optimize mmap_exit for large folios
Date: Wed, 30 Aug 2023 10:50:06 +0100
Message-Id: <20230830095011.1228673-1-ryan.roberts@arm.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

Optimize mmap_exit for large folios |

Message

Ryan Roberts Aug. 30, 2023, 9:50 a.m. UTC

  Hi All,

This is v2 of a series to improve performance of process teardown,
taking advantage of the fact that large folios are increasingly
regularly pte-mapped in user space; supporting filesystems already use
large folios for pagecache memory, and large folios for anonymous memory
are (hopefully) on the horizon.

See last patch for performance numbers, including measurements that show
this approach doesn't regress (and actually improves a little bit) when
all folios are small.

The basic approach is to accumulate contiguous ranges of pages in the
mmu_gather structure (instead of storing each individual page pointer),
then take advantage of this internal format to efficiently batch rmap
removal, swapcache removal and page release - see the commit messages
for more details.

This series replaces the previous approach I took at [2], which was much
smaller in scope, only attempting to batch rmap removal for anon pages.
Feedback was that I should do something more general that would also
batch-remove pagecache pages from the rmap. But while designing that, I
found it was also possible to improve swapcache removal and page
release. Hopefully I haven't gone too far the other way now! Note that
patch 1 is unchanged from that originl series.

Note that this series will conflict with Matthew's series at [3]. I
figure we both race to mm-unstable and the loser has to do the conflict
resolution?

This series is based on mm-unstable (b93868dbf9bc).


Changes since v1 [1]
--------------------

- Now using pfns for start and end of page ranges within a folio.
  `struct page`s may not be contiguous on some setups so using pointers
  breaks these systems. (Thanks to Zi Yan).
- Fixed zone_device folio reference putting. (Thanks to Matthew and
  David).
- Refactored release_pages() and folios_put_refs() so that they now
  share a common implementation.


[1] https://lore.kernel.org/linux-mm/20230810103332.3062143-1-ryan.roberts@arm.com/
[2] https://lore.kernel.org/linux-mm/20230727141837.3386072-1-ryan.roberts@arm.com/
[3] https://lore.kernel.org/linux-mm/20230825135918.4164671-1-willy@infradead.org/


Thanks,
Ryan

Ryan Roberts (5):
  mm: Implement folio_remove_rmap_range()
  mm/mmu_gather: generalize mmu_gather rmap removal mechanism
  mm/mmu_gather: Remove encoded_page infrastructure
  mm: Refector release_pages()
  mm/mmu_gather: Store and process pages in contig ranges

 arch/s390/include/asm/tlb.h |   9 +-
 include/asm-generic/tlb.h   |  49 ++++-----
 include/linux/mm.h          |  11 +-
 include/linux/mm_types.h    |  34 +-----
 include/linux/rmap.h        |   2 +
 include/linux/swap.h        |   6 +-
 mm/memory.c                 |  24 +++--
 mm/mmu_gather.c             | 114 ++++++++++++++------
 mm/rmap.c                   | 125 ++++++++++++++++------
 mm/swap.c                   | 201 ++++++++++++++++++++++--------------
 mm/swap_state.c             |  11 +-
 11 files changed, 367 insertions(+), 219 deletions(-)

--
2.25.1

Comments

Ryan Roberts Aug. 30, 2023, 3:32 p.m. UTC | #1

On 30/08/2023 16:07, Matthew Wilcox wrote:
> On Wed, Aug 30, 2023 at 10:50:11AM +0100, Ryan Roberts wrote:
>> +++ b/include/asm-generic/tlb.h
>> @@ -246,11 +246,11 @@ struct mmu_gather_batch {
>>  	struct mmu_gather_batch	*next;
>>  	unsigned int		nr;
>>  	unsigned int		max;
>> -	struct page		*pages[];
>> +	struct pfn_range	folios[];
> 
> I think it's dangerous to call this 'folios' as it lets you think that
> each entry is a single folio.  But as I understand this patch, you can
> coagulate contiguous ranges across multiple folios.

No that's not quite the case; each contiguous range only ever spans a *single*
folio. If there are 2 contiguous folios, they will be represented as separate
ranges. This is done so that we can subsequently do the per-folio operations
without having to figure out how many folios are within each range - one range =
one (contiguous part of a) folio.

On naming, I was calling this variable "ranges" in v1 but thought folios was
actually clearer. How about "folio_regions"?

> 
>> -void free_pages_and_swap_cache(struct page **pages, int nr)
>> +void free_folios_and_swap_cache(struct pfn_range *folios, int nr)
>>  {
>>  	lru_add_drain();
>>  	for (int i = 0; i < nr; i++)
>> -		free_swap_cache(pages[i]);
>> -	release_pages(pages, nr);
>> +		free_swap_cache(pfn_to_page(folios[i].start));
> 
> ... but here, you only put the swapcache for the first folio covered by
> the range, not for each folio.

Yes that's intentional - one range only ever covers one folio, so I only need to
call free_swap_cache() once for the folio. Unless I've misunderstood and
free_swap_cache() is actually decrementing a reference count and needs to be
called for every page? (but it doesn't look like that in the code).

> 
>> +	folios_put_refs(folios, nr);
> 
> It's kind of confusing to have folios_put() which takes a struct folio *
> and then folios_put_refs() which takes a struct pfn_range *.
> pfn_range_put()?

I think it's less confusing if you know that each pfn_range represents a single
contig range of pages within a *single* folio. pfn_range_put() would make it
sound like its ok to pass a pfn_range that spans multiple folios (this would
break). I could rename `struct pfn_range` to `struct sub_folio` or something
like that. Would that help make the semantic clearer?