Message ID | 20230315051444.3229621-35-willy@infradead.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp2152896wrd; Tue, 14 Mar 2023 22:46:29 -0700 (PDT) X-Google-Smtp-Source: AK7set8sm2uYpgJjn9be688UIwhH6qMtYEcaq/CmuttLNAZgFQvMQxk3TBsKOAcbn1RcGu9W7PTP X-Received: by 2002:a05:6a20:8e15:b0:d5:e2cb:6100 with SMTP id y21-20020a056a208e1500b000d5e2cb6100mr2971515pzj.49.1678859189446; Tue, 14 Mar 2023 22:46:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678859189; cv=none; d=google.com; s=arc-20160816; b=RpLU0T094OMa+mZK2wCRRsCoxCbJslwDF5ItyPg7jyhPKaVp/uszVuzRUTp+3X9TIF pqNL9Zli63Nxd56VVR7Lnc0rq0uSjirZ9u2vTBFNIqh/ZqWRYsimBsnT2QewFKj6MEVx LgZW+g0q96ax4EbHzxTWGGoQts+75FvpMn2TF+zdoTFMN31hgwDUT6n70pMmorfl+bH1 Dh1rRn8i1G4kx3rq+pjf1cirAp47vjdrsUJ8b0yliPGg9LYO/QcOqi77+K9v/m3sos5Q OGrsyAy+ntIiTe12FCJV3iYUq+yT9w5xiz7HXI5UISDwWXmFeP6aBWZh0HVDI4NC7nsq PAdA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=wGk85CR/Vctng7oBFjkMJq+NnlgBFRutHAfl1XoD7Y0=; b=nRnUi9jhGQkZAKArzZ1sG6+cTw8OaE6aVb+ApSV+zCV97wyZdhk23B62w0Fmw0kBD0 oUwa61Rx50cdi47H6rZ1mRF2syBleztA98M8GVPsMW7TX1/spu6GiY7BFqnIpV4r141B rpP/XO3X52g34r3B1MBej3mT+JdM4LyXo1Gjbn+uNDfLmDbqJT8mJf7WjDn2dWgqT3GK PCxynpJP0aKhzaYqj3H+vQ7FX7z1q6jcnQEUMF9PVJOJV8j+gsCtGVIxhvCs3sRWwy5f 4f11W/6YEmqFed1uabK7xv5yQjCfgNueGvf1YFgz6Lhm6n2219XM234Dii1jiULt7IxL 8S4Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=g5mC1e+E; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b9-20020a63d809000000b00477a32da0a9si3967944pgh.455.2023.03.14.22.46.14; Tue, 14 Mar 2023 22:46:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=g5mC1e+E; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231405AbjCOFQM (ORCPT <rfc822;realc9580@gmail.com> + 99 others); Wed, 15 Mar 2023 01:16:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48612 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231303AbjCOFPI (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 15 Mar 2023 01:15:08 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF3F32A99A; Tue, 14 Mar 2023 22:14:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=wGk85CR/Vctng7oBFjkMJq+NnlgBFRutHAfl1XoD7Y0=; b=g5mC1e+EyDeps/N6wOgNK48uE5 NY6T5TtawAzHoLWsQhDILMRhrQX9ZM+D075XgvvrVQWkZJLn7Q8AhVPPqiyi4J11YgM7o+meENJAL Dw16eYKgzteuuxGMuXlRMTH+J9Q6cDtVus0ee7yF2jdGr63Wa0pkqZIiEIL13eC3ASjbCgBqw4l3P PH4awXOcSI2eluFl5bJc028cmRgeBqG1mkXRZyGtFp5+XsRMny84X5Q21WcTuzpnFcvjDwcI7V3M2 vdQcdA8eIAXECldrZFJcR39/rd9cfuGhbwfRxGKw1tyhasbwt3TDiuF8/mntoH2cVEgfPhNQRwgF0 56XMdsMQ==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1pcJTO-00DYE0-MR; Wed, 15 Mar 2023 05:14:50 +0000 From: "Matthew Wilcox (Oracle)" <willy@infradead.org> To: linux-arch@vger.kernel.org Cc: Yin Fengwei <fengwei.yin@intel.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Matthew Wilcox <willy@infradead.org> Subject: [PATCH v4 34/36] rmap: add folio_add_file_rmap_range() Date: Wed, 15 Mar 2023 05:14:42 +0000 Message-Id: <20230315051444.3229621-35-willy@infradead.org> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20230315051444.3229621-1-willy@infradead.org> References: <20230315051444.3229621-1-willy@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760411453386695561?= X-GMAIL-MSGID: =?utf-8?q?1760411453386695561?= |
Series |
New page table range API
|
|
Commit Message
Matthew Wilcox
March 15, 2023, 5:14 a.m. UTC
From: Yin Fengwei <fengwei.yin@intel.com> folio_add_file_rmap_range() allows to add pte mapping to a specific range of file folio. Comparing to page_add_file_rmap(), it batched updates __lruvec_stat for large folio. Signed-off-by: Yin Fengwei <fengwei.yin@intel.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> --- include/linux/rmap.h | 2 ++ mm/rmap.c | 60 +++++++++++++++++++++++++++++++++----------- 2 files changed, 48 insertions(+), 14 deletions(-)
Comments
On 15/03/2023 05:14, Matthew Wilcox (Oracle) wrote: > From: Yin Fengwei <fengwei.yin@intel.com> > > folio_add_file_rmap_range() allows to add pte mapping to a specific > range of file folio. Comparing to page_add_file_rmap(), it batched > updates __lruvec_stat for large folio. > > Signed-off-by: Yin Fengwei <fengwei.yin@intel.com> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- > include/linux/rmap.h | 2 ++ > mm/rmap.c | 60 +++++++++++++++++++++++++++++++++----------- > 2 files changed, 48 insertions(+), 14 deletions(-) > > diff --git a/include/linux/rmap.h b/include/linux/rmap.h > index b87d01660412..a3825ce81102 100644 > --- a/include/linux/rmap.h > +++ b/include/linux/rmap.h > @@ -198,6 +198,8 @@ void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *, > unsigned long address); > void page_add_file_rmap(struct page *, struct vm_area_struct *, > bool compound); > +void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr, > + struct vm_area_struct *, bool compound); > void page_remove_rmap(struct page *, struct vm_area_struct *, > bool compound); > > diff --git a/mm/rmap.c b/mm/rmap.c > index 4898e10c569a..a91906b28835 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1301,31 +1301,39 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, > } > > /** > - * page_add_file_rmap - add pte mapping to a file page > - * @page: the page to add the mapping to > + * folio_add_file_rmap_range - add pte mapping to page range of a folio > + * @folio: The folio to add the mapping to > + * @page: The first page to add > + * @nr_pages: The number of pages which will be mapped > * @vma: the vm area in which the mapping is added > * @compound: charge the page as compound or small page > * > + * The page range of folio is defined by [first_page, first_page + nr_pages) > + * > * The caller needs to hold the pte lock. > */ > -void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, > - bool compound) > +void folio_add_file_rmap_range(struct folio *folio, struct page *page, > + unsigned int nr_pages, struct vm_area_struct *vma, > + bool compound) > { > - struct folio *folio = page_folio(page); > atomic_t *mapped = &folio->_nr_pages_mapped; > - int nr = 0, nr_pmdmapped = 0; > - bool first; > + unsigned int nr_pmdmapped = 0, first; > + int nr = 0; > > - VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page); > + VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio); > > /* Is page being mapped by PTE? Is this its first map to be added? */ > if (likely(!compound)) { > - first = atomic_inc_and_test(&page->_mapcount); > - nr = first; > - if (first && folio_test_large(folio)) { > - nr = atomic_inc_return_relaxed(mapped); > - nr = (nr < COMPOUND_MAPPED); > - } > + do { > + first = atomic_inc_and_test(&page->_mapcount); > + if (first && folio_test_large(folio)) { > + first = atomic_inc_return_relaxed(mapped); > + first = (nr < COMPOUND_MAPPED); This still contains the typo that Yin Fengwei spotted in the previous version: https://lore.kernel.org/linux-mm/20230228213738.272178-1-willy@infradead.org/T/#m84673899e25bc31356093a1177941f2cc35e5da8 FYI, I'm seeing a perf regression of about 1% when compiling the kernel on Ampere Altra (arm64) with this whole series on top of v6.3-rc1 (In a VM using ext4 filesystem). Looks like instruction aborts are taking much longer and a selection of syscalls are a bit slower. Still hunting down the root cause. Will report once I have conclusive diagnosis. Thanks, Ryan > + } > + > + if (first) > + nr++; > + } while (page++, --nr_pages > 0); > } else if (folio_test_pmd_mappable(folio)) { > /* That test is redundant: it's for safety or to optimize out */ > > @@ -1354,6 +1362,30 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, > mlock_vma_folio(folio, vma, compound); > } > > +/** > + * page_add_file_rmap - add pte mapping to a file page > + * @page: the page to add the mapping to > + * @vma: the vm area in which the mapping is added > + * @compound: charge the page as compound or small page > + * > + * The caller needs to hold the pte lock. > + */ > +void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, > + bool compound) > +{ > + struct folio *folio = page_folio(page); > + unsigned int nr_pages; > + > + VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page); > + > + if (likely(!compound)) > + nr_pages = 1; > + else > + nr_pages = folio_nr_pages(folio); > + > + folio_add_file_rmap_range(folio, page, nr_pages, vma, compound); > +} > + > /** > * page_remove_rmap - take down pte mapping from a page > * @page: page to remove mapping from
On 15/03/2023 13:34, Ryan Roberts wrote: > On 15/03/2023 05:14, Matthew Wilcox (Oracle) wrote: >> From: Yin Fengwei <fengwei.yin@intel.com> >> >> folio_add_file_rmap_range() allows to add pte mapping to a specific >> range of file folio. Comparing to page_add_file_rmap(), it batched >> updates __lruvec_stat for large folio. >> >> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com> >> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> >> --- >> include/linux/rmap.h | 2 ++ >> mm/rmap.c | 60 +++++++++++++++++++++++++++++++++----------- >> 2 files changed, 48 insertions(+), 14 deletions(-) >> >> diff --git a/include/linux/rmap.h b/include/linux/rmap.h >> index b87d01660412..a3825ce81102 100644 >> --- a/include/linux/rmap.h >> +++ b/include/linux/rmap.h >> @@ -198,6 +198,8 @@ void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *, >> unsigned long address); >> void page_add_file_rmap(struct page *, struct vm_area_struct *, >> bool compound); >> +void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr, >> + struct vm_area_struct *, bool compound); >> void page_remove_rmap(struct page *, struct vm_area_struct *, >> bool compound); >> >> diff --git a/mm/rmap.c b/mm/rmap.c >> index 4898e10c569a..a91906b28835 100644 >> --- a/mm/rmap.c >> +++ b/mm/rmap.c >> @@ -1301,31 +1301,39 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, >> } >> >> /** >> - * page_add_file_rmap - add pte mapping to a file page >> - * @page: the page to add the mapping to >> + * folio_add_file_rmap_range - add pte mapping to page range of a folio >> + * @folio: The folio to add the mapping to >> + * @page: The first page to add >> + * @nr_pages: The number of pages which will be mapped >> * @vma: the vm area in which the mapping is added >> * @compound: charge the page as compound or small page >> * >> + * The page range of folio is defined by [first_page, first_page + nr_pages) >> + * >> * The caller needs to hold the pte lock. >> */ >> -void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, >> - bool compound) >> +void folio_add_file_rmap_range(struct folio *folio, struct page *page, >> + unsigned int nr_pages, struct vm_area_struct *vma, >> + bool compound) >> { >> - struct folio *folio = page_folio(page); >> atomic_t *mapped = &folio->_nr_pages_mapped; >> - int nr = 0, nr_pmdmapped = 0; >> - bool first; >> + unsigned int nr_pmdmapped = 0, first; >> + int nr = 0; >> >> - VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page); >> + VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio); >> >> /* Is page being mapped by PTE? Is this its first map to be added? */ >> if (likely(!compound)) { >> - first = atomic_inc_and_test(&page->_mapcount); >> - nr = first; >> - if (first && folio_test_large(folio)) { >> - nr = atomic_inc_return_relaxed(mapped); >> - nr = (nr < COMPOUND_MAPPED); >> - } >> + do { >> + first = atomic_inc_and_test(&page->_mapcount); >> + if (first && folio_test_large(folio)) { >> + first = atomic_inc_return_relaxed(mapped); >> + first = (nr < COMPOUND_MAPPED); > > This still contains the typo that Yin Fengwei spotted in the previous version: > https://lore.kernel.org/linux-mm/20230228213738.272178-1-willy@infradead.org/T/#m84673899e25bc31356093a1177941f2cc35e5da8 > > FYI, I'm seeing a perf regression of about 1% when compiling the kernel on > Ampere Altra (arm64) with this whole series on top of v6.3-rc1 (In a VM using > ext4 filesystem). Looks like instruction aborts are taking much longer and a > selection of syscalls are a bit slower. Still hunting down the root cause. Will > report once I have conclusive diagnosis. I'm sorry - I'm struggling to find the exact cause. But its spending over 2x the amount of time in the instruction abort handling code once patches 32-36 are included. Everything in the flame graph is just taking longer. Perhaps we are getting more instruction aborts somehow? I have the flamegraphs if anyone wants them - just shout and I'll email them separately. > > Thanks, > Ryan > > >> + } >> + >> + if (first) >> + nr++; >> + } while (page++, --nr_pages > 0); >> } else if (folio_test_pmd_mappable(folio)) { >> /* That test is redundant: it's for safety or to optimize out */ >> >> @@ -1354,6 +1362,30 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, >> mlock_vma_folio(folio, vma, compound); >> } >> >> +/** >> + * page_add_file_rmap - add pte mapping to a file page >> + * @page: the page to add the mapping to >> + * @vma: the vm area in which the mapping is added >> + * @compound: charge the page as compound or small page >> + * >> + * The caller needs to hold the pte lock. >> + */ >> +void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, >> + bool compound) >> +{ >> + struct folio *folio = page_folio(page); >> + unsigned int nr_pages; >> + >> + VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page); >> + >> + if (likely(!compound)) >> + nr_pages = 1; >> + else >> + nr_pages = folio_nr_pages(folio); >> + >> + folio_add_file_rmap_range(folio, page, nr_pages, vma, compound); >> +} >> + >> /** >> * page_remove_rmap - take down pte mapping from a page >> * @page: page to remove mapping from >
Hi Matthew, On 3/16/2023 12:08 AM, Ryan Roberts wrote: > On 15/03/2023 13:34, Ryan Roberts wrote: >> On 15/03/2023 05:14, Matthew Wilcox (Oracle) wrote: >>> From: Yin Fengwei <fengwei.yin@intel.com> >>> >>> folio_add_file_rmap_range() allows to add pte mapping to a specific >>> range of file folio. Comparing to page_add_file_rmap(), it batched >>> updates __lruvec_stat for large folio. >>> >>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com> >>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> >>> --- >>> include/linux/rmap.h | 2 ++ >>> mm/rmap.c | 60 +++++++++++++++++++++++++++++++++----------- >>> 2 files changed, 48 insertions(+), 14 deletions(-) >>> >>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h >>> index b87d01660412..a3825ce81102 100644 >>> --- a/include/linux/rmap.h >>> +++ b/include/linux/rmap.h >>> @@ -198,6 +198,8 @@ void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *, >>> unsigned long address); >>> void page_add_file_rmap(struct page *, struct vm_area_struct *, >>> bool compound); >>> +void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr, >>> + struct vm_area_struct *, bool compound); >>> void page_remove_rmap(struct page *, struct vm_area_struct *, >>> bool compound); >>> >>> diff --git a/mm/rmap.c b/mm/rmap.c >>> index 4898e10c569a..a91906b28835 100644 >>> --- a/mm/rmap.c >>> +++ b/mm/rmap.c >>> @@ -1301,31 +1301,39 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, >>> } >>> >>> /** >>> - * page_add_file_rmap - add pte mapping to a file page >>> - * @page: the page to add the mapping to >>> + * folio_add_file_rmap_range - add pte mapping to page range of a folio >>> + * @folio: The folio to add the mapping to >>> + * @page: The first page to add >>> + * @nr_pages: The number of pages which will be mapped >>> * @vma: the vm area in which the mapping is added >>> * @compound: charge the page as compound or small page >>> * >>> + * The page range of folio is defined by [first_page, first_page + nr_pages) >>> + * >>> * The caller needs to hold the pte lock. >>> */ >>> -void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, >>> - bool compound) >>> +void folio_add_file_rmap_range(struct folio *folio, struct page *page, >>> + unsigned int nr_pages, struct vm_area_struct *vma, >>> + bool compound) >>> { >>> - struct folio *folio = page_folio(page); >>> atomic_t *mapped = &folio->_nr_pages_mapped; >>> - int nr = 0, nr_pmdmapped = 0; >>> - bool first; >>> + unsigned int nr_pmdmapped = 0, first; >>> + int nr = 0; >>> >>> - VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page); >>> + VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio); >>> >>> /* Is page being mapped by PTE? Is this its first map to be added? */ >>> if (likely(!compound)) { >>> - first = atomic_inc_and_test(&page->_mapcount); >>> - nr = first; >>> - if (first && folio_test_large(folio)) { >>> - nr = atomic_inc_return_relaxed(mapped); >>> - nr = (nr < COMPOUND_MAPPED); >>> - } >>> + do { >>> + first = atomic_inc_and_test(&page->_mapcount); >>> + if (first && folio_test_large(folio)) { >>> + first = atomic_inc_return_relaxed(mapped); >>> + first = (nr < COMPOUND_MAPPED); >> >> This still contains the typo that Yin Fengwei spotted in the previous version: >> https://lore.kernel.org/linux-mm/20230228213738.272178-1-willy@infradead.org/T/#m84673899e25bc31356093a1177941f2cc35e5da8 >> >> FYI, I'm seeing a perf regression of about 1% when compiling the kernel on >> Ampere Altra (arm64) with this whole series on top of v6.3-rc1 (In a VM using >> ext4 filesystem). Looks like instruction aborts are taking much longer and a >> selection of syscalls are a bit slower. Still hunting down the root cause. Will >> report once I have conclusive diagnosis. > > I'm sorry - I'm struggling to find the exact cause. But its spending over 2x the > amount of time in the instruction abort handling code once patches 32-36 are > included. Everything in the flame graph is just taking longer. Perhaps we are > getting more instruction aborts somehow? I have the flamegraphs if anyone wants > them - just shout and I'll email them separately. Thanks a lot to Ryan for sharing the flamegraphs to me. I found the __do_fault() is called with patch 32-36 while no __do_fault() just with first 31 patches. I suspect the folio_add_file_rmap_range() missed some PTEs population. Please give me few days to find the root cause and fix. Sorry for this. Regards Yin, Fengwei > >> >> Thanks, >> Ryan >> >> >>> + } >>> + >>> + if (first) >>> + nr++; >>> + } while (page++, --nr_pages > 0); >>> } else if (folio_test_pmd_mappable(folio)) { >>> /* That test is redundant: it's for safety or to optimize out */ >>> >>> @@ -1354,6 +1362,30 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, >>> mlock_vma_folio(folio, vma, compound); >>> } >>> >>> +/** >>> + * page_add_file_rmap - add pte mapping to a file page >>> + * @page: the page to add the mapping to >>> + * @vma: the vm area in which the mapping is added >>> + * @compound: charge the page as compound or small page >>> + * >>> + * The caller needs to hold the pte lock. >>> + */ >>> +void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, >>> + bool compound) >>> +{ >>> + struct folio *folio = page_folio(page); >>> + unsigned int nr_pages; >>> + >>> + VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page); >>> + >>> + if (likely(!compound)) >>> + nr_pages = 1; >>> + else >>> + nr_pages = folio_nr_pages(folio); >>> + >>> + folio_add_file_rmap_range(folio, page, nr_pages, vma, compound); >>> +} >>> + >>> /** >>> * page_remove_rmap - take down pte mapping from a page >>> * @page: page to remove mapping from >> >
On 16/03/2023 16:27, Yin, Fengwei wrote: > Hi Matthew, > > On 3/16/2023 12:08 AM, Ryan Roberts wrote: >> On 15/03/2023 13:34, Ryan Roberts wrote: >>> On 15/03/2023 05:14, Matthew Wilcox (Oracle) wrote: >>>> From: Yin Fengwei <fengwei.yin@intel.com> >>>> >>>> folio_add_file_rmap_range() allows to add pte mapping to a specific >>>> range of file folio. Comparing to page_add_file_rmap(), it batched >>>> updates __lruvec_stat for large folio. >>>> >>>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com> >>>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> >>>> --- >>>> include/linux/rmap.h | 2 ++ >>>> mm/rmap.c | 60 +++++++++++++++++++++++++++++++++----------- >>>> 2 files changed, 48 insertions(+), 14 deletions(-) >>>> >>>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h >>>> index b87d01660412..a3825ce81102 100644 >>>> --- a/include/linux/rmap.h >>>> +++ b/include/linux/rmap.h >>>> @@ -198,6 +198,8 @@ void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *, >>>> unsigned long address); >>>> void page_add_file_rmap(struct page *, struct vm_area_struct *, >>>> bool compound); >>>> +void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr, >>>> + struct vm_area_struct *, bool compound); >>>> void page_remove_rmap(struct page *, struct vm_area_struct *, >>>> bool compound); >>>> >>>> diff --git a/mm/rmap.c b/mm/rmap.c >>>> index 4898e10c569a..a91906b28835 100644 >>>> --- a/mm/rmap.c >>>> +++ b/mm/rmap.c >>>> @@ -1301,31 +1301,39 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, >>>> } >>>> >>>> /** >>>> - * page_add_file_rmap - add pte mapping to a file page >>>> - * @page: the page to add the mapping to >>>> + * folio_add_file_rmap_range - add pte mapping to page range of a folio >>>> + * @folio: The folio to add the mapping to >>>> + * @page: The first page to add >>>> + * @nr_pages: The number of pages which will be mapped >>>> * @vma: the vm area in which the mapping is added >>>> * @compound: charge the page as compound or small page >>>> * >>>> + * The page range of folio is defined by [first_page, first_page + nr_pages) >>>> + * >>>> * The caller needs to hold the pte lock. >>>> */ >>>> -void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, >>>> - bool compound) >>>> +void folio_add_file_rmap_range(struct folio *folio, struct page *page, >>>> + unsigned int nr_pages, struct vm_area_struct *vma, >>>> + bool compound) >>>> { >>>> - struct folio *folio = page_folio(page); >>>> atomic_t *mapped = &folio->_nr_pages_mapped; >>>> - int nr = 0, nr_pmdmapped = 0; >>>> - bool first; >>>> + unsigned int nr_pmdmapped = 0, first; >>>> + int nr = 0; >>>> >>>> - VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page); >>>> + VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio); >>>> >>>> /* Is page being mapped by PTE? Is this its first map to be added? */ >>>> if (likely(!compound)) { >>>> - first = atomic_inc_and_test(&page->_mapcount); >>>> - nr = first; >>>> - if (first && folio_test_large(folio)) { >>>> - nr = atomic_inc_return_relaxed(mapped); >>>> - nr = (nr < COMPOUND_MAPPED); >>>> - } >>>> + do { >>>> + first = atomic_inc_and_test(&page->_mapcount); >>>> + if (first && folio_test_large(folio)) { >>>> + first = atomic_inc_return_relaxed(mapped); >>>> + first = (nr < COMPOUND_MAPPED); >>> >>> This still contains the typo that Yin Fengwei spotted in the previous version: >>> https://lore.kernel.org/linux-mm/20230228213738.272178-1-willy@infradead.org/T/#m84673899e25bc31356093a1177941f2cc35e5da8 >>> >>> FYI, I'm seeing a perf regression of about 1% when compiling the kernel on >>> Ampere Altra (arm64) with this whole series on top of v6.3-rc1 (In a VM using >>> ext4 filesystem). Looks like instruction aborts are taking much longer and a >>> selection of syscalls are a bit slower. Still hunting down the root cause. Will >>> report once I have conclusive diagnosis. >> >> I'm sorry - I'm struggling to find the exact cause. But its spending over 2x the >> amount of time in the instruction abort handling code once patches 32-36 are >> included. Everything in the flame graph is just taking longer. Perhaps we are >> getting more instruction aborts somehow? I have the flamegraphs if anyone wants >> them - just shout and I'll email them separately. > Thanks a lot to Ryan for sharing the flamegraphs to me. I found the __do_fault() > is called with patch 32-36 while no __do_fault() just with first 31 patches. I > suspect the folio_add_file_rmap_range() missed some PTEs population. Please give > me few days to find the root cause and fix. Sorry for this. You're welcome. Give me a shout once you have a re-spin and I'll rerun the tests. > > > Regards > Yin, Fengwei > >> >>> >>> Thanks, >>> Ryan >>> >>> >>>> + } >>>> + >>>> + if (first) >>>> + nr++; >>>> + } while (page++, --nr_pages > 0); >>>> } else if (folio_test_pmd_mappable(folio)) { >>>> /* That test is redundant: it's for safety or to optimize out */ >>>> >>>> @@ -1354,6 +1362,30 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, >>>> mlock_vma_folio(folio, vma, compound); >>>> } >>>> >>>> +/** >>>> + * page_add_file_rmap - add pte mapping to a file page >>>> + * @page: the page to add the mapping to >>>> + * @vma: the vm area in which the mapping is added >>>> + * @compound: charge the page as compound or small page >>>> + * >>>> + * The caller needs to hold the pte lock. >>>> + */ >>>> +void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, >>>> + bool compound) >>>> +{ >>>> + struct folio *folio = page_folio(page); >>>> + unsigned int nr_pages; >>>> + >>>> + VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page); >>>> + >>>> + if (likely(!compound)) >>>> + nr_pages = 1; >>>> + else >>>> + nr_pages = folio_nr_pages(folio); >>>> + >>>> + folio_add_file_rmap_range(folio, page, nr_pages, vma, compound); >>>> +} >>>> + >>>> /** >>>> * page_remove_rmap - take down pte mapping from a page >>>> * @page: page to remove mapping from >>> >>
Hi Ryan, On 3/17/2023 12:34 AM, Ryan Roberts wrote: > On 16/03/2023 16:27, Yin, Fengwei wrote: >> Hi Matthew, >> >> On 3/16/2023 12:08 AM, Ryan Roberts wrote: >>> On 15/03/2023 13:34, Ryan Roberts wrote: >>>> On 15/03/2023 05:14, Matthew Wilcox (Oracle) wrote: >>>>> From: Yin Fengwei <fengwei.yin@intel.com> >>>>> >>>>> folio_add_file_rmap_range() allows to add pte mapping to a specific >>>>> range of file folio. Comparing to page_add_file_rmap(), it batched >>>>> updates __lruvec_stat for large folio. >>>>> >>>>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com> >>>>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> >>>>> --- >>>>> include/linux/rmap.h | 2 ++ >>>>> mm/rmap.c | 60 +++++++++++++++++++++++++++++++++----------- >>>>> 2 files changed, 48 insertions(+), 14 deletions(-) >>>>> >>>>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h >>>>> index b87d01660412..a3825ce81102 100644 >>>>> --- a/include/linux/rmap.h >>>>> +++ b/include/linux/rmap.h >>>>> @@ -198,6 +198,8 @@ void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *, >>>>> unsigned long address); >>>>> void page_add_file_rmap(struct page *, struct vm_area_struct *, >>>>> bool compound); >>>>> +void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr, >>>>> + struct vm_area_struct *, bool compound); >>>>> void page_remove_rmap(struct page *, struct vm_area_struct *, >>>>> bool compound); >>>>> >>>>> diff --git a/mm/rmap.c b/mm/rmap.c >>>>> index 4898e10c569a..a91906b28835 100644 >>>>> --- a/mm/rmap.c >>>>> +++ b/mm/rmap.c >>>>> @@ -1301,31 +1301,39 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, >>>>> } >>>>> >>>>> /** >>>>> - * page_add_file_rmap - add pte mapping to a file page >>>>> - * @page: the page to add the mapping to >>>>> + * folio_add_file_rmap_range - add pte mapping to page range of a folio >>>>> + * @folio: The folio to add the mapping to >>>>> + * @page: The first page to add >>>>> + * @nr_pages: The number of pages which will be mapped >>>>> * @vma: the vm area in which the mapping is added >>>>> * @compound: charge the page as compound or small page >>>>> * >>>>> + * The page range of folio is defined by [first_page, first_page + nr_pages) >>>>> + * >>>>> * The caller needs to hold the pte lock. >>>>> */ >>>>> -void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, >>>>> - bool compound) >>>>> +void folio_add_file_rmap_range(struct folio *folio, struct page *page, >>>>> + unsigned int nr_pages, struct vm_area_struct *vma, >>>>> + bool compound) >>>>> { >>>>> - struct folio *folio = page_folio(page); >>>>> atomic_t *mapped = &folio->_nr_pages_mapped; >>>>> - int nr = 0, nr_pmdmapped = 0; >>>>> - bool first; >>>>> + unsigned int nr_pmdmapped = 0, first; >>>>> + int nr = 0; >>>>> >>>>> - VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page); >>>>> + VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio); >>>>> >>>>> /* Is page being mapped by PTE? Is this its first map to be added? */ >>>>> if (likely(!compound)) { >>>>> - first = atomic_inc_and_test(&page->_mapcount); >>>>> - nr = first; >>>>> - if (first && folio_test_large(folio)) { >>>>> - nr = atomic_inc_return_relaxed(mapped); >>>>> - nr = (nr < COMPOUND_MAPPED); >>>>> - } >>>>> + do { >>>>> + first = atomic_inc_and_test(&page->_mapcount); >>>>> + if (first && folio_test_large(folio)) { >>>>> + first = atomic_inc_return_relaxed(mapped); >>>>> + first = (nr < COMPOUND_MAPPED); >>>> >>>> This still contains the typo that Yin Fengwei spotted in the previous version: >>>> https://lore.kernel.org/linux-mm/20230228213738.272178-1-willy@infradead.org/T/#m84673899e25bc31356093a1177941f2cc35e5da8 >>>> >>>> FYI, I'm seeing a perf regression of about 1% when compiling the kernel on >>>> Ampere Altra (arm64) with this whole series on top of v6.3-rc1 (In a VM using >>>> ext4 filesystem). Looks like instruction aborts are taking much longer and a >>>> selection of syscalls are a bit slower. Still hunting down the root cause. Will >>>> report once I have conclusive diagnosis. >>> >>> I'm sorry - I'm struggling to find the exact cause. But its spending over 2x the >>> amount of time in the instruction abort handling code once patches 32-36 are >>> included. Everything in the flame graph is just taking longer. Perhaps we are >>> getting more instruction aborts somehow? I have the flamegraphs if anyone wants >>> them - just shout and I'll email them separately. >> Thanks a lot to Ryan for sharing the flamegraphs to me. I found the __do_fault() >> is called with patch 32-36 while no __do_fault() just with first 31 patches. I >> suspect the folio_add_file_rmap_range() missed some PTEs population. Please give >> me few days to find the root cause and fix. Sorry for this. > > You're welcome. Give me a shout once you have a re-spin and I'll rerun the tests. Could you please help to try following changes? Thanks in advance. diff --git a/mm/filemap.c b/mm/filemap.c index 40be33b5ee46..137011320c68 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3504,15 +3504,16 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf, if (!pte_none(vmf->pte[count])) goto skip; - if (vmf->address == addr) - ret = VM_FAULT_NOPAGE; - count++; continue; skip: if (count) { set_pte_range(vmf, folio, page, count, addr); folio_ref_add(folio, count); + if ((vmf->address < (addr + count * PAGE_SIZE)) && + (vmf->address >= addr)) + ret = VM_FAULT_NOPAGE; + } count++; @@ -3525,6 +3526,9 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf, if (count) { set_pte_range(vmf, folio, page, count, addr); folio_ref_add(folio, count); + if ((vmf->address < (addr + count * PAGE_SIZE)) && + (vmf->address >= addr)) + ret = VM_FAULT_NOPAGE; } vmf->pte = old_ptep; Regards Yin, Fengwei > >> >> >> Regards >> Yin, Fengwei >> >>> >>>> >>>> Thanks, >>>> Ryan >>>> >>>> >>>>> + } >>>>> + >>>>> + if (first) >>>>> + nr++; >>>>> + } while (page++, --nr_pages > 0); >>>>> } else if (folio_test_pmd_mappable(folio)) { >>>>> /* That test is redundant: it's for safety or to optimize out */ >>>>> >>>>> @@ -1354,6 +1362,30 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, >>>>> mlock_vma_folio(folio, vma, compound); >>>>> } >>>>> >>>>> +/** >>>>> + * page_add_file_rmap - add pte mapping to a file page >>>>> + * @page: the page to add the mapping to >>>>> + * @vma: the vm area in which the mapping is added >>>>> + * @compound: charge the page as compound or small page >>>>> + * >>>>> + * The caller needs to hold the pte lock. >>>>> + */ >>>>> +void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, >>>>> + bool compound) >>>>> +{ >>>>> + struct folio *folio = page_folio(page); >>>>> + unsigned int nr_pages; >>>>> + >>>>> + VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page); >>>>> + >>>>> + if (likely(!compound)) >>>>> + nr_pages = 1; >>>>> + else >>>>> + nr_pages = folio_nr_pages(folio); >>>>> + >>>>> + folio_add_file_rmap_range(folio, page, nr_pages, vma, compound); >>>>> +} >>>>> + >>>>> /** >>>>> * page_remove_rmap - take down pte mapping from a page >>>>> * @page: page to remove mapping from >>>> >>> > >
On 17/03/2023 08:23, Yin, Fengwei wrote: [...] >>>>> FYI, I'm seeing a perf regression of about 1% when compiling the kernel on >>>>> Ampere Altra (arm64) with this whole series on top of v6.3-rc1 (In a VM using >>>>> ext4 filesystem). Looks like instruction aborts are taking much longer and a >>>>> selection of syscalls are a bit slower. Still hunting down the root cause. Will >>>>> report once I have conclusive diagnosis. >>>> >>>> I'm sorry - I'm struggling to find the exact cause. But its spending over 2x the >>>> amount of time in the instruction abort handling code once patches 32-36 are >>>> included. Everything in the flame graph is just taking longer. Perhaps we are >>>> getting more instruction aborts somehow? I have the flamegraphs if anyone wants >>>> them - just shout and I'll email them separately. >>> Thanks a lot to Ryan for sharing the flamegraphs to me. I found the __do_fault() >>> is called with patch 32-36 while no __do_fault() just with first 31 patches. I >>> suspect the folio_add_file_rmap_range() missed some PTEs population. Please give >>> me few days to find the root cause and fix. Sorry for this. >> >> You're welcome. Give me a shout once you have a re-spin and I'll rerun the tests. > Could you please help to try following changes? Thanks in advance. > > diff --git a/mm/filemap.c b/mm/filemap.c > index 40be33b5ee46..137011320c68 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -3504,15 +3504,16 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf, > if (!pte_none(vmf->pte[count])) > goto skip; > > - if (vmf->address == addr) > - ret = VM_FAULT_NOPAGE; > - > count++; > continue; > skip: > if (count) { > set_pte_range(vmf, folio, page, count, addr); > folio_ref_add(folio, count); > + if ((vmf->address < (addr + count * PAGE_SIZE)) && > + (vmf->address >= addr)) > + ret = VM_FAULT_NOPAGE; > + > } > > count++; > @@ -3525,6 +3526,9 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf, > if (count) { > set_pte_range(vmf, folio, page, count, addr); > folio_ref_add(folio, count); > + if ((vmf->address < (addr + count * PAGE_SIZE)) && > + (vmf->address >= addr)) > + ret = VM_FAULT_NOPAGE; > } > > vmf->pte = old_ptep; > I'm afraid this hasn't fixed it, and I still see __do_fault(). I'll send the flame graph over separately. Given I'm running on ext4, I wasn't expecting to see any large page cache folios? So I don't think we would have expected this patch to help anyway? (or perhaps there are still THP folios? But I think they will get PMD mapped?). > > Regards > Yin, Fengwei > >> >>> >>> >>> Regards >>> Yin, Fengwei >>> >>>> >>>>> >>>>> Thanks, >>>>> Ryan >>>>> >>>>> >>>>>> + } >>>>>> + >>>>>> + if (first) >>>>>> + nr++; >>>>>> + } while (page++, --nr_pages > 0); >>>>>> } else if (folio_test_pmd_mappable(folio)) { >>>>>> /* That test is redundant: it's for safety or to optimize out */ >>>>>> >>>>>> @@ -1354,6 +1362,30 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, >>>>>> mlock_vma_folio(folio, vma, compound); >>>>>> } >>>>>> >>>>>> +/** >>>>>> + * page_add_file_rmap - add pte mapping to a file page >>>>>> + * @page: the page to add the mapping to >>>>>> + * @vma: the vm area in which the mapping is added >>>>>> + * @compound: charge the page as compound or small page >>>>>> + * >>>>>> + * The caller needs to hold the pte lock. >>>>>> + */ >>>>>> +void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, >>>>>> + bool compound) >>>>>> +{ >>>>>> + struct folio *folio = page_folio(page); >>>>>> + unsigned int nr_pages; >>>>>> + >>>>>> + VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page); >>>>>> + >>>>>> + if (likely(!compound)) >>>>>> + nr_pages = 1; >>>>>> + else >>>>>> + nr_pages = folio_nr_pages(folio); >>>>>> + >>>>>> + folio_add_file_rmap_range(folio, page, nr_pages, vma, compound); >>>>>> +} >>>>>> + >>>>>> /** >>>>>> * page_remove_rmap - take down pte mapping from a page >>>>>> * @page: page to remove mapping from >>>>> >>>> >> >>
On 3/17/2023 8:46 PM, Ryan Roberts wrote: > On 17/03/2023 08:23, Yin, Fengwei wrote: > [...] > >>>>>> FYI, I'm seeing a perf regression of about 1% when compiling the kernel on >>>>>> Ampere Altra (arm64) with this whole series on top of v6.3-rc1 (In a VM using >>>>>> ext4 filesystem). Looks like instruction aborts are taking much longer and a >>>>>> selection of syscalls are a bit slower. Still hunting down the root cause. Will >>>>>> report once I have conclusive diagnosis. >>>>> >>>>> I'm sorry - I'm struggling to find the exact cause. But its spending over 2x the >>>>> amount of time in the instruction abort handling code once patches 32-36 are >>>>> included. Everything in the flame graph is just taking longer. Perhaps we are >>>>> getting more instruction aborts somehow? I have the flamegraphs if anyone wants >>>>> them - just shout and I'll email them separately. >>>> Thanks a lot to Ryan for sharing the flamegraphs to me. I found the __do_fault() >>>> is called with patch 32-36 while no __do_fault() just with first 31 patches. I >>>> suspect the folio_add_file_rmap_range() missed some PTEs population. Please give >>>> me few days to find the root cause and fix. Sorry for this. >>> >>> You're welcome. Give me a shout once you have a re-spin and I'll rerun the tests. >> Could you please help to try following changes? Thanks in advance. >> >> diff --git a/mm/filemap.c b/mm/filemap.c >> index 40be33b5ee46..137011320c68 100644 >> --- a/mm/filemap.c >> +++ b/mm/filemap.c >> @@ -3504,15 +3504,16 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf, >> if (!pte_none(vmf->pte[count])) >> goto skip; >> >> - if (vmf->address == addr) >> - ret = VM_FAULT_NOPAGE; >> - >> count++; >> continue; >> skip: >> if (count) { >> set_pte_range(vmf, folio, page, count, addr); >> folio_ref_add(folio, count); >> + if ((vmf->address < (addr + count * PAGE_SIZE)) && >> + (vmf->address >= addr)) >> + ret = VM_FAULT_NOPAGE; >> + >> } >> >> count++; >> @@ -3525,6 +3526,9 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf, >> if (count) { >> set_pte_range(vmf, folio, page, count, addr); >> folio_ref_add(folio, count); >> + if ((vmf->address < (addr + count * PAGE_SIZE)) && >> + (vmf->address >= addr)) >> + ret = VM_FAULT_NOPAGE; >> } >> >> vmf->pte = old_ptep; >> > > I'm afraid this hasn't fixed it, and I still see __do_fault(). I'll send the > flame graph over separately. > > Given I'm running on ext4, I wasn't expecting to see any large page cache > folios? So I don't think we would have expected this patch to help anyway? (or > perhaps there are still THP folios? But I think they will get PMD mapped?). OK. I will try to reproduce the issue on my local env to see whether I could reproduce it on x86_64 env. Regards Yin, Fengwei > > >> >> Regards >> Yin, Fengwei >> >>> >>>> >>>> >>>> Regards >>>> Yin, Fengwei >>>> >>>>> >>>>>> >>>>>> Thanks, >>>>>> Ryan >>>>>> >>>>>> >>>>>>> + } >>>>>>> + >>>>>>> + if (first) >>>>>>> + nr++; >>>>>>> + } while (page++, --nr_pages > 0); >>>>>>> } else if (folio_test_pmd_mappable(folio)) { >>>>>>> /* That test is redundant: it's for safety or to optimize out */ >>>>>>> >>>>>>> @@ -1354,6 +1362,30 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, >>>>>>> mlock_vma_folio(folio, vma, compound); >>>>>>> } >>>>>>> >>>>>>> +/** >>>>>>> + * page_add_file_rmap - add pte mapping to a file page >>>>>>> + * @page: the page to add the mapping to >>>>>>> + * @vma: the vm area in which the mapping is added >>>>>>> + * @compound: charge the page as compound or small page >>>>>>> + * >>>>>>> + * The caller needs to hold the pte lock. >>>>>>> + */ >>>>>>> +void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, >>>>>>> + bool compound) >>>>>>> +{ >>>>>>> + struct folio *folio = page_folio(page); >>>>>>> + unsigned int nr_pages; >>>>>>> + >>>>>>> + VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page); >>>>>>> + >>>>>>> + if (likely(!compound)) >>>>>>> + nr_pages = 1; >>>>>>> + else >>>>>>> + nr_pages = folio_nr_pages(folio); >>>>>>> + >>>>>>> + folio_add_file_rmap_range(folio, page, nr_pages, vma, compound); >>>>>>> +} >>>>>>> + >>>>>>> /** >>>>>>> * page_remove_rmap - take down pte mapping from a page >>>>>>> * @page: page to remove mapping from >>>>>> >>>>> >>> >>> >
diff --git a/include/linux/rmap.h b/include/linux/rmap.h index b87d01660412..a3825ce81102 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -198,6 +198,8 @@ void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *, unsigned long address); void page_add_file_rmap(struct page *, struct vm_area_struct *, bool compound); +void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr, + struct vm_area_struct *, bool compound); void page_remove_rmap(struct page *, struct vm_area_struct *, bool compound); diff --git a/mm/rmap.c b/mm/rmap.c index 4898e10c569a..a91906b28835 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1301,31 +1301,39 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, } /** - * page_add_file_rmap - add pte mapping to a file page - * @page: the page to add the mapping to + * folio_add_file_rmap_range - add pte mapping to page range of a folio + * @folio: The folio to add the mapping to + * @page: The first page to add + * @nr_pages: The number of pages which will be mapped * @vma: the vm area in which the mapping is added * @compound: charge the page as compound or small page * + * The page range of folio is defined by [first_page, first_page + nr_pages) + * * The caller needs to hold the pte lock. */ -void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, - bool compound) +void folio_add_file_rmap_range(struct folio *folio, struct page *page, + unsigned int nr_pages, struct vm_area_struct *vma, + bool compound) { - struct folio *folio = page_folio(page); atomic_t *mapped = &folio->_nr_pages_mapped; - int nr = 0, nr_pmdmapped = 0; - bool first; + unsigned int nr_pmdmapped = 0, first; + int nr = 0; - VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page); + VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio); /* Is page being mapped by PTE? Is this its first map to be added? */ if (likely(!compound)) { - first = atomic_inc_and_test(&page->_mapcount); - nr = first; - if (first && folio_test_large(folio)) { - nr = atomic_inc_return_relaxed(mapped); - nr = (nr < COMPOUND_MAPPED); - } + do { + first = atomic_inc_and_test(&page->_mapcount); + if (first && folio_test_large(folio)) { + first = atomic_inc_return_relaxed(mapped); + first = (nr < COMPOUND_MAPPED); + } + + if (first) + nr++; + } while (page++, --nr_pages > 0); } else if (folio_test_pmd_mappable(folio)) { /* That test is redundant: it's for safety or to optimize out */ @@ -1354,6 +1362,30 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, mlock_vma_folio(folio, vma, compound); } +/** + * page_add_file_rmap - add pte mapping to a file page + * @page: the page to add the mapping to + * @vma: the vm area in which the mapping is added + * @compound: charge the page as compound or small page + * + * The caller needs to hold the pte lock. + */ +void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, + bool compound) +{ + struct folio *folio = page_folio(page); + unsigned int nr_pages; + + VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page); + + if (likely(!compound)) + nr_pages = 1; + else + nr_pages = folio_nr_pages(folio); + + folio_add_file_rmap_range(folio, page, nr_pages, vma, compound); +} + /** * page_remove_rmap - take down pte mapping from a page * @page: page to remove mapping from