Message ID | 431d9fb6823036369dcb1d3b2f63732f01df21a7.1698488264.git.baolin.wang@linux.alibaba.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:d641:0:b0:403:3b70:6f57 with SMTP id cy1csp1928121vqb; Sun, 29 Oct 2023 18:13:49 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHjcFN3jmDjK7fqNXg4UkRSKPqSvfdhI/7brlC9vNRtzCEA/EO0ouBNiaSYa8LnkJqBvbH4 X-Received: by 2002:a17:902:e74b:b0:1c7:7e00:8075 with SMTP id p11-20020a170902e74b00b001c77e008075mr11436362plf.66.1698628429492; Sun, 29 Oct 2023 18:13:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698628429; cv=none; d=google.com; s=arc-20160816; b=xMPziOC6Rx7KzlW5zVuv1S4TYuyZskKBlvyEfuV0DDTb47Escvg7oIJAYlgiq8AlYB uOAY/0aFUwrT5lFSaoHzxK24Hxh+o9gNrMrr9VET6k/RvdB32s3GDXg50jUbQSBUqN28 yx0lCi+JOppsNi12ju+Mq2OM5aiUBGkw9lAXjWX5VuxMi24YydNfahZo3Rx70ma/4kEo jbFv/j5iOBZEZWL+Dlfj0nYB5nshxjspq9SDpnFRwrNqeD3bgmwLstyQ1XMCRyRBv+rU aDuS7S9GXkAVVKxPorMPcfNfVW0BMN3AunjAgx8cW8hLVIGx3yFUN0NaKxdz3i7JzPGf jUYg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=JXanDA/ABYcoIZYeJL4efLBSA/his1lJ3Qr9kTwd/O4=; fh=sO6tabB9Wkbm/jrGt/hKe1LmwgPxoQCGTD1JlX/xdDc=; b=ODKM/2ZlFhdirF5Y13lriONmQ+iFpsW9qSqa+7tDj8JoDg+8Qtt4wj8/1vXpOKy4in Vf4IbP78JBmlLepqPA+pNYIKfLV5EAie2bl+DZwvytb69+Q8IRWU8HEjvcOIzJ22wNyw 4SAAueoJ/h2Ng4AxtcZcxCiD9CzY3KKAdQk4R8kpUYAgEb9Gnm1yRQRCpIhAd1ZpSfr4 eHT+K4RnkCV1A700c9Zt266x58rp+BRccssCaYhy21/Vf+RDIVSd8ZCed1NkjESofdsh nuQ3RMXN8psDT4+uRBWvGT3ITdY4x36K8pp2QKrA+j6ZlNVH5BmskpwDns8pO1MGxl6a mQgQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id t8-20020a170902bc4800b001cc2a6624e5si3334413plz.307.2023.10.29.18.13.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 29 Oct 2023 18:13:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 254868062372; Sun, 29 Oct 2023 18:13:47 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230449AbjJ3BMH (ORCPT <rfc822;zxc52fgh@gmail.com> + 31 others); Sun, 29 Oct 2023 21:12:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47790 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229470AbjJ3BMF (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sun, 29 Oct 2023 21:12:05 -0400 Received: from out30-124.freemail.mail.aliyun.com (out30-124.freemail.mail.aliyun.com [115.124.30.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B587DBD for <linux-kernel@vger.kernel.org>; Sun, 29 Oct 2023 18:12:01 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R381e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045192;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=6;SR=0;TI=SMTPD_---0Vv4ZEFb_1698628318; Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0Vv4ZEFb_1698628318) by smtp.aliyun-inc.com; Mon, 30 Oct 2023 09:11:59 +0800 From: Baolin Wang <baolin.wang@linux.alibaba.com> To: akpm@linux-foundation.org Cc: shy828301@gmail.com, ying.huang@intel.com, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH] mm: huge_memory: batch tlb flush when splitting a pte-mapped THP Date: Mon, 30 Oct 2023 09:11:47 +0800 Message-Id: <431d9fb6823036369dcb1d3b2f63732f01df21a7.1698488264.git.baolin.wang@linux.alibaba.com> X-Mailer: git-send-email 2.39.3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.7 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Sun, 29 Oct 2023 18:13:47 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781141004003622475 X-GMAIL-MSGID: 1781141004003622475 |
Series |
mm: huge_memory: batch tlb flush when splitting a pte-mapped THP
|
|
Commit Message
Baolin Wang
Oct. 30, 2023, 1:11 a.m. UTC
I can observe an obvious tlb flush hotpot when splitting a pte-mapped THP on
my ARM64 server, and the distribution of this hotspot is as follows:
- 16.85% split_huge_page_to_list
+ 7.80% down_write
- 7.49% try_to_migrate
- 7.48% rmap_walk_anon
7.23% ptep_clear_flush
+ 1.52% __split_huge_page
The reason is that the split_huge_page_to_list() will build migration entries
for each subpage of a pte-mapped Anon THP by try_to_migrate(), or unmap for
file THP, and it will clear and tlb flush for each subpage's pte. Moreover,
the split_huge_page_to_list() will set TTU_SPLIT_HUGE_PMD flag to ensure
the THP is already a pte-mapped THP before splitting it to some normal pages.
Actually, there is no need to flush tlb for each subpage immediately, instead
we can batch tlb flush for the pte-mapped THP to improve the performance.
After this patch, we can see the batch tlb flush can improve the latency
obviously when running thpscale.
k6.5-base patched
Amean fault-both-1 1071.17 ( 0.00%) 901.83 * 15.81%*
Amean fault-both-3 2386.08 ( 0.00%) 1865.32 * 21.82%*
Amean fault-both-5 2851.10 ( 0.00%) 2273.84 * 20.25%*
Amean fault-both-7 3679.91 ( 0.00%) 2881.66 * 21.69%*
Amean fault-both-12 5916.66 ( 0.00%) 4369.55 * 26.15%*
Amean fault-both-18 7981.36 ( 0.00%) 6303.57 * 21.02%*
Amean fault-both-24 10950.79 ( 0.00%) 8752.56 * 20.07%*
Amean fault-both-30 14077.35 ( 0.00%) 10170.01 * 27.76%*
Amean fault-both-32 13061.57 ( 0.00%) 11630.08 * 10.96%*
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
mm/huge_memory.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
Comments
Baolin Wang <baolin.wang@linux.alibaba.com> writes: > I can observe an obvious tlb flush hotpot when splitting a pte-mapped THP on > my ARM64 server, and the distribution of this hotspot is as follows: > > - 16.85% split_huge_page_to_list > + 7.80% down_write > - 7.49% try_to_migrate > - 7.48% rmap_walk_anon > 7.23% ptep_clear_flush > + 1.52% __split_huge_page > > The reason is that the split_huge_page_to_list() will build migration entries > for each subpage of a pte-mapped Anon THP by try_to_migrate(), or unmap for > file THP, and it will clear and tlb flush for each subpage's pte. Moreover, > the split_huge_page_to_list() will set TTU_SPLIT_HUGE_PMD flag to ensure > the THP is already a pte-mapped THP before splitting it to some normal pages. > > Actually, there is no need to flush tlb for each subpage immediately, instead > we can batch tlb flush for the pte-mapped THP to improve the performance. > > After this patch, we can see the batch tlb flush can improve the latency > obviously when running thpscale. > k6.5-base patched > Amean fault-both-1 1071.17 ( 0.00%) 901.83 * 15.81%* > Amean fault-both-3 2386.08 ( 0.00%) 1865.32 * 21.82%* > Amean fault-both-5 2851.10 ( 0.00%) 2273.84 * 20.25%* > Amean fault-both-7 3679.91 ( 0.00%) 2881.66 * 21.69%* > Amean fault-both-12 5916.66 ( 0.00%) 4369.55 * 26.15%* > Amean fault-both-18 7981.36 ( 0.00%) 6303.57 * 21.02%* > Amean fault-both-24 10950.79 ( 0.00%) 8752.56 * 20.07%* > Amean fault-both-30 14077.35 ( 0.00%) 10170.01 * 27.76%* > Amean fault-both-32 13061.57 ( 0.00%) 11630.08 * 10.96%* > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> LGTM, Thanks! Reviewed-by: "Huang, Ying" <ying.huang@intel.com> > --- > mm/huge_memory.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index f31f02472396..0e4c14bf6872 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2379,7 +2379,7 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma, > static void unmap_folio(struct folio *folio) > { > enum ttu_flags ttu_flags = TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD | > - TTU_SYNC; > + TTU_SYNC | TTU_BATCH_FLUSH; > > VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); > > @@ -2392,6 +2392,8 @@ static void unmap_folio(struct folio *folio) > try_to_migrate(folio, ttu_flags); > else > try_to_unmap(folio, ttu_flags | TTU_IGNORE_MLOCK); > + > + try_to_unmap_flush(); > } > > static void remap_page(struct folio *folio, unsigned long nr)
On Sun, Oct 29, 2023 at 6:12 PM Baolin Wang <baolin.wang@linux.alibaba.com> wrote: > > I can observe an obvious tlb flush hotpot when splitting a pte-mapped THP on > my ARM64 server, and the distribution of this hotspot is as follows: > > - 16.85% split_huge_page_to_list > + 7.80% down_write > - 7.49% try_to_migrate > - 7.48% rmap_walk_anon > 7.23% ptep_clear_flush > + 1.52% __split_huge_page > > The reason is that the split_huge_page_to_list() will build migration entries > for each subpage of a pte-mapped Anon THP by try_to_migrate(), or unmap for > file THP, and it will clear and tlb flush for each subpage's pte. Moreover, > the split_huge_page_to_list() will set TTU_SPLIT_HUGE_PMD flag to ensure > the THP is already a pte-mapped THP before splitting it to some normal pages. > > Actually, there is no need to flush tlb for each subpage immediately, instead > we can batch tlb flush for the pte-mapped THP to improve the performance. > > After this patch, we can see the batch tlb flush can improve the latency > obviously when running thpscale. > k6.5-base patched > Amean fault-both-1 1071.17 ( 0.00%) 901.83 * 15.81%* > Amean fault-both-3 2386.08 ( 0.00%) 1865.32 * 21.82%* > Amean fault-both-5 2851.10 ( 0.00%) 2273.84 * 20.25%* > Amean fault-both-7 3679.91 ( 0.00%) 2881.66 * 21.69%* > Amean fault-both-12 5916.66 ( 0.00%) 4369.55 * 26.15%* > Amean fault-both-18 7981.36 ( 0.00%) 6303.57 * 21.02%* > Amean fault-both-24 10950.79 ( 0.00%) 8752.56 * 20.07%* > Amean fault-both-30 14077.35 ( 0.00%) 10170.01 * 27.76%* > Amean fault-both-32 13061.57 ( 0.00%) 11630.08 * 10.96%* > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Yang Shi <shy828301@gmail.com> > --- > mm/huge_memory.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index f31f02472396..0e4c14bf6872 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2379,7 +2379,7 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma, > static void unmap_folio(struct folio *folio) > { > enum ttu_flags ttu_flags = TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD | > - TTU_SYNC; > + TTU_SYNC | TTU_BATCH_FLUSH; > > VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); > > @@ -2392,6 +2392,8 @@ static void unmap_folio(struct folio *folio) > try_to_migrate(folio, ttu_flags); > else > try_to_unmap(folio, ttu_flags | TTU_IGNORE_MLOCK); > + > + try_to_unmap_flush(); > } > > static void remap_page(struct folio *folio, unsigned long nr) > -- > 2.39.3 >
Baolin Wang <baolin.wang@linux.alibaba.com> writes: > I can observe an obvious tlb flush hotpot when splitting a pte-mapped THP on A tlb flush hotpot does sound delicious, but I think you meant hotspot :-) > my ARM64 server, and the distribution of this hotspot is as follows: > > - 16.85% split_huge_page_to_list > + 7.80% down_write > - 7.49% try_to_migrate > - 7.48% rmap_walk_anon > 7.23% ptep_clear_flush > + 1.52% __split_huge_page > > The reason is that the split_huge_page_to_list() will build migration entries > for each subpage of a pte-mapped Anon THP by try_to_migrate(), or unmap for > file THP, and it will clear and tlb flush for each subpage's pte. Moreover, > the split_huge_page_to_list() will set TTU_SPLIT_HUGE_PMD flag to ensure > the THP is already a pte-mapped THP before splitting it to some normal pages. The only other user of TTU_SPLIT_HUGE_PMD is vmscan which also sets TTU_BATCH_FLUSH so we could make the former imply the latter but that seem dangerous given the requirement to call try_to_unmap_flush() so best not to. Reviewed-by: Alistair Popple <apopple@nvidia.com> > Actually, there is no need to flush tlb for each subpage immediately, instead > we can batch tlb flush for the pte-mapped THP to improve the performance. > > After this patch, we can see the batch tlb flush can improve the latency > obviously when running thpscale. > k6.5-base patched > Amean fault-both-1 1071.17 ( 0.00%) 901.83 * 15.81%* > Amean fault-both-3 2386.08 ( 0.00%) 1865.32 * 21.82%* > Amean fault-both-5 2851.10 ( 0.00%) 2273.84 * 20.25%* > Amean fault-both-7 3679.91 ( 0.00%) 2881.66 * 21.69%* > Amean fault-both-12 5916.66 ( 0.00%) 4369.55 * 26.15%* > Amean fault-both-18 7981.36 ( 0.00%) 6303.57 * 21.02%* > Amean fault-both-24 10950.79 ( 0.00%) 8752.56 * 20.07%* > Amean fault-both-30 14077.35 ( 0.00%) 10170.01 * 27.76%* > Amean fault-both-32 13061.57 ( 0.00%) 11630.08 * 10.96%* > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> > --- > mm/huge_memory.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index f31f02472396..0e4c14bf6872 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2379,7 +2379,7 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma, > static void unmap_folio(struct folio *folio) > { > enum ttu_flags ttu_flags = TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD | > - TTU_SYNC; > + TTU_SYNC | TTU_BATCH_FLUSH; > > VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); > > @@ -2392,6 +2392,8 @@ static void unmap_folio(struct folio *folio) > try_to_migrate(folio, ttu_flags); > else > try_to_unmap(folio, ttu_flags | TTU_IGNORE_MLOCK); > + > + try_to_unmap_flush(); > } > > static void remap_page(struct folio *folio, unsigned long nr)
On 11/1/2023 2:13 PM, Alistair Popple wrote: > > Baolin Wang <baolin.wang@linux.alibaba.com> writes: > >> I can observe an obvious tlb flush hotpot when splitting a pte-mapped THP on > > A tlb flush hotpot does sound delicious, but I think you meant hotspot :-) Ah, yes. Hope Andrew can help to fix it :) >> my ARM64 server, and the distribution of this hotspot is as follows: >> >> - 16.85% split_huge_page_to_list >> + 7.80% down_write >> - 7.49% try_to_migrate >> - 7.48% rmap_walk_anon >> 7.23% ptep_clear_flush >> + 1.52% __split_huge_page >> >> The reason is that the split_huge_page_to_list() will build migration entries >> for each subpage of a pte-mapped Anon THP by try_to_migrate(), or unmap for >> file THP, and it will clear and tlb flush for each subpage's pte. Moreover, >> the split_huge_page_to_list() will set TTU_SPLIT_HUGE_PMD flag to ensure >> the THP is already a pte-mapped THP before splitting it to some normal pages. > > The only other user of TTU_SPLIT_HUGE_PMD is vmscan which also sets > TTU_BATCH_FLUSH so we could make the former imply the latter but that > seem dangerous given the requirement to call try_to_unmap_flush() so > best not to. > > Reviewed-by: Alistair Popple <apopple@nvidia.com> Thanks for reviewing, and also thanks to Ying and Yang.
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index f31f02472396..0e4c14bf6872 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2379,7 +2379,7 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma, static void unmap_folio(struct folio *folio) { enum ttu_flags ttu_flags = TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD | - TTU_SYNC; + TTU_SYNC | TTU_BATCH_FLUSH; VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); @@ -2392,6 +2392,8 @@ static void unmap_folio(struct folio *folio) try_to_migrate(folio, ttu_flags); else try_to_unmap(folio, ttu_flags | TTU_IGNORE_MLOCK); + + try_to_unmap_flush(); } static void remap_page(struct folio *folio, unsigned long nr)