Message ID | 20230410075224.827740-1-ying.huang@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1741573vqo; Mon, 10 Apr 2023 01:02:05 -0700 (PDT) X-Google-Smtp-Source: AKy350aqXZNzZ1rB1BbPe3FJAWneWKnl5eqN4JzUSs1E6iWHZ1Fe9y/1QDTjn8H3CapNOxMdNW8C X-Received: by 2002:a17:907:a44:b0:947:d3d0:ae1c with SMTP id be4-20020a1709070a4400b00947d3d0ae1cmr8294254ejc.0.1681113724966; Mon, 10 Apr 2023 01:02:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681113724; cv=none; d=google.com; s=arc-20160816; b=N2Kn2d03tBvZHCHxwXQDImIW8Oo1LT2pa2R8a757EsVzlz0pB2JFT7MC+WmbfZqW4r eFlsGlEzDJ/7WvbtABSOa1fZX0I1xWJuNyd38CUkPeriau/Ov1z+qgUOMvNgsLk/A5Mj CRW85TyJmY4yiDhOM8nTrkEwdPFj9+0mM9tjX1cCI9uN8bQVLYqgUwebZ7hHWpz403T5 Qw3773NAqxZXwF47AY/+AdqazmVS+dWtBILyg1eQCS+lqvDB0mQ6DmO18lGb5gDwKrDs EZO33SrSXquo/NKYIdEvtRCPlupXw/JzPfCCkcBf9CqLBcW6NhVvPPUA+a8s5KLUjL5B fcaw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=t25TH+DrYgdaBpyVbEg2QRfMlNlzGQ5maHyUQhDNAaY=; b=BIEt8PD8Isu1Jk+YA4l+pW0e9+I9chWMIqANgxvHOMJ4TbkmAIdwLllB3qAMtyLXd/ oxC7JTo+OzeCaBOa9LB5KNhnxMww1NcbkFyPiGofSpFDlWhP+7YmeYm+6DoWHmbe0iKS OJI44jPqGlHMocey0ZZ4c/bksmT7nRfC4MDfaFlmCOguLx7g/dtYW+Rq+Zt1yH8OJ8iF FrvpHT6oA8QBed2m5d5piaHOsQCoyjvoE8lubMiNTiAOOh0pKF7Rsqxmxop26CTXDSYn /Jdp0a/OCatB3oavM+z11gW1wGHbvO/4P7m7iEOeNmzYwJH++HTc6pmCXqxBWI6/DfR6 fhJg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ehW1Ke8Y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id rl27-20020a170907217b00b0094b8ebd9219si243915ejb.849.2023.04.10.01.01.39; Mon, 10 Apr 2023 01:02:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ehW1Ke8Y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229596AbjDJHwq (ORCPT <rfc822;a1648639935@gmail.com> + 99 others); Mon, 10 Apr 2023 03:52:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56902 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229571AbjDJHwo (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 10 Apr 2023 03:52:44 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 810383C0A for <linux-kernel@vger.kernel.org>; Mon, 10 Apr 2023 00:52:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1681113163; x=1712649163; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=Yr48j2bIPY5fAJkFuSojw587VB/8YMUbbc+kU0mekfQ=; b=ehW1Ke8YNasHU8DAH6yqxqgeZDpVMckFOJ/T4wB2X4WMH+YWEwdhc04S kErYOnoXHIqSe7fFmR5FTMKG14JNsYnMLJ47yn/2eUsqrD7PGXvjAs7g4 eWn40cFEGmkzLqYsC+0Ah067Kyqvpw6UaO+K2ZErYZQKGpdShDyCvkph3 xH0VzuW8vnu0+aDhkmiLUYSrTM+2fSvifrvR9XgCjG1Kbwmt5TPATvo8j nGAGrgmhEQFUZY2PXXStereMSkIle2YKFG/2oHA8+UxGMMGKFdXR385lO bDzBUY5bemDCzGgQmCl/QHL00V/NuEntF/fUgnmgxVTAN9ZAJXfcLkMeX A==; X-IronPort-AV: E=McAfee;i="6600,9927,10675"; a="345969183" X-IronPort-AV: E=Sophos;i="5.98,333,1673942400"; d="scan'208";a="345969183" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2023 00:52:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10675"; a="681667638" X-IronPort-AV: E=Sophos;i="5.98,333,1673942400"; d="scan'208";a="681667638" Received: from yhuang6-mobl2.sh.intel.com ([10.238.7.50]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2023 00:52:40 -0700 From: Huang Ying <ying.huang@intel.com> To: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying <ying.huang@intel.com>, kernel test robot <yujie.liu@intel.com>, Nadav Amit <namit@vmware.com>, Mel Gorman <mgorman@techsingularity.net>, Hugh Dickins <hughd@google.com>, Matthew Wilcox <willy@infradead.org>, David Hildenbrand <david@redhat.com> Subject: [PATCH] mm,unmap: avoid flushing TLB in batch if PTE is inaccessible Date: Mon, 10 Apr 2023 15:52:24 +0800 Message-Id: <20230410075224.827740-1-ying.huang@intel.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.5 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762775505527983468?= X-GMAIL-MSGID: =?utf-8?q?1762775505527983468?= |
Series |
mm,unmap: avoid flushing TLB in batch if PTE is inaccessible
|
|
Commit Message
Huang, Ying
April 10, 2023, 7:52 a.m. UTC
0Day/LKP reported a performance regression for commit
7e12beb8ca2a ("migrate_pages: batch flushing TLB"). In the commit, the
TLB flushing during page migration is batched. So, in
try_to_migrate_one(), ptep_clear_flush() is replaced with
set_tlb_ubc_flush_pending(). In further investigation, it is found
that the TLB flushing can be avoided in ptep_clear_flush() if the PTE
is inaccessible. In fact, we can optimize in similar way for the
batched TLB flushing too to improve the performance.
So in this patch, we check pte_accessible() before
set_tlb_ubc_flush_pending() in try_to_unmap/migrate_one(). Tests show
that the benchmark score of the anon-cow-rand-mt test case of
vm-scalability test suite can improve up to 2.1% with the patch on a
Intel server machine. The TLB flushing IPI can reduce up to 44.3%.
Link: https://lore.kernel.org/oe-lkp/202303192325.ecbaf968-yujie.liu@intel.com
Link: https://lore.kernel.org/oe-lkp/ab92aaddf1b52ede15e2c608696c36765a2602c1.camel@intel.com/
Fixes: 7e12beb8ca2a ("migrate_pages: batch flushing TLB")
Reported-by: kernel test robot <yujie.liu@intel.com>
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
---
mm/rmap.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
Comments
> On Apr 10, 2023, at 12:52 AM, Huang Ying <ying.huang@intel.com> wrote: > > 0Day/LKP reported a performance regression for commit > 7e12beb8ca2a ("migrate_pages: batch flushing TLB"). In the commit, the > TLB flushing during page migration is batched. So, in > try_to_migrate_one(), ptep_clear_flush() is replaced with > set_tlb_ubc_flush_pending(). In further investigation, it is found > that the TLB flushing can be avoided in ptep_clear_flush() if the PTE > is inaccessible. In fact, we can optimize in similar way for the > batched TLB flushing too to improve the performance. > > So in this patch, we check pte_accessible() before > set_tlb_ubc_flush_pending() in try_to_unmap/migrate_one(). Tests show > that the benchmark score of the anon-cow-rand-mt test case of > vm-scalability test suite can improve up to 2.1% with the patch on a > Intel server machine. The TLB flushing IPI can reduce up to 44.3%. LGTM. I know it’s meaningless for x86 (but perhaps ARM would use this infra too): do we need smp_mb__after_atomic() after ptep_get_and_clear() and before pte_accessible()? In addition, if this goes into stable (based on the Fixes tag), consider breaking it into 2 patches, when only one would be backported.
Hi, Amit, Thank you very much for review! Nadav Amit <namit@vmware.com> writes: >> On Apr 10, 2023, at 12:52 AM, Huang Ying <ying.huang@intel.com> wrote: >> >> 0Day/LKP reported a performance regression for commit >> 7e12beb8ca2a ("migrate_pages: batch flushing TLB"). In the commit, the >> TLB flushing during page migration is batched. So, in >> try_to_migrate_one(), ptep_clear_flush() is replaced with >> set_tlb_ubc_flush_pending(). In further investigation, it is found >> that the TLB flushing can be avoided in ptep_clear_flush() if the PTE >> is inaccessible. In fact, we can optimize in similar way for the >> batched TLB flushing too to improve the performance. >> >> So in this patch, we check pte_accessible() before >> set_tlb_ubc_flush_pending() in try_to_unmap/migrate_one(). Tests show >> that the benchmark score of the anon-cow-rand-mt test case of >> vm-scalability test suite can improve up to 2.1% with the patch on a >> Intel server machine. The TLB flushing IPI can reduce up to 44.3%. > > LGTM. Thanks! > I know it’s meaningless for x86 (but perhaps ARM would use this infra > too): do we need smp_mb__after_atomic() after ptep_get_and_clear() and > before pte_accessible()? Why do we need the memory barrier? IIUC, the PTL is locked, so PTE value will not be changed under us. Anything else? > In addition, if this goes into stable (based on the Fixes tag), consider > breaking it into 2 patches, when only one would be backported. The fixed commit (7e12beb8ca2a ("migrate_pages: batch flushing TLB")) is merged by v6.3-rc1. So this patch will only be backported to v6.3 and later. Is it OK? Best Regards, Huang, Ying
> On Apr 10, 2023, at 6:31 PM, Huang, Ying <ying.huang@intel.com> wrote: > > !! External Email > > Hi, Amit, > > Thank you very much for review! > > Nadav Amit <namit@vmware.com> writes: > >>> On Apr 10, 2023, at 12:52 AM, Huang Ying <ying.huang@intel.com> wrote: >>> >>> 0Day/LKP reported a performance regression for commit >>> 7e12beb8ca2a ("migrate_pages: batch flushing TLB"). In the commit, the >>> TLB flushing during page migration is batched. So, in >>> try_to_migrate_one(), ptep_clear_flush() is replaced with >>> set_tlb_ubc_flush_pending(). In further investigation, it is found >>> that the TLB flushing can be avoided in ptep_clear_flush() if the PTE >>> is inaccessible. In fact, we can optimize in similar way for the >>> batched TLB flushing too to improve the performance. >>> >>> So in this patch, we check pte_accessible() before >>> set_tlb_ubc_flush_pending() in try_to_unmap/migrate_one(). Tests show >>> that the benchmark score of the anon-cow-rand-mt test case of >>> vm-scalability test suite can improve up to 2.1% with the patch on a >>> Intel server machine. The TLB flushing IPI can reduce up to 44.3%. >> >> LGTM. > > Thanks! > >> I know it’s meaningless for x86 (but perhaps ARM would use this infra >> too): do we need smp_mb__after_atomic() after ptep_get_and_clear() and >> before pte_accessible()? > > Why do we need the memory barrier? IIUC, the PTL is locked, so PTE > value will not be changed under us. Anything else? I was thinking about the ordering with respect to atomic_read(&mm->tlb_flush_pending), which is not protected by the PTL. I guess you can correctly argue that because of other control-flow dependencies, the barrier is not necessary. > >> In addition, if this goes into stable (based on the Fixes tag), consider >> breaking it into 2 patches, when only one would be backported. > > The fixed commit (7e12beb8ca2a ("migrate_pages: batch flushing TLB")) is > merged by v6.3-rc1. So this patch will only be backported to v6.3 and > later. Is it OK? Of course. I wasn’t sure when the bug was introduced.
Nadav Amit <namit@vmware.com> writes: >> On Apr 10, 2023, at 6:31 PM, Huang, Ying <ying.huang@intel.com> wrote: >> >> !! External Email >> >> Hi, Amit, >> >> Thank you very much for review! >> >> Nadav Amit <namit@vmware.com> writes: >> >>>> On Apr 10, 2023, at 12:52 AM, Huang Ying <ying.huang@intel.com> wrote: >>>> >>>> 0Day/LKP reported a performance regression for commit >>>> 7e12beb8ca2a ("migrate_pages: batch flushing TLB"). In the commit, the >>>> TLB flushing during page migration is batched. So, in >>>> try_to_migrate_one(), ptep_clear_flush() is replaced with >>>> set_tlb_ubc_flush_pending(). In further investigation, it is found >>>> that the TLB flushing can be avoided in ptep_clear_flush() if the PTE >>>> is inaccessible. In fact, we can optimize in similar way for the >>>> batched TLB flushing too to improve the performance. >>>> >>>> So in this patch, we check pte_accessible() before >>>> set_tlb_ubc_flush_pending() in try_to_unmap/migrate_one(). Tests show >>>> that the benchmark score of the anon-cow-rand-mt test case of >>>> vm-scalability test suite can improve up to 2.1% with the patch on a >>>> Intel server machine. The TLB flushing IPI can reduce up to 44.3%. >>> >>> LGTM. >> >> Thanks! >> >>> I know it’s meaningless for x86 (but perhaps ARM would use this infra >>> too): do we need smp_mb__after_atomic() after ptep_get_and_clear() and >>> before pte_accessible()? >> >> Why do we need the memory barrier? IIUC, the PTL is locked, so PTE >> value will not be changed under us. Anything else? > > I was thinking about the ordering with respect to > atomic_read(&mm->tlb_flush_pending), which is not protected by the PTL. > I guess you can correctly argue that because of other control-flow > dependencies, the barrier is not necessary. For ordering between ptep_get_and_clear() and atomic_read(&mm->tlb_flush_pending), I think PTL has provided the necessary protection already. The code path to write mm->tlb_flush_pending is, tlb_gather_mmu inc_tlb_flush_pending a) lock PTL change PTE b) unlock PTL tlb_finish_mmu dec_tlb_flush_pending c) While code path of try_to_unmap/migrate_one is, lock PTL read and change PTE d) read mm->tlb_flush_pending e) unlock PTL Even if e) occurs before d), they cannot occur at the same time of b). Do I miss anything? Best Regards, Huang, Ying [snip]
> On Apr 11, 2023, at 6:50 PM, Huang, Ying <ying.huang@intel.com> wrote: > > !! External Email > > Nadav Amit <namit@vmware.com> writes: > >>> On Apr 10, 2023, at 6:31 PM, Huang, Ying <ying.huang@intel.com> wrote: >>> >>> !! External Email >>> >>> Hi, Amit, >>> >>> Thank you very much for review! >>> >>> Nadav Amit <namit@vmware.com> writes: >>> >>>>> On Apr 10, 2023, at 12:52 AM, Huang Ying <ying.huang@intel.com> wrote: >>>>> >>>>> 0Day/LKP reported a performance regression for commit >>>>> 7e12beb8ca2a ("migrate_pages: batch flushing TLB"). In the commit, the >>>>> TLB flushing during page migration is batched. So, in >>>>> try_to_migrate_one(), ptep_clear_flush() is replaced with >>>>> set_tlb_ubc_flush_pending(). In further investigation, it is found >>>>> that the TLB flushing can be avoided in ptep_clear_flush() if the PTE >>>>> is inaccessible. In fact, we can optimize in similar way for the >>>>> batched TLB flushing too to improve the performance. >>>>> >>>>> So in this patch, we check pte_accessible() before >>>>> set_tlb_ubc_flush_pending() in try_to_unmap/migrate_one(). Tests show >>>>> that the benchmark score of the anon-cow-rand-mt test case of >>>>> vm-scalability test suite can improve up to 2.1% with the patch on a >>>>> Intel server machine. The TLB flushing IPI can reduce up to 44.3%. >>>> >>>> LGTM. >>> >>> Thanks! >>> >>>> I know it’s meaningless for x86 (but perhaps ARM would use this infra >>>> too): do we need smp_mb__after_atomic() after ptep_get_and_clear() and >>>> before pte_accessible()? >>> >>> Why do we need the memory barrier? IIUC, the PTL is locked, so PTE >>> value will not be changed under us. Anything else? >> >> I was thinking about the ordering with respect to >> atomic_read(&mm->tlb_flush_pending), which is not protected by the PTL. >> I guess you can correctly argue that because of other control-flow >> dependencies, the barrier is not necessary. > > For ordering between ptep_get_and_clear() and > atomic_read(&mm->tlb_flush_pending), I think PTL has provided the > necessary protection already. The code path to write > mm->tlb_flush_pending is, > > tlb_gather_mmu > inc_tlb_flush_pending a) > lock PTL > change PTE b) > unlock PTL > tlb_finish_mmu > dec_tlb_flush_pending c) > > While code path of try_to_unmap/migrate_one is, > > lock PTL > read and change PTE d) > read mm->tlb_flush_pending e) > unlock PTL > > Even if e) occurs before d), they cannot occur at the same time of b). > Do I miss anything? You didn’t miss anything. I went over the comment on inc_tlb_flush_pending() and you follow the scheme.
Nadav Amit <namit@vmware.com> writes: >> On Apr 11, 2023, at 6:50 PM, Huang, Ying <ying.huang@intel.com> wrote: >> >> !! External Email >> >> Nadav Amit <namit@vmware.com> writes: >> >>>> On Apr 10, 2023, at 6:31 PM, Huang, Ying <ying.huang@intel.com> wrote: >>>> >>>> !! External Email >>>> >>>> Hi, Amit, >>>> >>>> Thank you very much for review! >>>> >>>> Nadav Amit <namit@vmware.com> writes: >>>> >>>>>> On Apr 10, 2023, at 12:52 AM, Huang Ying <ying.huang@intel.com> wrote: >>>>>> >>>>>> 0Day/LKP reported a performance regression for commit >>>>>> 7e12beb8ca2a ("migrate_pages: batch flushing TLB"). In the commit, the >>>>>> TLB flushing during page migration is batched. So, in >>>>>> try_to_migrate_one(), ptep_clear_flush() is replaced with >>>>>> set_tlb_ubc_flush_pending(). In further investigation, it is found >>>>>> that the TLB flushing can be avoided in ptep_clear_flush() if the PTE >>>>>> is inaccessible. In fact, we can optimize in similar way for the >>>>>> batched TLB flushing too to improve the performance. >>>>>> >>>>>> So in this patch, we check pte_accessible() before >>>>>> set_tlb_ubc_flush_pending() in try_to_unmap/migrate_one(). Tests show >>>>>> that the benchmark score of the anon-cow-rand-mt test case of >>>>>> vm-scalability test suite can improve up to 2.1% with the patch on a >>>>>> Intel server machine. The TLB flushing IPI can reduce up to 44.3%. >>>>> >>>>> LGTM. >>>> >>>> Thanks! >>>> >>>>> I know it’s meaningless for x86 (but perhaps ARM would use this infra >>>>> too): do we need smp_mb__after_atomic() after ptep_get_and_clear() and >>>>> before pte_accessible()? >>>> >>>> Why do we need the memory barrier? IIUC, the PTL is locked, so PTE >>>> value will not be changed under us. Anything else? >>> >>> I was thinking about the ordering with respect to >>> atomic_read(&mm->tlb_flush_pending), which is not protected by the PTL. >>> I guess you can correctly argue that because of other control-flow >>> dependencies, the barrier is not necessary. >> >> For ordering between ptep_get_and_clear() and >> atomic_read(&mm->tlb_flush_pending), I think PTL has provided the >> necessary protection already. The code path to write >> mm->tlb_flush_pending is, >> >> tlb_gather_mmu >> inc_tlb_flush_pending a) >> lock PTL >> change PTE b) >> unlock PTL >> tlb_finish_mmu >> dec_tlb_flush_pending c) >> >> While code path of try_to_unmap/migrate_one is, >> >> lock PTL >> read and change PTE d) >> read mm->tlb_flush_pending e) >> unlock PTL >> >> Even if e) occurs before d), they cannot occur at the same time of b). >> Do I miss anything? > > You didn’t miss anything. I went over the comment on > inc_tlb_flush_pending() and you follow the scheme. Thanks! Can I get your acked-by or reviewed-by for this patch? Best Regards, Huang, Ying
> On Apr 17, 2023, at 8:17 PM, Huang, Ying <ying.huang@intel.com> wrote: > > !! External Email > > Nadav Amit <namit@vmware.com> writes: >> >> You didn’t miss anything. I went over the comment on >> inc_tlb_flush_pending() and you follow the scheme. > > Thanks! Can I get your acked-by or reviewed-by for this patch? Reviewed-by: Nadav Amit <namit@vmware.com> Thanks, Nadav
在 2023/4/10 下午3:52, Huang Ying 写道: > 0Day/LKP reported a performance regression for commit > 7e12beb8ca2a ("migrate_pages: batch flushing TLB"). In the commit, the > TLB flushing during page migration is batched. So, in > try_to_migrate_one(), ptep_clear_flush() is replaced with > set_tlb_ubc_flush_pending(). In further investigation, it is found > that the TLB flushing can be avoided in ptep_clear_flush() if the PTE > is inaccessible. In fact, we can optimize in similar way for the > batched TLB flushing too to improve the performance. > > So in this patch, we check pte_accessible() before > set_tlb_ubc_flush_pending() in try_to_unmap/migrate_one(). Tests show > that the benchmark score of the anon-cow-rand-mt test case of > vm-scalability test suite can improve up to 2.1% with the patch on a > Intel server machine. The TLB flushing IPI can reduce up to 44.3%. > > Link: https://lore.kernel.org/oe-lkp/202303192325.ecbaf968-yujie.liu@intel.com > Link: https://lore.kernel.org/oe-lkp/ab92aaddf1b52ede15e2c608696c36765a2602c1.camel@intel.com/ > Fixes: 7e12beb8ca2a ("migrate_pages: batch flushing TLB") > Reported-by: kernel test robot <yujie.liu@intel.com> > Signed-off-by: "Huang, Ying" <ying.huang@intel.com> > Cc: Nadav Amit <namit@vmware.com> > Cc: Mel Gorman <mgorman@techsingularity.net> > Cc: Hugh Dickins <hughd@google.com> > Cc: Matthew Wilcox (Oracle) <willy@infradead.org> > Cc: David Hildenbrand <david@redhat.com> > --- > mm/rmap.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/mm/rmap.c b/mm/rmap.c > index 8632e02661ac..3c7c43642d7c 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1582,7 +1582,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, > */ > pteval = ptep_get_and_clear(mm, address, pvmw.pte); > > - set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); > + if (pte_accessible(mm, pteval)) > + set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); > } else { > pteval = ptep_clear_flush(vma, address, pvmw.pte); > } > @@ -1963,7 +1964,8 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, > */ > pteval = ptep_get_and_clear(mm, address, pvmw.pte); > > - set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); > + if (pte_accessible(mm, pteval)) > + set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); Just a advice, can you put pte_accessible() into set_tlb_ubc_flush_pendin(), just like ptep_clear_flush(); so that we no need to add if (pte_accessible()) in per place where call set_tlb_ubc_flush_pending(); > } else { > pteval = ptep_clear_flush(vma, address, pvmw.pte); > }
haoxin <xhao@linux.alibaba.com> writes: > ( 2023/4/10 H3:52, Huang Ying S: >> 0Day/LKP reported a performance regression for commit >> 7e12beb8ca2a ("migrate_pages: batch flushing TLB"). In the commit, the >> TLB flushing during page migration is batched. So, in >> try_to_migrate_one(), ptep_clear_flush() is replaced with >> set_tlb_ubc_flush_pending(). In further investigation, it is found >> that the TLB flushing can be avoided in ptep_clear_flush() if the PTE >> is inaccessible. In fact, we can optimize in similar way for the >> batched TLB flushing too to improve the performance. >> >> So in this patch, we check pte_accessible() before >> set_tlb_ubc_flush_pending() in try_to_unmap/migrate_one(). Tests show >> that the benchmark score of the anon-cow-rand-mt test case of >> vm-scalability test suite can improve up to 2.1% with the patch on a >> Intel server machine. The TLB flushing IPI can reduce up to 44.3%. >> >> Link: https://lore.kernel.org/oe-lkp/202303192325.ecbaf968-yujie.liu@intel.com >> Link: https://lore.kernel.org/oe-lkp/ab92aaddf1b52ede15e2c608696c36765a2602c1.camel@intel.com/ >> Fixes: 7e12beb8ca2a ("migrate_pages: batch flushing TLB") >> Reported-by: kernel test robot <yujie.liu@intel.com> >> Signed-off-by: "Huang, Ying" <ying.huang@intel.com> >> Cc: Nadav Amit <namit@vmware.com> >> Cc: Mel Gorman <mgorman@techsingularity.net> >> Cc: Hugh Dickins <hughd@google.com> >> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> >> Cc: David Hildenbrand <david@redhat.com> >> --- >> mm/rmap.c | 6 ++++-- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/mm/rmap.c b/mm/rmap.c >> index 8632e02661ac..3c7c43642d7c 100644 >> --- a/mm/rmap.c >> +++ b/mm/rmap.c >> @@ -1582,7 +1582,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, >> */ >> pteval = ptep_get_and_clear(mm, address, pvmw.pte); >> - set_tlb_ubc_flush_pending(mm, >> pte_dirty(pteval)); >> + if (pte_accessible(mm, pteval)) >> + set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); >> } else { >> pteval = ptep_clear_flush(vma, address, pvmw.pte); >> } >> @@ -1963,7 +1964,8 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, >> */ >> pteval = ptep_get_and_clear(mm, address, pvmw.pte); >> - set_tlb_ubc_flush_pending(mm, >> pte_dirty(pteval)); >> + if (pte_accessible(mm, pteval)) >> + set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); > > Just a advice, can you put pte_accessible() into > set_tlb_ubc_flush_pendin(), just like ptep_clear_flush(); so that we > no need to add if (pte_accessible()) in per place > > where call set_tlb_ubc_flush_pending(); Sounds reasonable for me, will do that in the next version. Thanks for suggestion. Best Regards, Huang, Ying >> } else { >> pteval = ptep_clear_flush(vma, address, pvmw.pte); >> }
diff --git a/mm/rmap.c b/mm/rmap.c index 8632e02661ac..3c7c43642d7c 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1582,7 +1582,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, */ pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); + if (pte_accessible(mm, pteval)) + set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); } @@ -1963,7 +1964,8 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, */ pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); + if (pte_accessible(mm, pteval)) + set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); }