Message ID | 20230918073318.1181104-1-fengwei.yin@intel.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp2548230vqi; Mon, 18 Sep 2023 03:15:40 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHZVkbv03m6/xnqzdpeeIhFKwqiu1QQGlzHXCkly//TcgYspYUSanOZ49F5cXOSYxTNQqgx X-Received: by 2002:a17:902:9b83:b0:1bf:73ec:b980 with SMTP id y3-20020a1709029b8300b001bf73ecb980mr6299608plp.66.1695032140475; Mon, 18 Sep 2023 03:15:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695032140; cv=none; d=google.com; s=arc-20160816; b=iPafb78l2SxYGe3y231wK0t5K1lhmfGrEYyV2RDsblEilGYK/Y54gFVyC7Tmcmys9d QHLW95M4o1mTN9qPX6ZZu84AXqf737DTXV5JlY3U9pILCwwV4iQCcoWZdIiXoDwq9HwE jQRI+PIWlfKyRdykYS/3rvrJ+CRSVftzcFivKX3Cx8TQ7/Y5dYE5HKcgZziCDIE7B0uL 4Td6VHTHMhWJKxyv3Inw8X/uDy26K8fyxhwJpqnFLndU0262u55V7+AaQr3/5hDdd4qM n6BSMLzBlrzltYSdZRdyToPG0BBrAPxk8q+egxEtNo2cEeNHAPrTyz5mD6E9nzWraKFJ o9Qw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=lT6UF5ILleu9h6Vq9PriQChkKcgzw5T/lmPdaSatTgA=; fh=KwayAskJSS4ZGW1RnI6GoBsv+cfF2sa1gQNWUyxtw50=; b=NUNctpHG0uLhgRNioqpCFRPjcfa4PfAZQJgxZpRyUCQsjkxSvuzERUQ5cWt/MYzCqJ 3VJoM2dMlFwA33Cg2Pv98+5eG1U+0KBkHJTlvgGf0b4C8y0LnEOh1AN63a2LTJgyhVjP 9Lw/uaBKPz5RM1wppr3jSNSKxGQ13NAPF4WQn05lHrFOJw1HCp1hEE38435eKs48Md3b L2JkOq1ecrehCj+tll9dBaO+Om4wklhmtuHvDDhNT8p/FUO/oa0/2UEc76PnEkso+LLI yc2XaErSAjAJe9XJsKsz3CWzJkFPLprunHJ9xpfZ5YMzT3uE5Sr5BzcnvLeNAxny+Ra0 04bA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Uw+Gk7Yv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id h7-20020a170902f54700b001bb8a57d518si2263708plf.379.2023.09.18.03.15.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Sep 2023 03:15:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Uw+Gk7Yv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 8895C81F1DB0; Mon, 18 Sep 2023 00:34:49 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240236AbjIRHeH (ORCPT <rfc822;kernel.ruili@gmail.com> + 27 others); Mon, 18 Sep 2023 03:34:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50480 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240478AbjIRHdt (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 18 Sep 2023 03:33:49 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EEB7E12E for <linux-kernel@vger.kernel.org>; Mon, 18 Sep 2023 00:33:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695022403; x=1726558403; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=LismrfeNewiy95Gm2WxPWMqPvAHx57HKqPCYg8Zeheg=; b=Uw+Gk7Yvfbq4lyqhnMIVuBePmXhbAg81D2JmlTIAcJ6USPK0jleI0GXa aNygnApu+/mMmf6GnnwiNG0HobPgMqMDS0yC3ku91s/bmH4hfURUpqlLz 5Z1HgXzCenEbD5SOuTZoWvVKPCjYu626TjVDHsfRd9C16GJpqKLXpmnd2 t5fvKapr8g+73wW+2Pt808J2Vncug2hMYOIHdR0UiWHtveRVt5iV8GqGJ rbbpae5/AjsvJs/+ZnHLyOkTAlDARySQd+BEbEtCNYJ+qpuWIvOouxMJs MqpDfvSj3cGylVd5MKArOSlzlJjPJWV2jY09h8x+q8Ua41t83eIw0oqHL w==; X-IronPort-AV: E=McAfee;i="6600,9927,10836"; a="443664568" X-IronPort-AV: E=Sophos;i="6.02,156,1688454000"; d="scan'208";a="443664568" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Sep 2023 00:33:22 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10836"; a="775030619" X-IronPort-AV: E=Sophos;i="6.02,156,1688454000"; d="scan'208";a="775030619" Received: from fyin-dev.sh.intel.com ([10.239.159.24]) by orsmga008.jf.intel.com with ESMTP; 18 Sep 2023 00:33:19 -0700 From: Yin Fengwei <fengwei.yin@intel.com> To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, yuzhao@google.com, willy@infradead.org, hughd@google.com, yosryahmed@google.com, ryan.roberts@arm.com, david@redhat.com, shy828301@gmail.com Cc: fengwei.yin@intel.com Subject: [PATCH v3 0/3] support large folio for mlock Date: Mon, 18 Sep 2023 15:33:15 +0800 Message-Id: <20230918073318.1181104-1-fengwei.yin@intel.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Mon, 18 Sep 2023 00:34:49 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777370021812223390 X-GMAIL-MSGID: 1777370021812223390 |
Series | support large folio for mlock | |
Message
Yin Fengwei
Sept. 18, 2023, 7:33 a.m. UTC
Yu mentioned at [1] about the mlock() can't be applied to large folio. I leant the related code and here is my understanding: - For RLIMIT_MEMLOCK related, there is no problem. Because the RLIMIT_MEMLOCK statistics is not related underneath page. That means underneath page mlock or munlock doesn't impact the RLIMIT_MEMLOCK statistics collection which is always correct. - For keeping the page in RAM, there is no problem either. At least, during try_to_unmap_one(), once detect the VMA has VM_LOCKED bit set in vm_flags, the folio will be kept whatever the folio is mlocked or not. So the function of mlock for large folio works. But it's not optimized because the page reclaim needs scan these large folio and may split them. This series identified the large folio for mlock to four types: - The large folio is in VM_LOCKED range and fully mapped to the range - The large folio is in the VM_LOCKED range but not fully mapped to the range - The large folio cross VM_LOCKED VMA boundary - The large folio cross last level page table boundary For the first type, we mlock large folio so page reclaim will skip it. For the second/third type, we don't mlock large folio. As the pages not mapped to VM_LOACKED range are mapped to none VM_LOCKED range, if system is in memory pressure situation, the large folio can be picked by page reclaim and split. Then the pages not mapped to VM_LOCKED range can be reclaimed. For the fourth type, we don't mlock large folio because locking one page table lock can't prevent the part in another last level page table being unmapped. Thanks to Ryan for pointing this out. To check whether the folio is fully mapped to the range, PTEs needs be checked to see whether the page of folio is associated. Which needs take page table lock and is heavy operation. So far, the only place needs this check is madvise and page reclaim. These functions already have their own PTE iterator. patch1 introduce API to check whether large folio is in VMA range. patch2 make page reclaim/mlock_vma_folio/munlock_vma_folio support large folio mlock/munlock. patch3 make mlock/munlock syscall support large folio. testing done: - kernel selftest. No extra failure introduced v2 was post here [2]. Yu also mentioned a race which can make folio unevictable after munlock during RFC v2 discussion [3]: We decided that race issue didn't block this series based on: - That race issue was not introduced by this series - We had a looks-ok fix for that race issue. Need to wait for mlock_count fixing patch as Yosry Ahmed suggested [4] ChangeLog from V2: - Rebase to latest mm-unstable branch - Add comment to function folio_within_range() per Ryan's suggestions - Change function name folio_in_range() to folio_within_range() per Ryan's suggestions - No real code change ChangeLog from V1: - Remove the PTE check from folio_in_range() and reuse the page table iterator (in madvise and folio_referenced_one) to check whether fully mapped or not in callers - Avoid mlock the folio which cross last level page table. Thanks to Ryan for pointing this out. - Drop pte_none() check when iterate page table because we only care pte_present() case. - move folio_test_large() out of m(un)lock_vma_folio() ChangeLog from RFC v2: - Removed RFC - dropped folio_is_large() check as suggested by both Yu and Huge - Besides the address/pgoff check, also check the page table entry when check whether the folio is in the range. This is to handle mremap case that address/pgoff is in range, but folio can't be identified as in range. - Fixed one issue in page_add_anon_rmap() and page_add_anon_rmap() introdued by RFC v2. As these two functions can be called multiple times against one folio. And remove_rmap() may not be called same times. Which can bring imbalanced mlock_count. Fix it by skip mlock large folio in these two functions. [1] https://lore.kernel.org/linux-mm/CAOUHufbtNPkdktjt_5qM45GegVO-rCFOMkSh0HQminQ12zsV8Q@mail.gmail.com/ [2] https://lore.kernel.org/linux-mm/20230809061105.3369958-1-fengwei.yin@intel.com/ [3] https://lore.kernel.org/linux-mm/CAOUHufZ6=9P_=CAOQyw0xw-3q707q-1FVV09dBNDC-hpcpj2Pg@mail.gmail.com/ [4] https://lore.kernel.org/linux-mm/CAJD7tkZJFG=7xs=9otc5CKs6odWu48daUuZP9Wd9Z-sZF07hXg@mail.gmail.com/ Yin Fengwei (3): mm: add functions folio_in_range() and folio_within_vma() mm: handle large folio when large folio in VM_LOCKED VMA range mm: mlock: update mlock_pte_range to handle large folio mm/internal.h | 73 ++++++++++++++++++++++++++++++++++++++++++++------- mm/mlock.c | 66 ++++++++++++++++++++++++++++++++++++++++++++-- mm/rmap.c | 66 ++++++++++++++++++++++++++++++++++++++-------- 3 files changed, 182 insertions(+), 23 deletions(-)