Message ID | 20230306230004.1387007-1-jthoughton@google.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp2133183wrd; Mon, 6 Mar 2023 15:33:59 -0800 (PST) X-Google-Smtp-Source: AK7set8nrL/l1cgSCV+iJ4nzbeZc56w9G5vfksOS3fTklUYMklkkE7E4phIYTVjYG3T14+MOzOtV X-Received: by 2002:a05:6a20:258c:b0:cc:ec48:1138 with SMTP id k12-20020a056a20258c00b000ccec481138mr13393354pzd.21.1678145639006; Mon, 06 Mar 2023 15:33:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678145638; cv=none; d=google.com; s=arc-20160816; b=mh+5rzkyt75enWjGFgcBw257xfoTt/O99kaJt51YX0tScXbsffYU3YYng2m3Pc2YEg tCeCgyccsVZq7/FHWcjur4hIToBR6ChRgIgwhYcRkk1fEFeI1giKLPHhYasZZuE021jR nMCYvJGyOnL+aJOcTjo0vyiQoqu4XeB5VToZUyPyPbX25y6KQXbPfXaVkd/kQwnpOELN 2902nTfnVuXmV+KcLcVUFTBG56mM/ZSRSLCjWAtlfBD4vx3ue3dQJRY9Wa3M1VnMVKRt takOwwJUgaNaddX/FmQF/Co97tqwyd31q/MhkuuJRbogvyaIBhPoK9HVn9F9lQZ9bZvq BqMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:mime-version:date :dkim-signature; bh=Eg6WHRqrr+nDg1zkB4Sv+OHD2/2VAlH0scI+mVHlbQk=; b=0vLOdMtta1Tv5i8SsU8sAt4Ai3nyCZJ5oIgbLBhWtEHWSPdQayurIi4jUvN6lpYvD5 JINgSLRtxhDmaq13tXT5YMIm9C0HQ7LN8dXeQ3I7zW8SPp3gNRWpdTNPLPSH+aB2JzCn 5DHF3Z68YKlLgp27ovxHZlwLlcaxa36l6Un1BVk4GIfvzdvXV9JpujDmt57x/MBGIBuk M4mz2ORC1B02zsuqT607CRaxS1fPyHVfTYOQyKncWUI+vtubd8DU1xoSo/4APos86Bz/ rbJEDQcFGMaxMD1NAxU8yN4WEqQYB915KXRmlJMILy+jx/kwQeWDr24obcB3J0pP0Rka YzSA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=qDKEgow2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x131-20020a633189000000b00502d74a68bdsi10682394pgx.701.2023.03.06.15.33.45; Mon, 06 Mar 2023 15:33:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=qDKEgow2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229958AbjCFXAV (ORCPT <rfc822;toshivichauhan@gmail.com> + 99 others); Mon, 6 Mar 2023 18:00:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42966 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229966AbjCFXAS (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 6 Mar 2023 18:00:18 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C3E6251F9C for <linux-kernel@vger.kernel.org>; Mon, 6 Mar 2023 15:00:11 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-536cb268ab8so117854717b3.17 for <linux-kernel@vger.kernel.org>; Mon, 06 Mar 2023 15:00:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678143611; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=Eg6WHRqrr+nDg1zkB4Sv+OHD2/2VAlH0scI+mVHlbQk=; b=qDKEgow2LiRqBuoiJsPnDGC7d/4OEen4xB133Txa71+jK0A1BUaXRMNz3BMnfWJoC7 lY88/3+TKc+8bvgw0IDzEEyutgPSxKBxgt5LS3+wfIt91YWXnBIEU3G2sfJnA5gXSEAh 20+xTuhnHDMOVcplgvLifuRmNEQzsrPTdTzXtyIi9XrhpnGMuoy6Ke8cIXBCPuupdCvf ZtT5e7LGoXJQ0DrGdu0IOrKhzoibJ/C4w1hlJjxChq/faRLkVdzOTPb9aYm7B1rW61cs 6/h7JnENcrotwAh90gB9tKFvZgpZWxmIi+JDAFrLNWREQkdMei++CA8pg5ANE1gROqhA V3PA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678143611; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Eg6WHRqrr+nDg1zkB4Sv+OHD2/2VAlH0scI+mVHlbQk=; b=08+qnOIfyvkTbSAnRSJ/3i8wLdVkPNsAnRii7kJZJUC7TSMWMOQcwOAbKd19/AI9IR kCjkhTP8k9FPGUE+AdY22qHZFnp3liN4EAV4YkWnUvm40KrdObQ7niKrTFwU6RZ0xJON c+xSwsauo1ZExint6XFoTLhzRmhVqgiMTa0qCXYtQwoUTUoY2u52VqynPMeAOMcwG1fU gP5wGVCby3uAWsosTHpeIMcPm8WJ/fuWI8BG9Q7fJWhyoarY01W7b+R4r0b0gX1NWpLW oEexH6TgK7U6foAKUBV2yq9DwTFmIPaB56IxZ05yxlDUFtByloKCl+31d6CQe2bW3gpQ qvlg== X-Gm-Message-State: AO0yUKVO8USQls40JQ8TcqqSfGR6Q1IhOhNd+RhVoe8fTP/H91IuEkGU E4Kr40d2joLZ5lskh1xK4n4OIKLTp3a7WJSs X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:10e:b0:98e:6280:74ca with SMTP id o14-20020a056902010e00b0098e628074camr5337509ybh.1.1678143611042; Mon, 06 Mar 2023 15:00:11 -0800 (PST) Date: Mon, 6 Mar 2023 23:00:02 +0000 Mime-Version: 1.0 X-Mailer: git-send-email 2.40.0.rc0.216.gc4246ad0f0-goog Message-ID: <20230306230004.1387007-1-jthoughton@google.com> Subject: [PATCH 0/2] mm: rmap: merge HugeTLB mapcount logic with THPs From: James Houghton <jthoughton@google.com> To: Mike Kravetz <mike.kravetz@oracle.com>, Hugh Dickins <hughd@google.com>, Muchun Song <songmuchun@bytedance.com>, Peter Xu <peterx@redhat.com>, "Matthew Wilcox (Oracle)" <willy@infradead.org>, Andrew Morton <akpm@linux-foundation.org> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>, David Hildenbrand <david@redhat.com>, David Rientjes <rientjes@google.com>, Axel Rasmussen <axelrasmussen@google.com>, Jiaqi Yan <jiaqiyan@google.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton <jthoughton@google.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759663241766452599?= X-GMAIL-MSGID: =?utf-8?q?1759663241766452599?= |
Series |
mm: rmap: merge HugeTLB mapcount logic with THPs
|
|
Message
James Houghton
March 6, 2023, 11 p.m. UTC
HugeTLB pages may soon support being mapped with PTEs. To allow for this case, merge HugeTLB's mapcount scheme with THP's. The first patch of this series comes from the HugeTLB high-granularity mapping series[1], though with some updates, as the original version was buggy[2] and incomplete. I am sending this change as part of this smaller series in hopes that it can be more thoroughly scrutinized. I haven't run any THP performance tests with this series applied. HugeTLB pages don't currently support being mapped with `compound=false`, but this mapcount scheme will make collapsing compound=false mappings in HugeTLB pages quite slow. This can be optimized with future patches (likely by taking advantage of HugeTLB's alignment guarantees). Matthew Wilcox is working on a mapcounting scheme[3] that will avoid the use of each subpage's mapcount. If this series is applied, Matthew's new scheme will automatically apply to HugeTLB pages. [1]: https://lore.kernel.org/linux-mm/20230218002819.1486479-6-jthoughton@google.com/ [2]: https://lore.kernel.org/linux-mm/CACw3F538H+bYcvSY-qG4-gmrgGPRBgTScDzrX9suLyp_q+v_bQ@mail.gmail.com/ [3]: https://lore.kernel.org/linux-mm/Y9Afwds%2FJl39UjEp@casper.infradead.org/ James Houghton (2): mm: rmap: make hugetlb pages participate in _nr_pages_mapped mm: rmap: increase COMPOUND_MAPPED to support 512G HugeTLB pages include/linux/mm.h | 7 +------ mm/hugetlb.c | 4 ++-- mm/internal.h | 9 ++++----- mm/migrate.c | 2 +- mm/rmap.c | 35 ++++++++++++++++++++--------------- 5 files changed, 28 insertions(+), 29 deletions(-) base-commit: 9caa15b8a49949342bdf495bd47660267a3bd371
Comments
On Mon, Mar 06, 2023 at 11:00:02PM +0000, James Houghton wrote: > HugeTLB pages may soon support being mapped with PTEs. To allow for this > case, merge HugeTLB's mapcount scheme with THP's. > > The first patch of this series comes from the HugeTLB high-granularity > mapping series[1], though with some updates, as the original version > was buggy[2] and incomplete. > > I am sending this change as part of this smaller series in hopes that it > can be more thoroughly scrutinized. > > I haven't run any THP performance tests with this series applied. > HugeTLB pages don't currently support being mapped with > `compound=false`, but this mapcount scheme will make collapsing > compound=false mappings in HugeTLB pages quite slow. This can be > optimized with future patches (likely by taking advantage of HugeTLB's > alignment guarantees). > > Matthew Wilcox is working on a mapcounting scheme[3] that will avoid > the use of each subpage's mapcount. If this series is applied, Matthew's > new scheme will automatically apply to HugeTLB pages. Is this the plan? I may have not followed closely on the latest development of Matthew's idea. The thing is if the design requires ptes being installed / removed at the same time for the whole folio, then it may not work directly for HGM if HGM wants to support at least postcopy, iiuc, because if we install the whole folio ptes at the same time it seems to beat the whole purpose of having HGM.. The patch (especially patch 1) looks good. So it's a pure question just to make sure we're on the same page. IIUC your other mapcount proposal may work, but it still needs to be able to take care of ptes in less-than-folio sizes whatever it'll look like at last. A trivial comment on patch 2 since we're at it: does "a future plan on some arch to support 512GB huge page" justify itself? It would be better justified, IMHO, when that support is added (and decided to use HGM)? What I feel like is missing (rather than patch 2 itself) is some guard to make sure thp mapcountings will not be abused with new hugetlb sizes coming. How about another BUG_ON() squashed into patch 1 (probably somewhere in page_add_file|anon_rmap()) to make sure folio_size() is always smaller than COMPOUND_MAPPED / 2)?
On Wed, Mar 8, 2023 at 2:10 PM Peter Xu <peterx@redhat.com> wrote: > > On Mon, Mar 06, 2023 at 11:00:02PM +0000, James Houghton wrote: > > HugeTLB pages may soon support being mapped with PTEs. To allow for this > > case, merge HugeTLB's mapcount scheme with THP's. > > > > The first patch of this series comes from the HugeTLB high-granularity > > mapping series[1], though with some updates, as the original version > > was buggy[2] and incomplete. > > > > I am sending this change as part of this smaller series in hopes that it > > can be more thoroughly scrutinized. > > > > I haven't run any THP performance tests with this series applied. > > HugeTLB pages don't currently support being mapped with > > `compound=false`, but this mapcount scheme will make collapsing > > compound=false mappings in HugeTLB pages quite slow. This can be > > optimized with future patches (likely by taking advantage of HugeTLB's > > alignment guarantees). > > > > Matthew Wilcox is working on a mapcounting scheme[3] that will avoid > > the use of each subpage's mapcount. If this series is applied, Matthew's > > new scheme will automatically apply to HugeTLB pages. > > Is this the plan? > > I may have not followed closely on the latest development of Matthew's > idea. The thing is if the design requires ptes being installed / removed > at the same time for the whole folio, then it may not work directly for HGM > if HGM wants to support at least postcopy, iiuc, because if we install the > whole folio ptes at the same time it seems to beat the whole purpose of > having HGM.. My understanding is that it doesn't *require* all the PTEs in a folio to be mapped at the same time. I don't see how it possibly could, given that UFFDIO_CONTINUE exists (which can already create PTE-mapped THPs today). It would be faster to populate all the PTEs at the same time (you would only need to traverse the page table once for the entire group to see if you should be incrementing mapcount). Though, with respect to unmapping, if PTEs aren't all unmapped at the same time, then you could end up with a case where mapcount is still incremented but nothing is really mapped. I'm not really sure what should be done there, but this problem applies to PTE-mapped THPs the same way that it applies to HGMed HugeTLB pages. > The patch (especially patch 1) looks good. So it's a pure question just to > make sure we're on the same page. IIUC your other mapcount proposal may > work, but it still needs to be able to take care of ptes in less-than-folio > sizes whatever it'll look like at last. By my "other mapcount proposal", I assume you mean the "using the PAGE_SPECIAL bit to track if mapcount has been incremented or not". It really only serves as an optimization for Matthew's scheme (see below [2] for some more thoughts), and it doesn't have to only apply to HugeTLB. I originally thought[1] that Matthew's scheme would be really painful for postcopy for HGM without this optimization, but it's actually not so bad. Let's assume the worst case, that we're UFFDIO_CONTINUEing from the end to the beginning, like in [1]: First CONTINUE: pvmw finds an empty PUD, so quickly returns false. Second CONTINUE: pvmw finds 511 empty PMDs, then finds 511 empty PTEs, then finds a present PTE (from the first CONTINUE). Third CONTINUE: pvmw finds 511 empty PMDs, then finds 510 empty PTEs. ... 514th CONTINUE: pvmw finds 510 empty PMDs, then finds 511 empty PTEs. So it'll be slow, but it won't have to check 262k empty PTEs per CONTINUE (though you could make this possible with MADV_DONTNEED). Even with an HGM implementation that only allows PTE-mapping of HugeTLB pages, it should still behave just like this, too. > A trivial comment on patch 2 since we're at it: does "a future plan on some > arch to support 512GB huge page" justify itself? It would be better > justified, IMHO, when that support is added (and decided to use HGM)? That's fine with me. I'm happy to drop that patch. > What I feel like is missing (rather than patch 2 itself) is some guard to > make sure thp mapcountings will not be abused with new hugetlb sizes > coming. > > How about another BUG_ON() squashed into patch 1 (probably somewhere in > page_add_file|anon_rmap()) to make sure folio_size() is always smaller than > COMPOUND_MAPPED / 2)? Sure, I can add that. Thanks, Peter! - James [1]: https://lore.kernel.org/linux-mm/CADrL8HUrEgt+1qAtEsOHuQeA+WWnggGfLj8_nqHF0k-pqPi52w@mail.gmail.com/ [2]: Some details on what the optimization might look like: So an excerpt of Matthew's scheme would look something like this: /* if we're mapping < folio_nr_pages(folio) worth of PTEs. */ if (!folio_has_ptes(folio, vma)) atomic_inc(folio->_mapcount); where folio_has_ptes() is defined like: if (!page_vma_mapped_walk(...)) return false; page_vma_mapped_walk_done(...); return true; You might be able to optimize folio_has_ptes() with a block like this at the beginning: if (folio_is_naturally_aligned(folio, vma)) { /* optimization for naturally-aligned folios. */ if (folio_test_hugetlb(folio)) { /* check hstate-level PTE, and do a similar check as below. */ } /* for naturally-aligned THPs: */ pmdp = mm_find_pmd(...); /* or just pass it in. */ pmd = READ_ONCE(*pmdp); BUG_ON(!pmd_present(pmd) || pmd_leaf(pmd)); if (pmd_special(pmd)) return true; /* we already hold the PTL for the PTE. */ ptl = pmd_lock(mm, pmdp); /* test and set pmd_special */ pmd_unlock(ptl) return if_we_set_pmd_special; } (pmd_special() doesn't currently exist.) If HugeTLB walking code can be merged with generic mm, then HugeTLB wouldn't have a special case at all here.
On Thu, Mar 09, 2023 at 10:05:12AM -0800, James Houghton wrote: > On Wed, Mar 8, 2023 at 2:10 PM Peter Xu <peterx@redhat.com> wrote: > > > > On Mon, Mar 06, 2023 at 11:00:02PM +0000, James Houghton wrote: > > > HugeTLB pages may soon support being mapped with PTEs. To allow for this > > > case, merge HugeTLB's mapcount scheme with THP's. > > > > > > The first patch of this series comes from the HugeTLB high-granularity > > > mapping series[1], though with some updates, as the original version > > > was buggy[2] and incomplete. > > > > > > I am sending this change as part of this smaller series in hopes that it > > > can be more thoroughly scrutinized. > > > > > > I haven't run any THP performance tests with this series applied. > > > HugeTLB pages don't currently support being mapped with > > > `compound=false`, but this mapcount scheme will make collapsing > > > compound=false mappings in HugeTLB pages quite slow. This can be > > > optimized with future patches (likely by taking advantage of HugeTLB's > > > alignment guarantees). > > > > > > Matthew Wilcox is working on a mapcounting scheme[3] that will avoid > > > the use of each subpage's mapcount. If this series is applied, Matthew's > > > new scheme will automatically apply to HugeTLB pages. > > > > Is this the plan? > > > > I may have not followed closely on the latest development of Matthew's > > idea. The thing is if the design requires ptes being installed / removed > > at the same time for the whole folio, then it may not work directly for HGM > > if HGM wants to support at least postcopy, iiuc, because if we install the > > whole folio ptes at the same time it seems to beat the whole purpose of > > having HGM.. > > My understanding is that it doesn't *require* all the PTEs in a folio > to be mapped at the same time. I don't see how it possibly could, > given that UFFDIO_CONTINUE exists (which can already create PTE-mapped > THPs today). It would be faster to populate all the PTEs at the same > time (you would only need to traverse the page table once for the > entire group to see if you should be incrementing mapcount). > > Though, with respect to unmapping, if PTEs aren't all unmapped at the > same time, then you could end up with a case where mapcount is still > incremented but nothing is really mapped. I'm not really sure what > should be done there, but this problem applies to PTE-mapped THPs the > same way that it applies to HGMed HugeTLB pages. > > > The patch (especially patch 1) looks good. So it's a pure question just to > > make sure we're on the same page. IIUC your other mapcount proposal may > > work, but it still needs to be able to take care of ptes in less-than-folio > > sizes whatever it'll look like at last. > > By my "other mapcount proposal", I assume you mean the "using the > PAGE_SPECIAL bit to track if mapcount has been incremented or not". It > really only serves as an optimization for Matthew's scheme (see below > [2] for some more thoughts), and it doesn't have to only apply to > HugeTLB. > > I originally thought[1] that Matthew's scheme would be really painful > for postcopy for HGM without this optimization, but it's actually not > so bad. Let's assume the worst case, that we're UFFDIO_CONTINUEing > from the end to the beginning, like in [1]: > > First CONTINUE: pvmw finds an empty PUD, so quickly returns false. > Second CONTINUE: pvmw finds 511 empty PMDs, then finds 511 empty PTEs, > then finds a present PTE (from the first CONTINUE). > Third CONTINUE: pvmw finds 511 empty PMDs, then finds 510 empty PTEs. > ... > 514th CONTINUE: pvmw finds 510 empty PMDs, then finds 511 empty PTEs. > > So it'll be slow, but it won't have to check 262k empty PTEs per > CONTINUE (though you could make this possible with MADV_DONTNEED). > Even with an HGM implementation that only allows PTE-mapping of > HugeTLB pages, it should still behave just like this, too. > > > A trivial comment on patch 2 since we're at it: does "a future plan on some > > arch to support 512GB huge page" justify itself? It would be better > > justified, IMHO, when that support is added (and decided to use HGM)? > > That's fine with me. I'm happy to drop that patch. > > > What I feel like is missing (rather than patch 2 itself) is some guard to > > make sure thp mapcountings will not be abused with new hugetlb sizes > > coming. > > > > How about another BUG_ON() squashed into patch 1 (probably somewhere in > > page_add_file|anon_rmap()) to make sure folio_size() is always smaller than > > COMPOUND_MAPPED / 2)? > > Sure, I can add that. > > Thanks, Peter! > > - James > > [1]: https://lore.kernel.org/linux-mm/CADrL8HUrEgt+1qAtEsOHuQeA+WWnggGfLj8_nqHF0k-pqPi52w@mail.gmail.com/ > > [2]: Some details on what the optimization might look like: > > So an excerpt of Matthew's scheme would look something like this: > > /* if we're mapping < folio_nr_pages(folio) worth of PTEs. */ > if (!folio_has_ptes(folio, vma)) > atomic_inc(folio->_mapcount); > > where folio_has_ptes() is defined like: > > if (!page_vma_mapped_walk(...)) > return false; > page_vma_mapped_walk_done(...); > return true; > > You might be able to optimize folio_has_ptes() with a block like this > at the beginning: > > if (folio_is_naturally_aligned(folio, vma)) { > /* optimization for naturally-aligned folios. */ > if (folio_test_hugetlb(folio)) { > /* check hstate-level PTE, and do a similar check as below. */ > } > /* for naturally-aligned THPs: */ > pmdp = mm_find_pmd(...); /* or just pass it in. */ > pmd = READ_ONCE(*pmdp); > BUG_ON(!pmd_present(pmd) || pmd_leaf(pmd)); > if (pmd_special(pmd)) > return true; > /* we already hold the PTL for the PTE. */ > ptl = pmd_lock(mm, pmdp); > /* test and set pmd_special */ > pmd_unlock(ptl) > return if_we_set_pmd_special; > } > > (pmd_special() doesn't currently exist.) If HugeTLB walking code can > be merged with generic mm, then HugeTLB wouldn't have a special case > at all here. I see what you mean now, thanks. That looks fine. I just suspect the pte_special trick will still be needed if this will start to apply to HGM, as it seems to not suite perfectly with a large folio size, still. The MADV_DONTNEED worst case of having it loop over ~folio_size() times of none pte is still possible.