Message ID | 20221021163703.3218176-34-jthoughton@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp796887wrr; Fri, 21 Oct 2022 09:41:24 -0700 (PDT) X-Google-Smtp-Source: AMsMyM58SCRVwPAhy4Tx3sIvPRaWLx1gtAWI+g1lgB0s1LR44bkuWOziWQ7UrqlrpnZaimtnNNP6 X-Received: by 2002:a17:90b:2243:b0:20b:42a:4c0d with SMTP id hk3-20020a17090b224300b0020b042a4c0dmr56988155pjb.123.1666370484595; Fri, 21 Oct 2022 09:41:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370484; cv=none; d=google.com; s=arc-20160816; b=eNmdfWo3/uyaDcCdlT5Xy++09cdL48nEnFMIha5Uq5t30ulriDwjEDA8RtrX0n0h12 LBxPx0dbgCp1s+J4KJBosFODO0b2Wgebsu/iM7OCJkcKhugdIjT4fV59GCZSazGfK9HL flE+S1Uv/x6fRcEg1Vi6lPkyVwhebPcl11X3oKkUJhU5Bk8Vuo2AHdiRPtjQrZOSAoQN HMuIeRr6R1UxNPsK1A4tw5FnKU0mC5jmGx/yWVYeJlr4CiM5GFQIfoqQNEHBb3leN/E3 VpCOA0GWAifsmjOPY8YgcHHfE3Qw0UGTvW6nqh7OuXD30leY6QnuybO+0khACkbxJNAr GCPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=+C7/ATuUriMIrUw34NjzRIATqKexU0rnzCYq9OSaiU4=; b=lMrBVYaDG6WOthlJyxxYNWNHTf6x0sI6xiO46XQEwuJf6NMO5dbbsUatRGZqhZ+W8l 4G2AGplsfqHrvg/KcAaXv/gQA45Yun3+3hSll7h84ccba9ipp2P/TvaEkqwkRNzGEx0I +QvF6yeDnjfoNI35Bip7GbBHTvWUi8pwDQ1ic9cLxWhug1M59dTSnDPlp7Tkw+U5DOKm ulWJf3YubvPFUxJSv+l3nfxJTp4DI7ktRcEb0DyngHnAKW+Oa/44U0qW/ZOMLulNUp/G yZ6O7nWy++wn4uOevwgWd0CfHhjLGYNGXLkJGa/K8uxHLHf87TO/Xm79PpnWoxQ6jcTL 18Kw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=jJGO9Z7Z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k137-20020a633d8f000000b0045d60c88ad2si26186617pga.164.2022.10.21.09.41.11; Fri, 21 Oct 2022 09:41:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=jJGO9Z7Z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231442AbiJUQks (ORCPT <rfc822;mntrajkot1@gmail.com> + 99 others); Fri, 21 Oct 2022 12:40:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54372 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231181AbiJUQjH (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 21 Oct 2022 12:39:07 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C02128B187 for <linux-kernel@vger.kernel.org>; Fri, 21 Oct 2022 09:37:48 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id n6-20020a5b0486000000b006aff8dc9865so3737759ybp.11 for <linux-kernel@vger.kernel.org>; Fri, 21 Oct 2022 09:37:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=+C7/ATuUriMIrUw34NjzRIATqKexU0rnzCYq9OSaiU4=; b=jJGO9Z7ZWa3ry6jjj7ApkjoGMgyzlebjLJ9uIX3/IGcrm79tNXbnUjUmPWEAbUWcEd 55DuEMpVP9519fDyAuDBtSlqfbKbDKkyB8ijR8pErjYk3LLzCP/D/GV/wZwetPTE7Ojr wpMDqo/jZbCPsmNcIrkWDF+SByqBvmoi+4RjlGJDBYF3ZdmqeeL+CbWAQiFjQNnor9LY 8qVv18b9TvK+Uf6pPv8vBqeHpFR9pTHLq1xNH9bPPuK8OH77Q8JvguOlKR7a9IFK3m+x IXO0PqqgFybb/i1Kn8jmukXUloBrvY5XjEr+pUxja4P0YbG1JGN8wCpY7Bj9c2UzEM8m BOcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=+C7/ATuUriMIrUw34NjzRIATqKexU0rnzCYq9OSaiU4=; b=uiTNWqPQYKpD8lltgshhmlRq3/grOOEv7X7NyJI7d7ZNEcgirwSv7Wf0W02B/JiSaV 7xIyurw/Zmn8/ipL5nWBnWloBaorS2yMvSPB4Ll+iQVtB4eJJ1bqA61kQ53cOqQqiFx7 PtiXWyM1Gon5ApsTyfDqspIOgUA0gYL6yujJ1av05UiujqsNRLaO8PX61qTcfTnxPuMk QtBm5xD09N0GFe5odvKlhHknDWI/URB68027mH7pwsk4OJj4pl6fRhBbUcJ7rNDdmkyX Et8UL83ajadWqpW3HWGQlCItvfRDq0oxsCQ7O/MtaIL7P5t31EWQvUCn5IyNmUugTF3w 4VAA== X-Gm-Message-State: ACrzQf2Ilt5RYE2lUdITKWHGVgVEMrtSn3GNddOsH8FhSPNEl57rlY0F EVmHzwZ95fcnr66eP6WWy2gOcA5pC3liLsev X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:b851:0:b0:6ca:2b0b:d334 with SMTP id b17-20020a25b851000000b006ca2b0bd334mr9202954ybm.104.1666370267430; Fri, 21 Oct 2022 09:37:47 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:49 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-34-jthoughton@google.com> Subject: [RFC PATCH v2 33/47] userfaultfd: add UFFD_FEATURE_MINOR_HUGETLBFS_HGM From: James Houghton <jthoughton@google.com> To: Mike Kravetz <mike.kravetz@oracle.com>, Muchun Song <songmuchun@bytedance.com>, Peter Xu <peterx@redhat.com> Cc: David Hildenbrand <david@redhat.com>, David Rientjes <rientjes@google.com>, Axel Rasmussen <axelrasmussen@google.com>, Mina Almasry <almasrymina@google.com>, "Zach O'Keefe" <zokeefe@google.com>, Manish Mishra <manish.mishra@nutanix.com>, Naoya Horiguchi <naoya.horiguchi@nec.com>, "Dr . David Alan Gilbert" <dgilbert@redhat.com>, "Matthew Wilcox (Oracle)" <willy@infradead.org>, Vlastimil Babka <vbabka@suse.cz>, Baolin Wang <baolin.wang@linux.alibaba.com>, Miaohe Lin <linmiaohe@huawei.com>, Yang Shi <shy828301@gmail.com>, Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton <jthoughton@google.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747316096887403462?= X-GMAIL-MSGID: =?utf-8?q?1747316096887403462?= |
Series |
hugetlb: introduce HugeTLB high-granularity mapping
|
|
Commit Message
James Houghton
Oct. 21, 2022, 4:36 p.m. UTC
Userspace must provide this new feature when it calls UFFDIO_API to
enable HGM. Userspace can check if the feature exists in
uffdio_api.features, and if it does not exist, the kernel does not
support and therefore did not enable HGM.
Signed-off-by: James Houghton <jthoughton@google.com>
---
fs/userfaultfd.c | 12 +++++++++++-
include/linux/userfaultfd_k.h | 7 +++++++
include/uapi/linux/userfaultfd.h | 2 ++
3 files changed, 20 insertions(+), 1 deletion(-)
Comments
On Fri, Oct 21, 2022 at 04:36:49PM +0000, James Houghton wrote: > Userspace must provide this new feature when it calls UFFDIO_API to > enable HGM. Userspace can check if the feature exists in > uffdio_api.features, and if it does not exist, the kernel does not > support and therefore did not enable HGM. > > Signed-off-by: James Houghton <jthoughton@google.com> It's still slightly a pity that this can only be enabled by an uffd context plus a minor fault, so generic hugetlb users cannot directly leverage this. The patch itself looks good.
On Wed, Nov 16, 2022 at 2:28 PM Peter Xu <peterx@redhat.com> wrote: > > On Fri, Oct 21, 2022 at 04:36:49PM +0000, James Houghton wrote: > > Userspace must provide this new feature when it calls UFFDIO_API to > > enable HGM. Userspace can check if the feature exists in > > uffdio_api.features, and if it does not exist, the kernel does not > > support and therefore did not enable HGM. > > > > Signed-off-by: James Houghton <jthoughton@google.com> > > It's still slightly a pity that this can only be enabled by an uffd context > plus a minor fault, so generic hugetlb users cannot directly leverage this. The idea here is that, for applications that can conceivably benefit from HGM, we have a mechanism for enabling it for that application. So this patch creates that mechanism for userfaultfd/UFFDIO_CONTINUE. I prefer this approach over something more general like MADV_ENABLE_HGM or something. For hwpoison, HGM will be automatically enabled, but that isn't implemented in this series. We could also extend MADV_DONTNEED to do high-granularity unmapping in some way, but that also isn't attempted here. I'm sure that if there are other cases where HGM may be useful, we can add/change some uapi to make it possible to take advantage HGM. - James > > The patch itself looks good. > > -- > Peter Xu >
James, On Wed, Nov 16, 2022 at 03:30:00PM -0800, James Houghton wrote: > On Wed, Nov 16, 2022 at 2:28 PM Peter Xu <peterx@redhat.com> wrote: > > > > On Fri, Oct 21, 2022 at 04:36:49PM +0000, James Houghton wrote: > > > Userspace must provide this new feature when it calls UFFDIO_API to > > > enable HGM. Userspace can check if the feature exists in > > > uffdio_api.features, and if it does not exist, the kernel does not > > > support and therefore did not enable HGM. > > > > > > Signed-off-by: James Houghton <jthoughton@google.com> > > > > It's still slightly a pity that this can only be enabled by an uffd context > > plus a minor fault, so generic hugetlb users cannot directly leverage this. > > The idea here is that, for applications that can conceivably benefit > from HGM, we have a mechanism for enabling it for that application. So > this patch creates that mechanism for userfaultfd/UFFDIO_CONTINUE. I > prefer this approach over something more general like MADV_ENABLE_HGM > or something. Sorry to get back to this very late - I know this has been discussed since the very early stage of the feature, but is there any reasoning behind? When I start to think seriously on applying this to process snapshot with uffd-wp I found that the minor mode trick won't easily play - normally that's a case where all the pages were there mapped huge, but when the app wants UFFDIO_WRITEPROTECT it may want to remap the huge pages into smaller pages, probably some size that the user can specify. It'll be non-trivial to enable HGM during that phase using MINOR mode because in that case the pages are all mapped. For the long term, I am just still worried the current interface is still not as flexible.
On Wed, Dec 21, 2022 at 2:23 PM Peter Xu <peterx@redhat.com> wrote: > > James, > > On Wed, Nov 16, 2022 at 03:30:00PM -0800, James Houghton wrote: > > On Wed, Nov 16, 2022 at 2:28 PM Peter Xu <peterx@redhat.com> wrote: > > > > > > On Fri, Oct 21, 2022 at 04:36:49PM +0000, James Houghton wrote: > > > > Userspace must provide this new feature when it calls UFFDIO_API to > > > > enable HGM. Userspace can check if the feature exists in > > > > uffdio_api.features, and if it does not exist, the kernel does not > > > > support and therefore did not enable HGM. > > > > > > > > Signed-off-by: James Houghton <jthoughton@google.com> > > > > > > It's still slightly a pity that this can only be enabled by an uffd context > > > plus a minor fault, so generic hugetlb users cannot directly leverage this. > > > > The idea here is that, for applications that can conceivably benefit > > from HGM, we have a mechanism for enabling it for that application. So > > this patch creates that mechanism for userfaultfd/UFFDIO_CONTINUE. I > > prefer this approach over something more general like MADV_ENABLE_HGM > > or something. > > Sorry to get back to this very late - I know this has been discussed since > the very early stage of the feature, but is there any reasoning behind? > > When I start to think seriously on applying this to process snapshot with > uffd-wp I found that the minor mode trick won't easily play - normally > that's a case where all the pages were there mapped huge, but when the app > wants UFFDIO_WRITEPROTECT it may want to remap the huge pages into smaller > pages, probably some size that the user can specify. It'll be non-trivial > to enable HGM during that phase using MINOR mode because in that case the > pages are all mapped. > > For the long term, I am just still worried the current interface is still > not as flexible. Thanks for bringing this up, Peter. I think the main reason was: having separate UFFD_FEATUREs clearly indicates to userspace what is and is not supported. For UFFDIO_WRITEPROTECT, a user could remap huge pages into smaller pages by issuing a high-granularity UFFDIO_WRITEPROTECT. That isn't allowed as of this patch series, but it could be allowed in the future. To add support in the same way as this series, we would add another feature, say UFFD_FEATURE_WP_HUGETLBFS_HGM. I agree that having to add another feature isn't great; is this what you're concerned about? Considering MADV_ENABLE_HUGETLB... 1. If a user provides this, then the contract becomes: "the kernel may allow UFFDIO_CONTINUE and UFFDIO_WRITEPROTECT for HugeTLB at high-granularities, provided the support exists", but it becomes unclear to userspace to know what's supported and what isn't. 2. We would then need to keep track if a user explicitly enabled it, or if it got enabled automatically in response to memory poison, for example. Not a big problem, just a complication. (Otherwise, if HGM got enabled for poison, suddenly userspace would be allowed to do things it wasn't allowed to do before.) 3. This API makes sense for enabling HGM for something outside of userfaultfd, like MADV_DONTNEED. Maybe (1) is solvable if we provide a bit field that describes what's supported, or maybe (1) isn't even a problem. Another possibility is to have a feature like UFFD_FEATURE_HUGETLB_HGM, which will enable the possibility of HGM for all relevant userfaultfd ioctls, but we have the same problem where it's unclear what's supported and what isn't. I'm happy to change the API to whatever you think makes the most sense. Thanks! - James > > -- > Peter Xu >
On 12/21/22 15:21, James Houghton wrote: > On Wed, Dec 21, 2022 at 2:23 PM Peter Xu <peterx@redhat.com> wrote: > > > > James, > > > > On Wed, Nov 16, 2022 at 03:30:00PM -0800, James Houghton wrote: > > > On Wed, Nov 16, 2022 at 2:28 PM Peter Xu <peterx@redhat.com> wrote: > > > > > > > > On Fri, Oct 21, 2022 at 04:36:49PM +0000, James Houghton wrote: > > > > > Userspace must provide this new feature when it calls UFFDIO_API to > > > > > enable HGM. Userspace can check if the feature exists in > > > > > uffdio_api.features, and if it does not exist, the kernel does not > > > > > support and therefore did not enable HGM. > > > > > > > > > > Signed-off-by: James Houghton <jthoughton@google.com> > > > > > > > > It's still slightly a pity that this can only be enabled by an uffd context > > > > plus a minor fault, so generic hugetlb users cannot directly leverage this. > > > > > > The idea here is that, for applications that can conceivably benefit > > > from HGM, we have a mechanism for enabling it for that application. So > > > this patch creates that mechanism for userfaultfd/UFFDIO_CONTINUE. I > > > prefer this approach over something more general like MADV_ENABLE_HGM > > > or something. > > > > Sorry to get back to this very late - I know this has been discussed since > > the very early stage of the feature, but is there any reasoning behind? > > > > When I start to think seriously on applying this to process snapshot with > > uffd-wp I found that the minor mode trick won't easily play - normally > > that's a case where all the pages were there mapped huge, but when the app > > wants UFFDIO_WRITEPROTECT it may want to remap the huge pages into smaller > > pages, probably some size that the user can specify. It'll be non-trivial > > to enable HGM during that phase using MINOR mode because in that case the > > pages are all mapped. > > > > For the long term, I am just still worried the current interface is still > > not as flexible. > > Thanks for bringing this up, Peter. I think the main reason was: > having separate UFFD_FEATUREs clearly indicates to userspace what is > and is not supported. IIRC, I think we wanted to initially limit the usage to the very specific use case (live migration). The idea is that we could then expand usage as more use cases came to light. Another good thing is that userfaultfd has versioning built into the API. Thus a user can determine if HGM is enabled in their running kernel. > For UFFDIO_WRITEPROTECT, a user could remap huge pages into smaller > pages by issuing a high-granularity UFFDIO_WRITEPROTECT. That isn't > allowed as of this patch series, but it could be allowed in the > future. To add support in the same way as this series, we would add > another feature, say UFFD_FEATURE_WP_HUGETLBFS_HGM. I agree that > having to add another feature isn't great; is this what you're > concerned about? > > Considering MADV_ENABLE_HUGETLB... > 1. If a user provides this, then the contract becomes: "the kernel may > allow UFFDIO_CONTINUE and UFFDIO_WRITEPROTECT for HugeTLB at > high-granularities, provided the support exists", but it becomes > unclear to userspace to know what's supported and what isn't. > 2. We would then need to keep track if a user explicitly enabled it, > or if it got enabled automatically in response to memory poison, for > example. Not a big problem, just a complication. (Otherwise, if HGM > got enabled for poison, suddenly userspace would be allowed to do > things it wasn't allowed to do before.) > 3. This API makes sense for enabling HGM for something outside of > userfaultfd, like MADV_DONTNEED. I think #3 is key here. Once we start applying HGM to things outside userfaultfd, then more thought will be required on APIs. The API is somewhat limited by design until the basic functionality is in place.
On Wed, Dec 21, 2022 at 01:39:39PM -0800, Mike Kravetz wrote: > On 12/21/22 15:21, James Houghton wrote: > > On Wed, Dec 21, 2022 at 2:23 PM Peter Xu <peterx@redhat.com> wrote: > > > > > > James, > > > > > > On Wed, Nov 16, 2022 at 03:30:00PM -0800, James Houghton wrote: > > > > On Wed, Nov 16, 2022 at 2:28 PM Peter Xu <peterx@redhat.com> wrote: > > > > > > > > > > On Fri, Oct 21, 2022 at 04:36:49PM +0000, James Houghton wrote: > > > > > > Userspace must provide this new feature when it calls UFFDIO_API to > > > > > > enable HGM. Userspace can check if the feature exists in > > > > > > uffdio_api.features, and if it does not exist, the kernel does not > > > > > > support and therefore did not enable HGM. > > > > > > > > > > > > Signed-off-by: James Houghton <jthoughton@google.com> > > > > > > > > > > It's still slightly a pity that this can only be enabled by an uffd context > > > > > plus a minor fault, so generic hugetlb users cannot directly leverage this. > > > > > > > > The idea here is that, for applications that can conceivably benefit > > > > from HGM, we have a mechanism for enabling it for that application. So > > > > this patch creates that mechanism for userfaultfd/UFFDIO_CONTINUE. I > > > > prefer this approach over something more general like MADV_ENABLE_HGM > > > > or something. > > > > > > Sorry to get back to this very late - I know this has been discussed since > > > the very early stage of the feature, but is there any reasoning behind? > > > > > > When I start to think seriously on applying this to process snapshot with > > > uffd-wp I found that the minor mode trick won't easily play - normally > > > that's a case where all the pages were there mapped huge, but when the app > > > wants UFFDIO_WRITEPROTECT it may want to remap the huge pages into smaller > > > pages, probably some size that the user can specify. It'll be non-trivial > > > to enable HGM during that phase using MINOR mode because in that case the > > > pages are all mapped. > > > > > > For the long term, I am just still worried the current interface is still > > > not as flexible. > > > > Thanks for bringing this up, Peter. I think the main reason was: > > having separate UFFD_FEATUREs clearly indicates to userspace what is > > and is not supported. > > IIRC, I think we wanted to initially limit the usage to the very > specific use case (live migration). The idea is that we could then > expand usage as more use cases came to light. > > Another good thing is that userfaultfd has versioning built into the > API. Thus a user can determine if HGM is enabled in their running > kernel. I don't worry much on this one, afaiu if we have any way to enable hgm then the user can just try enabling it on a test vma, just like when an app wants to detect whether a new madvise() is present on the current host OS. Besides, I'm wondering whether something like /sys/kernel/vm/hugepages/hgm would work too. > > > For UFFDIO_WRITEPROTECT, a user could remap huge pages into smaller > > pages by issuing a high-granularity UFFDIO_WRITEPROTECT. That isn't > > allowed as of this patch series, but it could be allowed in the > > future. To add support in the same way as this series, we would add > > another feature, say UFFD_FEATURE_WP_HUGETLBFS_HGM. I agree that > > having to add another feature isn't great; is this what you're > > concerned about? > > > > Considering MADV_ENABLE_HUGETLB... > > 1. If a user provides this, then the contract becomes: "the kernel may > > allow UFFDIO_CONTINUE and UFFDIO_WRITEPROTECT for HugeTLB at > > high-granularities, provided the support exists", but it becomes > > unclear to userspace to know what's supported and what isn't. > > 2. We would then need to keep track if a user explicitly enabled it, > > or if it got enabled automatically in response to memory poison, for > > example. Not a big problem, just a complication. (Otherwise, if HGM > > got enabled for poison, suddenly userspace would be allowed to do > > things it wasn't allowed to do before.) We could alternatively have two flags for each vma: (a) hgm_advised and (b) hgm_enabled. (a) always sets (b) but not vice versa. We can limit poison to set (b) only. For this patchset, it can be all about (a). > > 3. This API makes sense for enabling HGM for something outside of > > userfaultfd, like MADV_DONTNEED. > > I think #3 is key here. Once we start applying HGM to things outside > userfaultfd, then more thought will be required on APIs. The API is > somewhat limited by design until the basic functionality is in place. Mike, could you elaborate what's the major concern of having hgm used outside uffd and live migration use cases? I feel like I miss something here. I can understand we want to limit the usage only when the user specifies using hgm because we want to keep the old behavior intact. However if we want another way to enable hgm it'll still need one knob anyway even outside uffd, and I thought that'll service the same purpose, or maybe not? Thanks,
On 12/21/22 17:10, Peter Xu wrote: > On Wed, Dec 21, 2022 at 01:39:39PM -0800, Mike Kravetz wrote: > > On 12/21/22 15:21, James Houghton wrote: > > > On Wed, Dec 21, 2022 at 2:23 PM Peter Xu <peterx@redhat.com> wrote: > > > > > > > > James, > > > > > > > > On Wed, Nov 16, 2022 at 03:30:00PM -0800, James Houghton wrote: > > > > > On Wed, Nov 16, 2022 at 2:28 PM Peter Xu <peterx@redhat.com> wrote: > > > > > > > > > > > > On Fri, Oct 21, 2022 at 04:36:49PM +0000, James Houghton wrote: > > > > > > > Userspace must provide this new feature when it calls UFFDIO_API to > > > > > > > enable HGM. Userspace can check if the feature exists in > > > > > > > uffdio_api.features, and if it does not exist, the kernel does not > > > > > > > support and therefore did not enable HGM. > > > > > > > > > > > > > > Signed-off-by: James Houghton <jthoughton@google.com> > > > > > > > > > > > > It's still slightly a pity that this can only be enabled by an uffd context > > > > > > plus a minor fault, so generic hugetlb users cannot directly leverage this. > > > > > > > > > > The idea here is that, for applications that can conceivably benefit > > > > > from HGM, we have a mechanism for enabling it for that application. So > > > > > this patch creates that mechanism for userfaultfd/UFFDIO_CONTINUE. I > > > > > prefer this approach over something more general like MADV_ENABLE_HGM > > > > > or something. > > > > > > > > Sorry to get back to this very late - I know this has been discussed since > > > > the very early stage of the feature, but is there any reasoning behind? > > > > > > > > When I start to think seriously on applying this to process snapshot with > > > > uffd-wp I found that the minor mode trick won't easily play - normally > > > > that's a case where all the pages were there mapped huge, but when the app > > > > wants UFFDIO_WRITEPROTECT it may want to remap the huge pages into smaller > > > > pages, probably some size that the user can specify. It'll be non-trivial > > > > to enable HGM during that phase using MINOR mode because in that case the > > > > pages are all mapped. > > > > > > > > For the long term, I am just still worried the current interface is still > > > > not as flexible. > > > > > > Thanks for bringing this up, Peter. I think the main reason was: > > > having separate UFFD_FEATUREs clearly indicates to userspace what is > > > and is not supported. > > > > IIRC, I think we wanted to initially limit the usage to the very > > specific use case (live migration). The idea is that we could then > > expand usage as more use cases came to light. > > > > Another good thing is that userfaultfd has versioning built into the > > API. Thus a user can determine if HGM is enabled in their running > > kernel. > > I don't worry much on this one, afaiu if we have any way to enable hgm then > the user can just try enabling it on a test vma, just like when an app > wants to detect whether a new madvise() is present on the current host OS. > > Besides, I'm wondering whether something like /sys/kernel/vm/hugepages/hgm > would work too. > > > > > > For UFFDIO_WRITEPROTECT, a user could remap huge pages into smaller > > > pages by issuing a high-granularity UFFDIO_WRITEPROTECT. That isn't > > > allowed as of this patch series, but it could be allowed in the > > > future. To add support in the same way as this series, we would add > > > another feature, say UFFD_FEATURE_WP_HUGETLBFS_HGM. I agree that > > > having to add another feature isn't great; is this what you're > > > concerned about? > > > > > > Considering MADV_ENABLE_HUGETLB... > > > 1. If a user provides this, then the contract becomes: "the kernel may > > > allow UFFDIO_CONTINUE and UFFDIO_WRITEPROTECT for HugeTLB at > > > high-granularities, provided the support exists", but it becomes > > > unclear to userspace to know what's supported and what isn't. > > > 2. We would then need to keep track if a user explicitly enabled it, > > > or if it got enabled automatically in response to memory poison, for > > > example. Not a big problem, just a complication. (Otherwise, if HGM > > > got enabled for poison, suddenly userspace would be allowed to do > > > things it wasn't allowed to do before.) > > We could alternatively have two flags for each vma: (a) hgm_advised and (b) > hgm_enabled. (a) always sets (b) but not vice versa. We can limit poison > to set (b) only. For this patchset, it can be all about (a). > > > > 3. This API makes sense for enabling HGM for something outside of > > > userfaultfd, like MADV_DONTNEED. > > > > I think #3 is key here. Once we start applying HGM to things outside > > userfaultfd, then more thought will be required on APIs. The API is > > somewhat limited by design until the basic functionality is in place. > > Mike, could you elaborate what's the major concern of having hgm used > outside uffd and live migration use cases? > > I feel like I miss something here. I can understand we want to limit the > usage only when the user specifies using hgm because we want to keep the > old behavior intact. However if we want another way to enable hgm it'll > still need one knob anyway even outside uffd, and I thought that'll service > the same purpose, or maybe not? I am not opposed to using hgm outside the use cases targeted by this series. It seems that when we were previously discussing the API we spent a bunch of time going around in circles trying to get the API correct. That is expected as it is more difficult to take all users/uses/abuses of the API into account. Since the initial use case was fairly limited, it seemed like a good idea to limit the API to userfaultfd. In this way we could focus on the underlying code/implementation and then expand as needed. Of course, with an eye on anything that may be a limiting factor in the future. I was not aware of the uffd-wp use case, and am more than happy to discuss expanding the API.
On Wed, Dec 21, 2022 at 5:32 PM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > On 12/21/22 17:10, Peter Xu wrote: > > On Wed, Dec 21, 2022 at 01:39:39PM -0800, Mike Kravetz wrote: > > > On 12/21/22 15:21, James Houghton wrote: > > > > Thanks for bringing this up, Peter. I think the main reason was: > > > > having separate UFFD_FEATUREs clearly indicates to userspace what is > > > > and is not supported. > > > > > > IIRC, I think we wanted to initially limit the usage to the very > > > specific use case (live migration). The idea is that we could then > > > expand usage as more use cases came to light. > > > > > > Another good thing is that userfaultfd has versioning built into the > > > API. Thus a user can determine if HGM is enabled in their running > > > kernel. > > > > I don't worry much on this one, afaiu if we have any way to enable hgm then > > the user can just try enabling it on a test vma, just like when an app > > wants to detect whether a new madvise() is present on the current host OS. That would be enough to test if HGM was merely present, but if specific features like 4K UFFDIO_CONTINUEs or 4K UFFDIO_WRITEPROTECTs were available. You could always check these by making a HugeTLB VMA and setting it up correctly for userfaultfd/etc., but that's a little messy. > > > > Besides, I'm wondering whether something like /sys/kernel/vm/hugepages/hgm > > would work too. I'm not opposed to this. > > > > > > > > > For UFFDIO_WRITEPROTECT, a user could remap huge pages into smaller > > > > pages by issuing a high-granularity UFFDIO_WRITEPROTECT. That isn't > > > > allowed as of this patch series, but it could be allowed in the > > > > future. To add support in the same way as this series, we would add > > > > another feature, say UFFD_FEATURE_WP_HUGETLBFS_HGM. I agree that > > > > having to add another feature isn't great; is this what you're > > > > concerned about? > > > > > > > > Considering MADV_ENABLE_HUGETLB... > > > > 1. If a user provides this, then the contract becomes: "the kernel may > > > > allow UFFDIO_CONTINUE and UFFDIO_WRITEPROTECT for HugeTLB at > > > > high-granularities, provided the support exists", but it becomes > > > > unclear to userspace to know what's supported and what isn't. > > > > 2. We would then need to keep track if a user explicitly enabled it, > > > > or if it got enabled automatically in response to memory poison, for > > > > example. Not a big problem, just a complication. (Otherwise, if HGM > > > > got enabled for poison, suddenly userspace would be allowed to do > > > > things it wasn't allowed to do before.) > > > > We could alternatively have two flags for each vma: (a) hgm_advised and (b) > > hgm_enabled. (a) always sets (b) but not vice versa. We can limit poison > > to set (b) only. For this patchset, it can be all about (a). My thoughts exactly. :) > > > > > > 3. This API makes sense for enabling HGM for something outside of > > > > userfaultfd, like MADV_DONTNEED. > > > > > > I think #3 is key here. Once we start applying HGM to things outside > > > userfaultfd, then more thought will be required on APIs. The API is > > > somewhat limited by design until the basic functionality is in place. > > > > Mike, could you elaborate what's the major concern of having hgm used > > outside uffd and live migration use cases? > > > > I feel like I miss something here. I can understand we want to limit the > > usage only when the user specifies using hgm because we want to keep the > > old behavior intact. However if we want another way to enable hgm it'll > > still need one knob anyway even outside uffd, and I thought that'll service > > the same purpose, or maybe not? > > I am not opposed to using hgm outside the use cases targeted by this series. > > It seems that when we were previously discussing the API we spent a bunch of > time going around in circles trying to get the API correct. That is expected > as it is more difficult to take all users/uses/abuses of the API into account. > > Since the initial use case was fairly limited, it seemed like a good idea to > limit the API to userfaultfd. In this way we could focus on the underlying > code/implementation and then expand as needed. Of course, with an eye on > anything that may be a limiting factor in the future. > > I was not aware of the uffd-wp use case, and am more than happy to discuss > expanding the API. So considering two API choices: 1. What we have now: UFFD_FEATURE_MINOR_HUGETLBFS_HGM for UFFDIO_CONTINUE, and later UFFD_FEATURE_WP_HUGETLBFS_HGM for UFFDIO_WRITEPROTECT. For MADV_DONTNEED, we could just suddenly start allowing high-granularity choices (not sure if this is bad; we started allowing it for HugeTLB recently with no other API change, AFAIA). 2. MADV_ENABLE_HGM or something similar. The changes to UFFDIO_CONTINUE/UFFDIO_WRITEPROTECT/MADV_DONTNEED come automatically, provided they are implemented. I don't mind one way or the other. Peter, I assume you prefer #2. Mike, what about you? If we decide on something other than #1, I'll make the change before sending v1 out. - James
On 12/21/22 19:02, James Houghton wrote: > On Wed, Dec 21, 2022 at 5:32 PM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > > > On 12/21/22 17:10, Peter Xu wrote: > > > On Wed, Dec 21, 2022 at 01:39:39PM -0800, Mike Kravetz wrote: > > > > On 12/21/22 15:21, James Houghton wrote: > > > > > Thanks for bringing this up, Peter. I think the main reason was: > > > > > having separate UFFD_FEATUREs clearly indicates to userspace what is > > > > > and is not supported. > > > > > > > > IIRC, I think we wanted to initially limit the usage to the very > > > > specific use case (live migration). The idea is that we could then > > > > expand usage as more use cases came to light. > > > > > > > > Another good thing is that userfaultfd has versioning built into the > > > > API. Thus a user can determine if HGM is enabled in their running > > > > kernel. > > > > > > I don't worry much on this one, afaiu if we have any way to enable hgm then > > > the user can just try enabling it on a test vma, just like when an app > > > wants to detect whether a new madvise() is present on the current host OS. > > That would be enough to test if HGM was merely present, but if > specific features like 4K UFFDIO_CONTINUEs or 4K UFFDIO_WRITEPROTECTs > were available. You could always check these by making a HugeTLB VMA > and setting it up correctly for userfaultfd/etc., but that's a little > messy. > > > > > > > Besides, I'm wondering whether something like /sys/kernel/vm/hugepages/hgm > > > would work too. > > I'm not opposed to this. > > > > > > > > > > > > > For UFFDIO_WRITEPROTECT, a user could remap huge pages into smaller > > > > > pages by issuing a high-granularity UFFDIO_WRITEPROTECT. That isn't > > > > > allowed as of this patch series, but it could be allowed in the > > > > > future. To add support in the same way as this series, we would add > > > > > another feature, say UFFD_FEATURE_WP_HUGETLBFS_HGM. I agree that > > > > > having to add another feature isn't great; is this what you're > > > > > concerned about? > > > > > > > > > > Considering MADV_ENABLE_HUGETLB... > > > > > 1. If a user provides this, then the contract becomes: "the kernel may > > > > > allow UFFDIO_CONTINUE and UFFDIO_WRITEPROTECT for HugeTLB at > > > > > high-granularities, provided the support exists", but it becomes > > > > > unclear to userspace to know what's supported and what isn't. > > > > > 2. We would then need to keep track if a user explicitly enabled it, > > > > > or if it got enabled automatically in response to memory poison, for > > > > > example. Not a big problem, just a complication. (Otherwise, if HGM > > > > > got enabled for poison, suddenly userspace would be allowed to do > > > > > things it wasn't allowed to do before.) > > > > > > We could alternatively have two flags for each vma: (a) hgm_advised and (b) > > > hgm_enabled. (a) always sets (b) but not vice versa. We can limit poison > > > to set (b) only. For this patchset, it can be all about (a). > > My thoughts exactly. :) > > > > > > > > > 3. This API makes sense for enabling HGM for something outside of > > > > > userfaultfd, like MADV_DONTNEED. > > > > > > > > I think #3 is key here. Once we start applying HGM to things outside > > > > userfaultfd, then more thought will be required on APIs. The API is > > > > somewhat limited by design until the basic functionality is in place. > > > > > > Mike, could you elaborate what's the major concern of having hgm used > > > outside uffd and live migration use cases? > > > > > > I feel like I miss something here. I can understand we want to limit the > > > usage only when the user specifies using hgm because we want to keep the > > > old behavior intact. However if we want another way to enable hgm it'll > > > still need one knob anyway even outside uffd, and I thought that'll service > > > the same purpose, or maybe not? > > > > I am not opposed to using hgm outside the use cases targeted by this series. > > > > It seems that when we were previously discussing the API we spent a bunch of > > time going around in circles trying to get the API correct. That is expected > > as it is more difficult to take all users/uses/abuses of the API into account. > > > > Since the initial use case was fairly limited, it seemed like a good idea to > > limit the API to userfaultfd. In this way we could focus on the underlying > > code/implementation and then expand as needed. Of course, with an eye on > > anything that may be a limiting factor in the future. > > > > I was not aware of the uffd-wp use case, and am more than happy to discuss > > expanding the API. > > So considering two API choices: > > 1. What we have now: UFFD_FEATURE_MINOR_HUGETLBFS_HGM for > UFFDIO_CONTINUE, and later UFFD_FEATURE_WP_HUGETLBFS_HGM for > UFFDIO_WRITEPROTECT. For MADV_DONTNEED, we could just suddenly start > allowing high-granularity choices (not sure if this is bad; we started > allowing it for HugeTLB recently with no other API change, AFAIA). I don't think we can just start allowing HGM for MADV_DONTNEED without some type of user interaction/request. Otherwise, a user that passes in non-hugetlb page size requests may get unexpected results. And, one of the threads about MADV_DONTNEED points out a valid use cases where the caller may not know the mapping is hugetlb or not and is likely to pass in non-hugetlb page size requests. > 2. MADV_ENABLE_HGM or something similar. The changes to > UFFDIO_CONTINUE/UFFDIO_WRITEPROTECT/MADV_DONTNEED come automatically, > provided they are implemented. > > I don't mind one way or the other. Peter, I assume you prefer #2. > Mike, what about you? If we decide on something other than #1, I'll > make the change before sending v1 out. Since I do not believe 1) is an option, MADV_ENABLE_HGM might be the way to go. Any thoughts about MADV_ENABLE_HGM? I'm thinking: - Make it have same restrictions as other madvise hugetlb calls, . addr must be huge page aligned . length is rounded down to a multiple of huge page size - We split the vma as required - Flags carrying HGM state reside in the hugetlb_shared_vma_data struct
> > So considering two API choices: > > > > 1. What we have now: UFFD_FEATURE_MINOR_HUGETLBFS_HGM for > > UFFDIO_CONTINUE, and later UFFD_FEATURE_WP_HUGETLBFS_HGM for > > UFFDIO_WRITEPROTECT. For MADV_DONTNEED, we could just suddenly start > > allowing high-granularity choices (not sure if this is bad; we started > > allowing it for HugeTLB recently with no other API change, AFAIA). > > I don't think we can just start allowing HGM for MADV_DONTNEED without > some type of user interaction/request. Otherwise, a user that passes > in non-hugetlb page size requests may get unexpected results. And, one > of the threads about MADV_DONTNEED points out a valid use cases where > the caller may not know the mapping is hugetlb or not and is likely to > pass in non-hugetlb page size requests. > > > 2. MADV_ENABLE_HGM or something similar. The changes to > > UFFDIO_CONTINUE/UFFDIO_WRITEPROTECT/MADV_DONTNEED come automatically, > > provided they are implemented. > > > > I don't mind one way or the other. Peter, I assume you prefer #2. > > Mike, what about you? If we decide on something other than #1, I'll > > make the change before sending v1 out. > > Since I do not believe 1) is an option, MADV_ENABLE_HGM might be the way > to go. Any thoughts about MADV_ENABLE_HGM? I'm thinking: > - Make it have same restrictions as other madvise hugetlb calls, > . addr must be huge page aligned > . length is rounded down to a multiple of huge page size > - We split the vma as required I agree with these. > - Flags carrying HGM state reside in the hugetlb_shared_vma_data struct I actually changed this in v1 to storing HGM state as a VMA flag to avoid problems with splitting VMAs (like, when we split a VMA, it's possible the VMA data/lock struct doesn't get allocated). It seems better to me; I can change it back if you disagree. Not sure what the best name for this flag is either. MADV_ENABLE_HGM sounds ok. MADV_HUGETLB_HGM or MADV_HUGETLB_SMALL_PAGES could work too. No need to figure it out now. Thanks Mike and Peter :) I'll make this change for v1 and send it out sometime soon. - James
On Wed, Dec 21, 2022 at 08:24:45PM -0500, James Houghton wrote: > Not sure what the best name for this flag is either. MADV_ENABLE_HGM > sounds ok. MADV_HUGETLB_HGM or MADV_HUGETLB_SMALL_PAGES could work > too. No need to figure it out now. One more option to consider is MADV_SPLIT (hopefully to be more generic). We already decided to reuse thp MADV_COLLAPSE, we can also introduce MADV_SPLIT and leave thp for later if it can be anything helpful (I remember we used to discuss this for thp split). For hugetlb one SPLIT should enable hgm advise bit on the vma forever. Thanks,
On Thu, Dec 22, 2022 at 9:30 AM Peter Xu <peterx@redhat.com> wrote: > > On Wed, Dec 21, 2022 at 08:24:45PM -0500, James Houghton wrote: > > Not sure what the best name for this flag is either. MADV_ENABLE_HGM > > sounds ok. MADV_HUGETLB_HGM or MADV_HUGETLB_SMALL_PAGES could work > > too. No need to figure it out now. > > One more option to consider is MADV_SPLIT (hopefully to be more generic). > > We already decided to reuse thp MADV_COLLAPSE, we can also introduce > MADV_SPLIT and leave thp for later if it can be anything helpful (I > remember we used to discuss this for thp split). > > For hugetlb one SPLIT should enable hgm advise bit on the vma forever. MADV_SPLIT sounds okay to me -- we'll see how it turns out when I send v1. However, there's an interesting API question regarding what address userfaultfd provides. We previously required UFFD_FEATURE_EXACT_ADDRESS when you specified UFFD_FEATURE_MINOR_HUGETLBFS_HGM so that there was no ambiguity. Now, we can do: 1. When MADV_SPLIT is given, userfaultfd will now round addresses to PAGE_SIZE instead of huge_page_size(hstate), and UFFD_FEATURE_EXACT_ADDRESS is not needed. 2. Don't change anything. A user must know to provide UFFD_FEATURE_EXACT_ADDRESS to get the real address, otherwise they get an (unusable) hugepage-aligned address. I think #1 sounds fine; let me know if you disagree. Thanks! - James
On Tue, Dec 27, 2022 at 12:02:52PM -0500, James Houghton wrote: > On Thu, Dec 22, 2022 at 9:30 AM Peter Xu <peterx@redhat.com> wrote: > > > > On Wed, Dec 21, 2022 at 08:24:45PM -0500, James Houghton wrote: > > > Not sure what the best name for this flag is either. MADV_ENABLE_HGM > > > sounds ok. MADV_HUGETLB_HGM or MADV_HUGETLB_SMALL_PAGES could work > > > too. No need to figure it out now. > > > > One more option to consider is MADV_SPLIT (hopefully to be more generic). > > > > We already decided to reuse thp MADV_COLLAPSE, we can also introduce > > MADV_SPLIT and leave thp for later if it can be anything helpful (I > > remember we used to discuss this for thp split). > > > > For hugetlb one SPLIT should enable hgm advise bit on the vma forever. > > MADV_SPLIT sounds okay to me -- we'll see how it turns out when I send > v1. However, there's an interesting API question regarding what > address userfaultfd provides. We previously required > UFFD_FEATURE_EXACT_ADDRESS when you specified > UFFD_FEATURE_MINOR_HUGETLBFS_HGM so that there was no ambiguity. Now, > we can do: > > 1. When MADV_SPLIT is given, userfaultfd will now round addresses to > PAGE_SIZE instead of huge_page_size(hstate), and > UFFD_FEATURE_EXACT_ADDRESS is not needed. > 2. Don't change anything. A user must know to provide > UFFD_FEATURE_EXACT_ADDRESS to get the real address, otherwise they get > an (unusable) hugepage-aligned address. > > I think #1 sounds fine; let me know if you disagree. Sounds good to me, thanks!
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 07c81ab3fd4d..3a3e9ef74dab 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -226,6 +226,11 @@ static inline struct uffd_msg userfault_msg(unsigned long address, return msg; } +bool uffd_ctx_has_hgm(struct vm_userfaultfd_ctx *ctx) +{ + return ctx->ctx->features & UFFD_FEATURE_MINOR_HUGETLBFS_HGM; +} + #ifdef CONFIG_HUGETLB_PAGE /* * Same functionality as userfaultfd_must_wait below with modifications for @@ -1954,10 +1959,15 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx, goto err_out; /* report all available features and ioctls to userland */ uffdio_api.features = UFFD_API_FEATURES; + #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR uffdio_api.features &= ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM); -#endif +#ifndef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING + uffdio_api.features &= ~UFFD_FEATURE_MINOR_HUGETLBFS_HGM; +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ +#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ + #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP uffdio_api.features &= ~UFFD_FEATURE_PAGEFAULT_FLAG_WP; #endif diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index f07e6998bb68..d8fa37f308f7 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -162,6 +162,8 @@ static inline bool vma_can_userfault(struct vm_area_struct *vma, vma_is_shmem(vma); } +extern bool uffd_ctx_has_hgm(struct vm_userfaultfd_ctx *); + extern int dup_userfaultfd(struct vm_area_struct *, struct list_head *); extern void dup_userfaultfd_complete(struct list_head *); @@ -228,6 +230,11 @@ static inline bool userfaultfd_armed(struct vm_area_struct *vma) return false; } +static inline bool uffd_ctx_has_hgm(struct vm_userfaultfd_ctx *ctx) +{ + return false; +} + static inline int dup_userfaultfd(struct vm_area_struct *vma, struct list_head *l) { diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 005e5e306266..ae8080003560 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -36,6 +36,7 @@ UFFD_FEATURE_SIGBUS | \ UFFD_FEATURE_THREAD_ID | \ UFFD_FEATURE_MINOR_HUGETLBFS | \ + UFFD_FEATURE_MINOR_HUGETLBFS_HGM | \ UFFD_FEATURE_MINOR_SHMEM | \ UFFD_FEATURE_EXACT_ADDRESS | \ UFFD_FEATURE_WP_HUGETLBFS_SHMEM) @@ -217,6 +218,7 @@ struct uffdio_api { #define UFFD_FEATURE_MINOR_SHMEM (1<<10) #define UFFD_FEATURE_EXACT_ADDRESS (1<<11) #define UFFD_FEATURE_WP_HUGETLBFS_SHMEM (1<<12) +#define UFFD_FEATURE_MINOR_HUGETLBFS_HGM (1<<13) __u64 features; __u64 ioctls;