Message ID | 20221021163703.3218176-9-jthoughton@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp795227wrr; Fri, 21 Oct 2022 09:38:32 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6C7dyPcDB81OCB+AzVVZmtF3gyxAi8KVcAuUj4ZCvlUYQmQI8lRMjmnrP2xTJ0Pds/YfKk X-Received: by 2002:a17:903:245:b0:178:e0ba:e507 with SMTP id j5-20020a170903024500b00178e0bae507mr20585521plh.115.1666370312274; Fri, 21 Oct 2022 09:38:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370312; cv=none; d=google.com; s=arc-20160816; b=rDqUYHc4JtVjZnvGB9GwYxN9ORfcqrsRsStgJSf7iaO+DMd/ehF5EH5zFVmhymRRm/ Hf7n7/L6Fd4FnRIgJRyW6bCsYE/fCLswsXVvQuVln3bdHAJHIFcEJGTYAMwLilwaAEiN 2cPNLGX0UHK7DQe/8LHe4B08X4tiZkJCrnM6xDMCwzWKnvMts+0yzSO7sEukzOLj2BiN 32zy/0kPyI1uIsyS3Vnhc8yzw9zxR2skanHd4tDYp28aG77ARo7ZvHeK3J9JQX07UBLg 8F3AaP3ZS/06vKRdhHHar/2DkC8hGQLT979hzFBKUtVA0LNSjroDspNcrsgs735OcprF k7/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=2b1VLSKb1qIoBhHv68uc7sEp7uA+Ju1UfoLt8HAduUY=; b=IUtuK3jmbWwmgaaiwSr84b721Kbb736XME6oFXdLdNZTU7bKob4pWqO/0gdlFQzR5P btXzGX25C8BEwkmaBnKe3rQBY01/93vogKe3lxPTOqndR5REcSL4ojgPrOGVM6mJEzX1 wcPLzWwnnUc7uW4LIM0DbITg+XKVu2Ya9iTtftCs0ky9gkP2y5SHq8H9rx0aoHjicWhr AkIL9mtZXwjy4a4DDyASrpgG4fK0mxHawFxG35OOTR1smHMJwDRhlFZeEzgiYdV1Xunp Ol5UOFys3ywW+jxTs14P2Laj6+gPKNxmIVu9gbIzjtYtLMGqpWg+csVnLxChKhC/Qx5l 1BXQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=fFKq9A+3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s7-20020a639247000000b00455bdca33d0si30679444pgn.812.2022.10.21.09.38.20; Fri, 21 Oct 2022 09:38:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=fFKq9A+3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230518AbiJUQh4 (ORCPT <rfc822;mntrajkot1@gmail.com> + 99 others); Fri, 21 Oct 2022 12:37:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52572 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230427AbiJUQh3 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 21 Oct 2022 12:37:29 -0400 Received: from mail-ua1-x94a.google.com (mail-ua1-x94a.google.com [IPv6:2607:f8b0:4864:20::94a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AED6227B564 for <linux-kernel@vger.kernel.org>; Fri, 21 Oct 2022 09:37:25 -0700 (PDT) Received: by mail-ua1-x94a.google.com with SMTP id h11-20020ab0470b000000b003bf1da44886so2348887uac.17 for <linux-kernel@vger.kernel.org>; Fri, 21 Oct 2022 09:37:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=2b1VLSKb1qIoBhHv68uc7sEp7uA+Ju1UfoLt8HAduUY=; b=fFKq9A+3A5TkI1IGkVnB2YWBf8fVHVGy1IiVCq2xH2D+ukznX6/SfUTTA86X+pjDnq mbkAvkkQZk6J12PGLDzV/KGi2b7kOMU02Aktw1M0kEN5f6ZdQppBNHzpLJZ/2+Tr/2/g ncbFQte81ZEqmhzmntgBziWdoyhRcdyDj0o5k51Fk87D8MZ5FdHzTMkbyahm7686yY3i DZOqrcFZGBRSPmwIMgUobG/r3zXICtn1+y5idZYO2hGxg2n7PJaREU84U/umC4o0E0Ow /Kbl96xQv1BiA7GCRPg/XqDMmTLQZjscb0wPrtARpfKTcFdOZGNtTdI8o1uc4cbj1uEg 69aw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2b1VLSKb1qIoBhHv68uc7sEp7uA+Ju1UfoLt8HAduUY=; b=dZLaVedVZ65npY9+RIkby/0YIv1neWgkH3rdavJ16jIuwGRuR2nsjOPPvHuQo2suEw OZUIbouwdSSR+DHbCEOumya9kGRgCPjOHT1TN2L0+MUMjCiaDfZ5gN/uFVc+cmqCUWu5 XQq1NPLkgztXYvNt3vGFJLnme5Qk8U4C7JLsP1uBegrpRuUpQ28nO43FEq42WsZRY5ab lw1enGh4lZy7Tw9F1zko4k+DZ7MqdL43Jcge7Jot6a7jhXvRFiTjaSPZ5Qs9lb+D8EuG 0aKZ5dV1K37jt655ehSL3MZQwIJARx8YSpmWodtY/5gtrzpVpY2fLKGwCpYo856HfVGC 5/bA== X-Gm-Message-State: ACrzQf3b8CdCQyxV2yrGUJzPodUqkIhIWoC9aHvee0fvpO0m0NtfI9LT +Yp/nqfcRehuxzcyn4r1AXXQhIYp/Qvsm77+ X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:ab0:6847:0:b0:3f0:c29b:e14a with SMTP id a7-20020ab06847000000b003f0c29be14amr9896783uas.33.1666370244124; Fri, 21 Oct 2022 09:37:24 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:24 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-9-jthoughton@google.com> Subject: [RFC PATCH v2 08/47] hugetlb: add HGM enablement functions From: James Houghton <jthoughton@google.com> To: Mike Kravetz <mike.kravetz@oracle.com>, Muchun Song <songmuchun@bytedance.com>, Peter Xu <peterx@redhat.com> Cc: David Hildenbrand <david@redhat.com>, David Rientjes <rientjes@google.com>, Axel Rasmussen <axelrasmussen@google.com>, Mina Almasry <almasrymina@google.com>, "Zach O'Keefe" <zokeefe@google.com>, Manish Mishra <manish.mishra@nutanix.com>, Naoya Horiguchi <naoya.horiguchi@nec.com>, "Dr . David Alan Gilbert" <dgilbert@redhat.com>, "Matthew Wilcox (Oracle)" <willy@infradead.org>, Vlastimil Babka <vbabka@suse.cz>, Baolin Wang <baolin.wang@linux.alibaba.com>, Miaohe Lin <linmiaohe@huawei.com>, Yang Shi <shy828301@gmail.com>, Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton <jthoughton@google.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747315916989629171?= X-GMAIL-MSGID: =?utf-8?q?1747315916989629171?= |
Series |
hugetlb: introduce HugeTLB high-granularity mapping
|
|
Commit Message
James Houghton
Oct. 21, 2022, 4:36 p.m. UTC
Currently it is possible for all shared VMAs to use HGM, but it must be
enabled first. This is because with HGM, we lose PMD sharing, and page
table walks require additional synchronization (we need to take the VMA
lock).
Signed-off-by: James Houghton <jthoughton@google.com>
---
include/linux/hugetlb.h | 22 +++++++++++++
mm/hugetlb.c | 69 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 91 insertions(+)
Comments
On Fri, Oct 21, 2022 at 04:36:24PM +0000, James Houghton wrote: > Currently it is possible for all shared VMAs to use HGM, but it must be > enabled first. This is because with HGM, we lose PMD sharing, and page > table walks require additional synchronization (we need to take the VMA > lock). > > Signed-off-by: James Houghton <jthoughton@google.com> > --- > include/linux/hugetlb.h | 22 +++++++++++++ > mm/hugetlb.c | 69 +++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 91 insertions(+) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 534958499ac4..6e0c36b08a0c 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -123,6 +123,9 @@ struct hugetlb_vma_lock { > > struct hugetlb_shared_vma_data { > struct hugetlb_vma_lock vma_lock; > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > + bool hgm_enabled; > +#endif > }; > > extern struct resv_map *resv_map_alloc(void); > @@ -1179,6 +1182,25 @@ static inline void hugetlb_unregister_node(struct node *node) > } > #endif /* CONFIG_HUGETLB_PAGE */ > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > +bool hugetlb_hgm_enabled(struct vm_area_struct *vma); > +bool hugetlb_hgm_eligible(struct vm_area_struct *vma); > +int enable_hugetlb_hgm(struct vm_area_struct *vma); > +#else > +static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma) > +{ > + return false; > +} > +static inline bool hugetlb_hgm_eligible(struct vm_area_struct *vma) > +{ > + return false; > +} > +static inline int enable_hugetlb_hgm(struct vm_area_struct *vma) > +{ > + return -EINVAL; > +} > +#endif > + > static inline spinlock_t *huge_pte_lock(struct hstate *h, > struct mm_struct *mm, pte_t *pte) > { > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 5ae8bc8c928e..a18143add956 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -6840,6 +6840,10 @@ static bool pmd_sharing_possible(struct vm_area_struct *vma) > #ifdef CONFIG_USERFAULTFD > if (uffd_disable_huge_pmd_share(vma)) > return false; > +#endif > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > + if (hugetlb_hgm_enabled(vma)) > + return false; > #endif > /* > * Only shared VMAs can share PMDs. > @@ -7033,6 +7037,9 @@ static int hugetlb_vma_data_alloc(struct vm_area_struct *vma) > kref_init(&data->vma_lock.refs); > init_rwsem(&data->vma_lock.rw_sema); > data->vma_lock.vma = vma; > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > + data->hgm_enabled = false; > +#endif > vma->vm_private_data = data; > return 0; > } > @@ -7290,6 +7297,68 @@ __weak unsigned long hugetlb_mask_last_page(struct hstate *h) > > #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */ > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > +bool hugetlb_hgm_eligible(struct vm_area_struct *vma) > +{ > + /* > + * All shared VMAs may have HGM. > + * > + * HGM requires using the VMA lock, which only exists for shared VMAs. > + * To make HGM work for private VMAs, we would need to use another > + * scheme to prevent collapsing/splitting from invalidating other > + * threads' page table walks. > + */ > + return vma && (vma->vm_flags & VM_MAYSHARE); > +} > +bool hugetlb_hgm_enabled(struct vm_area_struct *vma) > +{ > + struct hugetlb_shared_vma_data *data = vma->vm_private_data; > + > + if (!vma || !(vma->vm_flags & VM_MAYSHARE)) > + return false; Nit: smells like a open-coded hugetlb_hgm_eligible(). > + > + return data && data->hgm_enabled; > +} > + > +/* > + * Enable high-granularity mapping (HGM) for this VMA. Once enabled, HGM > + * cannot be turned off. > + * > + * PMDs cannot be shared in HGM VMAs. > + */ > +int enable_hugetlb_hgm(struct vm_area_struct *vma) > +{ > + int ret; > + struct hugetlb_shared_vma_data *data; > + > + if (!hugetlb_hgm_eligible(vma)) > + return -EINVAL; > + > + if (hugetlb_hgm_enabled(vma)) > + return 0; > + > + /* > + * We must hold the mmap lock for writing so that callers can rely on > + * hugetlb_hgm_enabled returning a consistent result while holding > + * the mmap lock for reading. > + */ > + mmap_assert_write_locked(vma->vm_mm); > + > + /* HugeTLB HGM requires the VMA lock to synchronize collapsing. */ > + ret = hugetlb_vma_data_alloc(vma); > + if (ret) > + return ret; > + > + data = vma->vm_private_data; > + BUG_ON(!data); Let's avoid BUG_ON() as afaiu it's mostly not welcomed unless extremely necessary. In this case it'll crash immediately in next dereference anyway with the whole stack dumped, so we won't miss anything important. :) > + data->hgm_enabled = true; > + > + /* We don't support PMD sharing with HGM. */ > + hugetlb_unshare_all_pmds(vma); > + return 0; > +} > +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ > + > /* > * These functions are overwritable if your architecture needs its own > * behavior. > -- > 2.38.0.135.g90850a2211-goog >
On Fri, Oct 21, 2022 at 9:37 AM James Houghton <jthoughton@google.com> wrote: > > Currently it is possible for all shared VMAs to use HGM, but it must be > enabled first. This is because with HGM, we lose PMD sharing, and page > table walks require additional synchronization (we need to take the VMA > lock). > > Signed-off-by: James Houghton <jthoughton@google.com> > --- > include/linux/hugetlb.h | 22 +++++++++++++ > mm/hugetlb.c | 69 +++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 91 insertions(+) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 534958499ac4..6e0c36b08a0c 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -123,6 +123,9 @@ struct hugetlb_vma_lock { > > struct hugetlb_shared_vma_data { > struct hugetlb_vma_lock vma_lock; > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > + bool hgm_enabled; > +#endif > }; > > extern struct resv_map *resv_map_alloc(void); > @@ -1179,6 +1182,25 @@ static inline void hugetlb_unregister_node(struct node *node) > } > #endif /* CONFIG_HUGETLB_PAGE */ > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > +bool hugetlb_hgm_enabled(struct vm_area_struct *vma); > +bool hugetlb_hgm_eligible(struct vm_area_struct *vma); > +int enable_hugetlb_hgm(struct vm_area_struct *vma); > +#else > +static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma) > +{ > + return false; > +} > +static inline bool hugetlb_hgm_eligible(struct vm_area_struct *vma) > +{ > + return false; > +} > +static inline int enable_hugetlb_hgm(struct vm_area_struct *vma) > +{ > + return -EINVAL; > +} > +#endif > + > static inline spinlock_t *huge_pte_lock(struct hstate *h, > struct mm_struct *mm, pte_t *pte) > { > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 5ae8bc8c928e..a18143add956 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -6840,6 +6840,10 @@ static bool pmd_sharing_possible(struct vm_area_struct *vma) > #ifdef CONFIG_USERFAULTFD > if (uffd_disable_huge_pmd_share(vma)) > return false; > +#endif > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > + if (hugetlb_hgm_enabled(vma)) > + return false; > #endif > /* > * Only shared VMAs can share PMDs. > @@ -7033,6 +7037,9 @@ static int hugetlb_vma_data_alloc(struct vm_area_struct *vma) > kref_init(&data->vma_lock.refs); > init_rwsem(&data->vma_lock.rw_sema); > data->vma_lock.vma = vma; > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > + data->hgm_enabled = false; > +#endif > vma->vm_private_data = data; > return 0; > } > @@ -7290,6 +7297,68 @@ __weak unsigned long hugetlb_mask_last_page(struct hstate *h) > > #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */ > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > +bool hugetlb_hgm_eligible(struct vm_area_struct *vma) > +{ > + /* > + * All shared VMAs may have HGM. > + * > + * HGM requires using the VMA lock, which only exists for shared VMAs. > + * To make HGM work for private VMAs, we would need to use another > + * scheme to prevent collapsing/splitting from invalidating other > + * threads' page table walks. > + */ > + return vma && (vma->vm_flags & VM_MAYSHARE); > +} > +bool hugetlb_hgm_enabled(struct vm_area_struct *vma) > +{ > + struct hugetlb_shared_vma_data *data = vma->vm_private_data; > + > + if (!vma || !(vma->vm_flags & VM_MAYSHARE)) > + return false; > + > + return data && data->hgm_enabled; Don't you need to lock data->vma_lock before you access data? Or did I misunderstand the locking? Or are you assuming this is safe before hgm_enabled can't be disabled? > +} > + > +/* > + * Enable high-granularity mapping (HGM) for this VMA. Once enabled, HGM > + * cannot be turned off. > + * > + * PMDs cannot be shared in HGM VMAs. > + */ > +int enable_hugetlb_hgm(struct vm_area_struct *vma) > +{ > + int ret; > + struct hugetlb_shared_vma_data *data; > + > + if (!hugetlb_hgm_eligible(vma)) > + return -EINVAL; > + > + if (hugetlb_hgm_enabled(vma)) > + return 0; > + > + /* > + * We must hold the mmap lock for writing so that callers can rely on > + * hugetlb_hgm_enabled returning a consistent result while holding > + * the mmap lock for reading. > + */ > + mmap_assert_write_locked(vma->vm_mm); > + > + /* HugeTLB HGM requires the VMA lock to synchronize collapsing. */ > + ret = hugetlb_vma_data_alloc(vma); Confused we need to vma_data_alloc() here. Shouldn't this be done by hugetlb_vm_op_open()? > + if (ret) > + return ret; > + > + data = vma->vm_private_data; > + BUG_ON(!data); > + data->hgm_enabled = true; > + > + /* We don't support PMD sharing with HGM. */ > + hugetlb_unshare_all_pmds(vma); > + return 0; > +} > +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ > + > /* > * These functions are overwritable if your architecture needs its own > * behavior. > -- > 2.38.0.135.g90850a2211-goog >
On Wed, Dec 7, 2022 at 7:26 PM Mina Almasry <almasrymina@google.com> wrote: > > On Fri, Oct 21, 2022 at 9:37 AM James Houghton <jthoughton@google.com> wrote: > > > > Currently it is possible for all shared VMAs to use HGM, but it must be > > enabled first. This is because with HGM, we lose PMD sharing, and page > > table walks require additional synchronization (we need to take the VMA > > lock). > > > > Signed-off-by: James Houghton <jthoughton@google.com> > > --- > > include/linux/hugetlb.h | 22 +++++++++++++ > > mm/hugetlb.c | 69 +++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 91 insertions(+) > > > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > > index 534958499ac4..6e0c36b08a0c 100644 > > --- a/include/linux/hugetlb.h > > +++ b/include/linux/hugetlb.h > > @@ -123,6 +123,9 @@ struct hugetlb_vma_lock { > > > > struct hugetlb_shared_vma_data { > > struct hugetlb_vma_lock vma_lock; > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > > + bool hgm_enabled; > > +#endif > > }; > > > > extern struct resv_map *resv_map_alloc(void); > > @@ -1179,6 +1182,25 @@ static inline void hugetlb_unregister_node(struct node *node) > > } > > #endif /* CONFIG_HUGETLB_PAGE */ > > > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > > +bool hugetlb_hgm_enabled(struct vm_area_struct *vma); > > +bool hugetlb_hgm_eligible(struct vm_area_struct *vma); > > +int enable_hugetlb_hgm(struct vm_area_struct *vma); > > +#else > > +static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma) > > +{ > > + return false; > > +} > > +static inline bool hugetlb_hgm_eligible(struct vm_area_struct *vma) > > +{ > > + return false; > > +} > > +static inline int enable_hugetlb_hgm(struct vm_area_struct *vma) > > +{ > > + return -EINVAL; > > +} > > +#endif > > + > > static inline spinlock_t *huge_pte_lock(struct hstate *h, > > struct mm_struct *mm, pte_t *pte) > > { > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 5ae8bc8c928e..a18143add956 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -6840,6 +6840,10 @@ static bool pmd_sharing_possible(struct vm_area_struct *vma) > > #ifdef CONFIG_USERFAULTFD > > if (uffd_disable_huge_pmd_share(vma)) > > return false; > > +#endif > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > > + if (hugetlb_hgm_enabled(vma)) > > + return false; > > #endif > > /* > > * Only shared VMAs can share PMDs. > > @@ -7033,6 +7037,9 @@ static int hugetlb_vma_data_alloc(struct vm_area_struct *vma) > > kref_init(&data->vma_lock.refs); > > init_rwsem(&data->vma_lock.rw_sema); > > data->vma_lock.vma = vma; > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > > + data->hgm_enabled = false; > > +#endif > > vma->vm_private_data = data; > > return 0; > > } > > @@ -7290,6 +7297,68 @@ __weak unsigned long hugetlb_mask_last_page(struct hstate *h) > > > > #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */ > > > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > > +bool hugetlb_hgm_eligible(struct vm_area_struct *vma) > > +{ > > + /* > > + * All shared VMAs may have HGM. > > + * > > + * HGM requires using the VMA lock, which only exists for shared VMAs. > > + * To make HGM work for private VMAs, we would need to use another > > + * scheme to prevent collapsing/splitting from invalidating other > > + * threads' page table walks. > > + */ > > + return vma && (vma->vm_flags & VM_MAYSHARE); > > +} > > +bool hugetlb_hgm_enabled(struct vm_area_struct *vma) > > +{ > > + struct hugetlb_shared_vma_data *data = vma->vm_private_data; > > + > > + if (!vma || !(vma->vm_flags & VM_MAYSHARE)) > > + return false; > > + > > + return data && data->hgm_enabled; > > Don't you need to lock data->vma_lock before you access data? Or did I > misunderstand the locking? Or are you assuming this is safe before > hgm_enabled can't be disabled? This should be protected by the mmap_lock (we must be holding it for at least reading here). `data` and `data->hgm_enabled` are only changed when holding the mmap_lock for writing. > > +} > > + > > +/* > > + * Enable high-granularity mapping (HGM) for this VMA. Once enabled, HGM > > + * cannot be turned off. > > + * > > + * PMDs cannot be shared in HGM VMAs. > > + */ > > +int enable_hugetlb_hgm(struct vm_area_struct *vma) > > +{ > > + int ret; > > + struct hugetlb_shared_vma_data *data; > > + > > + if (!hugetlb_hgm_eligible(vma)) > > + return -EINVAL; > > + > > + if (hugetlb_hgm_enabled(vma)) > > + return 0; > > + > > + /* > > + * We must hold the mmap lock for writing so that callers can rely on > > + * hugetlb_hgm_enabled returning a consistent result while holding > > + * the mmap lock for reading. > > + */ > > + mmap_assert_write_locked(vma->vm_mm); > > + > > + /* HugeTLB HGM requires the VMA lock to synchronize collapsing. */ > > + ret = hugetlb_vma_data_alloc(vma); > > Confused we need to vma_data_alloc() here. Shouldn't this be done by > hugetlb_vm_op_open()? hugetlb_vma_data_alloc() can fail. In hugetlb_vm_op_open()/other places, it is allowed to fail, and so we call it again here and check that it succeeded so that we can rely on the VMA lock. I think I need to be a little bit more careful with how I handle VMA splitting, though. It's possible for `data` not to be allocated after we split, but for some things to be mapped at high-granularity. The easiest solution here would be to disallow splitting when HGM is enabled; not sure what the best solution is though. Thanks for the review, Mina! > > > + if (ret) > > + return ret; > > + > > + data = vma->vm_private_data; > > + BUG_ON(!data); > > + data->hgm_enabled = true; > > + > > + /* We don't support PMD sharing with HGM. */ > > + hugetlb_unshare_all_pmds(vma); > > + return 0; > > +} > > +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ > > + > > /* > > * These functions are overwritable if your architecture needs its own > > * behavior. > > -- > > 2.38.0.135.g90850a2211-goog > >
On 10/21/22 16:36, James Houghton wrote: > Currently it is possible for all shared VMAs to use HGM, but it must be > enabled first. This is because with HGM, we lose PMD sharing, and page > table walks require additional synchronization (we need to take the VMA > lock). Not sure yet, but I expect Peter's series will help with locking for hugetlb specific page table walks. > > Signed-off-by: James Houghton <jthoughton@google.com> > --- > include/linux/hugetlb.h | 22 +++++++++++++ > mm/hugetlb.c | 69 +++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 91 insertions(+) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 534958499ac4..6e0c36b08a0c 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -123,6 +123,9 @@ struct hugetlb_vma_lock { > > struct hugetlb_shared_vma_data { > struct hugetlb_vma_lock vma_lock; > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > + bool hgm_enabled; > +#endif > }; > > extern struct resv_map *resv_map_alloc(void); > @@ -1179,6 +1182,25 @@ static inline void hugetlb_unregister_node(struct node *node) > } > #endif /* CONFIG_HUGETLB_PAGE */ > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > +bool hugetlb_hgm_enabled(struct vm_area_struct *vma); > +bool hugetlb_hgm_eligible(struct vm_area_struct *vma); > +int enable_hugetlb_hgm(struct vm_area_struct *vma); > +#else > +static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma) > +{ > + return false; > +} > +static inline bool hugetlb_hgm_eligible(struct vm_area_struct *vma) > +{ > + return false; > +} > +static inline int enable_hugetlb_hgm(struct vm_area_struct *vma) > +{ > + return -EINVAL; > +} > +#endif > + > static inline spinlock_t *huge_pte_lock(struct hstate *h, > struct mm_struct *mm, pte_t *pte) > { > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 5ae8bc8c928e..a18143add956 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -6840,6 +6840,10 @@ static bool pmd_sharing_possible(struct vm_area_struct *vma) > #ifdef CONFIG_USERFAULTFD > if (uffd_disable_huge_pmd_share(vma)) > return false; > +#endif > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > + if (hugetlb_hgm_enabled(vma)) > + return false; > #endif > /* > * Only shared VMAs can share PMDs. > @@ -7033,6 +7037,9 @@ static int hugetlb_vma_data_alloc(struct vm_area_struct *vma) > kref_init(&data->vma_lock.refs); > init_rwsem(&data->vma_lock.rw_sema); > data->vma_lock.vma = vma; > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > + data->hgm_enabled = false; > +#endif > vma->vm_private_data = data; > return 0; > } > @@ -7290,6 +7297,68 @@ __weak unsigned long hugetlb_mask_last_page(struct hstate *h) > > #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */ > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > +bool hugetlb_hgm_eligible(struct vm_area_struct *vma) > +{ > + /* > + * All shared VMAs may have HGM. > + * > + * HGM requires using the VMA lock, which only exists for shared VMAs. > + * To make HGM work for private VMAs, we would need to use another > + * scheme to prevent collapsing/splitting from invalidating other > + * threads' page table walks. > + */ > + return vma && (vma->vm_flags & VM_MAYSHARE); I am not yet 100% convinced you can/will take care of all possible code paths where hugetlb_vma_data allocation may fail. If not, then you should be checking vm_private_data here as well. > +} > +bool hugetlb_hgm_enabled(struct vm_area_struct *vma) > +{ > + struct hugetlb_shared_vma_data *data = vma->vm_private_data; > + > + if (!vma || !(vma->vm_flags & VM_MAYSHARE)) > + return false; > + > + return data && data->hgm_enabled; > +} > + > +/* > + * Enable high-granularity mapping (HGM) for this VMA. Once enabled, HGM > + * cannot be turned off. > + * > + * PMDs cannot be shared in HGM VMAs. > + */ > +int enable_hugetlb_hgm(struct vm_area_struct *vma) > +{ > + int ret; > + struct hugetlb_shared_vma_data *data; > + > + if (!hugetlb_hgm_eligible(vma)) > + return -EINVAL; > + > + if (hugetlb_hgm_enabled(vma)) > + return 0; > + > + /* > + * We must hold the mmap lock for writing so that callers can rely on > + * hugetlb_hgm_enabled returning a consistent result while holding > + * the mmap lock for reading. > + */ > + mmap_assert_write_locked(vma->vm_mm); > + > + /* HugeTLB HGM requires the VMA lock to synchronize collapsing. */ > + ret = hugetlb_vma_data_alloc(vma); > + if (ret) > + return ret; > + > + data = vma->vm_private_data; > + BUG_ON(!data); Would rather have hugetlb_hgm_eligible check for vm_private_data as suggested above instead of the BUG here.
On Mon, Dec 12, 2022 at 7:14 PM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > On 10/21/22 16:36, James Houghton wrote: > > Currently it is possible for all shared VMAs to use HGM, but it must be > > enabled first. This is because with HGM, we lose PMD sharing, and page > > table walks require additional synchronization (we need to take the VMA > > lock). > > Not sure yet, but I expect Peter's series will help with locking for > hugetlb specific page table walks. It should make things a little bit cleaner in this series; I'll rebase HGM on top of those patches this week (and hopefully get a v1 out soon). I don't think it's possible to implement MADV_COLLAPSE with RCU alone (as implemented in Peter's series anyway); we still need the VMA lock. > > > > > Signed-off-by: James Houghton <jthoughton@google.com> > > --- > > include/linux/hugetlb.h | 22 +++++++++++++ > > mm/hugetlb.c | 69 +++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 91 insertions(+) > > > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > > index 534958499ac4..6e0c36b08a0c 100644 > > --- a/include/linux/hugetlb.h > > +++ b/include/linux/hugetlb.h > > @@ -123,6 +123,9 @@ struct hugetlb_vma_lock { > > > > struct hugetlb_shared_vma_data { > > struct hugetlb_vma_lock vma_lock; > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > > + bool hgm_enabled; > > +#endif > > }; > > > > extern struct resv_map *resv_map_alloc(void); > > @@ -1179,6 +1182,25 @@ static inline void hugetlb_unregister_node(struct node *node) > > } > > #endif /* CONFIG_HUGETLB_PAGE */ > > > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > > +bool hugetlb_hgm_enabled(struct vm_area_struct *vma); > > +bool hugetlb_hgm_eligible(struct vm_area_struct *vma); > > +int enable_hugetlb_hgm(struct vm_area_struct *vma); > > +#else > > +static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma) > > +{ > > + return false; > > +} > > +static inline bool hugetlb_hgm_eligible(struct vm_area_struct *vma) > > +{ > > + return false; > > +} > > +static inline int enable_hugetlb_hgm(struct vm_area_struct *vma) > > +{ > > + return -EINVAL; > > +} > > +#endif > > + > > static inline spinlock_t *huge_pte_lock(struct hstate *h, > > struct mm_struct *mm, pte_t *pte) > > { > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 5ae8bc8c928e..a18143add956 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -6840,6 +6840,10 @@ static bool pmd_sharing_possible(struct vm_area_struct *vma) > > #ifdef CONFIG_USERFAULTFD > > if (uffd_disable_huge_pmd_share(vma)) > > return false; > > +#endif > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > > + if (hugetlb_hgm_enabled(vma)) > > + return false; > > #endif > > /* > > * Only shared VMAs can share PMDs. > > @@ -7033,6 +7037,9 @@ static int hugetlb_vma_data_alloc(struct vm_area_struct *vma) > > kref_init(&data->vma_lock.refs); > > init_rwsem(&data->vma_lock.rw_sema); > > data->vma_lock.vma = vma; > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > > + data->hgm_enabled = false; > > +#endif > > vma->vm_private_data = data; > > return 0; > > } > > @@ -7290,6 +7297,68 @@ __weak unsigned long hugetlb_mask_last_page(struct hstate *h) > > > > #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */ > > > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > > +bool hugetlb_hgm_eligible(struct vm_area_struct *vma) > > +{ > > + /* > > + * All shared VMAs may have HGM. > > + * > > + * HGM requires using the VMA lock, which only exists for shared VMAs. > > + * To make HGM work for private VMAs, we would need to use another > > + * scheme to prevent collapsing/splitting from invalidating other > > + * threads' page table walks. > > + */ > > + return vma && (vma->vm_flags & VM_MAYSHARE); > > I am not yet 100% convinced you can/will take care of all possible code > paths where hugetlb_vma_data allocation may fail. If not, then you > should be checking vm_private_data here as well. I think the check here makes sense -- if a VMA is shared, then it is eligible for HGM, but we might fail to enable it because we can't allocate the VMA lock. I'll reword the comment to clearly say this. There is the problem of splitting, though: if we have high-granularity mapped PTEs in a VMA and that VMA gets split, we need to remember that the VMA had HGM enabled even if allocating the VMA lock fails, otherwise things get out of sync. How does PMD sharing handle the splitting case? An easy way HGM could handle this is by disallowing splitting, but I think we can do better. If we fail to allocate the VMA lock, then we can no longer MADV_COLLAPSE safely, but everything else can proceed as normal, and so some "hugetlb_hgm_enabled" checks can be removed/changed. This should make things easier for when we have to handle (some bits of) HGM for private mappings, too. I'll make some improvements here for v1. > > > +} > > +bool hugetlb_hgm_enabled(struct vm_area_struct *vma) > > +{ > > + struct hugetlb_shared_vma_data *data = vma->vm_private_data; > > + > > + if (!vma || !(vma->vm_flags & VM_MAYSHARE)) > > + return false; > > + > > + return data && data->hgm_enabled; > > +} > > + > > +/* > > + * Enable high-granularity mapping (HGM) for this VMA. Once enabled, HGM > > + * cannot be turned off. > > + * > > + * PMDs cannot be shared in HGM VMAs. > > + */ > > +int enable_hugetlb_hgm(struct vm_area_struct *vma) > > +{ > > + int ret; > > + struct hugetlb_shared_vma_data *data; > > + > > + if (!hugetlb_hgm_eligible(vma)) > > + return -EINVAL; > > + > > + if (hugetlb_hgm_enabled(vma)) > > + return 0; > > + > > + /* > > + * We must hold the mmap lock for writing so that callers can rely on > > + * hugetlb_hgm_enabled returning a consistent result while holding > > + * the mmap lock for reading. > > + */ > > + mmap_assert_write_locked(vma->vm_mm); > > + > > + /* HugeTLB HGM requires the VMA lock to synchronize collapsing. */ > > + ret = hugetlb_vma_data_alloc(vma); > > + if (ret) > > + return ret; > > + > > + data = vma->vm_private_data; > > + BUG_ON(!data); > > Would rather have hugetlb_hgm_eligible check for vm_private_data as > suggested above instead of the BUG here. I don't think we'd ever actually BUG() here. Please correct me if I'm wrong, but if we are eligible for HGM, then hugetlb_vma_data_alloc() will only succeed if we actually allocated the VMA data/lock, so vma->vm_private_data should never be NULL (with the BUG_ON to inform the reader). Maybe I should just drop the BUG()? > > -- > Mike Kravetz > > > + data->hgm_enabled = true; > > + > > + /* We don't support PMD sharing with HGM. */ > > + hugetlb_unshare_all_pmds(vma); > > + return 0; > > +} > > +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ > > + > > /* > > * These functions are overwritable if your architecture needs its own > > * behavior. > > -- > > 2.38.0.135.g90850a2211-goog > >
On 12/13/22 10:49, James Houghton wrote: > On Mon, Dec 12, 2022 at 7:14 PM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > > > On 10/21/22 16:36, James Houghton wrote: > > > Currently it is possible for all shared VMAs to use HGM, but it must be > > > enabled first. This is because with HGM, we lose PMD sharing, and page > > > table walks require additional synchronization (we need to take the VMA > > > lock). > > > > Not sure yet, but I expect Peter's series will help with locking for > > hugetlb specific page table walks. > > It should make things a little bit cleaner in this series; I'll rebase > HGM on top of those patches this week (and hopefully get a v1 out > soon). > > I don't think it's possible to implement MADV_COLLAPSE with RCU alone > (as implemented in Peter's series anyway); we still need the VMA lock. As I continue going through the series, I realize that I am not exactly sure what synchronization by the vma lock is required by HGM. As you are aware, it was originally designed to protect against someone doing a pmd_unshare and effectively removing part of the page table. However, since pmd sharing is disabled for vmas with HGM enabled (I think?), then it might be a good idea to explicitly say somewhere the reason for using the lock.
On Thu, Dec 15, 2022 at 12:52 PM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > On 12/13/22 10:49, James Houghton wrote: > > On Mon, Dec 12, 2022 at 7:14 PM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > > > > > On 10/21/22 16:36, James Houghton wrote: > > > > Currently it is possible for all shared VMAs to use HGM, but it must be > > > > enabled first. This is because with HGM, we lose PMD sharing, and page > > > > table walks require additional synchronization (we need to take the VMA > > > > lock). > > > > > > Not sure yet, but I expect Peter's series will help with locking for > > > hugetlb specific page table walks. > > > > It should make things a little bit cleaner in this series; I'll rebase > > HGM on top of those patches this week (and hopefully get a v1 out > > soon). > > > > I don't think it's possible to implement MADV_COLLAPSE with RCU alone > > (as implemented in Peter's series anyway); we still need the VMA lock. > > As I continue going through the series, I realize that I am not exactly > sure what synchronization by the vma lock is required by HGM. As you are > aware, it was originally designed to protect against someone doing a > pmd_unshare and effectively removing part of the page table. However, > since pmd sharing is disabled for vmas with HGM enabled (I think?), then > it might be a good idea to explicitly say somewhere the reason for using > the lock. It synchronizes MADV_COLLAPSE for hugetlb (hugetlb_collapse). MADV_COLLAPSE will take it for writing and free some page table pages, and high-granularity walks will generally take it for reading. I'll make this clear in a comment somewhere and in commit messages. It might be easier if hugetlb_collapse() had the exact same synchronization as huge_pmd_unshare, where we not only take the VMA lock for writing, we also take the i_mmap_rw_sem for writing, so anywhere where hugetlb_walk() is safe, high-granularity walks are also safe. I think I should just do that for the sake of simplicity. - James > -- > Mike Kravetz
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 534958499ac4..6e0c36b08a0c 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -123,6 +123,9 @@ struct hugetlb_vma_lock { struct hugetlb_shared_vma_data { struct hugetlb_vma_lock vma_lock; +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING + bool hgm_enabled; +#endif }; extern struct resv_map *resv_map_alloc(void); @@ -1179,6 +1182,25 @@ static inline void hugetlb_unregister_node(struct node *node) } #endif /* CONFIG_HUGETLB_PAGE */ +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +bool hugetlb_hgm_enabled(struct vm_area_struct *vma); +bool hugetlb_hgm_eligible(struct vm_area_struct *vma); +int enable_hugetlb_hgm(struct vm_area_struct *vma); +#else +static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma) +{ + return false; +} +static inline bool hugetlb_hgm_eligible(struct vm_area_struct *vma) +{ + return false; +} +static inline int enable_hugetlb_hgm(struct vm_area_struct *vma) +{ + return -EINVAL; +} +#endif + static inline spinlock_t *huge_pte_lock(struct hstate *h, struct mm_struct *mm, pte_t *pte) { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 5ae8bc8c928e..a18143add956 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6840,6 +6840,10 @@ static bool pmd_sharing_possible(struct vm_area_struct *vma) #ifdef CONFIG_USERFAULTFD if (uffd_disable_huge_pmd_share(vma)) return false; +#endif +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING + if (hugetlb_hgm_enabled(vma)) + return false; #endif /* * Only shared VMAs can share PMDs. @@ -7033,6 +7037,9 @@ static int hugetlb_vma_data_alloc(struct vm_area_struct *vma) kref_init(&data->vma_lock.refs); init_rwsem(&data->vma_lock.rw_sema); data->vma_lock.vma = vma; +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING + data->hgm_enabled = false; +#endif vma->vm_private_data = data; return 0; } @@ -7290,6 +7297,68 @@ __weak unsigned long hugetlb_mask_last_page(struct hstate *h) #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */ +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +bool hugetlb_hgm_eligible(struct vm_area_struct *vma) +{ + /* + * All shared VMAs may have HGM. + * + * HGM requires using the VMA lock, which only exists for shared VMAs. + * To make HGM work for private VMAs, we would need to use another + * scheme to prevent collapsing/splitting from invalidating other + * threads' page table walks. + */ + return vma && (vma->vm_flags & VM_MAYSHARE); +} +bool hugetlb_hgm_enabled(struct vm_area_struct *vma) +{ + struct hugetlb_shared_vma_data *data = vma->vm_private_data; + + if (!vma || !(vma->vm_flags & VM_MAYSHARE)) + return false; + + return data && data->hgm_enabled; +} + +/* + * Enable high-granularity mapping (HGM) for this VMA. Once enabled, HGM + * cannot be turned off. + * + * PMDs cannot be shared in HGM VMAs. + */ +int enable_hugetlb_hgm(struct vm_area_struct *vma) +{ + int ret; + struct hugetlb_shared_vma_data *data; + + if (!hugetlb_hgm_eligible(vma)) + return -EINVAL; + + if (hugetlb_hgm_enabled(vma)) + return 0; + + /* + * We must hold the mmap lock for writing so that callers can rely on + * hugetlb_hgm_enabled returning a consistent result while holding + * the mmap lock for reading. + */ + mmap_assert_write_locked(vma->vm_mm); + + /* HugeTLB HGM requires the VMA lock to synchronize collapsing. */ + ret = hugetlb_vma_data_alloc(vma); + if (ret) + return ret; + + data = vma->vm_private_data; + BUG_ON(!data); + data->hgm_enabled = true; + + /* We don't support PMD sharing with HGM. */ + hugetlb_unshare_all_pmds(vma); + return 0; +} +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ + /* * These functions are overwritable if your architecture needs its own * behavior.