Message ID | 20221021163703.3218176-6-jthoughton@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp795046wrr; Fri, 21 Oct 2022 09:38:14 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6k97Bm5Ji9gacVfg2WmQFkTVIqtUYxX8/ZrMS60rfqRRBE8fF66p585xk6xoIjSqDd6O1w X-Received: by 2002:a63:145d:0:b0:44b:f115:f90f with SMTP id 29-20020a63145d000000b0044bf115f90fmr17140124pgu.157.1666370294004; Fri, 21 Oct 2022 09:38:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370293; cv=none; d=google.com; s=arc-20160816; b=O+rto/cW+lc/Zl3bU59z7giUSYYe9EwtMjqgCRzMNU+GiH2+O97EfJQE1XQaKleKHp FQpGtSCOa9h6bTTaMAIcLu3CTKXyBileNglLXXbLSf+36RoY/LbZn+Qo7+FCoOI1JDi0 X5NsIgrv20HLhd3Y5JvFMtrecFkBtmHF7ut/FlYd8KEvVlVyHejWXwcw4bsjh2Rgb7EB Kddaa5zx/pY9KvUj0GMTOOSkQu8nHVgRm8PVqU3unWcX1fKsm1uP2q1w9vX73sgnFhkC a7Rjc5UNrt+f9xMBUQYkpFXKqx9Q/dQWDqKJbDNyY/68z6aw6gRtFGuvgN/9SPD6tzfk xFUg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=34cx5eEOWWMEza3+3K1vLVCCrFvx1bz9vEUMrzq2ytU=; b=XT/7uEQ9JYfOtOVtN7RMkIbbVeSJBslKiADwM6pWUxUwfLebHx6A4obe1ysRZhGQMx GtQaW6jGjga8uAKZa1GxCGyrK7eaakSTwHqy42hPr2riV8To8/mXuzaxq0tX/yGMe+Wh NB3MgX8I/poCZs7/rKDnjQMMBCfokVZeRsIynoKC9xPRhCDH5yugfN2O7bRSb7j03ZsK 04P3bxNCh0IvD8VZ8GNgLv/RLN+IqCBWZ3nyiYuEwnX32spF+1YyHMZO+pSznK5/40Nk oyiYXwwbEJAikkp4gvEvPEGnbToV2OPVxOR719VUGD7pg+wfbFBMIHAOrBQjaqqBVhMA KfRg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Sp9ejFIj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l62-20020a638841000000b0044c7a49bef7si24312807pgd.259.2022.10.21.09.38.00; Fri, 21 Oct 2022 09:38:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Sp9ejFIj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230472AbiJUQhn (ORCPT <rfc822;mntrajkot1@gmail.com> + 99 others); Fri, 21 Oct 2022 12:37:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52586 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230385AbiJUQh1 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 21 Oct 2022 12:37:27 -0400 Received: from mail-vs1-xe4a.google.com (mail-vs1-xe4a.google.com [IPv6:2607:f8b0:4864:20::e4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3541327B54B for <linux-kernel@vger.kernel.org>; Fri, 21 Oct 2022 09:37:22 -0700 (PDT) Received: by mail-vs1-xe4a.google.com with SMTP id p184-20020a6742c1000000b003a9dc8a5ec2so1052458vsa.22 for <linux-kernel@vger.kernel.org>; Fri, 21 Oct 2022 09:37:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=34cx5eEOWWMEza3+3K1vLVCCrFvx1bz9vEUMrzq2ytU=; b=Sp9ejFIjwkTD2k7b5O35d/eTr3qFaL4Cu0+Y3Jqen1+jW42rNvItxKaNURZz5TSKk7 x35Bvv5PH68z6pCQYFbhWyoyGgG4rzUpeOPbjmYrzgvXuzuUNcyvX4/Otvfkuiq5+cZv HHQynZVto3uDixtcVQsCf3qWQvklcY23DsStSgC2qGnXAGQ/l0stiqvSXSvt9oZVTDkE Caber1o4FewbpJvcwU1yEnRVdmKdy2XZSnVqN1DFbOLIkZI+hKbcR55fyjn4Og0CJ7M/ x13b8bjYpqhnO2fokhLpSAWC6h6Udyc8I00LhZRGAFNopGb6wpiImE12TuoAvXq6t9Ke 1k4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=34cx5eEOWWMEza3+3K1vLVCCrFvx1bz9vEUMrzq2ytU=; b=FruMi4n4XpGw2FLrALD6TCR0HJQg8ncpUUEszG9NgMVXpXIJeFBc/s/BeH0GaLQf7f YYNWRqTxfO3HItPVw2JzMCGT+OLD7xxzwqwIxZNsVeCabjF1st0w2/Euk6x44B1+7mAF go9VPj1d84krzrP0uKOZQAwFSV7gvKmz59JN/GyUwgflUWPXSdLBICX7rA3aUEPE1d+G eqejsUlkBbtsLN/yhGKTh6DQ9IUZv6qDSqp+vef+Q6okCQEuBCmldqBZQ5+lZsjz7YCn Gvsw+/XzIQlkgH19QVzU8BPQiUefBNEpzVaRj4KaD7OHQ1suDldTdOd3Dj98bTiVPwiD 8oJw== X-Gm-Message-State: ACrzQf30tQQflkDYlIJUKgjW880j3HTx+Nbscp8s1W6OSsJ5TcvhTntu GjxG3Z03DjZ7FvmuHBe7eRrV/zbncmGXjNva X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a67:fa59:0:b0:3a7:7516:b43b with SMTP id j25-20020a67fa59000000b003a77516b43bmr14278433vsq.83.1666370241512; Fri, 21 Oct 2022 09:37:21 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:21 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-6-jthoughton@google.com> Subject: [RFC PATCH v2 05/47] hugetlb: make hugetlb_vma_lock_alloc return its failure reason From: James Houghton <jthoughton@google.com> To: Mike Kravetz <mike.kravetz@oracle.com>, Muchun Song <songmuchun@bytedance.com>, Peter Xu <peterx@redhat.com> Cc: David Hildenbrand <david@redhat.com>, David Rientjes <rientjes@google.com>, Axel Rasmussen <axelrasmussen@google.com>, Mina Almasry <almasrymina@google.com>, "Zach O'Keefe" <zokeefe@google.com>, Manish Mishra <manish.mishra@nutanix.com>, Naoya Horiguchi <naoya.horiguchi@nec.com>, "Dr . David Alan Gilbert" <dgilbert@redhat.com>, "Matthew Wilcox (Oracle)" <willy@infradead.org>, Vlastimil Babka <vbabka@suse.cz>, Baolin Wang <baolin.wang@linux.alibaba.com>, Miaohe Lin <linmiaohe@huawei.com>, Yang Shi <shy828301@gmail.com>, Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton <jthoughton@google.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747315897607722023?= X-GMAIL-MSGID: =?utf-8?q?1747315897607722023?= |
Series |
hugetlb: introduce HugeTLB high-granularity mapping
|
|
Commit Message
James Houghton
Oct. 21, 2022, 4:36 p.m. UTC
Currently hugetlb_vma_lock_alloc doesn't return anything, as there is no
need: if it fails, PMD sharing won't be enabled. However, HGM requires
that the VMA lock exists, so we need to verify that
hugetlb_vma_lock_alloc actually succeeded. If hugetlb_vma_lock_alloc
fails, then we can pass that up to the caller that is attempting to
enable HGM.
Signed-off-by: James Houghton <jthoughton@google.com>
---
mm/hugetlb.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
Comments
On Fri, Oct 21, 2022 at 04:36:21PM +0000, James Houghton wrote: > Currently hugetlb_vma_lock_alloc doesn't return anything, as there is no > need: if it fails, PMD sharing won't be enabled. However, HGM requires > that the VMA lock exists, so we need to verify that > hugetlb_vma_lock_alloc actually succeeded. If hugetlb_vma_lock_alloc > fails, then we can pass that up to the caller that is attempting to > enable HGM. > > Signed-off-by: James Houghton <jthoughton@google.com> > --- > mm/hugetlb.c | 16 +++++++++------- > 1 file changed, 9 insertions(+), 7 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 52cec5b0789e..dc82256b89dd 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -92,7 +92,7 @@ struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp; > /* Forward declaration */ > static int hugetlb_acct_memory(struct hstate *h, long delta); > static void hugetlb_vma_lock_free(struct vm_area_struct *vma); > -static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma); > +static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma); > static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma); > > static inline bool subpool_is_free(struct hugepage_subpool *spool) > @@ -7001,17 +7001,17 @@ static void hugetlb_vma_lock_free(struct vm_area_struct *vma) > } > } > > -static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma) > +static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma) > { > struct hugetlb_vma_lock *vma_lock; > > /* Only establish in (flags) sharable vmas */ > if (!vma || !(vma->vm_flags & VM_MAYSHARE)) > - return; > + return -EINVAL; > > - /* Should never get here with non-NULL vm_private_data */ > + /* We've already allocated the lock. */ > if (vma->vm_private_data) > - return; > + return 0; No objection on the patch itself, but I am just wondering what guarantees thread-safety for this function to not leak vm_private_data when two threads try to allocate at the same time. I think it should be the write mmap lock. I saw that in your latest code base there's: /* * We must hold the mmap lock for writing so that callers can rely on * hugetlb_hgm_enabled returning a consistent result while holding * the mmap lock for reading. */ mmap_assert_write_locked(vma->vm_mm); /* HugeTLB HGM requires the VMA lock to synchronize collapsing. */ ret = hugetlb_vma_data_alloc(vma); if (ret) return ret; So that's covered there. The rest places are hugetlb_vm_op_open() and hugetlb_reserve_pages() and they all seem fine too: hugetlb_vm_op_open() is during mmap(), the latter has vma==NULL so allocation will be skipped. I'm wondering whether it would make sense to move this assert to be inside of hugetlb_vma_data_alloc() after the !vma check, or just add the same assert too but for different reason. > > vma_lock = kmalloc(sizeof(*vma_lock), GFP_KERNEL); > if (!vma_lock) { > @@ -7026,13 +7026,14 @@ static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma) > * allocation failure. > */ > pr_warn_once("HugeTLB: unable to allocate vma specific lock\n"); > - return; > + return -ENOMEM; > } > > kref_init(&vma_lock->refs); > init_rwsem(&vma_lock->rw_sema); > vma_lock->vma = vma; > vma->vm_private_data = vma_lock; > + return 0; > } > > /* > @@ -7160,8 +7161,9 @@ static void hugetlb_vma_lock_free(struct vm_area_struct *vma) > { > } > > -static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma) > +static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma) > { > + return 0; > } > > pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, > -- > 2.38.0.135.g90850a2211-goog > >
On Wed, Nov 16, 2022 at 9:08 AM Peter Xu <peterx@redhat.com> wrote: > > No objection on the patch itself, but I am just wondering what guarantees > thread-safety for this function to not leak vm_private_data when two > threads try to allocate at the same time. > > I think it should be the write mmap lock. I saw that in your latest code > base there's: > > /* > * We must hold the mmap lock for writing so that callers can rely on > * hugetlb_hgm_enabled returning a consistent result while holding > * the mmap lock for reading. > */ > mmap_assert_write_locked(vma->vm_mm); > > /* HugeTLB HGM requires the VMA lock to synchronize collapsing. */ > ret = hugetlb_vma_data_alloc(vma); > if (ret) > return ret; > > So that's covered there. The rest places are hugetlb_vm_op_open() and > hugetlb_reserve_pages() and they all seem fine too: hugetlb_vm_op_open() is > during mmap(), the latter has vma==NULL so allocation will be skipped. > > I'm wondering whether it would make sense to move this assert to be inside > of hugetlb_vma_data_alloc() after the !vma check, or just add the same > assert too but for different reason. I think leaving the assert here and adding a new assert inside hugetlb_vma_data_alloc() makes sense. Thanks Peter. - James > > > > > vma_lock = kmalloc(sizeof(*vma_lock), GFP_KERNEL); > > if (!vma_lock) { > > @@ -7026,13 +7026,14 @@ static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma) > > * allocation failure. > > */ > > pr_warn_once("HugeTLB: unable to allocate vma specific lock\n"); > > - return; > > + return -ENOMEM; > > } > > > > kref_init(&vma_lock->refs); > > init_rwsem(&vma_lock->rw_sema); > > vma_lock->vma = vma; > > vma->vm_private_data = vma_lock; > > + return 0; > > } > > > > /* > > @@ -7160,8 +7161,9 @@ static void hugetlb_vma_lock_free(struct vm_area_struct *vma) > > { > > } > > > > -static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma) > > +static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma) > > { > > + return 0; > > } > > > > pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, > > -- > > 2.38.0.135.g90850a2211-goog > > > > > > -- > Peter Xu >
On Fri, Oct 21, 2022 at 9:37 AM James Houghton <jthoughton@google.com> wrote: > > Currently hugetlb_vma_lock_alloc doesn't return anything, as there is no > need: if it fails, PMD sharing won't be enabled. However, HGM requires > that the VMA lock exists, so we need to verify that > hugetlb_vma_lock_alloc actually succeeded. If hugetlb_vma_lock_alloc > fails, then we can pass that up to the caller that is attempting to > enable HGM. > > Signed-off-by: James Houghton <jthoughton@google.com> > --- > mm/hugetlb.c | 16 +++++++++------- > 1 file changed, 9 insertions(+), 7 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 52cec5b0789e..dc82256b89dd 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -92,7 +92,7 @@ struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp; > /* Forward declaration */ > static int hugetlb_acct_memory(struct hstate *h, long delta); > static void hugetlb_vma_lock_free(struct vm_area_struct *vma); > -static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma); > +static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma); > static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma); > > static inline bool subpool_is_free(struct hugepage_subpool *spool) > @@ -7001,17 +7001,17 @@ static void hugetlb_vma_lock_free(struct vm_area_struct *vma) > } > } > > -static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma) > +static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma) > { > struct hugetlb_vma_lock *vma_lock; > > /* Only establish in (flags) sharable vmas */ > if (!vma || !(vma->vm_flags & VM_MAYSHARE)) > - return; > + return -EINVAL; > > - /* Should never get here with non-NULL vm_private_data */ > + /* We've already allocated the lock. */ > if (vma->vm_private_data) > - return; > + return 0; I would have expected -EEXIST here. Also even if the patch looks generally fine it's hard to provide Acked-by now. I need to look at the call site which is in another patch in the series. If there is an opportunity to squash changes to helpers and their call sites please do. > > vma_lock = kmalloc(sizeof(*vma_lock), GFP_KERNEL); > if (!vma_lock) { > @@ -7026,13 +7026,14 @@ static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma) > * allocation failure. > */ > pr_warn_once("HugeTLB: unable to allocate vma specific lock\n"); > - return; > + return -ENOMEM; > } > > kref_init(&vma_lock->refs); > init_rwsem(&vma_lock->rw_sema); > vma_lock->vma = vma; > vma->vm_private_data = vma_lock; > + return 0; > } > > /* > @@ -7160,8 +7161,9 @@ static void hugetlb_vma_lock_free(struct vm_area_struct *vma) > { > } > > -static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma) > +static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma) > { > + return 0; > } > > pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, > -- > 2.38.0.135.g90850a2211-goog >
On 10/21/22 16:36, James Houghton wrote: > Currently hugetlb_vma_lock_alloc doesn't return anything, as there is no > need: if it fails, PMD sharing won't be enabled. However, HGM requires > that the VMA lock exists, so we need to verify that > hugetlb_vma_lock_alloc actually succeeded. If hugetlb_vma_lock_alloc > fails, then we can pass that up to the caller that is attempting to > enable HGM. No serious objections to this change ... However, there are currently only two places today where hugetlb_vma_lock_alloc is called: hugetlb_reserve_pages and hugetlb_vm_op_open. hugetlb_reserve_pages is not an issue. Since hugetlb_vm_op_open (as a defined vm_operation) returns void, I am not sure how you plan to pass up an allocation failure. Suspect this will become evident in subsequent patches.
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 52cec5b0789e..dc82256b89dd 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -92,7 +92,7 @@ struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp; /* Forward declaration */ static int hugetlb_acct_memory(struct hstate *h, long delta); static void hugetlb_vma_lock_free(struct vm_area_struct *vma); -static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma); +static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma); static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma); static inline bool subpool_is_free(struct hugepage_subpool *spool) @@ -7001,17 +7001,17 @@ static void hugetlb_vma_lock_free(struct vm_area_struct *vma) } } -static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma) +static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma) { struct hugetlb_vma_lock *vma_lock; /* Only establish in (flags) sharable vmas */ if (!vma || !(vma->vm_flags & VM_MAYSHARE)) - return; + return -EINVAL; - /* Should never get here with non-NULL vm_private_data */ + /* We've already allocated the lock. */ if (vma->vm_private_data) - return; + return 0; vma_lock = kmalloc(sizeof(*vma_lock), GFP_KERNEL); if (!vma_lock) { @@ -7026,13 +7026,14 @@ static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma) * allocation failure. */ pr_warn_once("HugeTLB: unable to allocate vma specific lock\n"); - return; + return -ENOMEM; } kref_init(&vma_lock->refs); init_rwsem(&vma_lock->rw_sema); vma_lock->vma = vma; vma->vm_private_data = vma_lock; + return 0; } /* @@ -7160,8 +7161,9 @@ static void hugetlb_vma_lock_free(struct vm_area_struct *vma) { } -static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma) +static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma) { + return 0; } pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,