Message ID | 20221207203034.650899-6-peterx@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp393277wrr; Wed, 7 Dec 2022 12:36:10 -0800 (PST) X-Google-Smtp-Source: AA0mqf5u4VhY7koSfi4ac1JDhttVFfn7+Ad3RM9Yu3wgbqXzl/bWyaOMb1puQzk9533Q/auw/byU X-Received: by 2002:a17:907:a50a:b0:7c0:7902:885f with SMTP id vr10-20020a170907a50a00b007c07902885fmr19142432ejc.233.1670445370586; Wed, 07 Dec 2022 12:36:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670445370; cv=none; d=google.com; s=arc-20160816; b=I/CKCA1rPDtTMTArjF1dzZxhWhbKf5ETdrD/jYoS5o1AWRhD+55eqQcE345K8cGwZy SYXaNIOJRlRBKTqdjxOetkMsEetafmtlKWbVdc0QbOcm41OdpH0LYyji5YsOxZFTpyXj rx0aHfsVYbsW1PwosIbpxn0q57XQgNbg/4YoK77rLWK+BgYdiXpXEPbrYXb708xEAp1m OXyxiyswBDgObTeKP+Kus3aua/BAhDTznk+L8BE8LX5huBRz100Y27vyl6vE+Mreb41A qBMl+htN6n7/+a30NwX+rtgyTkvHOSIPlZMuah/2PYv9UUx1TgIA0Uj+7TxsVEUXnUAx K8IQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=v+Krxeq6Jc+PcaZJFIdSBIisnuJapI9S7rvolPGgBf8=; b=Bnh+OpUR/70Hl+MVyxYyG1Jc/DD4v5W7UqfVv00q83UPj6vRC8yRwkaSTEE8GVXbnn sv44BmbUeVOMTZDnTOo1gLgrBjKg5PQ/TnIbDS4hCo7nANvvNXDAADovQmuAFdmxeh8/ aGmKb06/XwZW3wHh0mLwHFjuzBWpv8fqdLehXVg09nqdjEEUH/bML1WkmnvbwfxRyA8R u37TE8ntIh6g1QIR+bnhZggQAYE2Qob4+6Vt/sNQhVwrUzaaIf1lCL6/CP5yS2u39lek k8W+/+ehxQESjeJE5yI+3PR6Oc4lVlp81B5GQFCO7AUDynVR65X10EyxGJY/BTgpgyV9 vkZQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PTmas97J; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id dz9-20020a0564021d4900b00468fdd316e6si4780792edb.542.2022.12.07.12.35.47; Wed, 07 Dec 2022 12:36:10 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PTmas97J; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229964AbiLGUcQ (ORCPT <rfc822;foxyelen666@gmail.com> + 99 others); Wed, 7 Dec 2022 15:32:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49318 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229905AbiLGUbz (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 7 Dec 2022 15:31:55 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3CD0F7BC1E for <linux-kernel@vger.kernel.org>; Wed, 7 Dec 2022 12:30:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670445055; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=v+Krxeq6Jc+PcaZJFIdSBIisnuJapI9S7rvolPGgBf8=; b=PTmas97Jb93qnuAog8k00coQ9bjf3eaC8ajXy2xigRSNgTiAb1RCoAxGG9h+XEXeAZSHmu ilQplszTVbbfQzO2cKGPcReX/6bUk3MlVMU0K+PUT2U23NCvXdY16Rk1Zbd5dkUHBibpKb VEexxEwkrsfWEER+1wP4KC1ExugASDo= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-482-2GMUU1jMPreIwdjOd4-7ng-1; Wed, 07 Dec 2022 15:30:46 -0500 X-MC-Unique: 2GMUU1jMPreIwdjOd4-7ng-1 Received: by mail-qt1-f199.google.com with SMTP id w27-20020a05622a191b00b003a56c0e1cd0so38190777qtc.4 for <linux-kernel@vger.kernel.org>; Wed, 07 Dec 2022 12:30:46 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=v+Krxeq6Jc+PcaZJFIdSBIisnuJapI9S7rvolPGgBf8=; b=vUCvmUX2JnCXTdddOHtMiVBaPxSpRSwZCiAtgeOt1oLrBoxZp9Jpf3ozunGONn5uY4 M+0M0XYGRabNT2PyHuh0w227QBB+A72HokKOwmnNIf8+XiyXNqggwV1OJwyT1InZLA9r 3E4FdTTgTiyc64/fDinb9gQJLAOCHdtyLou4yYa8hz8djwg7263ZOWxm2YNRS9bsTOab qtv5tY71/6wLapolkJkfQc1AgpvqESCt4+ixQHGnQXwzyDo+Iu7ilObnRwpMW3MMgVTm AdY8hIbuluXitIUesiPBl/0qrco8hiQrvewPtTWGKmhhtuWPkPmVt+jJt2yxVpUe8XNe JaHQ== X-Gm-Message-State: ANoB5pmAFQBZIjx2MIsg323IzVIm3TgSMXxuVbY8Obtd8fwEgDcHB74J k0WstnuzRi/9aKDNAVPYulg1t+46duvlLzzS3+xyvZUSjKweAHBAZ5SwrX34cBTpckNM5gVSMNX QySFFSrRZ/qeNzvqPA4/tDdvZ X-Received: by 2002:ad4:4049:0:b0:4c6:e720:39ff with SMTP id r9-20020ad44049000000b004c6e72039ffmr1560027qvp.28.1670445046159; Wed, 07 Dec 2022 12:30:46 -0800 (PST) X-Received: by 2002:ad4:4049:0:b0:4c6:e720:39ff with SMTP id r9-20020ad44049000000b004c6e72039ffmr1560015qvp.28.1670445045854; Wed, 07 Dec 2022 12:30:45 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id dc53-20020a05620a523500b006fefa5f7fcesm855594qkb.10.2022.12.07.12.30.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Dec 2022 12:30:45 -0800 (PST) From: Peter Xu <peterx@redhat.com> To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song <songmuchun@bytedance.com>, John Hubbard <jhubbard@nvidia.com>, Andrea Arcangeli <aarcange@redhat.com>, James Houghton <jthoughton@google.com>, Jann Horn <jannh@google.com>, Rik van Riel <riel@surriel.com>, Miaohe Lin <linmiaohe@huawei.com>, Andrew Morton <akpm@linux-foundation.org>, Mike Kravetz <mike.kravetz@oracle.com>, peterx@redhat.com, David Hildenbrand <david@redhat.com>, Nadav Amit <nadav.amit@gmail.com> Subject: [PATCH v2 05/10] mm/hugetlb: Make userfaultfd_huge_must_wait() safe to pmd unshare Date: Wed, 7 Dec 2022 15:30:29 -0500 Message-Id: <20221207203034.650899-6-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221207203034.650899-1-peterx@redhat.com> References: <20221207203034.650899-1-peterx@redhat.com> MIME-Version: 1.0 Content-type: text/plain Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751588925173329629?= X-GMAIL-MSGID: =?utf-8?q?1751588925173329629?= |
Series |
[v2,01/10] mm/hugetlb: Let vma_offset_start() to return start
|
|
Commit Message
Peter Xu
Dec. 7, 2022, 8:30 p.m. UTC
We can take the hugetlb walker lock, here taking vma lock directly. Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Peter Xu <peterx@redhat.com> --- fs/userfaultfd.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
Comments
On 12/7/22 12:30, Peter Xu wrote: > We can take the hugetlb walker lock, here taking vma lock directly. > > Reviewed-by: David Hildenbrand <david@redhat.com> > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > fs/userfaultfd.c | 18 ++++++++++++++---- > 1 file changed, 14 insertions(+), 4 deletions(-) > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > index 07c81ab3fd4d..a602f008dde5 100644 > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c > @@ -376,7 +376,8 @@ static inline unsigned int userfaultfd_get_blocking_state(unsigned int flags) > */ > vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) > { > - struct mm_struct *mm = vmf->vma->vm_mm; > + struct vm_area_struct *vma = vmf->vma; > + struct mm_struct *mm = vma->vm_mm; > struct userfaultfd_ctx *ctx; > struct userfaultfd_wait_queue uwq; > vm_fault_t ret = VM_FAULT_SIGBUS; > @@ -403,7 +404,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) > */ > mmap_assert_locked(mm); > > - ctx = vmf->vma->vm_userfaultfd_ctx.ctx; > + ctx = vma->vm_userfaultfd_ctx.ctx; > if (!ctx) > goto out; > > @@ -493,6 +494,13 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) > > blocking_state = userfaultfd_get_blocking_state(vmf->flags); > > + /* > + * This stablizes pgtable for hugetlb on e.g. pmd unsharing. Need > + * to be before setting current state. > + */ Looking at this code, I am not able to come up with a reason for why the vma lock/unlock placement is exactly where it is. It looks quite arbitrary. Why not, for example, take and drop the vma lock within userfaultfd_huge_must_wait()? That makes more sense to me, but I'm not familiar with userfaultfd so of course I'm missing something. But the comment above certainly doesn't supply that something. thanks,
On Wed, Dec 07, 2022 at 03:19:55PM -0800, John Hubbard wrote: > On 12/7/22 12:30, Peter Xu wrote: > > We can take the hugetlb walker lock, here taking vma lock directly. > > > > Reviewed-by: David Hildenbrand <david@redhat.com> > > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > fs/userfaultfd.c | 18 ++++++++++++++---- > > 1 file changed, 14 insertions(+), 4 deletions(-) > > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > > index 07c81ab3fd4d..a602f008dde5 100644 > > --- a/fs/userfaultfd.c > > +++ b/fs/userfaultfd.c > > @@ -376,7 +376,8 @@ static inline unsigned int userfaultfd_get_blocking_state(unsigned int flags) > > */ > > vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) > > { > > - struct mm_struct *mm = vmf->vma->vm_mm; > > + struct vm_area_struct *vma = vmf->vma; > > + struct mm_struct *mm = vma->vm_mm; > > struct userfaultfd_ctx *ctx; > > struct userfaultfd_wait_queue uwq; > > vm_fault_t ret = VM_FAULT_SIGBUS; > > @@ -403,7 +404,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) > > */ > > mmap_assert_locked(mm); > > - ctx = vmf->vma->vm_userfaultfd_ctx.ctx; > > + ctx = vma->vm_userfaultfd_ctx.ctx; > > if (!ctx) > > goto out; > > @@ -493,6 +494,13 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) > > blocking_state = userfaultfd_get_blocking_state(vmf->flags); > > + /* > > + * This stablizes pgtable for hugetlb on e.g. pmd unsharing. Need > > + * to be before setting current state. > > + */ > > Looking at this code, I am not able to come up with a reason for why the > vma lock/unlock placement is exactly where it is. It looks quite arbitrary. > > Why not, for example, take and drop the vma lock within > userfaultfd_huge_must_wait()? That makes more sense to me, but I'm not familiar > with userfaultfd so of course I'm missing something. > > But the comment above certainly doesn't supply that something. The part that matters in the comment is "need to be before setting current state". blocking_state = userfaultfd_get_blocking_state(vmf->flags); if (is_vm_hugetlb_page(vma)) hugetlb_vma_lock_read(vma); set_current_state(blocking_state); down_read() can sleep and also modify the task state, we cannot take the lock after that point because otherwise the task state will be messed up.
On 12/7/22 15:44, Peter Xu wrote: > The part that matters in the comment is "need to be before setting current > state". > > blocking_state = userfaultfd_get_blocking_state(vmf->flags); > if (is_vm_hugetlb_page(vma)) > hugetlb_vma_lock_read(vma); > set_current_state(blocking_state); > > down_read() can sleep and also modify the task state, we cannot take the > lock after that point because otherwise the task state will be messed up. > OK, how about: /* * Take the vma lock now, in order to safely call * userfaultfd_huge_must_wait() a little bit later. Because acquiring * the (sleepable) vma lock potentially modifies the current task state, * that must be before explicitly calling set_current_state(). */ Other than that, the patch looks good, so: Reviewed-by: John Hubbard <jhubbard@nvidia.com> thanks,
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 07c81ab3fd4d..a602f008dde5 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -376,7 +376,8 @@ static inline unsigned int userfaultfd_get_blocking_state(unsigned int flags) */ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) { - struct mm_struct *mm = vmf->vma->vm_mm; + struct vm_area_struct *vma = vmf->vma; + struct mm_struct *mm = vma->vm_mm; struct userfaultfd_ctx *ctx; struct userfaultfd_wait_queue uwq; vm_fault_t ret = VM_FAULT_SIGBUS; @@ -403,7 +404,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) */ mmap_assert_locked(mm); - ctx = vmf->vma->vm_userfaultfd_ctx.ctx; + ctx = vma->vm_userfaultfd_ctx.ctx; if (!ctx) goto out; @@ -493,6 +494,13 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) blocking_state = userfaultfd_get_blocking_state(vmf->flags); + /* + * This stablizes pgtable for hugetlb on e.g. pmd unsharing. Need + * to be before setting current state. + */ + if (is_vm_hugetlb_page(vma)) + hugetlb_vma_lock_read(vma); + spin_lock_irq(&ctx->fault_pending_wqh.lock); /* * After the __add_wait_queue the uwq is visible to userland @@ -507,13 +515,15 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) set_current_state(blocking_state); spin_unlock_irq(&ctx->fault_pending_wqh.lock); - if (!is_vm_hugetlb_page(vmf->vma)) + if (!is_vm_hugetlb_page(vma)) must_wait = userfaultfd_must_wait(ctx, vmf->address, vmf->flags, reason); else - must_wait = userfaultfd_huge_must_wait(ctx, vmf->vma, + must_wait = userfaultfd_huge_must_wait(ctx, vma, vmf->address, vmf->flags, reason); + if (is_vm_hugetlb_page(vma)) + hugetlb_vma_unlock_read(vma); mmap_read_unlock(mm); if (likely(must_wait && !READ_ONCE(ctx->released))) {