Message ID | b9da41bb-b7b6-2fc6-caac-b01b6719334@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1226363vqo; Sun, 21 May 2023 22:46:59 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5/QodUZYdB4r2/qakeuSlJdRDCAd7H+nWJpEWGGYqrDb0Vuf+clmdGler/izubZa/PgCfr X-Received: by 2002:a17:902:e811:b0:1af:981b:eeff with SMTP id u17-20020a170902e81100b001af981beeffmr7252595plg.64.1684734419595; Sun, 21 May 2023 22:46:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684734419; cv=none; d=google.com; s=arc-20160816; b=fSJCVQ6zATcHnPi5uHj5BdAhkVAizk354L6saufID/Nvh1hIeaaWXpE0ZJzZ9WKkBb 9w7rURUl6ak0n2picYiG4wbE+BXkZN9+9Bh/ZtAFaDDedYTgzvqJqfhRK6wcQJLXBg+y WG5Fo5MOKSjFvbkPkcWoZAFMX/qlnsitNTXOIMqtSpYiAulnfswBrYULuuvZeAcRgm3S PBTgPGqIMzldRBWJs1g9Q3wBbiktf/w/S14EqHTDWDYLCCuhjLs8j5/YflGR1Pk7brTP sQ9A5tMFfR8Z21W6MqQq+q51zPp701zd2JZOjMzW5r/ePCgO14ruDP9mwxmR19Lq9+RB ePBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:message-id:in-reply-to :subject:cc:to:from:date:dkim-signature; bh=bf4ovlbip9o2eAgzZYzeIq4fNWAxMs47VtohjgZo/Hc=; b=vTLEVyNnMA/ERiLHfr4ClEZ3Lgw8cWZYHxE77wa4x8oOHcSAehmD/aR+1xguk3owdH wQke03vkNg7h0mgf++1ZG9SZE9I5MLwTT+ZKMmfd8MKlA+TMB43WyiTbF3h19bWAOhTj 8GPLV+Nzzc49LvL6NvXXBwNJJIwj/OCO1Erky806bmF+fqHDN49ay9qlK081glkugiIO 7TBGmbS5BHh4ZZT8gEmKmO2OxojFlaZoeX7cKgGf2OJ7hO00YSUpxf9GZ8N08i9Y+XQb dnWE8fNyao0uGza1uXfyIDZ7ooWfOGOrC47ZAn+IWUKzDHhMYKJ6zIszSsUxL+d9Gmte N2lg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=xxliRdiY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w7-20020a170902d70700b001a9265e6fdasi4021043ply.268.2023.05.21.22.46.45; Sun, 21 May 2023 22:46:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=xxliRdiY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229649AbjEVFWX (ORCPT <rfc822;cscallsign@gmail.com> + 99 others); Mon, 22 May 2023 01:22:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58006 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231958AbjEVFWS (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 22 May 2023 01:22:18 -0400 Received: from mail-yb1-xb2f.google.com (mail-yb1-xb2f.google.com [IPv6:2607:f8b0:4864:20::b2f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A8347CA for <linux-kernel@vger.kernel.org>; Sun, 21 May 2023 22:22:17 -0700 (PDT) Received: by mail-yb1-xb2f.google.com with SMTP id 3f1490d57ef6-ba829f93da3so8148830276.1 for <linux-kernel@vger.kernel.org>; Sun, 21 May 2023 22:22:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684732937; x=1687324937; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=bf4ovlbip9o2eAgzZYzeIq4fNWAxMs47VtohjgZo/Hc=; b=xxliRdiYq491ddtLAOp7IvlP6Tgs09+WixlM6YG4PIP9hmkihY34Mp7pqS8UXaXWhi RqkBsAewEsCIX9b0ejcCa5E2jV6uN+a3W57TZ3FlG9ZI/Xxbjh+C1MSqcwhmQFBu25PN rRzcJXLhWTtkRDhu7m269tBf4pXwAXhsMjAzMdb8HFMIiX2bUkp/5SZfLmQ8JxH4T0vk wKyQPeorZ9iGKOGhWx6oCTIHOln7Bg8pdmjG1PPm7ouSR3zthhZvslly93YlThiYWoSt Uq6if1E3nt9EtldOeJnlIf2uspBMjaO0EQ7qGUle/0SCGanLsme+EJIaES6VTaJRp/ZR yfYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684732937; x=1687324937; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=bf4ovlbip9o2eAgzZYzeIq4fNWAxMs47VtohjgZo/Hc=; b=XbgnM2eD0eOUhKeTJb/YcZu+b7myoXKFCdSuRo1xLr1rJRyPucBV/tNQjRpafiVBqE AmBYdR55lX4wMxlp7YIdKrrzAZq8LYovhWLkuc2uLHqia0ElNsYeR8hPQetQuLRUqO2p LduEJpg/DpFF6/sVHIv3T6NN8yXoC/Fdp+qUf4Neyi7OOKq1oJgq0KqgaAYQEdVOsSgF etd7qqvPuP6Z3kdQ58JI+swGrggIlT49dIDujJ4eZTiGImaQ3jzQeGljh2cwP28ZOZ7A KUe7fmsHN99LPsWnzNDyJJ0kN5Y3HlyG6Hdb8Iy3LJww9NiAiYPb8OYY27Pz+WjenG7g y/aQ== X-Gm-Message-State: AC+VfDweAb3yJIMOD7a0CuEeZnlsgaUn8NHR/FQ9ee0swtUxdxKy6pjx gz/0FtdgNe0wfIUTrk6y7JfByw== X-Received: by 2002:a25:f812:0:b0:ba8:3590:4302 with SMTP id u18-20020a25f812000000b00ba835904302mr10377914ybd.36.1684732936746; Sun, 21 May 2023 22:22:16 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id x7-20020a259a07000000b00b8f6ec5a955sm1266497ybn.49.2023.05.21.22.22.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 22:22:16 -0700 (PDT) Date: Sun, 21 May 2023 22:22:13 -0700 (PDT) From: Hugh Dickins <hughd@google.com> X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton <akpm@linux-foundation.org> cc: Mike Kravetz <mike.kravetz@oracle.com>, Mike Rapoport <rppt@kernel.org>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Matthew Wilcox <willy@infradead.org>, David Hildenbrand <david@redhat.com>, Suren Baghdasaryan <surenb@google.com>, Qi Zheng <zhengqi.arch@bytedance.com>, Yang Shi <shy828301@gmail.com>, Mel Gorman <mgorman@techsingularity.net>, Peter Xu <peterx@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>, Alistair Popple <apopple@nvidia.com>, Ralph Campbell <rcampbell@nvidia.com>, Ira Weiny <ira.weiny@intel.com>, Steven Price <steven.price@arm.com>, SeongJae Park <sj@kernel.org>, Naoya Horiguchi <naoya.horiguchi@nec.com>, Christophe Leroy <christophe.leroy@csgroup.eu>, Zack Rusin <zackr@vmware.com>, Jason Gunthorpe <jgg@ziepe.ca>, Axel Rasmussen <axelrasmussen@google.com>, Anshuman Khandual <anshuman.khandual@arm.com>, Pasha Tatashin <pasha.tatashin@soleen.com>, Miaohe Lin <linmiaohe@huawei.com>, Minchan Kim <minchan@kernel.org>, Christoph Hellwig <hch@infradead.org>, Song Liu <song@kernel.org>, Thomas Hellstrom <thomas.hellstrom@linux.intel.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 25/31] mm/gup: remove FOLL_SPLIT_PMD use of pmd_trans_unstable() In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: <b9da41bb-b7b6-2fc6-caac-b01b6719334@google.com> References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766572078666574250?= X-GMAIL-MSGID: =?utf-8?q?1766572078666574250?= |
Series |
mm: allow pte_offset_map[_lock]() to fail
|
|
Commit Message
Hugh Dickins
May 22, 2023, 5:22 a.m. UTC
There is now no reason for follow_pmd_mask()'s FOLL_SPLIT_PMD block to
distinguish huge_zero_page from a normal THP: follow_page_pte() handles
any instability, and here it's a good idea to replace any pmd_none(*pmd)
by a page table a.s.a.p, in the huge_zero_page case as for a normal THP.
(Hmm, couldn't the normal THP case have hit an unstably refaulted THP
before? But there are only two, exceptional, users of FOLL_SPLIT_PMD.)
Signed-off-by: Hugh Dickins <hughd@google.com>
---
mm/gup.c | 19 ++++---------------
1 file changed, 4 insertions(+), 15 deletions(-)
Comments
On Sun, May 21, 2023 at 10:22 PM Hugh Dickins <hughd@google.com> wrote: > > There is now no reason for follow_pmd_mask()'s FOLL_SPLIT_PMD block to > distinguish huge_zero_page from a normal THP: follow_page_pte() handles > any instability, and here it's a good idea to replace any pmd_none(*pmd) > by a page table a.s.a.p, in the huge_zero_page case as for a normal THP. > (Hmm, couldn't the normal THP case have hit an unstably refaulted THP > before? But there are only two, exceptional, users of FOLL_SPLIT_PMD.) > > Signed-off-by: Hugh Dickins <hughd@google.com> > --- > mm/gup.c | 19 ++++--------------- > 1 file changed, 4 insertions(+), 15 deletions(-) > > diff --git a/mm/gup.c b/mm/gup.c > index bb67193c5460..4ad50a59897f 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -681,21 +681,10 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, > return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); > } > if (flags & FOLL_SPLIT_PMD) { > - int ret; > - page = pmd_page(*pmd); > - if (is_huge_zero_page(page)) { > - spin_unlock(ptl); > - ret = 0; > - split_huge_pmd(vma, pmd, address); > - if (pmd_trans_unstable(pmd)) > - ret = -EBUSY; IIUC the pmd_trans_unstable() check was transferred to the implicit pmd_none() in pte_alloc(). But it will return -ENOMEM instead of -EBUSY. Won't it break some userspace? Or the pmd_trans_unstable() is never true? If so it seems worth mentioning in the commit log about this return value change. > - } else { > - spin_unlock(ptl); > - split_huge_pmd(vma, pmd, address); > - ret = pte_alloc(mm, pmd) ? -ENOMEM : 0; > - } > - > - return ret ? ERR_PTR(ret) : > + spin_unlock(ptl); > + split_huge_pmd(vma, pmd, address); > + /* If pmd was left empty, stuff a page table in there quickly */ > + return pte_alloc(mm, pmd) ? ERR_PTR(-ENOMEM) : > follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); > } > page = follow_trans_huge_pmd(vma, address, pmd, flags); > -- > 2.35.3 >
On Mon, May 22, 2023 at 7:26 PM Yang Shi <shy828301@gmail.com> wrote: > > On Sun, May 21, 2023 at 10:22 PM Hugh Dickins <hughd@google.com> wrote: > > > > There is now no reason for follow_pmd_mask()'s FOLL_SPLIT_PMD block to > > distinguish huge_zero_page from a normal THP: follow_page_pte() handles > > any instability, and here it's a good idea to replace any pmd_none(*pmd) > > by a page table a.s.a.p, in the huge_zero_page case as for a normal THP. > > (Hmm, couldn't the normal THP case have hit an unstably refaulted THP > > before? But there are only two, exceptional, users of FOLL_SPLIT_PMD.) > > > > Signed-off-by: Hugh Dickins <hughd@google.com> > > --- > > mm/gup.c | 19 ++++--------------- > > 1 file changed, 4 insertions(+), 15 deletions(-) > > > > diff --git a/mm/gup.c b/mm/gup.c > > index bb67193c5460..4ad50a59897f 100644 > > --- a/mm/gup.c > > +++ b/mm/gup.c > > @@ -681,21 +681,10 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, > > return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); > > } > > if (flags & FOLL_SPLIT_PMD) { > > - int ret; > > - page = pmd_page(*pmd); > > - if (is_huge_zero_page(page)) { > > - spin_unlock(ptl); > > - ret = 0; > > - split_huge_pmd(vma, pmd, address); > > - if (pmd_trans_unstable(pmd)) > > - ret = -EBUSY; > > IIUC the pmd_trans_unstable() check was transferred to the implicit > pmd_none() in pte_alloc(). But it will return -ENOMEM instead of > -EBUSY. Won't it break some userspace? Or the pmd_trans_unstable() is > never true? If so it seems worth mentioning in the commit log about > this return value change. Oops, the above comment is not accurate. It will call follow_page_pte() instead of returning -EBUSY if pmd is none. For other unstable cases, it will return -ENOMEM instead of -EBUSY. > > > - } else { > > - spin_unlock(ptl); > > - split_huge_pmd(vma, pmd, address); > > - ret = pte_alloc(mm, pmd) ? -ENOMEM : 0; > > - } > > - > > - return ret ? ERR_PTR(ret) : > > + spin_unlock(ptl); > > + split_huge_pmd(vma, pmd, address); > > + /* If pmd was left empty, stuff a page table in there quickly */ > > + return pte_alloc(mm, pmd) ? ERR_PTR(-ENOMEM) : > > follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); > > } > > page = follow_trans_huge_pmd(vma, address, pmd, flags); > > -- > > 2.35.3 > >
On Mon, 22 May 2023, Yang Shi wrote: > On Mon, May 22, 2023 at 7:26 PM Yang Shi <shy828301@gmail.com> wrote: > > On Sun, May 21, 2023 at 10:22 PM Hugh Dickins <hughd@google.com> wrote: > > > > > > There is now no reason for follow_pmd_mask()'s FOLL_SPLIT_PMD block to > > > distinguish huge_zero_page from a normal THP: follow_page_pte() handles > > > any instability, and here it's a good idea to replace any pmd_none(*pmd) > > > by a page table a.s.a.p, in the huge_zero_page case as for a normal THP. > > > (Hmm, couldn't the normal THP case have hit an unstably refaulted THP > > > before? But there are only two, exceptional, users of FOLL_SPLIT_PMD.) > > > > > > Signed-off-by: Hugh Dickins <hughd@google.com> > > > --- > > > mm/gup.c | 19 ++++--------------- > > > 1 file changed, 4 insertions(+), 15 deletions(-) > > > > > > diff --git a/mm/gup.c b/mm/gup.c > > > index bb67193c5460..4ad50a59897f 100644 > > > --- a/mm/gup.c > > > +++ b/mm/gup.c > > > @@ -681,21 +681,10 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, > > > return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); > > > } > > > if (flags & FOLL_SPLIT_PMD) { > > > - int ret; > > > - page = pmd_page(*pmd); > > > - if (is_huge_zero_page(page)) { > > > - spin_unlock(ptl); > > > - ret = 0; > > > - split_huge_pmd(vma, pmd, address); > > > - if (pmd_trans_unstable(pmd)) > > > - ret = -EBUSY; > > > > IIUC the pmd_trans_unstable() check was transferred to the implicit > > pmd_none() in pte_alloc(). But it will return -ENOMEM instead of > > -EBUSY. Won't it break some userspace? Or the pmd_trans_unstable() is > > never true? If so it seems worth mentioning in the commit log about > > this return value change. Thanks a lot for looking at these, but I disagree here. > > Oops, the above comment is not accurate. It will call > follow_page_pte() instead of returning -EBUSY if pmd is none. Yes. Ignoring secondary races, if pmd is none, pte_alloc() will allocate an empty page table there, follow_page_pte() find !pte_present and return NULL; or if pmd is not none, follow_page_pte() will return no_page_table() i.e. NULL. And page NULL ends up with __get_user_pages() having another go round, instead of failing with -EBUSY. Which I'd say is better handling for such a transient case - remember, it's split_huge_pmd() (which should always succeed, but might be raced) in use there, not split_huge_page() (which might take years for pins to be removed before it can succeed). > For other unstable cases, it will return -ENOMEM instead of -EBUSY. I don't think so: the possibly-failing __pte_alloc() only gets called in the pmd_none() case. Hugh > > > > > > - } else { > > > - spin_unlock(ptl); > > > - split_huge_pmd(vma, pmd, address); > > > - ret = pte_alloc(mm, pmd) ? -ENOMEM : 0; > > > - } > > > - > > > - return ret ? ERR_PTR(ret) : > > > + spin_unlock(ptl); > > > + split_huge_pmd(vma, pmd, address); > > > + /* If pmd was left empty, stuff a page table in there quickly */ > > > + return pte_alloc(mm, pmd) ? ERR_PTR(-ENOMEM) : > > > follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); > > > } > > > page = follow_trans_huge_pmd(vma, address, pmd, flags); > > > -- > > > 2.35.3
On Tue, May 23, 2023 at 9:26 PM Hugh Dickins <hughd@google.com> wrote: > > On Mon, 22 May 2023, Yang Shi wrote: > > On Mon, May 22, 2023 at 7:26 PM Yang Shi <shy828301@gmail.com> wrote: > > > On Sun, May 21, 2023 at 10:22 PM Hugh Dickins <hughd@google.com> wrote: > > > > > > > > There is now no reason for follow_pmd_mask()'s FOLL_SPLIT_PMD block to > > > > distinguish huge_zero_page from a normal THP: follow_page_pte() handles > > > > any instability, and here it's a good idea to replace any pmd_none(*pmd) > > > > by a page table a.s.a.p, in the huge_zero_page case as for a normal THP. > > > > (Hmm, couldn't the normal THP case have hit an unstably refaulted THP > > > > before? But there are only two, exceptional, users of FOLL_SPLIT_PMD.) > > > > > > > > Signed-off-by: Hugh Dickins <hughd@google.com> > > > > --- > > > > mm/gup.c | 19 ++++--------------- > > > > 1 file changed, 4 insertions(+), 15 deletions(-) > > > > > > > > diff --git a/mm/gup.c b/mm/gup.c > > > > index bb67193c5460..4ad50a59897f 100644 > > > > --- a/mm/gup.c > > > > +++ b/mm/gup.c > > > > @@ -681,21 +681,10 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, > > > > return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); > > > > } > > > > if (flags & FOLL_SPLIT_PMD) { > > > > - int ret; > > > > - page = pmd_page(*pmd); > > > > - if (is_huge_zero_page(page)) { > > > > - spin_unlock(ptl); > > > > - ret = 0; > > > > - split_huge_pmd(vma, pmd, address); > > > > - if (pmd_trans_unstable(pmd)) > > > > - ret = -EBUSY; > > > > > > IIUC the pmd_trans_unstable() check was transferred to the implicit > > > pmd_none() in pte_alloc(). But it will return -ENOMEM instead of > > > -EBUSY. Won't it break some userspace? Or the pmd_trans_unstable() is > > > never true? If so it seems worth mentioning in the commit log about > > > this return value change. > > Thanks a lot for looking at these, but I disagree here. > > > > > Oops, the above comment is not accurate. It will call > > follow_page_pte() instead of returning -EBUSY if pmd is none. > > Yes. Ignoring secondary races, if pmd is none, pte_alloc() will allocate > an empty page table there, follow_page_pte() find !pte_present and return > NULL; or if pmd is not none, follow_page_pte() will return no_page_table() > i.e. NULL. And page NULL ends up with __get_user_pages() having another > go round, instead of failing with -EBUSY. > > Which I'd say is better handling for such a transient case - remember, > it's split_huge_pmd() (which should always succeed, but might be raced) > in use there, not split_huge_page() (which might take years for pins to > be removed before it can succeed). It sounds like an improvement. > > > For other unstable cases, it will return -ENOMEM instead of -EBUSY. > > I don't think so: the possibly-failing __pte_alloc() only gets called > in the pmd_none() case. I mean what if pmd is not none for huge zero page. If it is not pmd_none pte_alloc() just returns 0, then returns -ENOMEM instead of -EBUSY. Or it is impossible that pmd end up being pmd_huge_trans or !pmd_present? It should be very unlikely, for example, migration does skip huge zero page, but I'm not sure whether there is any corner case that I missed. > > Hugh > > > > > > > > > > - } else { > > > > - spin_unlock(ptl); > > > > - split_huge_pmd(vma, pmd, address); > > > > - ret = pte_alloc(mm, pmd) ? -ENOMEM : 0; > > > > - } > > > > - > > > > - return ret ? ERR_PTR(ret) : > > > > + spin_unlock(ptl); > > > > + split_huge_pmd(vma, pmd, address); > > > > + /* If pmd was left empty, stuff a page table in there quickly */ > > > > + return pte_alloc(mm, pmd) ? ERR_PTR(-ENOMEM) : > > > > follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); > > > > } > > > > page = follow_trans_huge_pmd(vma, address, pmd, flags); > > > > -- > > > > 2.35.3
On Wed, 24 May 2023, Yang Shi wrote: > On Tue, May 23, 2023 at 9:26 PM Hugh Dickins <hughd@google.com> wrote: > > On Mon, 22 May 2023, Yang Shi wrote: > > > > > For other unstable cases, it will return -ENOMEM instead of -EBUSY. > > > > I don't think so: the possibly-failing __pte_alloc() only gets called > > in the pmd_none() case. > > I mean what if pmd is not none for huge zero page. If it is not > pmd_none pte_alloc() just returns 0, Yes, I agree with you on that. > then returns -ENOMEM instead of -EBUSY. But disagree with you on that. return pte_alloc(mm, pmd) ? ERR_PTR(-ENOMEM) : follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); Doesn't that say that if pte_alloc() returns 0, then follow_page_mask() will call follow_page_pte() and return whatever that returns? > Or it is impossible that pmd end up being pmd_huge_trans or > !pmd_present? It should be very unlikely, for example, migration does > skip huge zero page, but I'm not sure whether there is any corner case > that I missed. I'm assuming both are possible there (but not asserting that they are). Hugh
On Thu, May 25, 2023 at 2:16 PM Hugh Dickins <hughd@google.com> wrote: > > On Wed, 24 May 2023, Yang Shi wrote: > > On Tue, May 23, 2023 at 9:26 PM Hugh Dickins <hughd@google.com> wrote: > > > On Mon, 22 May 2023, Yang Shi wrote: > > > > > > > For other unstable cases, it will return -ENOMEM instead of -EBUSY. > > > > > > I don't think so: the possibly-failing __pte_alloc() only gets called > > > in the pmd_none() case. > > > > I mean what if pmd is not none for huge zero page. If it is not > > pmd_none pte_alloc() just returns 0, > > Yes, I agree with you on that. > > > then returns -ENOMEM instead of -EBUSY. > > But disagree with you on that. > > return pte_alloc(mm, pmd) ? ERR_PTR(-ENOMEM) : > follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); > > Doesn't that say that if pte_alloc() returns 0, then follow_page_mask() > will call follow_page_pte() and return whatever that returns? Err... you are right. I misread the code. Anyway it returns -ENOMEM instead of -EBUSY when pmd is none and pte alloc fails. Returning -ENOMEM does make sense for this case. Is it worth some words in the commit log for the slight behavior change? > > > Or it is impossible that pmd end up being pmd_huge_trans or > > !pmd_present? It should be very unlikely, for example, migration does > > skip huge zero page, but I'm not sure whether there is any corner case > > that I missed. > > I'm assuming both are possible there (but not asserting that they are). > > Hugh
diff --git a/mm/gup.c b/mm/gup.c index bb67193c5460..4ad50a59897f 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -681,21 +681,10 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); } if (flags & FOLL_SPLIT_PMD) { - int ret; - page = pmd_page(*pmd); - if (is_huge_zero_page(page)) { - spin_unlock(ptl); - ret = 0; - split_huge_pmd(vma, pmd, address); - if (pmd_trans_unstable(pmd)) - ret = -EBUSY; - } else { - spin_unlock(ptl); - split_huge_pmd(vma, pmd, address); - ret = pte_alloc(mm, pmd) ? -ENOMEM : 0; - } - - return ret ? ERR_PTR(ret) : + spin_unlock(ptl); + split_huge_pmd(vma, pmd, address); + /* If pmd was left empty, stuff a page table in there quickly */ + return pte_alloc(mm, pmd) ? ERR_PTR(-ENOMEM) : follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); } page = follow_trans_huge_pmd(vma, address, pmd, flags);