Message ID | e8d56c95-c132-a82e-5f5f-7bb1b738b057@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp909829vqm; Tue, 11 Jul 2023 21:49:14 -0700 (PDT) X-Google-Smtp-Source: APBJJlFYmWEQjDPOy5zM7iD7Aoo42D5mxiYbj6QyDldWjhiYJ1vyCxxlHc5kYitamGGkQqCwhS0T X-Received: by 2002:a05:6808:20a5:b0:3a3:95cb:bf2c with SMTP id s37-20020a05680820a500b003a395cbbf2cmr17837241oiw.4.1689137353939; Tue, 11 Jul 2023 21:49:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689137353; cv=none; d=google.com; s=arc-20160816; b=SQpkkzlKjXCl/1mPZLORijfh40km14zQOKb3ZquOK2LX24oJuLX8w3ZDlJ8+zCnbAj 3m/QFKhlihYcWbVeNmOVuJq5yCP39H60PXMyAnC3Pxi1l/RbRTZ6g3/jM6fheAycm92q w4evVBC0zWoVZx5ecSBVW0qShAwk4JUz5UWYA+YJclkVP41qzAl6NrLEE4BS5tiUhjyQ ob4gW1nzNMEKNVacwKQNh6N/qUsHBkMhclzQhlQ7s6PmG1GIGxZAPX87Xg/6z2BFanKf RQh3SkfhSGgTwQhQkRIRlZmLA8QOb77/yMI5ULMhRdGA4FMCcSIpm28UAZ0W2v2jyUe3 x4gQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:message-id:in-reply-to :subject:cc:to:from:date:dkim-signature; bh=7tbEFbFk5pzLjwVra+37SCseEW8dJexk2Np2YDtZH+c=; fh=F9IVpUHia72ABF8rrQzDtAcJe241JKL7UyOA6EvSYz4=; b=aTGqZZ75D8HE4C78GD5qvE9WYRyKbauCRpfaE3IystT/vkzETV/XptbeEL4dFdYKm4 Z6lwy0xv/pJD+Xv8q1RYb9R/cezlVn0vgun82LpBxWzCbYVdApobawO7rKGl/GNKBb63 HS22RfU9fYLU0EBE0OJdi/cqKp9oMCjgNMW+oelAS7E7TyJ5Cz/wZH5hI04q6n1rLN+a j+V/OzqZfhwTr/6mj+l6+JKrrncDduVxUuY8oO7RJseErCQILsZiI5jUfwUnz381NWJ7 csZdRVLozmnp68PXBv6OOEk1U6QYJShxs7irvuyd4HGsBJBHz0iQrGMSyhtEs33MwmiF PGJw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=wnCVuIXu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 24-20020a631258000000b0055b4307963fsi2211337pgs.197.2023.07.11.21.49.00; Tue, 11 Jul 2023 21:49:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=wnCVuIXu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231431AbjGLEee (ORCPT <rfc822;gnulinuxfreebsd@gmail.com> + 99 others); Wed, 12 Jul 2023 00:34:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52154 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229772AbjGLEec (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 12 Jul 2023 00:34:32 -0400 Received: from mail-yw1-x1134.google.com (mail-yw1-x1134.google.com [IPv6:2607:f8b0:4864:20::1134]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 810FB136 for <linux-kernel@vger.kernel.org>; Tue, 11 Jul 2023 21:34:31 -0700 (PDT) Received: by mail-yw1-x1134.google.com with SMTP id 00721157ae682-57d24970042so11457567b3.2 for <linux-kernel@vger.kernel.org>; Tue, 11 Jul 2023 21:34:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689136470; x=1691728470; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=7tbEFbFk5pzLjwVra+37SCseEW8dJexk2Np2YDtZH+c=; b=wnCVuIXuiaG32yazd1jaE3W5ihs2aq10U+0VnqC0A4M5VW9f75uDvYaKtJTiOmyaue Sk3Hz9lSjZsRg5VYNIu7Hn2yE63ieOx4N3x7YzoU3Buew+NClS1OfLuPLWy/xU1Be4F4 Mj93usiX7kHcOZV19+ldiZyyL/p+CNJFiKm0gEtrogp1KtBAN0j2Dy4CImKpM6En2tcK aezkkRAxSKmsFJ7X4v2alSgKHi31+oMGpb01SlGprmt8mWgh7ncZkS4hJxBgagnaXOse tz8TLiOWqvvCKNWEav/8zM1LSgHqAZXlnZ87YVamvaqNDIExsUdGGqSiLd9h9BTV1Nu3 hsEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689136470; x=1691728470; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7tbEFbFk5pzLjwVra+37SCseEW8dJexk2Np2YDtZH+c=; b=MRWk4BFqaAvxXDW8cxD03kVkxDmiStVwtM7NgqNB6YEY4quHKjYN3dx2y/5QoGPS/4 fI7udSC+1S+BOUAS8Hyp+dO+KJUZAZTBryG0hr0gKhPn0GcvexwWI0b26hTGccU3BRPR t7iVXX0Ogk1gb++9O89lfLONYtt3goKEc7AirClMCXHPyznk1ekMEEnt7yhL47PMZ75+ xz2I6fVg30nfYdBPd1tKvzpk24sBS2hGsFQ/DUDyLUvbyp9GqvRIcEIMXgpicjKrpPgu Sjzd7Iy+IDOcBaoyeG5tymi/n+9ArVZgrG6K+fxo9LokUY1uhhuLfGc4n0c85xM+oZvQ ry0Q== X-Gm-Message-State: ABy/qLbmmrqH4jNZglhMLhQpLBvu4WfVFF5kNjvMQr9LxmItNChak4B6 I10und7O/5JPQ0CIIv3SttuoiQ== X-Received: by 2002:a81:4ecb:0:b0:57d:24e9:e7f3 with SMTP id c194-20020a814ecb000000b0057d24e9e7f3mr2850117ywb.38.1689136470494; Tue, 11 Jul 2023 21:34:30 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id c124-20020a0dc182000000b0057a05834754sm974979ywd.75.2023.07.11.21.34.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jul 2023 21:34:30 -0700 (PDT) Date: Tue, 11 Jul 2023 21:34:25 -0700 (PDT) From: Hugh Dickins <hughd@google.com> X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton <akpm@linux-foundation.org> cc: Mike Kravetz <mike.kravetz@oracle.com>, Mike Rapoport <rppt@kernel.org>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Matthew Wilcox <willy@infradead.org>, David Hildenbrand <david@redhat.com>, Suren Baghdasaryan <surenb@google.com>, Qi Zheng <zhengqi.arch@bytedance.com>, Yang Shi <shy828301@gmail.com>, Mel Gorman <mgorman@techsingularity.net>, Peter Xu <peterx@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>, Alistair Popple <apopple@nvidia.com>, Ralph Campbell <rcampbell@nvidia.com>, Ira Weiny <ira.weiny@intel.com>, Steven Price <steven.price@arm.com>, SeongJae Park <sj@kernel.org>, Lorenzo Stoakes <lstoakes@gmail.com>, Huang Ying <ying.huang@intel.com>, Naoya Horiguchi <naoya.horiguchi@nec.com>, Christophe Leroy <christophe.leroy@csgroup.eu>, Zack Rusin <zackr@vmware.com>, Jason Gunthorpe <jgg@ziepe.ca>, Axel Rasmussen <axelrasmussen@google.com>, Anshuman Khandual <anshuman.khandual@arm.com>, Pasha Tatashin <pasha.tatashin@soleen.com>, Miaohe Lin <linmiaohe@huawei.com>, Minchan Kim <minchan@kernel.org>, Christoph Hellwig <hch@infradead.org>, Song Liu <song@kernel.org>, Thomas Hellstrom <thomas.hellstrom@linux.intel.com>, Russell King <linux@armlinux.org.uk>, "David S. Miller" <davem@davemloft.net>, Michael Ellerman <mpe@ellerman.id.au>, "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>, Heiko Carstens <hca@linux.ibm.com>, Christian Borntraeger <borntraeger@linux.ibm.com>, Claudio Imbrenda <imbrenda@linux.ibm.com>, Alexander Gordeev <agordeev@linux.ibm.com>, Gerald Schaefer <gerald.schaefer@linux.ibm.com>, Vasily Gorbik <gor@linux.ibm.com>, Jann Horn <jannh@google.com>, Vishal Moola <vishal.moola@gmail.com>, Vlastimil Babka <vbabka@suse.cz>, Zi Yan <ziy@nvidia.com>, linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v3 04/13] powerpc: assert_pte_locked() use pte_offset_map_nolock() In-Reply-To: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> Message-ID: <e8d56c95-c132-a82e-5f5f-7bb1b738b057@google.com> References: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771188890281843396 X-GMAIL-MSGID: 1771188890281843396 |
Series |
mm: free retracted page table by RCU
|
|
Commit Message
Hugh Dickins
July 12, 2023, 4:34 a.m. UTC
Instead of pte_lockptr(), use the recently added pte_offset_map_nolock()
in assert_pte_locked(). BUG if pte_offset_map_nolock() fails: this is
stricter than the previous implementation, which skipped when pmd_none()
(with a comment on khugepaged collapse transitions): but wouldn't we want
to know, if an assert_pte_locked() caller can be racing such transitions?
This mod might cause new crashes: which either expose my ignorance, or
indicate issues to be fixed, or limit the usage of assert_pte_locked().
Signed-off-by: Hugh Dickins <hughd@google.com>
---
arch/powerpc/mm/pgtable.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
Comments
Hugh Dickins <hughd@google.com> writes: > Instead of pte_lockptr(), use the recently added pte_offset_map_nolock() > in assert_pte_locked(). BUG if pte_offset_map_nolock() fails: this is > stricter than the previous implementation, which skipped when pmd_none() > (with a comment on khugepaged collapse transitions): but wouldn't we want > to know, if an assert_pte_locked() caller can be racing such transitions? > The reason we had that pmd_none check there was to handle khugpaged. In case of khugepaged we do pmdp_collapse_flush and then do a ptep_clear. ppc64 had the assert_pte_locked check inside that ptep_clear. _pmd = pmdp_collapse_flush(vma, address, pmd); .. ptep_clear() -> asset_ptep_locked() ---> pmd_none -----> BUG The problem is how assert_pte_locked() verify whether we are holding ptl. It does that by walking the page table again and in this specific case by the time we call the function we already had cleared pmd . > > This mod might cause new crashes: which either expose my ignorance, or > indicate issues to be fixed, or limit the usage of assert_pte_locked(). > > Signed-off-by: Hugh Dickins <hughd@google.com> > --- > arch/powerpc/mm/pgtable.c | 16 ++++++---------- > 1 file changed, 6 insertions(+), 10 deletions(-) > > diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c > index cb2dcdb18f8e..16b061af86d7 100644 > --- a/arch/powerpc/mm/pgtable.c > +++ b/arch/powerpc/mm/pgtable.c > @@ -311,6 +311,8 @@ void assert_pte_locked(struct mm_struct *mm, unsigned long addr) > p4d_t *p4d; > pud_t *pud; > pmd_t *pmd; > + pte_t *pte; > + spinlock_t *ptl; > > if (mm == &init_mm) > return; > @@ -321,16 +323,10 @@ void assert_pte_locked(struct mm_struct *mm, unsigned long addr) > pud = pud_offset(p4d, addr); > BUG_ON(pud_none(*pud)); > pmd = pmd_offset(pud, addr); > - /* > - * khugepaged to collapse normal pages to hugepage, first set > - * pmd to none to force page fault/gup to take mmap_lock. After > - * pmd is set to none, we do a pte_clear which does this assertion > - * so if we find pmd none, return. > - */ > - if (pmd_none(*pmd)) > - return; > - BUG_ON(!pmd_present(*pmd)); > - assert_spin_locked(pte_lockptr(mm, pmd)); > + pte = pte_offset_map_nolock(mm, pmd, addr, &ptl); > + BUG_ON(!pte); > + assert_spin_locked(ptl); > + pte_unmap(pte); > } > #endif /* CONFIG_DEBUG_VM */ > > -- > 2.35.3
On Tue, 18 Jul 2023, Aneesh Kumar K.V wrote: > Hugh Dickins <hughd@google.com> writes: > > > Instead of pte_lockptr(), use the recently added pte_offset_map_nolock() > > in assert_pte_locked(). BUG if pte_offset_map_nolock() fails: this is > > stricter than the previous implementation, which skipped when pmd_none() > > (with a comment on khugepaged collapse transitions): but wouldn't we want > > to know, if an assert_pte_locked() caller can be racing such transitions? > > > > The reason we had that pmd_none check there was to handle khugpaged. In > case of khugepaged we do pmdp_collapse_flush and then do a ptep_clear. > ppc64 had the assert_pte_locked check inside that ptep_clear. > > _pmd = pmdp_collapse_flush(vma, address, pmd); > .. > ptep_clear() > -> asset_ptep_locked() > ---> pmd_none > -----> BUG > > > The problem is how assert_pte_locked() verify whether we are holding > ptl. It does that by walking the page table again and in this specific > case by the time we call the function we already had cleared pmd . Aneesh, please clarify, I've spent hours on this. From all your use of past tense ("had"), I thought you were Acking my patch; but now, after looking again at v3.11 source and today's, I think you are NAKing my patch in its present form. You are pointing out that anon THP's __collapse_huge_page_copy_succeeded() uses ptep_clear() at a point after pmdp_collapse_flush() already cleared *pmd, so my patch now leads that one use of assert_pte_locked() to BUG. Is that your point? I can easily restore that khugepaged comment (which had appeared to me out of date at the time, but now looks still relevant) and pmd_none(*pmd) check: but please clarify. Thanks, Hugh > > > > This mod might cause new crashes: which either expose my ignorance, or > > indicate issues to be fixed, or limit the usage of assert_pte_locked(). > > > > Signed-off-by: Hugh Dickins <hughd@google.com> > > --- > > arch/powerpc/mm/pgtable.c | 16 ++++++---------- > > 1 file changed, 6 insertions(+), 10 deletions(-) > > > > diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c > > index cb2dcdb18f8e..16b061af86d7 100644 > > --- a/arch/powerpc/mm/pgtable.c > > +++ b/arch/powerpc/mm/pgtable.c > > @@ -311,6 +311,8 @@ void assert_pte_locked(struct mm_struct *mm, unsigned long addr) > > p4d_t *p4d; > > pud_t *pud; > > pmd_t *pmd; > > + pte_t *pte; > > + spinlock_t *ptl; > > > > if (mm == &init_mm) > > return; > > @@ -321,16 +323,10 @@ void assert_pte_locked(struct mm_struct *mm, unsigned long addr) > > pud = pud_offset(p4d, addr); > > BUG_ON(pud_none(*pud)); > > pmd = pmd_offset(pud, addr); > > - /* > > - * khugepaged to collapse normal pages to hugepage, first set > > - * pmd to none to force page fault/gup to take mmap_lock. After > > - * pmd is set to none, we do a pte_clear which does this assertion > > - * so if we find pmd none, return. > > - */ > > - if (pmd_none(*pmd)) > > - return; > > - BUG_ON(!pmd_present(*pmd)); > > - assert_spin_locked(pte_lockptr(mm, pmd)); > > + pte = pte_offset_map_nolock(mm, pmd, addr, &ptl); > > + BUG_ON(!pte); > > + assert_spin_locked(ptl); > > + pte_unmap(pte); > > } > > #endif /* CONFIG_DEBUG_VM */ > > > > -- > > 2.35.3
On 7/19/23 10:34 AM, Hugh Dickins wrote: > On Tue, 18 Jul 2023, Aneesh Kumar K.V wrote: >> Hugh Dickins <hughd@google.com> writes: >> >>> Instead of pte_lockptr(), use the recently added pte_offset_map_nolock() >>> in assert_pte_locked(). BUG if pte_offset_map_nolock() fails: this is >>> stricter than the previous implementation, which skipped when pmd_none() >>> (with a comment on khugepaged collapse transitions): but wouldn't we want >>> to know, if an assert_pte_locked() caller can be racing such transitions? >>> >> >> The reason we had that pmd_none check there was to handle khugpaged. In >> case of khugepaged we do pmdp_collapse_flush and then do a ptep_clear. >> ppc64 had the assert_pte_locked check inside that ptep_clear. >> >> _pmd = pmdp_collapse_flush(vma, address, pmd); >> .. >> ptep_clear() >> -> asset_ptep_locked() >> ---> pmd_none >> -----> BUG >> >> >> The problem is how assert_pte_locked() verify whether we are holding >> ptl. It does that by walking the page table again and in this specific >> case by the time we call the function we already had cleared pmd . > > Aneesh, please clarify, I've spent hours on this. > > From all your use of past tense ("had"), I thought you were Acking my > patch; but now, after looking again at v3.11 source and today's, > I think you are NAKing my patch in its present form. > Sorry for the confusion my reply created. > You are pointing out that anon THP's __collapse_huge_page_copy_succeeded() > uses ptep_clear() at a point after pmdp_collapse_flush() already cleared > *pmd, so my patch now leads that one use of assert_pte_locked() to BUG. > Is that your point? > Yes. I haven't tested this yet to verify that it is indeed hitting that BUG. But a code inspection tells me we will hit that BUG on powerpc because of the above details. > I can easily restore that khugepaged comment (which had appeared to me > out of date at the time, but now looks still relevant) and pmd_none(*pmd) > check: but please clarify. > That is correct. if we add that pmd_none check back we should be good here. -aneesh
On Jul 19 2023, Aneesh Kumar K V wrote: > On 7/19/23 10:34 AM, Hugh Dickins wrote: > > On Tue, 18 Jul 2023, Aneesh Kumar K.V wrote: > >> Hugh Dickins <hughd@google.com> writes: > >> > >>> Instead of pte_lockptr(), use the recently added pte_offset_map_nolock() > >>> in assert_pte_locked(). BUG if pte_offset_map_nolock() fails: this is > >>> stricter than the previous implementation, which skipped when pmd_none() > >>> (with a comment on khugepaged collapse transitions): but wouldn't we want > >>> to know, if an assert_pte_locked() caller can be racing such transitions? > >>> > >> > >> The reason we had that pmd_none check there was to handle khugpaged. In > >> case of khugepaged we do pmdp_collapse_flush and then do a ptep_clear. > >> ppc64 had the assert_pte_locked check inside that ptep_clear. > >> > >> _pmd = pmdp_collapse_flush(vma, address, pmd); > >> .. > >> ptep_clear() > >> -> asset_ptep_locked() > >> ---> pmd_none > >> -----> BUG > >> > >> > >> The problem is how assert_pte_locked() verify whether we are holding > >> ptl. It does that by walking the page table again and in this specific > >> case by the time we call the function we already had cleared pmd . > > > > Aneesh, please clarify, I've spent hours on this. > > > > From all your use of past tense ("had"), I thought you were Acking my > > patch; but now, after looking again at v3.11 source and today's, > > I think you are NAKing my patch in its present form. > > > > Sorry for the confusion my reply created. > > > You are pointing out that anon THP's __collapse_huge_page_copy_succeeded() > > uses ptep_clear() at a point after pmdp_collapse_flush() already cleared > > *pmd, so my patch now leads that one use of assert_pte_locked() to BUG. > > Is that your point? > > > > Yes. I haven't tested this yet to verify that it is indeed hitting that BUG. > But a code inspection tells me we will hit that BUG on powerpc because of > the above details. > Hi Aneesh, After testing it, I can confirm that it encountered a BUG on powerpc. Log report as attached Thanks, Jay Patel > > I can easily restore that khugepaged comment (which had appeared to me > > out of date at the time, but now looks still relevant) and pmd_none(*pmd) > > check: but please clarify. > > > > That is correct. if we add that pmd_none check back we should be good here. > > > -aneesh [ 53.513058][ T105] ------------[ cut here ]------------ [ 53.513080][ T105] kernel BUG at arch/powerpc/mm/pgtable.c:327! [ 53.513090][ T105] Oops: Exception in kernel mode, sig: 5 [#1] [ 53.513099][ T105] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [ 53.513109][ T105] Modules linked in: bonding pseries_rng rng_core vmx_crypto gf128mul ibmveth crc32c_vpmsum fuse autofs4 [ 53.513135][ T105] CPU: 3 PID: 105 Comm: khugepaged Not tainted 6.5.0-rc1-gebfaf626e99f-dirty #1 [ 53.513146][ T105] Hardware name: IBM,9009-42G POWER9 (raw) 0x4e0202 0xf000005 of:IBM,FW950.80 (VL950_131) hv:phyp pSeries [ 53.513156][ T105] NIP: c000000000079478 LR: c00000000007946c CTR: 0000000000000000 [ 53.513165][ T105] REGS: c000000008e9b930 TRAP: 0700 Not tainted (6.5.0-rc1-gebfaf626e99f-dirty) [ 53.513175][ T105] MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24002882 XER: 20040000 [ 53.513202][ T105] CFAR: c000000000412a30 IRQMASK: 0 [ 53.513202][ T105] GPR00: c00000000007946c c000000008e9bbd0 c0000000012d3500 0000000000000001 [ 53.513202][ T105] GPR04: 0000000011000000 c000000008e9bbb0 c000000008e9bbf0 ffffffffffffffff [ 53.513202][ T105] GPR08: 00000000000003ff 0000000000000000 0000000000000000 000000000000000a [ 53.513202][ T105] GPR12: 00000000497b0000 c00000001ec9d480 c00c00000016fe00 c000000051455000 [ 53.513202][ T105] GPR16: 0000000000000000 ffffffffffffffff 0000000000000001 0000000000000001 [ 53.513202][ T105] GPR20: c000000002912430 c000000051455000 0000000000000000 c00000000946e650 [ 53.513202][ T105] GPR24: c0000000029800e8 0000000011000000 c00c000000145168 c000000002980180 [ 53.513202][ T105] GPR28: 0000000011000000 8603f85b000080c0 c000000008e9bc70 c00000001bd0d680 [ 53.513304][ T105] NIP [c000000000079478] assert_pte_locked.part.18+0x168/0x1b0 [ 53.513318][ T105] LR [c00000000007946c] assert_pte_locked.part.18+0x15c/0x1b0 [ 53.513329][ T105] Call Trace: [ 53.513335][ T105] [c000000008e9bbd0] [c00000000007946c] assert_pte_locked.part.18+0x15c/0x1b0 (unreliable) [ 53.513350][ T105] [c000000008e9bc00] [c00000000048e10c] collapse_huge_page+0x11dc/0x1700 [ 53.513362][ T105] [c000000008e9bd40] [c00000000048ed18] hpage_collapse_scan_pmd+0x6e8/0x850 [ 53.513374][ T105] [c000000008e9be30] [c000000000492544] khugepaged+0x7e4/0xb70 [ 53.513386][ T105] [c000000008e9bf90] [c00000000015f038] kthread+0x138/0x140 [ 53.513399][ T105] [c000000008e9bfe0] [c00000000000dd58] start_kernel_thread+0x14/0x18 [ 53.513411][ T105] Code: 7c852378 7c844c36 794a1564 7c894038 794af082 79241f24 78eaf00e 7c8a2214 48399541 60000000 7c630074 7863d182 <0b030000> e9210020 81290000 7d290074 [ 53.513448][ T105] ---[ end trace 0000000000000000 ]--- [ 53.516544][ T105] [ 53.516551][ T105] note: khugepaged[105] exited with irqs disabled [ 182.648447][ T1952] mconf[1952]: segfault (11) at 110efa38 nip 1001e97c lr 1001e8a8 code 1 [ 182.648482][ T1952] mconf[1952]: code: 60420000 4bfffd59 4bffffd0 60000000 60420000 e93f0070 2fa90000 409e0014 [ 182.648494][ T1952] mconf[1952]: code: 480000cc e9290000 2fa90000 419e00c0 <81490008> 2f8a0005 409effec e9490028 [ 962.694079][T39811] sda2: Can't mount, would change RO state
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index cb2dcdb18f8e..16b061af86d7 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -311,6 +311,8 @@ void assert_pte_locked(struct mm_struct *mm, unsigned long addr) p4d_t *p4d; pud_t *pud; pmd_t *pmd; + pte_t *pte; + spinlock_t *ptl; if (mm == &init_mm) return; @@ -321,16 +323,10 @@ void assert_pte_locked(struct mm_struct *mm, unsigned long addr) pud = pud_offset(p4d, addr); BUG_ON(pud_none(*pud)); pmd = pmd_offset(pud, addr); - /* - * khugepaged to collapse normal pages to hugepage, first set - * pmd to none to force page fault/gup to take mmap_lock. After - * pmd is set to none, we do a pte_clear which does this assertion - * so if we find pmd none, return. - */ - if (pmd_none(*pmd)) - return; - BUG_ON(!pmd_present(*pmd)); - assert_spin_locked(pte_lockptr(mm, pmd)); + pte = pte_offset_map_nolock(mm, pmd, addr, &ptl); + BUG_ON(!pte); + assert_spin_locked(ptl); + pte_unmap(pte); } #endif /* CONFIG_DEBUG_VM */