Message ID | 20230921162007.1630149-9-ryan.roberts@arm.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp5048723vqi; Thu, 21 Sep 2023 11:19:52 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHdJEMxAQYEHfHtaGIkp2asxglNDPrO6Oa+sSB231n3u0uY2yGB2jYkUypTMgaig4yt3pjh X-Received: by 2002:a17:902:ecc2:b0:1c0:c86a:5425 with SMTP id a2-20020a170902ecc200b001c0c86a5425mr7156520plh.19.1695320392148; Thu, 21 Sep 2023 11:19:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695320392; cv=none; d=google.com; s=arc-20160816; b=obvB5FeUs30IH6JW61KciZl+hk68PtRSm8Pm1vAXtLlmbns1iWxSrYThciv9Kt/2Mw j/q4oE4g9fgk/xGSRrB7ZbqE1ZJOLMqwwqcknvRHeLs+cQ5gZt1wu8rouPoBYQxrMhdH 43uyrAM/XiBNjHZ5VawLiZP9M9FfE72mci/jBt16vIIVsBepG4TZSvMvGIuOs9+IeZIO doiVb9w6a93/keFV9A+jaMvnyKhclLMwajOpmr2DIJJ8/Vpqz5T4cz8/2STDV2u3UplT 9Wt/6Otf0u2yDKJBeuNkS+FFtd1kI0UgaCy3/jFyg+SCxGD5ZNeoKdHGEN3DtglaYUNP ZzKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=NfUxN4bvi8i9D4AKXzSQb1pdt1stW1rn/rlDRpXWERQ=; fh=q/eoddAVbbeWDzUVs4Vwk85GEJr6Ex1kH1y0q+v38Hg=; b=iDLAFdC3knbm+eZfqZ0Bp9t0luLfkZfj53fPVCMD/Vs0ERt0XpLEWDOhwWR6JwbvWL /HleyMDtcPsX1fVlt4P2wlU18SdEHpp07J7V4A/d3GVvGnLmzOfYZ/Cx69YnICwFMCFg R3DnGbNREInDL1uISg9gq49ly6XdQhF8TrPYEmoAFIvm+Z0ztvtkkHdiqj+1CyiiTVmu z6VeVTjJ3u01/w6Kq3ufITK019rZyAo3uls9XHC+PcZhnnh2gaGEZgZrxsWnwmtZYJ01 vKZ3aAumJd7/eAH6QMJo/RfEh/y9IS7elFdAxfCSogZ5JBG0Z1CQSAFY3+uZWQeT4tk4 K51w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id lo11-20020a170903434b00b001c38d1cd443si1944258plb.238.2023.09.21.11.19.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Sep 2023 11:19:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 345B5832EB2D; Thu, 21 Sep 2023 11:16:51 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230200AbjIUSQ3 (ORCPT <rfc822;pwkd43@gmail.com> + 29 others); Thu, 21 Sep 2023 14:16:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51560 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230254AbjIUSQP (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 21 Sep 2023 14:16:15 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 2D5BB7EA33; Thu, 21 Sep 2023 10:37:38 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 911D21764; Thu, 21 Sep 2023 09:21:37 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 26AFE3F59C; Thu, 21 Sep 2023 09:20:56 -0700 (PDT) From: Ryan Roberts <ryan.roberts@arm.com> To: Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>, Helge Deller <deller@gmx.de>, Nicholas Piggin <npiggin@gmail.com>, Christophe Leroy <christophe.leroy@csgroup.eu>, Paul Walmsley <paul.walmsley@sifive.com>, Palmer Dabbelt <palmer@dabbelt.com>, Albert Ou <aou@eecs.berkeley.edu>, Heiko Carstens <hca@linux.ibm.com>, Vasily Gorbik <gor@linux.ibm.com>, Alexander Gordeev <agordeev@linux.ibm.com>, Christian Borntraeger <borntraeger@linux.ibm.com>, Sven Schnelle <svens@linux.ibm.com>, Gerald Schaefer <gerald.schaefer@linux.ibm.com>, "David S. Miller" <davem@davemloft.net>, Arnd Bergmann <arnd@arndb.de>, Mike Kravetz <mike.kravetz@oracle.com>, Muchun Song <muchun.song@linux.dev>, SeongJae Park <sj@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, Uladzislau Rezki <urezki@gmail.com>, Christoph Hellwig <hch@infradead.org>, Lorenzo Stoakes <lstoakes@gmail.com>, Anshuman Khandual <anshuman.khandual@arm.com>, Peter Xu <peterx@redhat.com>, Axel Rasmussen <axelrasmussen@google.com>, Qi Zheng <zhengqi.arch@bytedance.com> Cc: Ryan Roberts <ryan.roberts@arm.com>, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, sparclinux@vger.kernel.org, linux-mm@kvack.org, stable@vger.kernel.org Subject: [PATCH v1 8/8] arm64: hugetlb: Fix set_huge_pte_at() to work with all swap entries Date: Thu, 21 Sep 2023 17:20:07 +0100 Message-Id: <20230921162007.1630149-9-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230921162007.1630149-1-ryan.roberts@arm.com> References: <20230921162007.1630149-1-ryan.roberts@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Thu, 21 Sep 2023 11:16:51 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777672275171873596 X-GMAIL-MSGID: 1777672275171873596 |
Series |
Fix set_huge_pte_at() panic on arm64
|
|
Commit Message
Ryan Roberts
Sept. 21, 2023, 4:20 p.m. UTC
When called with a swap entry that does not embed a PFN (e.g.
PTE_MARKER_POISONED or PTE_MARKER_UFFD_WP), the previous implementation
of set_huge_pte_at() would either cause a BUG() to fire (if
CONFIG_DEBUG_VM is enabled) or cause a dereference of an invalid address
and subsequent panic.
arm64's huge pte implementation supports multiple huge page sizes, some
of which are implemented in the page table with contiguous mappings. So
set_huge_pte_at() needs to work out how big the logical pte is, so that
it can also work out how many physical ptes (or pmds) need to be
written. It does this by grabbing the folio out of the pte and querying
its size.
However, there are cases when the pte being set is actually a swap
entry. But this also used to work fine, because for huge ptes, we only
ever saw migration entries and hwpoison entries. And both of these types
of swap entries have a PFN embedded, so the code would grab that and
everything still worked out.
But over time, more calls to set_huge_pte_at() have been added that set
swap entry types that do not embed a PFN. And this causes the code to go
bang. The triggering case is for the uffd poison test, commit
99aa77215ad0 ("selftests/mm: add uffd unit test for UFFDIO_POISON"),
which sets a PTE_MARKER_POISONED swap entry. But review shows there are
other places too (PTE_MARKER_UFFD_WP).
So the root cause is due to commit 18f3962953e4 ("mm: hugetlb: kill
set_huge_swap_pte_at()"), which aimed to simplify the interface to the
core code by removing set_huge_swap_pte_at() (which took a page size
parameter) and replacing it with calls to set_huge_swap_pte_at() where
the size was inferred from the folio, as descibed above. While that
commit didn't break anything at the time, it did break the interface
because it couldn't handle swap entries without PFNs. And since then new
callers have come along which rely on this working.
Now that we have modified the set_huge_pte_at() interface to pass the
vma, we can extract the huge page size from it and fix this issue.
I'm tagging the commit that added the uffd poison feature, since that is
what exposed the problem, as well as the original change that broke the
interface. Hopefully this is valuable for people doing bisect.
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Fixes: 18f3962953e4 ("mm: hugetlb: kill set_huge_swap_pte_at()")
Fixes: 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs")
---
arch/arm64/mm/hugetlbpage.c | 17 +++--------------
1 file changed, 3 insertions(+), 14 deletions(-)
Comments
Hi Ryan, On 2023/9/22 00:20, Ryan Roberts wrote: > When called with a swap entry that does not embed a PFN (e.g. > PTE_MARKER_POISONED or PTE_MARKER_UFFD_WP), the previous implementation > of set_huge_pte_at() would either cause a BUG() to fire (if > CONFIG_DEBUG_VM is enabled) or cause a dereference of an invalid address > and subsequent panic. > > arm64's huge pte implementation supports multiple huge page sizes, some > of which are implemented in the page table with contiguous mappings. So > set_huge_pte_at() needs to work out how big the logical pte is, so that > it can also work out how many physical ptes (or pmds) need to be > written. It does this by grabbing the folio out of the pte and querying > its size. > > However, there are cases when the pte being set is actually a swap > entry. But this also used to work fine, because for huge ptes, we only > ever saw migration entries and hwpoison entries. And both of these types > of swap entries have a PFN embedded, so the code would grab that and > everything still worked out. > > But over time, more calls to set_huge_pte_at() have been added that set > swap entry types that do not embed a PFN. And this causes the code to go > bang. The triggering case is for the uffd poison test, commit > 99aa77215ad0 ("selftests/mm: add uffd unit test for UFFDIO_POISON"), > which sets a PTE_MARKER_POISONED swap entry. But review shows there are > other places too (PTE_MARKER_UFFD_WP). > > So the root cause is due to commit 18f3962953e4 ("mm: hugetlb: kill > set_huge_swap_pte_at()"), which aimed to simplify the interface to the > core code by removing set_huge_swap_pte_at() (which took a page size > parameter) and replacing it with calls to set_huge_swap_pte_at() where > the size was inferred from the folio, as descibed above. While that > commit didn't break anything at the time, If it didn't break anything at that time, then shouldn't the Fixes tag be added to this commit? > it did break the interface > because it couldn't handle swap entries without PFNs. And since then new > callers have come along which rely on this working. So the Fixes tag should be added only to the commit that introduces the first new callers? Other than that, LGTM. Thanks, Qi > > Now that we have modified the set_huge_pte_at() interface to pass the > vma, we can extract the huge page size from it and fix this issue. > > I'm tagging the commit that added the uffd poison feature, since that is > what exposed the problem, as well as the original change that broke the > interface. Hopefully this is valuable for people doing bisect. > > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> > Fixes: 18f3962953e4 ("mm: hugetlb: kill set_huge_swap_pte_at()") > Fixes: 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") > --- > arch/arm64/mm/hugetlbpage.c | 17 +++-------------- > 1 file changed, 3 insertions(+), 14 deletions(-) > > diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c > index 844832511c1e..a08601a14689 100644 > --- a/arch/arm64/mm/hugetlbpage.c > +++ b/arch/arm64/mm/hugetlbpage.c > @@ -241,13 +241,6 @@ static void clear_flush(struct mm_struct *mm, > flush_tlb_range(&vma, saddr, addr); > } > > -static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry) > -{ > - VM_BUG_ON(!is_migration_entry(entry) && !is_hwpoison_entry(entry)); > - > - return page_folio(pfn_to_page(swp_offset_pfn(entry))); > -} > - > void set_huge_pte_at(struct vm_area_struct *vma, unsigned long addr, > pte_t *ptep, pte_t pte) > { > @@ -258,13 +251,10 @@ void set_huge_pte_at(struct vm_area_struct *vma, unsigned long addr, > unsigned long pfn, dpfn; > pgprot_t hugeprot; > > - if (!pte_present(pte)) { > - struct folio *folio; > - > - folio = hugetlb_swap_entry_to_folio(pte_to_swp_entry(pte)); > - ncontig = num_contig_ptes(folio_size(folio), &pgsize); > + ncontig = num_contig_ptes(huge_page_size(hstate_vma(vma)), &pgsize); > > - for (i = 0; i < ncontig; i++, ptep++) > + if (!pte_present(pte)) { > + for (i = 0; i < ncontig; i++, ptep++, addr += pgsize) > set_pte_at(mm, addr, ptep, pte); > return; > } > @@ -274,7 +264,6 @@ void set_huge_pte_at(struct vm_area_struct *vma, unsigned long addr, > return; > } > > - ncontig = find_num_contig(mm, addr, ptep, &pgsize); > pfn = pte_pfn(pte); > dpfn = pgsize >> PAGE_SHIFT; > hugeprot = pte_pgprot(pte);
On 22/09/2023 03:54, Qi Zheng wrote: > Hi Ryan, > > On 2023/9/22 00:20, Ryan Roberts wrote: >> When called with a swap entry that does not embed a PFN (e.g. >> PTE_MARKER_POISONED or PTE_MARKER_UFFD_WP), the previous implementation >> of set_huge_pte_at() would either cause a BUG() to fire (if >> CONFIG_DEBUG_VM is enabled) or cause a dereference of an invalid address >> and subsequent panic. >> >> arm64's huge pte implementation supports multiple huge page sizes, some >> of which are implemented in the page table with contiguous mappings. So >> set_huge_pte_at() needs to work out how big the logical pte is, so that >> it can also work out how many physical ptes (or pmds) need to be >> written. It does this by grabbing the folio out of the pte and querying >> its size. >> >> However, there are cases when the pte being set is actually a swap >> entry. But this also used to work fine, because for huge ptes, we only >> ever saw migration entries and hwpoison entries. And both of these types >> of swap entries have a PFN embedded, so the code would grab that and >> everything still worked out. >> >> But over time, more calls to set_huge_pte_at() have been added that set >> swap entry types that do not embed a PFN. And this causes the code to go >> bang. The triggering case is for the uffd poison test, commit >> 99aa77215ad0 ("selftests/mm: add uffd unit test for UFFDIO_POISON"), >> which sets a PTE_MARKER_POISONED swap entry. But review shows there are >> other places too (PTE_MARKER_UFFD_WP). >> >> So the root cause is due to commit 18f3962953e4 ("mm: hugetlb: kill >> set_huge_swap_pte_at()"), which aimed to simplify the interface to the >> core code by removing set_huge_swap_pte_at() (which took a page size >> parameter) and replacing it with calls to set_huge_swap_pte_at() where >> the size was inferred from the folio, as descibed above. While that >> commit didn't break anything at the time, > > If it didn't break anything at that time, then shouldn't the Fixes tag > be added to this commit? > >> it did break the interface >> because it couldn't handle swap entries without PFNs. And since then new >> callers have come along which rely on this working. > > So the Fixes tag should be added only to the commit that introduces the > first new callers? Well I guess it's a matter of point of view; My view is that 18f3962953e4 is the buggy change because it broke the interface to not be able to handle swap entries which do not contain PFNs. The fact that there were no callers that used the interface in this way at the time of the commit is irrelevant in my view. But I already added 2 fixes tags; one for the buggy commit, and the other for the commit containing the new user of the interface. > > Other than that, LGTM. Thanks! > > Thanks, > Qi > >> >> Now that we have modified the set_huge_pte_at() interface to pass the >> vma, we can extract the huge page size from it and fix this issue. >> >> I'm tagging the commit that added the uffd poison feature, since that is >> what exposed the problem, as well as the original change that broke the >> interface. Hopefully this is valuable for people doing bisect. >> >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >> Fixes: 18f3962953e4 ("mm: hugetlb: kill set_huge_swap_pte_at()") >> Fixes: 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") >> --- >> arch/arm64/mm/hugetlbpage.c | 17 +++-------------- >> 1 file changed, 3 insertions(+), 14 deletions(-) >> >> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c >> index 844832511c1e..a08601a14689 100644 >> --- a/arch/arm64/mm/hugetlbpage.c >> +++ b/arch/arm64/mm/hugetlbpage.c >> @@ -241,13 +241,6 @@ static void clear_flush(struct mm_struct *mm, >> flush_tlb_range(&vma, saddr, addr); >> } >> -static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry) >> -{ >> - VM_BUG_ON(!is_migration_entry(entry) && !is_hwpoison_entry(entry)); >> - >> - return page_folio(pfn_to_page(swp_offset_pfn(entry))); >> -} >> - >> void set_huge_pte_at(struct vm_area_struct *vma, unsigned long addr, >> pte_t *ptep, pte_t pte) >> { >> @@ -258,13 +251,10 @@ void set_huge_pte_at(struct vm_area_struct *vma, >> unsigned long addr, >> unsigned long pfn, dpfn; >> pgprot_t hugeprot; >> - if (!pte_present(pte)) { >> - struct folio *folio; >> - >> - folio = hugetlb_swap_entry_to_folio(pte_to_swp_entry(pte)); >> - ncontig = num_contig_ptes(folio_size(folio), &pgsize); >> + ncontig = num_contig_ptes(huge_page_size(hstate_vma(vma)), &pgsize); >> - for (i = 0; i < ncontig; i++, ptep++) >> + if (!pte_present(pte)) { >> + for (i = 0; i < ncontig; i++, ptep++, addr += pgsize) >> set_pte_at(mm, addr, ptep, pte); >> return; >> } >> @@ -274,7 +264,6 @@ void set_huge_pte_at(struct vm_area_struct *vma, unsigned >> long addr, >> return; >> } >> - ncontig = find_num_contig(mm, addr, ptep, &pgsize); >> pfn = pte_pfn(pte); >> dpfn = pgsize >> PAGE_SHIFT; >> hugeprot = pte_pgprot(pte);
Hi Ryan, On 2023/9/22 15:40, Ryan Roberts wrote: > On 22/09/2023 03:54, Qi Zheng wrote: >> Hi Ryan, >> >> On 2023/9/22 00:20, Ryan Roberts wrote: >>> When called with a swap entry that does not embed a PFN (e.g. >>> PTE_MARKER_POISONED or PTE_MARKER_UFFD_WP), the previous implementation >>> of set_huge_pte_at() would either cause a BUG() to fire (if >>> CONFIG_DEBUG_VM is enabled) or cause a dereference of an invalid address >>> and subsequent panic. >>> >>> arm64's huge pte implementation supports multiple huge page sizes, some >>> of which are implemented in the page table with contiguous mappings. So >>> set_huge_pte_at() needs to work out how big the logical pte is, so that >>> it can also work out how many physical ptes (or pmds) need to be >>> written. It does this by grabbing the folio out of the pte and querying >>> its size. >>> >>> However, there are cases when the pte being set is actually a swap >>> entry. But this also used to work fine, because for huge ptes, we only >>> ever saw migration entries and hwpoison entries. And both of these types >>> of swap entries have a PFN embedded, so the code would grab that and >>> everything still worked out. >>> >>> But over time, more calls to set_huge_pte_at() have been added that set >>> swap entry types that do not embed a PFN. And this causes the code to go >>> bang. The triggering case is for the uffd poison test, commit >>> 99aa77215ad0 ("selftests/mm: add uffd unit test for UFFDIO_POISON"), >>> which sets a PTE_MARKER_POISONED swap entry. But review shows there are >>> other places too (PTE_MARKER_UFFD_WP). >>> >>> So the root cause is due to commit 18f3962953e4 ("mm: hugetlb: kill >>> set_huge_swap_pte_at()"), which aimed to simplify the interface to the >>> core code by removing set_huge_swap_pte_at() (which took a page size >>> parameter) and replacing it with calls to set_huge_swap_pte_at() where >>> the size was inferred from the folio, as descibed above. While that >>> commit didn't break anything at the time, >> >> If it didn't break anything at that time, then shouldn't the Fixes tag >> be added to this commit? >> >>> it did break the interface >>> because it couldn't handle swap entries without PFNs. And since then new >>> callers have come along which rely on this working. >> >> So the Fixes tag should be added only to the commit that introduces the >> first new callers? > > Well I guess it's a matter of point of view; My view is that 18f3962953e4 is the > buggy change because it broke the interface to not be able to handle swap > entries which do not contain PFNs. The fact that there were no callers that used > the interface in this way at the time of the commit is irrelevant in my view. I understand your point of view. But IIUC, the Fixes tag is used to indicate the version that needs to backport, but the version where the commit 18f3962953e4 is located does not need to backport this bugfix patch. > But I already added 2 fixes tags; one for the buggy commit, and the other for > the commit containing the new user of the interface. I think 2 fixes tags will cause inconvenience to the maintainers. Thanks, Qi > >> >> Other than that, LGTM. > > Thanks! > >> >> Thanks, >> Qi >> >>> >>> Now that we have modified the set_huge_pte_at() interface to pass the >>> vma, we can extract the huge page size from it and fix this issue. >>> >>> I'm tagging the commit that added the uffd poison feature, since that is >>> what exposed the problem, as well as the original change that broke the >>> interface. Hopefully this is valuable for people doing bisect. >>> >>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >>> Fixes: 18f3962953e4 ("mm: hugetlb: kill set_huge_swap_pte_at()") >>> Fixes: 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") >>> --- >>> arch/arm64/mm/hugetlbpage.c | 17 +++-------------- >>> 1 file changed, 3 insertions(+), 14 deletions(-) >>> >>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c >>> index 844832511c1e..a08601a14689 100644 >>> --- a/arch/arm64/mm/hugetlbpage.c >>> +++ b/arch/arm64/mm/hugetlbpage.c >>> @@ -241,13 +241,6 @@ static void clear_flush(struct mm_struct *mm, >>> flush_tlb_range(&vma, saddr, addr); >>> } >>> -static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry) >>> -{ >>> - VM_BUG_ON(!is_migration_entry(entry) && !is_hwpoison_entry(entry)); >>> - >>> - return page_folio(pfn_to_page(swp_offset_pfn(entry))); >>> -} >>> - >>> void set_huge_pte_at(struct vm_area_struct *vma, unsigned long addr, >>> pte_t *ptep, pte_t pte) >>> { >>> @@ -258,13 +251,10 @@ void set_huge_pte_at(struct vm_area_struct *vma, >>> unsigned long addr, >>> unsigned long pfn, dpfn; >>> pgprot_t hugeprot; >>> - if (!pte_present(pte)) { >>> - struct folio *folio; >>> - >>> - folio = hugetlb_swap_entry_to_folio(pte_to_swp_entry(pte)); >>> - ncontig = num_contig_ptes(folio_size(folio), &pgsize); >>> + ncontig = num_contig_ptes(huge_page_size(hstate_vma(vma)), &pgsize); >>> - for (i = 0; i < ncontig; i++, ptep++) >>> + if (!pte_present(pte)) { >>> + for (i = 0; i < ncontig; i++, ptep++, addr += pgsize) >>> set_pte_at(mm, addr, ptep, pte); >>> return; >>> } >>> @@ -274,7 +264,6 @@ void set_huge_pte_at(struct vm_area_struct *vma, unsigned >>> long addr, >>> return; >>> } >>> - ncontig = find_num_contig(mm, addr, ptep, &pgsize); >>> pfn = pte_pfn(pte); >>> dpfn = pgsize >> PAGE_SHIFT; >>> hugeprot = pte_pgprot(pte); >
On 22/09/2023 08:54, Qi Zheng wrote: > Hi Ryan, > > On 2023/9/22 15:40, Ryan Roberts wrote: >> On 22/09/2023 03:54, Qi Zheng wrote: >>> Hi Ryan, >>> >>> On 2023/9/22 00:20, Ryan Roberts wrote: >>>> When called with a swap entry that does not embed a PFN (e.g. >>>> PTE_MARKER_POISONED or PTE_MARKER_UFFD_WP), the previous implementation >>>> of set_huge_pte_at() would either cause a BUG() to fire (if >>>> CONFIG_DEBUG_VM is enabled) or cause a dereference of an invalid address >>>> and subsequent panic. >>>> >>>> arm64's huge pte implementation supports multiple huge page sizes, some >>>> of which are implemented in the page table with contiguous mappings. So >>>> set_huge_pte_at() needs to work out how big the logical pte is, so that >>>> it can also work out how many physical ptes (or pmds) need to be >>>> written. It does this by grabbing the folio out of the pte and querying >>>> its size. >>>> >>>> However, there are cases when the pte being set is actually a swap >>>> entry. But this also used to work fine, because for huge ptes, we only >>>> ever saw migration entries and hwpoison entries. And both of these types >>>> of swap entries have a PFN embedded, so the code would grab that and >>>> everything still worked out. >>>> >>>> But over time, more calls to set_huge_pte_at() have been added that set >>>> swap entry types that do not embed a PFN. And this causes the code to go >>>> bang. The triggering case is for the uffd poison test, commit >>>> 99aa77215ad0 ("selftests/mm: add uffd unit test for UFFDIO_POISON"), >>>> which sets a PTE_MARKER_POISONED swap entry. But review shows there are >>>> other places too (PTE_MARKER_UFFD_WP). >>>> >>>> So the root cause is due to commit 18f3962953e4 ("mm: hugetlb: kill >>>> set_huge_swap_pte_at()"), which aimed to simplify the interface to the >>>> core code by removing set_huge_swap_pte_at() (which took a page size >>>> parameter) and replacing it with calls to set_huge_swap_pte_at() where >>>> the size was inferred from the folio, as descibed above. While that >>>> commit didn't break anything at the time, >>> >>> If it didn't break anything at that time, then shouldn't the Fixes tag >>> be added to this commit? >>> >>>> it did break the interface >>>> because it couldn't handle swap entries without PFNs. And since then new >>>> callers have come along which rely on this working. >>> >>> So the Fixes tag should be added only to the commit that introduces the >>> first new callers? >> >> Well I guess it's a matter of point of view; My view is that 18f3962953e4 is the >> buggy change because it broke the interface to not be able to handle swap >> entries which do not contain PFNs. The fact that there were no callers that used >> the interface in this way at the time of the commit is irrelevant in my view. > > I understand your point of view. > > But IIUC, the Fixes tag is used to indicate the version that needs to > backport, but the version where the commit 18f3962953e4 is located > does not need to backport this bugfix patch. > >> But I already added 2 fixes tags; one for the buggy commit, and the other for >> the commit containing the new user of the interface. > > I think 2 fixes tags will cause inconvenience to the maintainers. > I did some Archaeology: $ git rev-list --no-walk=sorted --pretty=oneline \ 05e90bd05eea33fc77d6b11e121e2da01fee379f \ 60dfaad65aa97fb6755b9798a6b3c9e79bcd5930 \ 8a13897fb0daa8f56821f263f0c63661e1c6acae \ 18f3962953e40401b7ed98e8524167282c3e626e \ v6.5 v5.18 v5.17 v5.19 v6.5-rc6 v6.5-rc7 2dde18cd1d8fac735875f2e4987f11817cc0bc2c Linux 6.5 706a741595047797872e669b3101429ab8d378ef Linux 6.5-rc7 8a13897fb0daa8f56821f263f0c63661e1c6acae mm: userfaultfd: support UFFDIO_POISON for hugetlbfs 2ccdd1b13c591d306f0401d98dedc4bdcd02b421 Linux 6.5-rc6 3d7cb6b04c3f3115719235cc6866b10326de34cd Linux 5.19 18f3962953e40401b7ed98e8524167282c3e626e mm: hugetlb: kill set_huge_swap_pte_at() 4b0986a3613c92f4ec1bdc7f60ec66fea135991f Linux 5.18 05e90bd05eea33fc77d6b11e121e2da01fee379f mm/hugetlb: only drop uffd-wp special pte if required 60dfaad65aa97fb6755b9798a6b3c9e79bcd5930 mm/hugetlb: allow uffd wr-protect none ptes f443e374ae131c168a065ea1748feac6b2e76613 Linux 5.17 So it turns out that the PTE_MARKER_UFFD_WP markers were added first, using set_huge_pte_at(). At the time, this should have used set_huge_swap_pte_at(), so was arguably buggy for that reason. However, arm64 does not support UFFD_WP so none of the call sites that set the PTE_MARKER_UFFD_WP marker to the pte ever trigger on arm64. Then "mm: hugetlb: kill set_huge_swap_pte_at()" came along and "broke" the interface, but there were no callers relying on the behavoir that was broken. Then "mm: userfaultfd: support UFFDIO_POISON for hugetlbfs" came along in v6.5-rc7 and started relying on the broken behaviour of set_huge_pte_at(). So on that basis, I agree that the first commit where broken behaviour is observable is "mm: userfaultfd: support UFFDIO_POISON for hugetlbfs". So I will tag that one as "Fixes". (Although if set_huge_pte_at() was an exported symbol, then we would want to mark "mm: hugetlb: kill set_huge_swap_pte_at()"). Thanks, Ryan > Thanks, > Qi > >> >>> >>> Other than that, LGTM. >> >> Thanks! >> >>> >>> Thanks, >>> Qi >>>
Hi Ryan, On 2023/9/22 17:35, Ryan Roberts wrote: > On 22/09/2023 08:54, Qi Zheng wrote: >> Hi Ryan, >> >> On 2023/9/22 15:40, Ryan Roberts wrote: >>> On 22/09/2023 03:54, Qi Zheng wrote: >>>> Hi Ryan, >>>> >>>> On 2023/9/22 00:20, Ryan Roberts wrote: >>>>> When called with a swap entry that does not embed a PFN (e.g. >>>>> PTE_MARKER_POISONED or PTE_MARKER_UFFD_WP), the previous implementation >>>>> of set_huge_pte_at() would either cause a BUG() to fire (if >>>>> CONFIG_DEBUG_VM is enabled) or cause a dereference of an invalid address >>>>> and subsequent panic. >>>>> >>>>> arm64's huge pte implementation supports multiple huge page sizes, some >>>>> of which are implemented in the page table with contiguous mappings. So >>>>> set_huge_pte_at() needs to work out how big the logical pte is, so that >>>>> it can also work out how many physical ptes (or pmds) need to be >>>>> written. It does this by grabbing the folio out of the pte and querying >>>>> its size. >>>>> >>>>> However, there are cases when the pte being set is actually a swap >>>>> entry. But this also used to work fine, because for huge ptes, we only >>>>> ever saw migration entries and hwpoison entries. And both of these types >>>>> of swap entries have a PFN embedded, so the code would grab that and >>>>> everything still worked out. >>>>> >>>>> But over time, more calls to set_huge_pte_at() have been added that set >>>>> swap entry types that do not embed a PFN. And this causes the code to go >>>>> bang. The triggering case is for the uffd poison test, commit >>>>> 99aa77215ad0 ("selftests/mm: add uffd unit test for UFFDIO_POISON"), >>>>> which sets a PTE_MARKER_POISONED swap entry. But review shows there are >>>>> other places too (PTE_MARKER_UFFD_WP). >>>>> >>>>> So the root cause is due to commit 18f3962953e4 ("mm: hugetlb: kill >>>>> set_huge_swap_pte_at()"), which aimed to simplify the interface to the >>>>> core code by removing set_huge_swap_pte_at() (which took a page size >>>>> parameter) and replacing it with calls to set_huge_swap_pte_at() where >>>>> the size was inferred from the folio, as descibed above. While that >>>>> commit didn't break anything at the time, >>>> >>>> If it didn't break anything at that time, then shouldn't the Fixes tag >>>> be added to this commit? >>>> >>>>> it did break the interface >>>>> because it couldn't handle swap entries without PFNs. And since then new >>>>> callers have come along which rely on this working. >>>> >>>> So the Fixes tag should be added only to the commit that introduces the >>>> first new callers? >>> >>> Well I guess it's a matter of point of view; My view is that 18f3962953e4 is the >>> buggy change because it broke the interface to not be able to handle swap >>> entries which do not contain PFNs. The fact that there were no callers that used >>> the interface in this way at the time of the commit is irrelevant in my view. >> >> I understand your point of view. >> >> But IIUC, the Fixes tag is used to indicate the version that needs to >> backport, but the version where the commit 18f3962953e4 is located >> does not need to backport this bugfix patch. >> >>> But I already added 2 fixes tags; one for the buggy commit, and the other for >>> the commit containing the new user of the interface. >> >> I think 2 fixes tags will cause inconvenience to the maintainers. >> > > I did some Archaeology: Nice! Thanks for doing this. > > $ git rev-list --no-walk=sorted --pretty=oneline \ > 05e90bd05eea33fc77d6b11e121e2da01fee379f \ > 60dfaad65aa97fb6755b9798a6b3c9e79bcd5930 \ > 8a13897fb0daa8f56821f263f0c63661e1c6acae \ > 18f3962953e40401b7ed98e8524167282c3e626e \ > v6.5 v5.18 v5.17 v5.19 v6.5-rc6 v6.5-rc7 > > 2dde18cd1d8fac735875f2e4987f11817cc0bc2c Linux 6.5 > 706a741595047797872e669b3101429ab8d378ef Linux 6.5-rc7 > 8a13897fb0daa8f56821f263f0c63661e1c6acae mm: userfaultfd: support UFFDIO_POISON for hugetlbfs > 2ccdd1b13c591d306f0401d98dedc4bdcd02b421 Linux 6.5-rc6 > 3d7cb6b04c3f3115719235cc6866b10326de34cd Linux 5.19 > 18f3962953e40401b7ed98e8524167282c3e626e mm: hugetlb: kill set_huge_swap_pte_at() > 4b0986a3613c92f4ec1bdc7f60ec66fea135991f Linux 5.18 > 05e90bd05eea33fc77d6b11e121e2da01fee379f mm/hugetlb: only drop uffd-wp special pte if required > 60dfaad65aa97fb6755b9798a6b3c9e79bcd5930 mm/hugetlb: allow uffd wr-protect none ptes > f443e374ae131c168a065ea1748feac6b2e76613 Linux 5.17 > > > So it turns out that the PTE_MARKER_UFFD_WP markers were added first, using > set_huge_pte_at(). At the time, this should have used set_huge_swap_pte_at(), so > was arguably buggy for that reason. However, arm64 does not support UFFD_WP so > none of the call sites that set the PTE_MARKER_UFFD_WP marker to the pte ever > trigger on arm64. > > Then "mm: hugetlb: kill set_huge_swap_pte_at()" came along and "broke" the > interface, but there were no callers relying on the behavoir that was broken. > > Then "mm: userfaultfd: support UFFDIO_POISON for hugetlbfs" came along in > v6.5-rc7 and started relying on the broken behaviour of set_huge_pte_at(). Got it. > > So on that basis, I agree that the first commit where broken behaviour is > observable is "mm: userfaultfd: support UFFDIO_POISON for hugetlbfs". So I will > tag that one as "Fixes". (Although if set_huge_pte_at() was an exported symbol, > then we would want to mark "mm: hugetlb: kill set_huge_swap_pte_at()"). Agree. I just checked the time point when 18f3962953e4 was added, neither set_huge_pte_at() nor set_huge_swap_pte_at() are exported symbols. Thanks, Qi > > Thanks, > Ryan > > > > >> Thanks, >> Qi >> >>> >>>> >>>> Other than that, LGTM. >>> >>> Thanks! >>> >>>> >>>> Thanks, >>>> Qi >>>>
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 844832511c1e..a08601a14689 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -241,13 +241,6 @@ static void clear_flush(struct mm_struct *mm, flush_tlb_range(&vma, saddr, addr); } -static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry) -{ - VM_BUG_ON(!is_migration_entry(entry) && !is_hwpoison_entry(entry)); - - return page_folio(pfn_to_page(swp_offset_pfn(entry))); -} - void set_huge_pte_at(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, pte_t pte) { @@ -258,13 +251,10 @@ void set_huge_pte_at(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn, dpfn; pgprot_t hugeprot; - if (!pte_present(pte)) { - struct folio *folio; - - folio = hugetlb_swap_entry_to_folio(pte_to_swp_entry(pte)); - ncontig = num_contig_ptes(folio_size(folio), &pgsize); + ncontig = num_contig_ptes(huge_page_size(hstate_vma(vma)), &pgsize); - for (i = 0; i < ncontig; i++, ptep++) + if (!pte_present(pte)) { + for (i = 0; i < ncontig; i++, ptep++, addr += pgsize) set_pte_at(mm, addr, ptep, pte); return; } @@ -274,7 +264,6 @@ void set_huge_pte_at(struct vm_area_struct *vma, unsigned long addr, return; } - ncontig = find_num_contig(mm, addr, ptep, &pgsize); pfn = pte_pfn(pte); dpfn = pgsize >> PAGE_SHIFT; hugeprot = pte_pgprot(pte);