Message ID | 20231204172646.2541916-1-jthoughton@google.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp2916508vqy; Mon, 4 Dec 2023 09:27:12 -0800 (PST) X-Google-Smtp-Source: AGHT+IFqfG1E9jQ6fZggJOsQQdxpf9C6Mn6/6CUn+hgTV4O1GRxyDokXNOosu7qHrg3jieNMjzt8 X-Received: by 2002:a05:6a00:9398:b0:6cb:d24c:4a9f with SMTP id ka24-20020a056a00939800b006cbd24c4a9fmr5564036pfb.29.1701710832114; Mon, 04 Dec 2023 09:27:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701710832; cv=none; d=google.com; s=arc-20160816; b=EcQfyZ5G20sVVIez2OBWC1gmBw44zAnXRXl9boRKLP4uWIE8A4TRoI+Hr+w1LUM99d YkcWJelmEcGStTfLAqYZDM9EEZmHczqU21I0coPAKa+c+3zy99z2occYcL7FcoML4VQY RqtadWDqSS0fH+YlYotnvgbwpdGY8EQAosJaoMj7PwfGKRZ10P0kWjxWh5KxNNCvjC3c Mr36VRC2Lkz/m6uBgckNiqzdLlkY/m6W4bNPEnVottYiSJBS3itLcE9vKgjGEENqg1l/ 4m98aps31Sz2GVsFYGztRK2alWajqClrnrGnYj/D7SdyUYSj64Ed1feyZ/PSaPGqoWLX 9Nrw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:mime-version:date :dkim-signature; bh=M646tH9YYmwVxUliD4L7Z8ZBWWZDACIb65kUaN+JDEY=; fh=MJxrAMNk1xt5eTwPYlce6LT4l1O5DiRP7H6bsCVP2JY=; b=sdy7LS/TGyg/ViFkGq019K/BJ+nqTqY8YU+d9U4Tlq6xAgfdtf5jtYIfrYDap1BMfr YsazUxSqIoGGT7qik8CgpIjzWG/0irqrCJBPcArH4bTTBTjPIoauAl5Zualo2cskrQfs ZIb4A6fV3jtn8NPRefFKGB8poFZHmI4mr5hsLdcctG1V6pm/dm0bRTHnlkyMrfzBt+Ho cmqc3aK7FS5klcaN0rVVZu/T3XZoRXhP8kfxpLfXPb141VxEET3Afn7b8L0lbzGFvUdz Rj7yWfqJxDriyovb9UVKEJIFHNf6valzmVHz33dUezacoTQcLrJll8gBeVVPtjBsE0Rs gwLg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=lTt710ez; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id f1-20020a056a000b0100b006ce403f239bsi3141733pfu.68.2023.12.04.09.27.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Dec 2023 09:27:12 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=lTt710ez; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id E9DBD807665B; Mon, 4 Dec 2023 09:27:08 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230513AbjLDR04 (ORCPT <rfc822;chrisfriedt@gmail.com> + 99 others); Mon, 4 Dec 2023 12:26:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55380 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229485AbjLDR0z (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 4 Dec 2023 12:26:55 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 312AE83 for <linux-kernel@vger.kernel.org>; Mon, 4 Dec 2023 09:27:02 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-5caf61210e3so66086277b3.0 for <linux-kernel@vger.kernel.org>; Mon, 04 Dec 2023 09:27:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701710821; x=1702315621; darn=vger.kernel.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=M646tH9YYmwVxUliD4L7Z8ZBWWZDACIb65kUaN+JDEY=; b=lTt710ezl89QNlOsCBl4a9+pweU89TFCWVh9hqGjr29urQKjCQaQEuuk3oHrYnR8ld 2GNveWe990ytW/1NxCWPohuoubvBzUqfem0UlypiZQU/eK01QD0PEkYE7bGcyomoFHF2 k4FW+4Yfshiv5SNqKVcc/qUdSBWvNodeZAyR+fU4CdrLluxjWDJ5Ja7KRjcZdXvu1Xpr UhAyo6hDs3ZdjNbwMTgpoNer4uOiBoOwQ87zHFWJlbqAR/ON/LYK1RTVaTCg7KRb8rtK WPKeoYHTNWaTwN5Tnz9mYCrdsb/g+zaYck4oPW+DZiGBQlN6CI/PpN4msxXJOOBP+Wsl YY4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701710821; x=1702315621; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=M646tH9YYmwVxUliD4L7Z8ZBWWZDACIb65kUaN+JDEY=; b=aaKeX2LDnMT46cxs42k4ZPqdJRyYKLm42N2hukUOzuZupwe8kq6tOMng/3Fz6gkzXW SO+23N32KxwEJ6x7YpRzxKaBOONPLCpjaiaGdNcHRz2uGyJQUXVsm9EM0/ZPRCShoEFH Ok/tdumQLSBIXmtKXuHygOdS1B15LtBNSYM3CAmlrTr5n63Z+xFFO852QOdHMadBtceC w+ZSwlou5QhaNOAGf2kDgmaX+VNKTurI81th/cna20BH9xSQIpIWC4aSmwXUVp4oMaZ7 ojMyhEbdt1UPzMo4+tnMpBjRTVpcrBWkKXgYytvxa8ROqba06TLTn8BSVTmI1Uzuodfw Xpag== X-Gm-Message-State: AOJu0Yx55+kKIHw79GSnKBQc7036i5WOopn22hC6craxLcyCZIumW1pi R/jCf9AahbGrC6h25iONY3JP8PhSiDtWhzP5 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a0d:d60d:0:b0:5d3:9222:a83e with SMTP id y13-20020a0dd60d000000b005d39222a83emr415074ywd.10.1701710821479; Mon, 04 Dec 2023 09:27:01 -0800 (PST) Date: Mon, 4 Dec 2023 17:26:44 +0000 Mime-Version: 1.0 X-Mailer: git-send-email 2.43.0.rc2.451.g8631bc7472-goog Message-ID: <20231204172646.2541916-1-jthoughton@google.com> Subject: [PATCH 0/2] arm64: hugetlb: Fix page fault loop for sw-dirty/hw-clean contiguous PTEs From: James Houghton <jthoughton@google.com> To: Steve Capper <steve.capper@arm.com>, Will Deacon <will@kernel.org>, Andrew Morton <akpm@linux-foundation.org> Cc: Mike Kravetz <mike.kravetz@oracle.com>, Muchun Song <songmuchun@bytedance.com>, Anshuman Khandual <anshuman.khandual@arm.com>, Catalin Marinas <catalin.marinas@arm.com>, Ryan Roberts <ryan.roberts@arm.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton <jthoughton@google.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Mon, 04 Dec 2023 09:27:09 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784373137218348865 X-GMAIL-MSGID: 1784373137218348865 |
Series |
arm64: hugetlb: Fix page fault loop for sw-dirty/hw-clean contiguous PTEs
|
|
Message
James Houghton
Dec. 4, 2023, 5:26 p.m. UTC
It is currently possible for a userspace application to enter a page fault loop when using HugeTLB pages implemented with contiguous PTEs when HAFDBS is not available. This happens because: 1. The kernel may sometimes write PTEs that are sw-dirty but hw-clean (PTE_DIRTY | PTE_RDONLY | PTE_WRITE). 2. If, during a write, the CPU uses a sw-dirty, hw-clean PTE in handling the memory access on a system without HAFDBS, we will get a page fault. 3. HugeTLB will check if it needs to update the dirty bits on the PTE. For contiguous PTEs, it will check to see if the pgprot bits need updating. In this case, HugeTLB wants to write a sequence of sw-dirty, hw-dirty PTEs, but it finds that all the PTEs it is about to overwrite are all pte_dirty() (pte_sw_dirty() => pte_dirty()), so it thinks no update is necessary. Please see this[1] reproducer. I think (though I may be wrong) that both step (1) and step (3) are buggy. The first patch in this series fixes step (3); instead of checking if pte_dirty is matching in __cont_access_flags_changed, check pte_hw_dirty and pte_sw_dirty separately. The second patch in this series makes step (1) less likely to occur. Without this patch, we can get the kernel to write a sw-dirty, hw-clean PTE with the following steps (showing the relevant VMA flags and pgprot bits): i. Create a valid, writable contiguous PTE. VMA vmflags: VM_SHARED | VM_READ | VM_WRITE VMA pgprot bits: PTE_RDONLY | PTE_WRITE PTE pgprot bits: PTE_DIRTY | PTE_WRITE ii. mprotect the VMA to PROT_NONE. VMA vmflags: VM_SHARED VMA pgprot bits: PTE_RDONLY PTE pgprot bits: PTE_DIRTY | PTE_RDONLY iii. mprotect the VMA back to PROT_READ | PROT_WRITE. VMA vmflags: VM_SHARED | VM_READ | VM_WRITE VMA pgprot bits: PTE_RDONLY | PTE_WRITE PTE pgprot bits: PTE_DIRTY | PTE_WRITE | PTE_RDONLY Applying either one of the two patches in this patchset will fix the particular issue with HugeTLB pages implemented with contiguous PTEs. It's possible that only one of these patches should be taken, or that the right fix is something else entirely. [1]: https://gist.github.com/48ca/11d1e466deee032cb35aa8c2280f93b0 James Houghton (2): arm64: hugetlb: Distinguish between hw and sw dirtiness in __cont_access_flags_changed arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify arch/arm64/include/asm/pgtable.h | 6 ++++++ arch/arm64/mm/hugetlbpage.c | 5 ++++- 2 files changed, 10 insertions(+), 1 deletion(-) base-commit: 645a9a454fdb7e698a63a275edca6a17ef97afc4
Comments
On 04/12/2023 17:26, James Houghton wrote: > It is currently possible for a userspace application to enter a page > fault loop when using HugeTLB pages implemented with contiguous PTEs > when HAFDBS is not available. This happens because: > 1. The kernel may sometimes write PTEs that are sw-dirty but hw-clean > (PTE_DIRTY | PTE_RDONLY | PTE_WRITE). Hi James, Do you know how this happens? AFAIK, this is the set of valid bit combinations, and PTE_RDONLY|PTE_WRITE|PTE_DIRTY is not one of them. Perhaps the real solution is to understand how this is happening and prevent it? /* * PTE bits configuration in the presence of hardware Dirty Bit Management * (PTE_WRITE == PTE_DBM): * * Dirty Writable | PTE_RDONLY PTE_WRITE PTE_DIRTY (sw) * 0 0 | 1 0 0 * 0 1 | 1 1 0 * 1 0 | 1 0 1 * 1 1 | 0 1 x * * When hardware DBM is not present, the sofware PTE_DIRTY bit is updated via * the page fault mechanism. Checking the dirty status of a pte becomes: * * PTE_DIRTY || (PTE_WRITE && !PTE_RDONLY) */ Thanks, Ryan > 2. If, during a write, the CPU uses a sw-dirty, hw-clean PTE in handling > the memory access on a system without HAFDBS, we will get a page > fault. > 3. HugeTLB will check if it needs to update the dirty bits on the PTE. > For contiguous PTEs, it will check to see if the pgprot bits need > updating. In this case, HugeTLB wants to write a sequence of > sw-dirty, hw-dirty PTEs, but it finds that all the PTEs it is about > to overwrite are all pte_dirty() (pte_sw_dirty() => pte_dirty()), > so it thinks no update is necessary. > > Please see this[1] reproducer. > > I think (though I may be wrong) that both step (1) and step (3) are > buggy. > > The first patch in this series fixes step (3); instead of checking if > pte_dirty is matching in __cont_access_flags_changed, check pte_hw_dirty > and pte_sw_dirty separately. > > The second patch in this series makes step (1) less likely to occur. > Without this patch, we can get the kernel to write a sw-dirty, hw-clean > PTE with the following steps (showing the relevant VMA flags and pgprot > bits): > i. Create a valid, writable contiguous PTE. > VMA vmflags: VM_SHARED | VM_READ | VM_WRITE > VMA pgprot bits: PTE_RDONLY | PTE_WRITE > PTE pgprot bits: PTE_DIRTY | PTE_WRITE > ii. mprotect the VMA to PROT_NONE. > VMA vmflags: VM_SHARED > VMA pgprot bits: PTE_RDONLY > PTE pgprot bits: PTE_DIRTY | PTE_RDONLY > iii. mprotect the VMA back to PROT_READ | PROT_WRITE. > VMA vmflags: VM_SHARED | VM_READ | VM_WRITE > VMA pgprot bits: PTE_RDONLY | PTE_WRITE > PTE pgprot bits: PTE_DIRTY | PTE_WRITE | PTE_RDONLY > > Applying either one of the two patches in this patchset will fix the > particular issue with HugeTLB pages implemented with contiguous PTEs. > It's possible that only one of these patches should be taken, or that > the right fix is something else entirely. > > [1]: https://gist.github.com/48ca/11d1e466deee032cb35aa8c2280f93b0 > > James Houghton (2): > arm64: hugetlb: Distinguish between hw and sw dirtiness in > __cont_access_flags_changed > arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify > > arch/arm64/include/asm/pgtable.h | 6 ++++++ > arch/arm64/mm/hugetlbpage.c | 5 ++++- > 2 files changed, 10 insertions(+), 1 deletion(-) > > > base-commit: 645a9a454fdb7e698a63a275edca6a17ef97afc4
On Tue, Dec 5, 2023 at 6:43 AM Ryan Roberts <ryan.roberts@arm.com> wrote: > > On 04/12/2023 17:26, James Houghton wrote: > > It is currently possible for a userspace application to enter a page > > fault loop when using HugeTLB pages implemented with contiguous PTEs > > when HAFDBS is not available. This happens because: > > 1. The kernel may sometimes write PTEs that are sw-dirty but hw-clean > > (PTE_DIRTY | PTE_RDONLY | PTE_WRITE). > > Hi James, > > Do you know how this happens? Hi Ryan, Thanks for taking a look! I do understand why this is happening. There is an explanation in the reproducer[1] and also in this cover letter (though I realize I could have been a little clearer). See below. > AFAIK, this is the set of valid bit combinations, and > PTE_RDONLY|PTE_WRITE|PTE_DIRTY is not one of them. Perhaps the real solution is > to understand how this is happening and prevent it? > > /* > * PTE bits configuration in the presence of hardware Dirty Bit Management > * (PTE_WRITE == PTE_DBM): > * > * Dirty Writable | PTE_RDONLY PTE_WRITE PTE_DIRTY (sw) > * 0 0 | 1 0 0 > * 0 1 | 1 1 0 > * 1 0 | 1 0 1 > * 1 1 | 0 1 x > * > * When hardware DBM is not present, the sofware PTE_DIRTY bit is updated via > * the page fault mechanism. Checking the dirty status of a pte becomes: > * > * PTE_DIRTY || (PTE_WRITE && !PTE_RDONLY) > */ Thanks for pointing this out. So (1) is definitely a bug. The second patch in this series makes it impossible to create such a PTE via pte_modify (by forcing sw-dirty PTEs to be hw-dirty as well). > > The second patch in this series makes step (1) less likely to occur. It makes it impossible to create this invalid set of bits via pte_modify(). Assuming all PTE pgprot updates are done via the proper interfaces, patch #2 might actually make this invalid bit combination impossible to produce (that's certainly the goal). So perhaps language stronger than "less likely" is appropriate. Here's the sequence of events to trigger this bug, via mprotect(): > > Without this patch, we can get the kernel to write a sw-dirty, hw-clean > > PTE with the following steps (showing the relevant VMA flags and pgprot > > bits): > > i. Create a valid, writable contiguous PTE. > > VMA vmflags: VM_SHARED | VM_READ | VM_WRITE > > VMA pgprot bits: PTE_RDONLY | PTE_WRITE > > PTE pgprot bits: PTE_DIRTY | PTE_WRITE > > ii. mprotect the VMA to PROT_NONE. > > VMA vmflags: VM_SHARED > > VMA pgprot bits: PTE_RDONLY > > PTE pgprot bits: PTE_DIRTY | PTE_RDONLY > > iii. mprotect the VMA back to PROT_READ | PROT_WRITE. > > VMA vmflags: VM_SHARED | VM_READ | VM_WRITE > > VMA pgprot bits: PTE_RDONLY | PTE_WRITE > > PTE pgprot bits: PTE_DIRTY | PTE_WRITE | PTE_RDONLY With patch #2, the PTE pgprot bits in step iii become PTE_DIRTY | PTE_WRITE (hw-dirtiness is set, as the PTE is sw-dirty). Thanks! > > [1]: https://gist.github.com/48ca/11d1e466deee032cb35aa8c2280f93b0
On 05/12/2023 17:54, James Houghton wrote: > On Tue, Dec 5, 2023 at 6:43 AM Ryan Roberts <ryan.roberts@arm.com> wrote: >> >> On 04/12/2023 17:26, James Houghton wrote: >>> It is currently possible for a userspace application to enter a page >>> fault loop when using HugeTLB pages implemented with contiguous PTEs >>> when HAFDBS is not available. This happens because: >>> 1. The kernel may sometimes write PTEs that are sw-dirty but hw-clean >>> (PTE_DIRTY | PTE_RDONLY | PTE_WRITE). >> >> Hi James, >> >> Do you know how this happens? > > Hi Ryan, > > Thanks for taking a look! I do understand why this is happening. There > is an explanation in the reproducer[1] and also in this cover letter > (though I realize I could have been a little clearer). See below. Sigh... sorry! I totally missed your (excellent) explanation. > >> AFAIK, this is the set of valid bit combinations, and >> PTE_RDONLY|PTE_WRITE|PTE_DIRTY is not one of them. Perhaps the real solution is >> to understand how this is happening and prevent it? >> >> /* >> * PTE bits configuration in the presence of hardware Dirty Bit Management >> * (PTE_WRITE == PTE_DBM): >> * >> * Dirty Writable | PTE_RDONLY PTE_WRITE PTE_DIRTY (sw) >> * 0 0 | 1 0 0 >> * 0 1 | 1 1 0 >> * 1 0 | 1 0 1 >> * 1 1 | 0 1 x >> * >> * When hardware DBM is not present, the sofware PTE_DIRTY bit is updated via >> * the page fault mechanism. Checking the dirty status of a pte becomes: >> * >> * PTE_DIRTY || (PTE_WRITE && !PTE_RDONLY) >> */ > > Thanks for pointing this out. So (1) is definitely a bug. The second > patch in this series makes it impossible to create such a PTE via > pte_modify (by forcing sw-dirty PTEs to be hw-dirty as well). Yes; I think the second patch should be sufficient; I took a quick look at the other helpers and I don't see anything else that could get the PTE to the invalid state. I have a series that starts using the contpte bit for (multi-size) THP opportunistically. This bug will affect that too I think. Your patch #2 will fix for both hugetlb and my series. I'd rather not apply an equivalent to your patch #1 because its not quite as straightforward in my code path. But I'm pretty confident that patch # is all that's needed here. Thanks, Ryan > >>> The second patch in this series makes step (1) less likely to occur. > > It makes it impossible to create this invalid set of bits via > pte_modify(). Assuming all PTE pgprot updates are done via the proper > interfaces, patch #2 might actually make this invalid bit combination > impossible to produce (that's certainly the goal). So perhaps language > stronger than "less likely" is appropriate. > > Here's the sequence of events to trigger this bug, via mprotect(): > >>> Without this patch, we can get the kernel to write a sw-dirty, hw-clean >>> PTE with the following steps (showing the relevant VMA flags and pgprot >>> bits): >>> i. Create a valid, writable contiguous PTE. >>> VMA vmflags: VM_SHARED | VM_READ | VM_WRITE >>> VMA pgprot bits: PTE_RDONLY | PTE_WRITE >>> PTE pgprot bits: PTE_DIRTY | PTE_WRITE >>> ii. mprotect the VMA to PROT_NONE. >>> VMA vmflags: VM_SHARED >>> VMA pgprot bits: PTE_RDONLY >>> PTE pgprot bits: PTE_DIRTY | PTE_RDONLY >>> iii. mprotect the VMA back to PROT_READ | PROT_WRITE. >>> VMA vmflags: VM_SHARED | VM_READ | VM_WRITE >>> VMA pgprot bits: PTE_RDONLY | PTE_WRITE >>> PTE pgprot bits: PTE_DIRTY | PTE_WRITE | PTE_RDONLY > > With patch #2, the PTE pgprot bits in step iii become PTE_DIRTY | > PTE_WRITE (hw-dirtiness is set, as the PTE is sw-dirty). > > Thanks! > >>> [1]: https://gist.github.com/48ca/11d1e466deee032cb35aa8c2280f93b0
On Wed, Dec 6, 2023 at 2:24 AM Ryan Roberts <ryan.roberts@arm.com> wrote: > > On 05/12/2023 17:54, James Houghton wrote: > > On Tue, Dec 5, 2023 at 6:43 AM Ryan Roberts <ryan.roberts@arm.com> wrote: > > Thanks for pointing this out. So (1) is definitely a bug. The second > > patch in this series makes it impossible to create such a PTE via > > pte_modify (by forcing sw-dirty PTEs to be hw-dirty as well). > > Yes; I think the second patch should be sufficient; I took a quick look at the > other helpers and I don't see anything else that could get the PTE to the > invalid state. > > I have a series that starts using the contpte bit for (multi-size) THP > opportunistically. This bug will affect that too I think. Your patch #2 will fix > for both hugetlb and my series. I'd rather not apply an equivalent to your patch > #1 because its not quite as straightforward in my code path. But I'm pretty > confident that patch # is all that's needed here. There is no need to apply a patch #1-equivalent for multi-size THPs. :) If multi-size THP has the same problem as HugeTLB, patch #2 will fix it too. I don't think multi-size THP has the equivalent problem -- in fact, I'm not sure how multi-size THP keeps the PTE_DIRTY, PTE_WRITE (DBM), and the PTE_RDONLY bits in sync (they do need to be in-sync with each other when the contiguous bit is being used, right?). I included patch #1 (with cc:stable) because it's a more direct fix for HugeTLB that might be slightly easier to backport. If you think that patch #1 should be dropped and patch #2 should be backported, please let me know. Thanks for the review!
On Mon, 04 Dec 2023 17:26:44 +0000, James Houghton wrote: > It is currently possible for a userspace application to enter a page > fault loop when using HugeTLB pages implemented with contiguous PTEs > when HAFDBS is not available. This happens because: > 1. The kernel may sometimes write PTEs that are sw-dirty but hw-clean > (PTE_DIRTY | PTE_RDONLY | PTE_WRITE). > 2. If, during a write, the CPU uses a sw-dirty, hw-clean PTE in handling > the memory access on a system without HAFDBS, we will get a page > fault. > 3. HugeTLB will check if it needs to update the dirty bits on the PTE. > For contiguous PTEs, it will check to see if the pgprot bits need > updating. In this case, HugeTLB wants to write a sequence of > sw-dirty, hw-dirty PTEs, but it finds that all the PTEs it is about > to overwrite are all pte_dirty() (pte_sw_dirty() => pte_dirty()), > so it thinks no update is necessary. > > [...] Applied to arm64 (for-next/fixes), thanks! [2/2] arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify https://git.kernel.org/arm64/c/3c0696076aad I only picked up the second patch and added the description from the cover letter into the commit log.