Message ID | 20230413231120.544685-2-peterx@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1945vqo; Thu, 13 Apr 2023 16:19:05 -0700 (PDT) X-Google-Smtp-Source: AKy350ad+xkqeFleXOCnHEMfXILIUgxS8mGffFkj6oz7xOupDEHcEH8rV6GfKR+n43qYrQztjVUV X-Received: by 2002:aca:1809:0:b0:363:a5fd:9cd5 with SMTP id h9-20020aca1809000000b00363a5fd9cd5mr1979331oih.3.1681427945694; Thu, 13 Apr 2023 16:19:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681427945; cv=none; d=google.com; s=arc-20160816; b=qsnHPNLPSSI541ZGqD2Zo6wtkhn2GdZF+dZ6xzzhhasSVg1810z+v2gWdN9ciebTmp 4Lu/ki/X2g+JcEdjc15O347QYunbg4SqN6DwMBUCPdAixwSLo3GQQHD8nrfzOMaFDj/m jzaCGGmOz06CdY5vGUWjiAYfDhDL+1olcJ/KSrOOrN9H9ygEZIwDSZcdhea40YA4vv+Z 7zouPtsfyAK95xy8Ga3ILX53WuKVgPYwF6ku/s6kX2eas0eHAhjMIjni2FzuQIjpjQPn zhLrUBEoemcmtLUx/byjN0/PIW0b0uVTUwgWlxHId0kFg6P6bxJv/CF2KC7/trFKRA8I KT9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=yQk3ShNsJBmwBAeq1XAgk+wt78LOM0H1LyrpvOkMJhI=; b=sK0iu6beIQhNdnvlfM8nmdeDLtgJVoGO/jaBUyX5vHCDoRw/UGXfY69Xz06Lt8o0H8 hDg8vsQ7rmZha/ULgOvz8Sxw7Dje/5JP876nACjGw3OUXtMztUc0SlK4BxZ6KhqaG6On 1GaCGTHPsukER4qram3q3Sj1T5ktxwqHU01t4rAp6AQS5t74exncyQ9Yg4KSn4aAkbuq 5s5qEl+1fmjieg1XVsIQ/bHf2OXjqfvTozYLygDuTJ+uAW+E6DXsYaQ100J5GkA1G572 pTkd70NIzGWWrfTKL32Wb8kwD1xxHSuJOkKrYsFaoHpxRXFickJKDn/KlY+K8gERLZK2 q13A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Pj0oA1Va; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x16-20020a544010000000b0037f84c78ed7si2584183oie.71.2023.04.13.16.18.29; Thu, 13 Apr 2023 16:19:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Pj0oA1Va; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230305AbjDMXMW (ORCPT <rfc822;peter110.wang@gmail.com> + 99 others); Thu, 13 Apr 2023 19:12:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43002 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230253AbjDMXMS (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 13 Apr 2023 19:12:18 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EE1254488 for <linux-kernel@vger.kernel.org>; Thu, 13 Apr 2023 16:11:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681427487; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yQk3ShNsJBmwBAeq1XAgk+wt78LOM0H1LyrpvOkMJhI=; b=Pj0oA1VaVaTBz7Co/mHv7+3QEOlqTk/prCxWkwerwaFt1yDy7Lzmj6IrjslmO3Ryyht08R gXKCHMXs9YK4YXdNU3KJWOLEWNgbLW7MKsnBHcu7bjdShCWi8ngSFoVT7nW1fkKCyjuD/N 9SvoktEMDqwxidZFnkQ/vOQor1M73Os= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-177-MrpqThBnNPWN7iSdYw2WJA-1; Thu, 13 Apr 2023 19:11:26 -0400 X-MC-Unique: MrpqThBnNPWN7iSdYw2WJA-1 Received: by mail-qt1-f199.google.com with SMTP id d75a77b69052e-3e699015faeso6672201cf.0 for <linux-kernel@vger.kernel.org>; Thu, 13 Apr 2023 16:11:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681427484; x=1684019484; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yQk3ShNsJBmwBAeq1XAgk+wt78LOM0H1LyrpvOkMJhI=; b=g8b6WmadfC2NmQDbfmDyPceJY6eYY8V/U0OXh5egq63F2Vi85m8TrTmvIX1nhzMMmx lmH5Eczde+rCPvVtYl/uHuYJtTDMEWT6b+ZbG02iT6pfGa60aDvBfVjyRbK308LxPAMx Au35nep0E8WRqfAre1eqEP/rfOjJwM2uNc8UgAxLIEb+fi8wJp/tcrIQvpsJt2iGtPv8 wwTAh0RLnL2YwyXD6ZDC5kp41UhbsoUvv425gCgu6/WjjaDom3Qs7JNyxZr2r5+AevLD 62dZGy6IXgnVB7+PHLd4X727R3b5oZjd3JWfLB5KeP0XEuvSwM8fZXYY5qYpBImOFsul c59Q== X-Gm-Message-State: AAQBX9fi2rMT9jOGElttwPmEz6hSo3i98VCN+NY5jLzyMMER5SMUD/N9 72onLwON3IAnE8Vjpe4tkxNmVhKYnhQvTrkYnmU7ONqAn1qIoiN7XqKQdTU9Xzec1m70cQdILbN yupi8hCvZExDzY1l7auVbfShV2AY2bqJxKko8hhEasbOwjyuPDnzfaf+eFrs/Tx1qazMQUrL+c7 +XWDtl4w== X-Received: by 2002:a05:622a:19a1:b0:3e3:98cf:68ae with SMTP id u33-20020a05622a19a100b003e398cf68aemr1101164qtc.2.1681427484669; Thu, 13 Apr 2023 16:11:24 -0700 (PDT) X-Received: by 2002:a05:622a:19a1:b0:3e3:98cf:68ae with SMTP id u33-20020a05622a19a100b003e398cf68aemr1101122qtc.2.1681427484321; Thu, 13 Apr 2023 16:11:24 -0700 (PDT) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-40-70-52-229-124.dsl.bell.ca. [70.52.229.124]) by smtp.gmail.com with ESMTPSA id g3-20020ac81243000000b003ea1b97acfasm612446qtj.49.2023.04.13.16.11.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Apr 2023 16:11:23 -0700 (PDT) From: Peter Xu <peterx@redhat.com> To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Axel Rasmussen <axelrasmussen@google.com>, Andrew Morton <akpm@linux-foundation.org>, David Hildenbrand <david@redhat.com>, peterx@redhat.com, Mike Kravetz <mike.kravetz@oracle.com>, Nadav Amit <nadav.amit@gmail.com>, Andrea Arcangeli <aarcange@redhat.com>, linux-stable <stable@vger.kernel.org> Subject: [PATCH 1/6] mm/hugetlb: Fix uffd-wp during fork() Date: Thu, 13 Apr 2023 19:11:15 -0400 Message-Id: <20230413231120.544685-2-peterx@redhat.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230413231120.544685-1-peterx@redhat.com> References: <20230413231120.544685-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1763104989732649449?= X-GMAIL-MSGID: =?utf-8?q?1763104989732649449?= |
Series |
mm/hugetlb: More fixes around uffd-wp vs fork() / RO pins
|
|
Commit Message
Peter Xu
April 13, 2023, 11:11 p.m. UTC
There're a bunch of things that were wrong:
- Reading uffd-wp bit from a swap entry should use pte_swp_uffd_wp()
rather than huge_pte_uffd_wp().
- When copying over a pte, we should drop uffd-wp bit when
!EVENT_FORK (aka, when !userfaultfd_wp(dst_vma)).
- When doing early CoW for private hugetlb (e.g. when the parent page was
pinned), uffd-wp bit should be properly carried over if necessary.
No bug reported probably because most people do not even care about these
corner cases, but they are still bugs and can be exposed by the recent unit
tests introduced, so fix all of them in one shot.
Cc: linux-stable <stable@vger.kernel.org>
Fixes: bc70fbf269fd ("mm/hugetlb: handle uffd-wp during fork()")
Signed-off-by: Peter Xu <peterx@redhat.com>
---
mm/hugetlb.c | 26 ++++++++++++++++----------
1 file changed, 16 insertions(+), 10 deletions(-)
Comments
On 14.04.23 01:11, Peter Xu wrote: > There're a bunch of things that were wrong: > > - Reading uffd-wp bit from a swap entry should use pte_swp_uffd_wp() > rather than huge_pte_uffd_wp(). > > - When copying over a pte, we should drop uffd-wp bit when > !EVENT_FORK (aka, when !userfaultfd_wp(dst_vma)). > > - When doing early CoW for private hugetlb (e.g. when the parent page was > pinned), uffd-wp bit should be properly carried over if necessary. > > No bug reported probably because most people do not even care about these > corner cases, but they are still bugs and can be exposed by the recent unit > tests introduced, so fix all of them in one shot. > > Cc: linux-stable <stable@vger.kernel.org> > Fixes: bc70fbf269fd ("mm/hugetlb: handle uffd-wp during fork()") > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > mm/hugetlb.c | 26 ++++++++++++++++---------- > 1 file changed, 16 insertions(+), 10 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index f16b25b1a6b9..7320e64aacc6 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -4953,11 +4953,15 @@ static bool is_hugetlb_entry_hwpoisoned(pte_t pte) > > static void > hugetlb_install_folio(struct vm_area_struct *vma, pte_t *ptep, unsigned long addr, > - struct folio *new_folio) > + struct folio *new_folio, pte_t old) > { Nit: The function now expects old to be !swap_pte. Which works perfectly fine with existing code -- the function name is a bit generic and misleading, unfortunately. IMHO, instead of factoring that functionality out to desperately try keeping copy_hugetlb_page_range() somewhat readable, we should just have factored out the complete copy+replace into a copy_hugetlb_page() function -- similar to the ordinary page handling -- which would have made copy_hugetlb_page_range() more readable eventually. Anyhow, unrelated. > + pte_t newpte = make_huge_pte(vma, &new_folio->page, 1); > + > __folio_mark_uptodate(new_folio); > hugepage_add_new_anon_rmap(new_folio, vma, addr); > - set_huge_pte_at(vma->vm_mm, addr, ptep, make_huge_pte(vma, &new_folio->page, 1)); > + if (userfaultfd_wp(vma) && huge_pte_uffd_wp(old)) > + newpte = huge_pte_mkuffd_wp(newpte); > + set_huge_pte_at(vma->vm_mm, addr, ptep, newpte); > hugetlb_count_add(pages_per_huge_page(hstate_vma(vma)), vma->vm_mm); > folio_set_hugetlb_migratable(new_folio); > } > @@ -5032,14 +5036,11 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, > */ > ; > } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) { > - bool uffd_wp = huge_pte_uffd_wp(entry); > - > - if (!userfaultfd_wp(dst_vma) && uffd_wp) > + if (!userfaultfd_wp(dst_vma)) > entry = huge_pte_clear_uffd_wp(entry); > set_huge_pte_at(dst, addr, dst_pte, entry); > } else if (unlikely(is_hugetlb_entry_migration(entry))) { > swp_entry_t swp_entry = pte_to_swp_entry(entry); > - bool uffd_wp = huge_pte_uffd_wp(entry); > > if (!is_readable_migration_entry(swp_entry) && cow) { > /* > @@ -5049,11 +5050,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, > swp_entry = make_readable_migration_entry( > swp_offset(swp_entry)); > entry = swp_entry_to_pte(swp_entry); > - if (userfaultfd_wp(src_vma) && uffd_wp) > - entry = huge_pte_mkuffd_wp(entry); > + if (userfaultfd_wp(src_vma) && > + pte_swp_uffd_wp(entry)) > + entry = pte_swp_mkuffd_wp(entry); > set_huge_pte_at(src, addr, src_pte, entry); > } > - if (!userfaultfd_wp(dst_vma) && uffd_wp) > + if (!userfaultfd_wp(dst_vma)) > entry = huge_pte_clear_uffd_wp(entry); > set_huge_pte_at(dst, addr, dst_pte, entry); > } else if (unlikely(is_pte_marker(entry))) { > @@ -5114,7 +5116,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, > /* huge_ptep of dst_pte won't change as in child */ > goto again; > } > - hugetlb_install_folio(dst_vma, dst_pte, addr, new_folio); > + hugetlb_install_folio(dst_vma, dst_pte, addr, > + new_folio, src_pte_old); > spin_unlock(src_ptl); > spin_unlock(dst_ptl); > continue; > @@ -5132,6 +5135,9 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, > entry = huge_pte_wrprotect(entry); > } > > + if (!userfaultfd_wp(dst_vma)) > + entry = huge_pte_clear_uffd_wp(entry); > + > set_huge_pte_at(dst, addr, dst_pte, entry); > hugetlb_count_add(npages, dst); > } LGTM Reviewed-by: David Hildenbrand <david@redhat.com>
On 14.4.2023 2.11, Peter Xu wrote: > There're a bunch of things that were wrong: > > - Reading uffd-wp bit from a swap entry should use pte_swp_uffd_wp() > rather than huge_pte_uffd_wp(). > > - When copying over a pte, we should drop uffd-wp bit when > !EVENT_FORK (aka, when !userfaultfd_wp(dst_vma)). > > - When doing early CoW for private hugetlb (e.g. when the parent page was > pinned), uffd-wp bit should be properly carried over if necessary. > > No bug reported probably because most people do not even care about these > corner cases, but they are still bugs and can be exposed by the recent unit > tests introduced, so fix all of them in one shot. > > Cc: linux-stable <stable@vger.kernel.org> > Fixes: bc70fbf269fd ("mm/hugetlb: handle uffd-wp during fork()") > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > mm/hugetlb.c | 26 ++++++++++++++++---------- > 1 file changed, 16 insertions(+), 10 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index f16b25b1a6b9..7320e64aacc6 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -4953,11 +4953,15 @@ static bool is_hugetlb_entry_hwpoisoned(pte_t pte) > > static void > hugetlb_install_folio(struct vm_area_struct *vma, pte_t *ptep, unsigned long addr, > - struct folio *new_folio) > + struct folio *new_folio, pte_t old) > { > + pte_t newpte = make_huge_pte(vma, &new_folio->page, 1); > + > __folio_mark_uptodate(new_folio); > hugepage_add_new_anon_rmap(new_folio, vma, addr); > - set_huge_pte_at(vma->vm_mm, addr, ptep, make_huge_pte(vma, &new_folio->page, 1)); > + if (userfaultfd_wp(vma) && huge_pte_uffd_wp(old)) > + newpte = huge_pte_mkuffd_wp(newpte); > + set_huge_pte_at(vma->vm_mm, addr, ptep, newpte); > hugetlb_count_add(pages_per_huge_page(hstate_vma(vma)), vma->vm_mm); > folio_set_hugetlb_migratable(new_folio); > } > @@ -5032,14 +5036,11 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, > */ > ; > } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) { > - bool uffd_wp = huge_pte_uffd_wp(entry); > - > - if (!userfaultfd_wp(dst_vma) && uffd_wp) > + if (!userfaultfd_wp(dst_vma)) > entry = huge_pte_clear_uffd_wp(entry); > set_huge_pte_at(dst, addr, dst_pte, entry); > } else if (unlikely(is_hugetlb_entry_migration(entry))) { > swp_entry_t swp_entry = pte_to_swp_entry(entry); > - bool uffd_wp = huge_pte_uffd_wp(entry); > > if (!is_readable_migration_entry(swp_entry) && cow) { > /* > @@ -5049,11 +5050,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, > swp_entry = make_readable_migration_entry( > swp_offset(swp_entry)); > entry = swp_entry_to_pte(swp_entry); > - if (userfaultfd_wp(src_vma) && uffd_wp) > - entry = huge_pte_mkuffd_wp(entry); > + if (userfaultfd_wp(src_vma) && > + pte_swp_uffd_wp(entry)) > + entry = pte_swp_mkuffd_wp(entry); This looks interesting with pte_swp_uffd_wp and pte_swp_mkuffd_wp ? > set_huge_pte_at(src, addr, src_pte, entry); > } > - if (!userfaultfd_wp(dst_vma) && uffd_wp) > + if (!userfaultfd_wp(dst_vma)) > entry = huge_pte_clear_uffd_wp(entry); > set_huge_pte_at(dst, addr, dst_pte, entry); > } else if (unlikely(is_pte_marker(entry))) { > @@ -5114,7 +5116,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, > /* huge_ptep of dst_pte won't change as in child */ > goto again; > } > - hugetlb_install_folio(dst_vma, dst_pte, addr, new_folio); > + hugetlb_install_folio(dst_vma, dst_pte, addr, > + new_folio, src_pte_old); > spin_unlock(src_ptl); > spin_unlock(dst_ptl); > continue; > @@ -5132,6 +5135,9 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, > entry = huge_pte_wrprotect(entry); > } > > + if (!userfaultfd_wp(dst_vma)) > + entry = huge_pte_clear_uffd_wp(entry); > + > set_huge_pte_at(dst, addr, dst_pte, entry); > hugetlb_count_add(npages, dst); > } --Mika
On Fri, Apr 14, 2023 at 12:45:29PM +0300, Mika Penttilä wrote: > > } else if (unlikely(is_hugetlb_entry_migration(entry))) { > > swp_entry_t swp_entry = pte_to_swp_entry(entry); > > - bool uffd_wp = huge_pte_uffd_wp(entry); [1] > > if (!is_readable_migration_entry(swp_entry) && cow) { > > /* > > @@ -5049,11 +5050,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, > > swp_entry = make_readable_migration_entry( > > swp_offset(swp_entry)); > > entry = swp_entry_to_pte(swp_entry); [2] > > - if (userfaultfd_wp(src_vma) && uffd_wp) > > - entry = huge_pte_mkuffd_wp(entry); > > + if (userfaultfd_wp(src_vma) && > > + pte_swp_uffd_wp(entry)) > > + entry = pte_swp_mkuffd_wp(entry); > > > This looks interesting with pte_swp_uffd_wp and pte_swp_mkuffd_wp ? Could you explain what do you mean? I think these helpers are the right ones to use, as afaict hugetlb migration should follow the same pte format with !hugetlb. However, I noticed I did it wrong when dropping the temp var - when at [1], "entry" still points to the src entry, but at [2] it's already pointing to the newly created one.. so I think I can't drop the var, a fixup should like: ===8<=== diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 083aae35bff8..cd3a9d8f4b70 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5041,6 +5041,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, set_huge_pte_at(dst, addr, dst_pte, entry); } else if (unlikely(is_hugetlb_entry_migration(entry))) { swp_entry_t swp_entry = pte_to_swp_entry(entry); + bool uffd_wp = pte_swp_uffd_wp(entry); if (!is_readable_migration_entry(swp_entry) && cow) { /* @@ -5050,8 +5051,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, swp_entry = make_readable_migration_entry( swp_offset(swp_entry)); entry = swp_entry_to_pte(swp_entry); - if (userfaultfd_wp(src_vma) && - pte_swp_uffd_wp(entry)) + if (userfaultfd_wp(src_vma) && uffd_wp) entry = pte_swp_mkuffd_wp(entry); set_huge_pte_at(src, addr, src_pte, entry); ===8<=== Besides, did I miss something else? Thanks,
On 14.4.2023 17.09, Peter Xu wrote: > On Fri, Apr 14, 2023 at 12:45:29PM +0300, Mika Penttilä wrote: >>> } else if (unlikely(is_hugetlb_entry_migration(entry))) { >>> swp_entry_t swp_entry = pte_to_swp_entry(entry); >>> - bool uffd_wp = huge_pte_uffd_wp(entry); > > [1] > >>> if (!is_readable_migration_entry(swp_entry) && cow) { >>> /* >>> @@ -5049,11 +5050,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, >>> swp_entry = make_readable_migration_entry( >>> swp_offset(swp_entry)); >>> entry = swp_entry_to_pte(swp_entry); > > [2] > >>> - if (userfaultfd_wp(src_vma) && uffd_wp) >>> - entry = huge_pte_mkuffd_wp(entry); >>> + if (userfaultfd_wp(src_vma) && >>> + pte_swp_uffd_wp(entry)) >>> + entry = pte_swp_mkuffd_wp(entry); >> >> >> This looks interesting with pte_swp_uffd_wp and pte_swp_mkuffd_wp ? > > Could you explain what do you mean? > Yes like you noticed also you called pte_swp_mkuffd_wp(entry) iff pte_swp_uffd_wp(entry) which is of course a nop. But the fixup not dropping the temp var should work. > I think these helpers are the right ones to use, as afaict hugetlb > migration should follow the same pte format with !hugetlb. However, I > noticed I did it wrong when dropping the temp var - when at [1], "entry" > still points to the src entry, but at [2] it's already pointing to the > newly created one.. so I think I can't drop the var, a fixup should like: > > ===8<=== > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 083aae35bff8..cd3a9d8f4b70 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -5041,6 +5041,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, > set_huge_pte_at(dst, addr, dst_pte, entry); > } else if (unlikely(is_hugetlb_entry_migration(entry))) { > swp_entry_t swp_entry = pte_to_swp_entry(entry); > + bool uffd_wp = pte_swp_uffd_wp(entry); > > if (!is_readable_migration_entry(swp_entry) && cow) { > /* > @@ -5050,8 +5051,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, > swp_entry = make_readable_migration_entry( > swp_offset(swp_entry)); > entry = swp_entry_to_pte(swp_entry); > - if (userfaultfd_wp(src_vma) && > - pte_swp_uffd_wp(entry)) > + if (userfaultfd_wp(src_vma) && uffd_wp) > entry = pte_swp_mkuffd_wp(entry); > set_huge_pte_at(src, addr, src_pte, entry); > ===8<=== > > Besides, did I miss something else? > > Thanks, > --Mika
On Fri, Apr 14, 2023 at 05:23:12PM +0300, Mika Penttilä wrote:
> But the fixup not dropping the temp var should work.
Ok I see. I'll wait for a few more days for a respin. Thanks,
On 04/13/23 19:11, Peter Xu wrote: > There're a bunch of things that were wrong: > > - Reading uffd-wp bit from a swap entry should use pte_swp_uffd_wp() > rather than huge_pte_uffd_wp(). That was/is quite confusing to me at least. > > - When copying over a pte, we should drop uffd-wp bit when > !EVENT_FORK (aka, when !userfaultfd_wp(dst_vma)). > > - When doing early CoW for private hugetlb (e.g. when the parent page was > pinned), uffd-wp bit should be properly carried over if necessary. > > No bug reported probably because most people do not even care about these > corner cases, but they are still bugs and can be exposed by the recent unit > tests introduced, so fix all of them in one shot. > > Cc: linux-stable <stable@vger.kernel.org> > Fixes: bc70fbf269fd ("mm/hugetlb: handle uffd-wp during fork()") > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > mm/hugetlb.c | 26 ++++++++++++++++---------- > 1 file changed, 16 insertions(+), 10 deletions(-) No issues except losing information in pte entry as pointed out by Mika.
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index f16b25b1a6b9..7320e64aacc6 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4953,11 +4953,15 @@ static bool is_hugetlb_entry_hwpoisoned(pte_t pte) static void hugetlb_install_folio(struct vm_area_struct *vma, pte_t *ptep, unsigned long addr, - struct folio *new_folio) + struct folio *new_folio, pte_t old) { + pte_t newpte = make_huge_pte(vma, &new_folio->page, 1); + __folio_mark_uptodate(new_folio); hugepage_add_new_anon_rmap(new_folio, vma, addr); - set_huge_pte_at(vma->vm_mm, addr, ptep, make_huge_pte(vma, &new_folio->page, 1)); + if (userfaultfd_wp(vma) && huge_pte_uffd_wp(old)) + newpte = huge_pte_mkuffd_wp(newpte); + set_huge_pte_at(vma->vm_mm, addr, ptep, newpte); hugetlb_count_add(pages_per_huge_page(hstate_vma(vma)), vma->vm_mm); folio_set_hugetlb_migratable(new_folio); } @@ -5032,14 +5036,11 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, */ ; } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) { - bool uffd_wp = huge_pte_uffd_wp(entry); - - if (!userfaultfd_wp(dst_vma) && uffd_wp) + if (!userfaultfd_wp(dst_vma)) entry = huge_pte_clear_uffd_wp(entry); set_huge_pte_at(dst, addr, dst_pte, entry); } else if (unlikely(is_hugetlb_entry_migration(entry))) { swp_entry_t swp_entry = pte_to_swp_entry(entry); - bool uffd_wp = huge_pte_uffd_wp(entry); if (!is_readable_migration_entry(swp_entry) && cow) { /* @@ -5049,11 +5050,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, swp_entry = make_readable_migration_entry( swp_offset(swp_entry)); entry = swp_entry_to_pte(swp_entry); - if (userfaultfd_wp(src_vma) && uffd_wp) - entry = huge_pte_mkuffd_wp(entry); + if (userfaultfd_wp(src_vma) && + pte_swp_uffd_wp(entry)) + entry = pte_swp_mkuffd_wp(entry); set_huge_pte_at(src, addr, src_pte, entry); } - if (!userfaultfd_wp(dst_vma) && uffd_wp) + if (!userfaultfd_wp(dst_vma)) entry = huge_pte_clear_uffd_wp(entry); set_huge_pte_at(dst, addr, dst_pte, entry); } else if (unlikely(is_pte_marker(entry))) { @@ -5114,7 +5116,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, /* huge_ptep of dst_pte won't change as in child */ goto again; } - hugetlb_install_folio(dst_vma, dst_pte, addr, new_folio); + hugetlb_install_folio(dst_vma, dst_pte, addr, + new_folio, src_pte_old); spin_unlock(src_ptl); spin_unlock(dst_ptl); continue; @@ -5132,6 +5135,9 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, entry = huge_pte_wrprotect(entry); } + if (!userfaultfd_wp(dst_vma)) + entry = huge_pte_clear_uffd_wp(entry); + set_huge_pte_at(dst, addr, dst_pte, entry); hugetlb_count_add(npages, dst); }