Message ID | 20221214200453.1772655-2-peterx@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp445503wrn; Wed, 14 Dec 2022 12:17:01 -0800 (PST) X-Google-Smtp-Source: AA0mqf6j8wz02YuTrpHEIrc8zpYmrvRQh5lm1wNn3rkXRDckbP818vciPYCz21zt9Goe0uEJmRdu X-Received: by 2002:a17:906:4687:b0:7c0:deb1:bb8a with SMTP id a7-20020a170906468700b007c0deb1bb8amr20038497ejr.28.1671049021491; Wed, 14 Dec 2022 12:17:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671049021; cv=none; d=google.com; s=arc-20160816; b=Kk2BHKJgdf4C6YvOemNisrATbZydSzRQmXmmuihT0fYVBNBKyCxU7wig31oBbr5ukW fdyotDzzJkR+1WvoPyPWVE0XiOomm0Vp1BSTyIx4JsHPTyDwrdGtvZ3u4Og7dGuEtkZr c9tNnW6r5EGWnXAMpFJqDpK9gWIoca/tanAkXIKNAiaRCljr/cVNowqaxdpfx8T010B5 V79i5+oXxc5Gm7LyA3OxjcRkoNBtQOJ6jY6rICEA/LjllG9h8KWY6YjKcd8+piG4rYYx R5z85DuUhV3U0ISa70g3VABMExWDEzdw3nU5LdIf5QLx3gY/q2vueRjvALTOCKHJwGbk 9WlQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=OjnVrIvGwtoddEojvu7v3r8jpkJLbQV8Ta0gkZnxU7g=; b=QKTCe8Je9ptra4+rCg0cxX6ytphnZ5Lyvq59lr94ZRaIK6j0c0OEw7FV303fJkFNUv P42WmvoCqS/9XMBk3JbBfyif8tub/Qy2g40c/CxwqQ7LyKoLwYB7ea6iUl6C+yUFdTS7 9FZFYXJg+kaXlKRtxjpHl/ZJQkhHtE9D8QTxQNAkjkIH1GrWBcaeTTSXcj63Zu5ydT1z aPDz9jWXE+1UWm0WVlGBzQU3HXJhMPCt9WVWbCcCgKBJ4T/fRWQp14y0w2bcGSfRoNYv e8VT6+YoIG1jYBHuYy6B2iPjBGbvNDriTTbiKxHkSU1/DjzRY93HKDvJiEHiaerWPLQg b9YQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dJMiGNc5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c13-20020a05640227cd00b0044e8fe826a0si14646666ede.156.2022.12.14.12.16.38; Wed, 14 Dec 2022 12:17:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dJMiGNc5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230136AbiLNUPc (ORCPT <rfc822;jeantsuru.cumc.mandola@gmail.com> + 99 others); Wed, 14 Dec 2022 15:15:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46902 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229866AbiLNUOh (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 14 Dec 2022 15:14:37 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0072C2DAB7 for <linux-kernel@vger.kernel.org>; Wed, 14 Dec 2022 12:05:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671048301; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OjnVrIvGwtoddEojvu7v3r8jpkJLbQV8Ta0gkZnxU7g=; b=dJMiGNc5korgFkTz8aGwAU42B8noZ1NRT0OpwRv1k4ybyzgk8yBL2oPhRi4km9yHWJaNiM rNzbknOiCRylkAzY41gn+UOlUaOt9VYiIQWUw1RdtKevrljCVHcW8SkPfelMAVw/6sm//6 mQDJWWmjSPJWl8/sPK+QH1eErz3iHeU= Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-170-45_FoOZWMe65xOXVc74Q7A-1; Wed, 14 Dec 2022 15:05:00 -0500 X-MC-Unique: 45_FoOZWMe65xOXVc74Q7A-1 Received: by mail-qv1-f69.google.com with SMTP id 71-20020a0c804d000000b004b2fb260447so543160qva.10 for <linux-kernel@vger.kernel.org>; Wed, 14 Dec 2022 12:04:59 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OjnVrIvGwtoddEojvu7v3r8jpkJLbQV8Ta0gkZnxU7g=; b=4mEXmQIgeuGlANXYDE2tfcuP2jGU9cxpyznOTGocsY9cy6LbAE3/PiK7whfYAzlm2h qF+FjGjh1ntZzTfoTO/tGoS6v+imLHdDOCUd4rtpOwWeIxk5TUBeaNxOcAP54sf7OMtZ Rm8d8uZxnS151NIkKOlDo6LVo4SjlSVUKQbSod4OlAxQ7GuL/Hb2zAeE3HsGyg60EatY JghgMyyzMxNWgtCAcami2km6RO5C4A6a9RUOdpP3S5q+M7SPRaabrLjDhwgW8UviIadw 2+BHzzICtfjElo2Xi3VCd7HLot9oFnOGd/VDKikBbrGQ+sP6vdNZTSFoHI0P7smcTasd sPOA== X-Gm-Message-State: ANoB5pmgNtx8mNyr1Oa8pRZDJti+O09AhvUtkTfEf9aWSoasVs6pBdUc dqP3s5TrvJ+QYRCFPaRW7K+HkpRqegbTI4WXRdSO1N+HqSBmPZ8l/xg4TVKFQmBf6+4BJxyIGRq LkJqkJgSqKkH5/4jHpMzTkPJQUJ+ss7uuWgLBC1b751obMzny7XiFNWGMetnNB8Mm3iCXqZlXIA == X-Received: by 2002:ac8:7a92:0:b0:3a8:234a:3204 with SMTP id x18-20020ac87a92000000b003a8234a3204mr11410979qtr.23.1671048298425; Wed, 14 Dec 2022 12:04:58 -0800 (PST) X-Received: by 2002:ac8:7a92:0:b0:3a8:234a:3204 with SMTP id x18-20020ac87a92000000b003a8234a3204mr11410949qtr.23.1671048298123; Wed, 14 Dec 2022 12:04:58 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-45-70-31-26-132.dsl.bell.ca. [70.31.26.132]) by smtp.gmail.com with ESMTPSA id l11-20020ac848cb000000b003a689a5b177sm2199352qtr.8.2022.12.14.12.04.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 14 Dec 2022 12:04:57 -0800 (PST) From: Peter Xu <peterx@redhat.com> To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Andrea Arcangeli <aarcange@redhat.com>, Pengfei Xu <pengfei.xu@intel.com>, peterx@redhat.com, Nadav Amit <nadav.amit@gmail.com>, David Hildenbrand <david@redhat.com>, Andrew Morton <akpm@linux-foundation.org>, Miaohe Lin <linmiaohe@huawei.com>, Huang Ying <ying.huang@intel.com>, stable@vger.kernel.org Subject: [PATCH 1/2] mm/uffd: Fix pte marker when fork() without fork event Date: Wed, 14 Dec 2022 15:04:52 -0500 Message-Id: <20221214200453.1772655-2-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221214200453.1772655-1-peterx@redhat.com> References: <20221214200453.1772655-1-peterx@redhat.com> MIME-Version: 1.0 Content-type: text/plain Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752221898813941195?= X-GMAIL-MSGID: =?utf-8?q?1752221898813941195?= |
Series |
mm: Fixes on pte markers
|
|
Commit Message
Peter Xu
Dec. 14, 2022, 8:04 p.m. UTC
When fork(), dst_vma is not guaranteed to have VM_UFFD_WP even if src may
have it and has pte marker installed. The warning is improper along with
the comment. The right thing is to inherit the pte marker when needed, or
keep the dst pte empty.
A vague guess is this happened by an accident when there's the prior patch
to introduce src/dst vma into this helper during the uffd-wp feature got
developed and I probably messed up in the rebase, since if we replace
dst_vma with src_vma the warning & comment it all makes sense too.
Hugetlb did exactly the right here (copy_hugetlb_page_range()). Fix the
general path.
Reproducer:
https://github.com/xupengfe/syzkaller_logs/blob/main/221208_115556_copy_page_range/repro.c
Cc: <stable@vger.kernel.org> # 5.19+
Fixes: c56d1b62cce8 ("mm/shmem: handle uffd-wp during fork()")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216808
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
mm/memory.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
Comments
On 14.12.22 21:04, Peter Xu wrote: > When fork(), dst_vma is not guaranteed to have VM_UFFD_WP even if src may > have it and has pte marker installed. The warning is improper along with > the comment. The right thing is to inherit the pte marker when needed, or > keep the dst pte empty. > > A vague guess is this happened by an accident when there's the prior patch > to introduce src/dst vma into this helper during the uffd-wp feature got > developed and I probably messed up in the rebase, since if we replace > dst_vma with src_vma the warning & comment it all makes sense too. > > Hugetlb did exactly the right here (copy_hugetlb_page_range()). Fix the > general path. > > Reproducer: > > https://github.com/xupengfe/syzkaller_logs/blob/main/221208_115556_copy_page_range/repro.c > > Cc: <stable@vger.kernel.org> # 5.19+ > Fixes: c56d1b62cce8 ("mm/shmem: handle uffd-wp during fork()") > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216808 > Reported-by: Pengfei Xu <pengfei.xu@intel.com> > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > mm/memory.c | 8 ++------ > 1 file changed, 2 insertions(+), 6 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index aad226daf41b..032ef700c3e8 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -828,12 +828,8 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, > return -EBUSY; > return -ENOENT; > } else if (is_pte_marker_entry(entry)) { > - /* > - * We're copying the pgtable should only because dst_vma has > - * uffd-wp enabled, do sanity check. > - */ > - WARN_ON_ONCE(!userfaultfd_wp(dst_vma)); > - set_pte_at(dst_mm, addr, dst_pte, pte); > + if (userfaultfd_wp(dst_vma)) > + set_pte_at(dst_mm, addr, dst_pte, pte); > return 0; > } > if (!userfaultfd_wp(dst_vma)) Staring at the code first made me go "what about other PTE markers". I then looked into the discussion in patch #2. The fix as is is suboptimal, because it 1) Removes the warning which is good, but 2) Silently drops swapin errors now So it silently breaks something else temporarily ... I remember, that theoretically we could have multiple markers stored in a single PTE marker. Wouldn't it be cleaner to be able to "clean" specific markers from a PTE marker without having to special case on each and everyone? I mean, only uffd-wp is really special such that it might disappear for the target. Something like (pseudocode): if (!userfaultfd_wp(dst_vma)) pte_marker_clear_uff_wp(entry); if (!pte_marker_empty(entry)) { pte = make_pte_marker(pte_marker_get(entry)); set_pte_at(dst_mm, addr, dst_pte, pte); } Then this fix would be correct and backport-able even without #2. And it would work for new types of markers :) I'd prefer a fix that doesn't break something else temporarily, even if the stable backport might require 5 additional minutes to do. So squashing #2 into #1 would also work.
On Fri, Dec 16, 2022 at 10:04:27AM +0100, David Hildenbrand wrote: > On 14.12.22 21:04, Peter Xu wrote: > > When fork(), dst_vma is not guaranteed to have VM_UFFD_WP even if src may > > have it and has pte marker installed. The warning is improper along with > > the comment. The right thing is to inherit the pte marker when needed, or > > keep the dst pte empty. > > > > A vague guess is this happened by an accident when there's the prior patch > > to introduce src/dst vma into this helper during the uffd-wp feature got > > developed and I probably messed up in the rebase, since if we replace > > dst_vma with src_vma the warning & comment it all makes sense too. > > > > Hugetlb did exactly the right here (copy_hugetlb_page_range()). Fix the > > general path. > > > > Reproducer: > > > > https://github.com/xupengfe/syzkaller_logs/blob/main/221208_115556_copy_page_range/repro.c > > > > Cc: <stable@vger.kernel.org> # 5.19+ > > Fixes: c56d1b62cce8 ("mm/shmem: handle uffd-wp during fork()") > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216808 > > Reported-by: Pengfei Xu <pengfei.xu@intel.com> > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > mm/memory.c | 8 ++------ > > 1 file changed, 2 insertions(+), 6 deletions(-) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index aad226daf41b..032ef700c3e8 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -828,12 +828,8 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, > > return -EBUSY; > > return -ENOENT; > > } else if (is_pte_marker_entry(entry)) { > > - /* > > - * We're copying the pgtable should only because dst_vma has > > - * uffd-wp enabled, do sanity check. > > - */ > > - WARN_ON_ONCE(!userfaultfd_wp(dst_vma)); > > - set_pte_at(dst_mm, addr, dst_pte, pte); > > + if (userfaultfd_wp(dst_vma)) > > + set_pte_at(dst_mm, addr, dst_pte, pte); > > return 0; > > } > > if (!userfaultfd_wp(dst_vma)) > > Staring at the code first made me go "what about other PTE markers". I then > looked into the discussion in patch #2. The fix as is is suboptimal, because > it > > 1) Removes the warning which is good, but > 2) Silently drops swapin errors now > > So it silently breaks something else temporarily ... > > > I remember, that theoretically we could have multiple markers stored in a > single PTE marker. > > Wouldn't it be cleaner to be able to "clean" specific markers from a PTE > marker without having to special case on each and everyone? I mean, only > uffd-wp is really special such that it might disappear for the target. Quotting the commit message in patch 2: Currently there is a priority difference between the uffd-wp bit and the swapin error entry, in which the swapin error always has higher priority (e.g. we don't need to wr-protect a swapin error pte marker). If there will be a 3rd bit introduced, we'll probably need to consider a more involved approach so we may need to start operate on the bits. Let's leave that for later. I actually started the fix with something like that, but I noticed it's not needed to add more code if there's no 3rd bit introduced so I dropped that. I decided to go the simpler change approach and leave that for later. > > Something like (pseudocode): > > if (!userfaultfd_wp(dst_vma)) > pte_marker_clear_uff_wp(entry); > if (!pte_marker_empty(entry)) { > pte = make_pte_marker(pte_marker_get(entry)); > set_pte_at(dst_mm, addr, dst_pte, pte); > } > > Then this fix would be correct and backport-able even without #2. And it > would work for new types of markers :) When that comes, we may need one set_pte_marker_at() taking care of empty pte markers, otherwise there can be a lot of such check. > > > I'd prefer a fix that doesn't break something else temporarily, even if the > stable backport might require 5 additional minutes to do. So squashing #2 > into #1 would also work. The thing is whether do we care about someone: (1) explicitly checkout at the commit of patch 1, then (2) runs the kernel, hit a swapnin error, (3) fork(), and (4) access the swapin error page in the child. To me I don't care even starting from (1).. because it really shouldn't happen at all in any serious environment. The other reason is these are indeed two issues to solve. Even if by accident we kept the swapin error in old code we'll probably dump an warning which is not wanted either. It's not something someone will really get benefit from.. So like many other places, I don't have a strong opinion, but personally I prefer the current approach. Andrew, let me know if you want me to repost with a squashed version. Thanks,
>> >> Wouldn't it be cleaner to be able to "clean" specific markers from a PTE >> marker without having to special case on each and everyone? I mean, only >> uffd-wp is really special such that it might disappear for the target. > > Quotting the commit message in patch 2: > > Currently there is a priority difference between the uffd-wp bit and the > swapin error entry, in which the swapin error always has higher priority > (e.g. we don't need to wr-protect a swapin error pte marker). > > If there will be a 3rd bit introduced, we'll probably need to consider a > more involved approach so we may need to start operate on the bits. > Let's leave that for later. > > I actually started the fix with something like that, but I noticed it's not > needed to add more code if there's no 3rd bit introduced so I dropped that. > I decided to go the simpler change approach and leave that for later. Okay, makes sense. > >> >> Something like (pseudocode): >> >> if (!userfaultfd_wp(dst_vma)) >> pte_marker_clear_uff_wp(entry); >> if (!pte_marker_empty(entry)) { >> pte = make_pte_marker(pte_marker_get(entry)); >> set_pte_at(dst_mm, addr, dst_pte, pte); >> } >> >> Then this fix would be correct and backport-able even without #2. And it >> would work for new types of markers :) > > When that comes, we may need one set_pte_marker_at() taking care of empty > pte markers, otherwise there can be a lot of such check. Right. In the future it might be cleaner. > >> >> >> I'd prefer a fix that doesn't break something else temporarily, even if the >> stable backport might require 5 additional minutes to do. So squashing #2 >> into #1 would also work. > > The thing is whether do we care about someone: (1) explicitly checkout at > the commit of patch 1, then (2) runs the kernel, hit a swapnin error, (3) > fork(), and (4) access the swapin error page in the child. I'm more concerned about backports, when one backports #1 but not #2. In theory, patch #2 fixes patch #1, because that introduced IMHO a real regression -- a possible memory corruption when discarding a hwpoison marker. Warnings are not nice but at least indicate that something needs a second look. > > To me I don't care even starting from (1).. because it really shouldn't > happen at all in any serious environment. > > The other reason is these are indeed two issues to solve. Even if by > accident we kept the swapin error in old code we'll probably dump an > warning which is not wanted either. It's not something someone will really > get benefit from.. > > So like many other places, I don't have a strong opinion, but personally I > prefer the current approach. Me neither, two patches just felt more complicated than it should be. Anyhow, the final code change LGTM.
On Fri, Dec 16, 2022 at 04:57:33PM +0100, David Hildenbrand wrote: > I'm more concerned about backports, when one backports #1 but not #2. In > theory, patch #2 fixes patch #1, because that introduced IMHO a real > regression -- a possible memory corruption when discarding a hwpoison > marker. Warnings are not nice but at least indicate that something needs a > second look. Note that backporting patch 1 only is exactly what I wanted to do here - it means his/her tree should not have the swapin error pte markers at all. The swapin error pte marker change only existed for a few days in Linus's tree, so no one should be backporting patch 2. Thanks,
On 16.12.22 17:24, Peter Xu wrote: > On Fri, Dec 16, 2022 at 04:57:33PM +0100, David Hildenbrand wrote: >> I'm more concerned about backports, when one backports #1 but not #2. In >> theory, patch #2 fixes patch #1, because that introduced IMHO a real >> regression -- a possible memory corruption when discarding a hwpoison >> marker. Warnings are not nice but at least indicate that something needs a >> second look. > > Note that backporting patch 1 only is exactly what I wanted to do here - it > means his/her tree should not have the swapin error pte markers at all. > > The swapin error pte marker change only existed for a few days in Linus's > tree, so no one should be backporting patch 2. Right, and these patches are supposed to land in 6.2 as well. Makes sense to me then. Especially, the other parts in patch #2 are worth being in a separate patch. Acked-by: David Hildenbrand <david@redhat.com>
diff --git a/mm/memory.c b/mm/memory.c index aad226daf41b..032ef700c3e8 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -828,12 +828,8 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, return -EBUSY; return -ENOENT; } else if (is_pte_marker_entry(entry)) { - /* - * We're copying the pgtable should only because dst_vma has - * uffd-wp enabled, do sanity check. - */ - WARN_ON_ONCE(!userfaultfd_wp(dst_vma)); - set_pte_at(dst_mm, addr, dst_pte, pte); + if (userfaultfd_wp(dst_vma)) + set_pte_at(dst_mm, addr, dst_pte, pte); return 0; } if (!userfaultfd_wp(dst_vma))