Message ID | 20230602230552.350731-1-peterx@redhat.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp1358265vqr; Fri, 2 Jun 2023 16:17:12 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7iKSmhd4IsyS+ohADZ2JpJ9FQXUIIcYZjACsf/yszskSCqCshDNs+q6tX+isOoOsRtkLIz X-Received: by 2002:a05:6a20:20d1:b0:10c:b9ed:6a38 with SMTP id t17-20020a056a2020d100b0010cb9ed6a38mr1288pza.28.1685747831997; Fri, 02 Jun 2023 16:17:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685747831; cv=none; d=google.com; s=arc-20160816; b=MlbJqfUsV6VtcNIde1OMejo2AoWf14wVbLpmmD9xfwJg1uVUYXoQ2iMwtD4VNwYJfM PSvjslH/HW3FdDGSJNU2AkGvgh+pHclP+J8vczx1ijzt5G/4ri/pJPbidBXZr2Efek5x KsYcQVHXGEifoMxybc2eAHWW4gLTXTG6kRGej4B+2vBzVZ7xQ8BqhUKGmh4C1uulsWMO 0ImbTD8P+3q6Qe55vfHFvF5hMiJhkpLXIju9Bt5fFPDJTqFfPnNlTneD6BRgfDEU8aIu 5eSMt23+McuOTbe8mKVu8UgksQNC6s+mDmrJcr6EIuLZ/Q7ps3Qy9UyKkQ9BJUjVCLX5 /rFA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=5USYmygKDAknHQBixzUNP1oGMXwkEWqz0+1Nrqkoh28=; b=P1GIZ3mRDXlPKvWnvsrt2qdzKsmDSL4TZpgSOaxZuLqUXs5ji2LR7tqxTFLb+IGdJP jaA31EpwW+Wl9A9xkCfYy5mjuB2imz0Fv61iJXm2jqSYYx2LKPnpXLVQYaRz08JO0Z8q Vu5O9fYhmHtIrC89OXc7AorcmcIK8l/RVyaqJb196sxV4BREC1RtIqFsglX1z1/LvymK 22haETBEJIj/lMuzFb9Y10GRNzayFxatIdWvoVfDS5mbFDTk/k2kkSfG6Ss7b+R/uIvn IiA52DR6DxvWUmVJLrGt6GFDjhJgRzgbkVrKKMcUT0OoWAiaT6sQQ57W48ELDgWNZGg9 d+lw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=fcxfMIdv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x4-20020a636304000000b00542897dcc22si1697145pgb.516.2023.06.02.16.16.57; Fri, 02 Jun 2023 16:17:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=fcxfMIdv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236670AbjFBXGq (ORCPT <rfc822;limurcpp@gmail.com> + 99 others); Fri, 2 Jun 2023 19:06:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44138 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236190AbjFBXGo (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 2 Jun 2023 19:06:44 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EEAB11B9 for <linux-kernel@vger.kernel.org>; Fri, 2 Jun 2023 16:05:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685747158; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=5USYmygKDAknHQBixzUNP1oGMXwkEWqz0+1Nrqkoh28=; b=fcxfMIdviI1E2XOu7LsueCpPfQBiBwj8UHJl2gm5fZsj/GBi+BBLxHhHuVeuh339BPma0L NSCeilkuQNnKPUac3fn+6gW/AbF9J9N2l3Jj8UPTvjfJJMY6WFgNPpisAu0owCd7ZARLp6 O971LtKECPybK5EyguRBkJCdP1NxsOE= Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-632-dhlwNUPgM0yPgTXrHbzW1w-1; Fri, 02 Jun 2023 19:05:57 -0400 X-MC-Unique: dhlwNUPgM0yPgTXrHbzW1w-1 Received: by mail-qv1-f70.google.com with SMTP id 6a1803df08f44-626380298d3so5604586d6.0 for <linux-kernel@vger.kernel.org>; Fri, 02 Jun 2023 16:05:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685747155; x=1688339155; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=5USYmygKDAknHQBixzUNP1oGMXwkEWqz0+1Nrqkoh28=; b=bWR9fyhGA3eo0jL8W5tKkUAvr/n/C1ZaTQps5Kgv5qqH/M3yl68tABnB/v8BcVe1nd mtynegSUUq2vGadnRdSwqxQgLvFFsKb5snk1WaomsEOPkP5icaypQWJCwVu8tu+/YmKp m2M6zjklyLIDSy/iBIs/Gh8ekLK3kHGBbWmg03cBdZyEx4oXugoF+PmRpGiskPnmMjLs RY3VCJnoINx7Z6AgrmNSEZEVYBqHBOwj6YC+iym5xohsGNcS5aQ9aj3OTL/QXYos4hTj /2Snz9PNmKq/NzTnh9Bp6X5KhsCzZ8tX4ONHcq3lVwFT8GK5PCujxMwrOfk8sF7gZazA 1f2Q== X-Gm-Message-State: AC+VfDzva3nAktN5UaK2TDKXY+lvWeLZtY7aFgT2NZssc/tlCcRbxadL REd1q/cdBIPANKLuJkdyvdsf3Cjp0askkHUKUO+nwOxe5pd+D23PVVLueY98MwyNgyOlWADObfY eIdE2dnwusWxmkqQ+YG3DMGLQUJAIRGqoVQXEHY2jn3n9w5t3MQlcV936VrNopq4ZeMx88yT5PQ mekx30Xg== X-Received: by 2002:a05:6214:765:b0:5ed:c96e:ca4a with SMTP id f5-20020a056214076500b005edc96eca4amr11522579qvz.1.1685747155520; Fri, 02 Jun 2023 16:05:55 -0700 (PDT) X-Received: by 2002:a05:6214:765:b0:5ed:c96e:ca4a with SMTP id f5-20020a056214076500b005edc96eca4amr11522545qvz.1.1685747155195; Fri, 02 Jun 2023 16:05:55 -0700 (PDT) Received: from x1n.. (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id px13-20020a056214050d00b0062607ea6d01sm1400792qvb.50.2023.06.02.16.05.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 02 Jun 2023 16:05:54 -0700 (PDT) From: Peter Xu <peterx@redhat.com> To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: David Hildenbrand <david@redhat.com>, Alistair Popple <apopple@nvidia.com>, Andrew Morton <akpm@linux-foundation.org>, Andrea Arcangeli <aarcange@redhat.com>, "Kirill A . Shutemov" <kirill@shutemov.name>, Johannes Weiner <hannes@cmpxchg.org>, John Hubbard <jhubbard@nvidia.com>, Naoya Horiguchi <naoya.horiguchi@nec.com>, peterx@redhat.com, Muhammad Usama Anjum <usama.anjum@collabora.com>, Hugh Dickins <hughd@google.com>, Mike Rapoport <rppt@kernel.org> Subject: [PATCH 0/4] mm: Fix pmd_trans_unstable() call sites on retry Date: Fri, 2 Jun 2023 19:05:48 -0400 Message-Id: <20230602230552.350731-1-peterx@redhat.com> X-Mailer: git-send-email 2.40.1 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767634718833166613?= X-GMAIL-MSGID: =?utf-8?q?1767634718833166613?= |
Series |
mm: Fix pmd_trans_unstable() call sites on retry
|
|
Message
Peter Xu
June 2, 2023, 11:05 p.m. UTC
When hit pmd_trans_unstable() under mmap read lock, it means we raced with something else. Per the comment above the helper, we can definitely treat it as some pmd (none?) but the 100% correct way is always retry, and I don't think it should race again in most cases. Not taking care of that retry can mean different things on different paths. For example, for smaps it means inaccurate accountings when we skip those raced regions, but it's fine anyway because the accounting is not for 100% accurate. I think it's broken for pagemap OTOH, because we have the pagemap buffer linear to the VA we're scanning, it means if we skip some region the follow up scans can fill in the wrong slots, I think. It means the pagemap results returned to userapp will be wrong when very unlucky. This reminded me that I should have a look at all call sites of pmd_trans_unstable(), some of them are alright but I do see many of them may still be better to give another shot when hit. This series tries to resolve all call sites for it on that retry attempt. I really don't know whether I missed something, even if not, whether it matters a lot to anyone. Still, _if_ I'm correct may worth consider fixing. Happy to be prove wrong. Then Muhammad should know how to code his. The patchset is only smoke tested, nothing wrong I see. Please have a look, thanks. Peter Xu (4): mm/mprotect: Retry on pmd_trans_unstable() mm/migrate: Unify and retry an unstable pmd when hit mm: Warn for unstable pmd in move_page_tables() mm: Make most walk page paths with pmd_trans_unstable() to retry fs/proc/task_mmu.c | 17 +++++++++++++---- mm/madvise.c | 8 ++++++-- mm/memcontrol.c | 8 ++++++-- mm/memory-failure.c | 4 +++- mm/mempolicy.c | 4 +++- mm/migrate_device.c | 9 ++++----- mm/mprotect.c | 20 +++++++++++--------- mm/mremap.c | 4 ++-- 8 files changed, 48 insertions(+), 26 deletions(-)
Comments
On Fri, Jun 02, 2023 at 07:05:48PM -0400, Peter Xu wrote:
> Please have a look, thanks.
Hello, all,
This one seems to have more or less conflict with Hugh's rework on pmd
collapse. Please hold off review or merging until I prepare another one
(probably based on Hugh's, after I have a closer read).
Sorry for the noise.
On 07.06.23 15:49, Peter Xu wrote: > On Fri, Jun 02, 2023 at 07:05:48PM -0400, Peter Xu wrote: >> Please have a look, thanks. > > Hello, all, > > This one seems to have more or less conflict with Hugh's rework on pmd > collapse. Please hold off review or merging until I prepare another one > (probably based on Hugh's, after I have a closer read). > > Sorry for the noise. > [did not have time to look yet] Are there any fixes buried in there that we'd like to have in earlier? I skimmed over the patches and all read like "cleanup" + "consistency", correct?
On Wed, Jun 07, 2023 at 05:45:28PM +0200, David Hildenbrand wrote: > On 07.06.23 15:49, Peter Xu wrote: > > On Fri, Jun 02, 2023 at 07:05:48PM -0400, Peter Xu wrote: > > > Please have a look, thanks. > > > > Hello, all, > > > > This one seems to have more or less conflict with Hugh's rework on pmd > > collapse. Please hold off review or merging until I prepare another one > > (probably based on Hugh's, after I have a closer read). > > > > Sorry for the noise. > > > > [did not have time to look yet] > > Are there any fixes buried in there that we'd like to have in earlier? I > skimmed over the patches and all read like "cleanup" + "consistency", > correct? There are bug fixes when unluckily hitting unstable pmd I think, these ones worth mentioning: - pagemap can be broken, causing read to be shifted over to the next (wrong data read) - memcg wrong accounting, e.g., moving one task from memcg1 to memcg2, we can skip an unstable pmd while it could quickly contain something that can belong to memcg1, I think. This one needs some eyes from memcg developers. I don't rush on having them because these are all theoretical and no bug report I saw, no reproducer I wrote, only observed by my eyes. At least the pagemap issue should have been there for 10+ years without being noticed even if rightfully spot this time. Meanwhile this seems to have conflict with Hugh's series which should have been posted earlier - I still need to check on how that will affect this series, but not yet. Said that, let me know if any of you hit any (potential) issue with above or think that we should to move this in earlier. Thanks,
On Wed, Jun 7, 2023 at 9:21 AM Peter Xu <peterx@redhat.com> wrote: > > On Wed, Jun 07, 2023 at 05:45:28PM +0200, David Hildenbrand wrote: > > On 07.06.23 15:49, Peter Xu wrote: > > > On Fri, Jun 02, 2023 at 07:05:48PM -0400, Peter Xu wrote: > > > > Please have a look, thanks. > > > > > > Hello, all, > > > > > > This one seems to have more or less conflict with Hugh's rework on pmd > > > collapse. Please hold off review or merging until I prepare another one > > > (probably based on Hugh's, after I have a closer read). > > > > > > Sorry for the noise. > > > > > > > [did not have time to look yet] > > > > Are there any fixes buried in there that we'd like to have in earlier? I > > skimmed over the patches and all read like "cleanup" + "consistency", > > correct? > > There are bug fixes when unluckily hitting unstable pmd I think, these ones > worth mentioning: > > - pagemap can be broken, causing read to be shifted over to the next > (wrong data read) Yes, it may corrupt the pagemap data. But anyway it seems like nobody was busted by this one as you said. > > - memcg wrong accounting, e.g., moving one task from memcg1 to memcg2, we > can skip an unstable pmd while it could quickly contain something that > can belong to memcg1, I think. This one needs some eyes from memcg > developers. I don't think this is an important thing. There are plenty of other conditions that could make the accounting inaccurate, for example, isolating page from LRU fails, force charge, etc. And it seems like nobody was bothered by this either. > > I don't rush on having them because these are all theoretical and no bug > report I saw, no reproducer I wrote, only observed by my eyes. > > At least the pagemap issue should have been there for 10+ years without > being noticed even if rightfully spot this time. Meanwhile this seems to > have conflict with Hugh's series which should have been posted earlier - I > still need to check on how that will affect this series, but not yet. > > Said that, let me know if any of you hit any (potential) issue with above > or think that we should to move this in earlier. > > Thanks, > > -- > Peter Xu > >
On Wed, Jun 07, 2023 at 09:39:44AM -0700, Yang Shi wrote: > I don't think this is an important thing. There are plenty of other > conditions that could make the accounting inaccurate, for example, > isolating page from LRU fails, force charge, etc. And it seems like > nobody was bothered by this either. Yes, I read that a bit more and I agree. So let me summarize after I read Hugh's series just now.. With the pre-requisite of the new __pte_map_offset() that Hugh proposed here: [PATCH 04/31] mm/pgtable: allow pte_offset_map[_lock]() to fail https://lore.kernel.org/r/8218ffdc-8be-54e5-0a8-83f5542af283@google.com We should not need pmd_trans_unstable() anymore as Hugh pointed out, which I fully agree. I think Hugh has covered all the issues that this series wanted to address alongside, namely: Patch 1 (mprotect) is covered in: [PATCH 18/31] mm/mprotect: delete pmd_none_or_clear_bad_unless_trans_huge() https://lore.kernel.org/r/4a834932-9064-9ed7-3cd1-99466f549486@google.com No way to move spinlock outside, so one -EAGAIN needed which makes sense. Patch 2 (migrate_device) is covered in: [PATCH 24/31] mm/migrate_device: allow pte_offset_map_lock() to fail https://lore.kernel.org/r/ea51bb69-189c-229b-fc0-9d3e7be5d6b@google.com By a direct retry, and more code unified so even better. Patch 3 () is covered in: [PATCH 19/31] mm/mremap: retry if either pte_offset_map_*lock() fails https://lore.kernel.org/r/2d3fbfea-5884-8211-0cc-954afe25ae9c@google.com Instead of WARN_ON_ONCE(), it got dropped which looks all good, too. Most of patch 4's changes are done in: [PATCH 09/31] mm/pagewalkers: ACTION_AGAIN if pte_offset_map_lock() fails https://lore.kernel.org/r/6265ac58-6018-a8c6-cf38-69cba698471@google.com There're some different handling on memcg changes, where in Hugh's series it was put separately here: [PATCH 17/31] mm/various: give up if pte_offset_map[_lock]() fails https://lore.kernel.org/r/c299eba-4e17-c645-1115-ccd1fd9956bd@google.com After double check, I agree with Yang's comment and Hugh's approach, that no retry is needed for memcg. Let's ignore this series, not needed anymore. Thanks,