Message ID | 20230812210053.2325091-1-zokeefe@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp1943005vqi; Sat, 12 Aug 2023 16:40:38 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGAcJzwUdAI0/86zr2TVKmxiJpFDCdqB5i69oX+aKYJtICua5NeNtjMJWR+oZKp2Jj0JmMs X-Received: by 2002:a05:6512:2213:b0:4fc:6e21:ff50 with SMTP id h19-20020a056512221300b004fc6e21ff50mr4839556lfu.55.1691883638258; Sat, 12 Aug 2023 16:40:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691883638; cv=none; d=google.com; s=arc-20160816; b=dAa7DQBX4G7EEBOtnh3LRNG7BtqWjVatyu8OZk17ZyTVeKBakSIKH5XhcJQhCuK1Ab E2BvRxYkeaXCGm0vI3fht8KszYSy0ODUuV2ohxelTif959iVlBxbh6RjEMBdlL/gbJWC x9CYP09QyzjY+IA//Ysq1ExcvVXzdpDASjneJQA82B1hn7mtK6EYy8zItHWKTz6k0v1j v02hCG41cPhIXnMFgf+TI7i49lo3VymxoXrX8FQL0/JIPGVBDCK/6vjFeepWrpqqA9h5 UUyADOtGWFSUBYbg2d9MKNXZyEeBQl4+9ijmzjSAo2odaXtgcHofPPEm6Ahy5hHYSaYo Avmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:mime-version:date :dkim-signature; bh=ERK5q0s3pNoBkM1cIqzvaJZXjMBumkRX6A7c8Or84Aw=; fh=Ao51Cw8/VMl/Bkk0A3NXwCJ8/izXR9o/KzRdxTUxSN8=; b=AII/UFxVzDyuW0eIzDzrI2VssiDfZlro9TjgVIKsLUMbD9U3dYbIAMrF8w772AxbZw Zg4ayhexMchuQD6Q4pmwcMk90uA3ri1MrdXls3iA0jqr/vhcGcbG+7fiUnfBtQetKwOD ldEQq8JukELR6Wm+O2wf1AZA1dDXtFrtWw3QJLE3f5Pv4dSZtemUIg86QZfRs6wCkp2/ qo7RMEZ32ggt6tUd7+iC9U9yWo/Ib37Yt0hnQf/w1ULuvuWFuJ8BHjLMTLxXIIbKreJM 3pq5yXApEdjUJ3Ov3aVpMAam6d+IFAfMmbanOwY9z8NSk/Pmf2FRHmb6mYrvq13VGunp 0hBw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=t+rpWnfh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q2-20020aa7da82000000b005254add9a47si1423143eds.276.2023.08.12.16.40.13; Sat, 12 Aug 2023 16:40:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=t+rpWnfh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231207AbjHLVBu (ORCPT <rfc822;274620705z@gmail.com> + 99 others); Sat, 12 Aug 2023 17:01:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43536 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230246AbjHLVBt (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sat, 12 Aug 2023 17:01:49 -0400 Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com [IPv6:2607:f8b0:4864:20::44a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 008751709 for <linux-kernel@vger.kernel.org>; Sat, 12 Aug 2023 14:01:51 -0700 (PDT) Received: by mail-pf1-x44a.google.com with SMTP id d2e1a72fcca58-68790b952bbso3560357b3a.0 for <linux-kernel@vger.kernel.org>; Sat, 12 Aug 2023 14:01:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1691874111; x=1692478911; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=ERK5q0s3pNoBkM1cIqzvaJZXjMBumkRX6A7c8Or84Aw=; b=t+rpWnfhBRL0I38yT/xUce/8F3MxLZo3IosDRFfNJM+a4EoUcCEF2OscZMzOdO5fkD 6HCqLycbOofbZRm1DknX5zPAYE1OiZyaRUMd+jwLGTtP2xxKr3RE/n0Fztpm6qAyLmLo 9vQFHtau3fUtqBe8yNVOnqxyqOpJBFUVlg6Qn72tnpIiEgCVJJ/2s86KO3Watgg+mnQs Ow0mnIRbwmQSXUapc9ZhXzw77iLdJaeWwb+6V0pCUdKT+IwxL95WTRZDGXT35vM0nxir p7HtV2rBdo3sGogmuI9Qxnccr3lD4h9w3sBAzX5iEsk2I7vIZpF28wldPF8Zr+GhQ4/a K5iQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691874111; x=1692478911; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=ERK5q0s3pNoBkM1cIqzvaJZXjMBumkRX6A7c8Or84Aw=; b=OSV748kTtGOKC1+FA+nRdxOP+W3SEoM1YOn1CGKYssdnrRHFg7XDUKXvic//4NrNcF UpfRhI/Z27Qg74kNyB+VC2rzIsxsiJSBZZ56flW7IqS2L7IG8wyD0pVCIAg609Ud7SrG 5RTfiQPY/augykFaW2PgnRBQCw2FcWDyyjKaZSFQB30YmBW/nelHMkJQkRVKDkDM2ljS M8O3h0S/xKqT+gA6yx7H+cAwjCZ9aDhxEemNCIgX3HS25m/Fx2NP3bx4F54VWy1rA43g j09BSa7ks8nFvWXmC4MVY/lyXH45QebLSJvM9mSIxNkR5297RTRWZ74vAd7OrJ8qivkf zCdQ== X-Gm-Message-State: AOJu0YyYvbkSU4zYW0TUM7ju/m2HegvoBGYqNZFOVW87YpJB0A1sDGj9 n63y8ieIYCnLmCGCpPD1tDlHVjMpmDov X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a05:6a00:2488:b0:67d:41a8:3e19 with SMTP id c8-20020a056a00248800b0067d41a83e19mr2442047pfv.3.1691874111529; Sat, 12 Aug 2023 14:01:51 -0700 (PDT) Date: Sat, 12 Aug 2023 14:00:53 -0700 Mime-Version: 1.0 X-Mailer: git-send-email 2.41.0.694.ge786442a9b-goog Message-ID: <20230812210053.2325091-1-zokeefe@google.com> Subject: [PATCH] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()" From: "Zach O'Keefe" <zokeefe@google.com> To: linux-mm@kvack.org, Yang Shi <shy828301@gmail.com> Cc: linux-kernel@vger.kernel.org, "Zach O'Keefe" <zokeefe@google.com>, Saurabh Singh Sengar <ssengar@microsoft.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1774068577397807576 X-GMAIL-MSGID: 1774068577397807576 |
Series |
mm/thp: fix "mm: thp: kill __transhuge_page_enabled()"
|
|
Commit Message
Zach O'Keefe
Aug. 12, 2023, 9 p.m. UTC
The 6.0 commits:
commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()")
commit 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
merged "can we have THPs in this VMA?" logic that was previously done
separately by fault-path, khugepaged, and smaps "THPeligible".
During the process, the check on VM_NO_KHUGEPAGED from the khugepaged
path was accidentally added to fault and smaps paths. Certainly the
previous behavior for fault should be restored, and since smaps should
report the union of THP eligibility for fault and khugepaged, also opt
smaps out of this constraint.
Fixes: 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
Reported-by: Saurabh Singh Sengar <ssengar@microsoft.com>
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Cc: Yang Shi <shy828301@gmail.com>
---
mm/huge_memory.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Comments
On Sat, Aug 12, 2023 at 2:01 PM Zach O'Keefe <zokeefe@google.com> wrote: > > The 6.0 commits: > > commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()") > commit 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()") > > merged "can we have THPs in this VMA?" logic that was previously done > separately by fault-path, khugepaged, and smaps "THPeligible". > > During the process, the check on VM_NO_KHUGEPAGED from the khugepaged > path was accidentally added to fault and smaps paths. Certainly the > previous behavior for fault should be restored, and since smaps should > report the union of THP eligibility for fault and khugepaged, also opt > smaps out of this constraint. > > Fixes: 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()") > Reported-by: Saurabh Singh Sengar <ssengar@microsoft.com> > Signed-off-by: Zach O'Keefe <zokeefe@google.com> > Cc: Yang Shi <shy828301@gmail.com> > --- > mm/huge_memory.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index eb3678360b97..e098c26d5e2e 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -96,11 +96,11 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, > return in_pf; > > /* > - * Special VMA and hugetlb VMA. > + * khugepaged check for special VMA and hugetlb VMA. > * Must be checked after dax since some dax mappings may have > * VM_MIXEDMAP set. > */ > - if (vm_flags & VM_NO_KHUGEPAGED) > + if (!in_pf && !smaps && (vm_flags & VM_NO_KHUGEPAGED)) > return false; > > /* > -- > 2.41.0.694.ge786442a9b-goog > I should note that this was discussed before[1], and VM_MIXEDMAP was called out then, but we didn't have any use cases. What was reported broken by Saurabh was an out-of-tree driver that relies on being able to fault in THPs over VM_HUGEPAGE|VM_MIXEDMAP VMAs. We mentioned back then we could always opt fault-path out of this check in the future, and it seems like we should. To that extent, should this be added to stable? Apologies, I should have added this context to the commit log. Best, Zach [1] https://lore.kernel.org/linux-mm/YqdPmitColnzlXJ0@google.com/
> -----Original Message----- > From: Zach O'Keefe <zokeefe@google.com> > Sent: Sunday, August 13, 2023 2:31 AM > To: linux-mm@kvack.org; Yang Shi <shy828301@gmail.com> > Cc: linux-kernel@vger.kernel.org; Zach O'Keefe <zokeefe@google.com>; > Saurabh Singh Sengar <ssengar@microsoft.com> > Subject: [EXTERNAL] [PATCH] mm/thp: fix "mm: thp: kill > __transhuge_page_enabled()" > > [You don't often get email from zokeefe@google.com. Learn why this is > important at https://aka.ms/LearnAboutSenderIdentification ] > > The 6.0 commits: > > commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()") commit > 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()") > > merged "can we have THPs in this VMA?" logic that was previously done > separately by fault-path, khugepaged, and smaps "THPeligible". > > During the process, the check on VM_NO_KHUGEPAGED from the > khugepaged path was accidentally added to fault and smaps paths. Certainly > the previous behavior for fault should be restored, and since smaps should > report the union of THP eligibility for fault and khugepaged, also opt smaps > out of this constraint. > > Fixes: 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()") > Reported-by: Saurabh Singh Sengar <ssengar@microsoft.com> > Signed-off-by: Zach O'Keefe <zokeefe@google.com> > Cc: Yang Shi <shy828301@gmail.com> > --- > mm/huge_memory.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c index > eb3678360b97..e098c26d5e2e 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -96,11 +96,11 @@ bool hugepage_vma_check(struct vm_area_struct > *vma, unsigned long vm_flags, > return in_pf; > > /* > - * Special VMA and hugetlb VMA. > + * khugepaged check for special VMA and hugetlb VMA. > * Must be checked after dax since some dax mappings may have > * VM_MIXEDMAP set. > */ > - if (vm_flags & VM_NO_KHUGEPAGED) > + if (!in_pf && !smaps && (vm_flags & VM_NO_KHUGEPAGED)) > return false; > > /* > -- > 2.41.0.694.ge786442a9b-goog Thanks for the patch, I realized with the commit 9fec51689ff60, !vma_is_anonymous restriction is also introduced. To make fault path work same as before we need relaxation for this check as well. Can we add the below as will in this patch: - if (!vma_is_anonymous(vma)) + if (!is_pf && !vma_is_anonymous(vma)) return false; - Saurabh
On Mon, Aug 14, 2023 at 05:04:47PM -0700, Zach O'Keefe wrote: > > From a large folios perspective, filesystems do not implement a special > > handler. They call filemap_fault() (directly or indirectly) from their > > ->fault handler. If there is already a folio in the page cache which > > satisfies this fault, we insert it into the page tables (no matter what > > size it is). If there is no folio, we call readahead to populate that > > index in the page cache, and probably some other indices around it. > > That's do_sync_mmap_readahead(). > > > > If you look at that, you'll see that we check the VM_HUGEPAGE flag, and > > if set we align to a PMD boundary and read two PMD-size pages (so that we > > can do async readahead for the second page, if we're doing a linear scan). > > If the VM_HUGEPAGE flag isn't set, we'll use the readahead algorithm to > > decide how large the folio should be that we're reading into; if it's a > > random read workload, we'll stick to order-0 pages, but if we're getting > > good hit rate from the linear scan, we'll increase the size (although > > we won't go past PMD size) > > > > There's also the ->map_pages() optimisation which handles page faults > > locklessly, and will fail back to ->fault() if there's even a light > > breeze. I don't think that's of any particular use in answering your > > question, so I'm not going into details about it. > > > > I'm not sure I understand the code that's being modified well enough to > > be able to give you a straight answer to your question, but hopefully > > this is helpful to you. > > Thank you, this was great info. I had thought, incorrectly, that large > folio work would eventually tie into that ->huge_fault() handler > (should be dax_huge_fault() ?) > > If that's the case, then faulting file-backed, non-DAX memory as > (pmd-mapped-)THPs isn't supported at all, and no fault lies with the > aforementioned patches. Ah, wait, hang on. You absolutely can get a PMD mapping by calling into ->fault. Look at how finish_fault() works: if (pmd_none(*vmf->pmd)) { if (PageTransCompound(page)) { ret = do_set_pmd(vmf, page); if (ret != VM_FAULT_FALLBACK) return ret; } if (vmf->prealloc_pte) pmd_install(vma->vm_mm, vmf->pmd, &vmf->prealloc_pte); So if we find a large folio that is PMD mappable, and there's nothing at vmf->pmd, we install a PMD-sized mapping at that spot. If that fails, we install the preallocated PTE table at vmf->pmd and continue to trying set one or more PTEs to satisfy this page fault. So why, you may be asking, do we have ->huge_fault. Well, you should ask the clown who did commit b96375f74a6d ... in fairness to me, finish_fault() did not exist at the time, and the ability to return a PMD-sized page was added later.
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index eb3678360b97..e098c26d5e2e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -96,11 +96,11 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, return in_pf; /* - * Special VMA and hugetlb VMA. + * khugepaged check for special VMA and hugetlb VMA. * Must be checked after dax since some dax mappings may have * VM_MIXEDMAP set. */ - if (vm_flags & VM_NO_KHUGEPAGED) + if (!in_pf && !smaps && (vm_flags & VM_NO_KHUGEPAGED)) return false; /*