Message ID | 739964d-c535-4db4-90ec-2166285b4d47@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp1319220vqr; Sun, 28 May 2023 23:48:56 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ78hd6dtWv+S2so8a3En6xkgOmxVVvx/cv7fcbYsS1uYisBQ3bzCQ+U+ZbjWROTgqr8Vu/j X-Received: by 2002:a17:90b:11d8:b0:256:87f5:68f9 with SMTP id gv24-20020a17090b11d800b0025687f568f9mr1515228pjb.36.1685342936371; Sun, 28 May 2023 23:48:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685342936; cv=none; d=google.com; s=arc-20160816; b=L47KKSsGGP45uEOIOAkAtaBiw2kCfNOuXB5t7yYwmN9Bdl5ysTKHN+0MeAiK25IjV8 4qRhETFkoKQBtfD0/iVo7kAmy/ViCs2T5tK94Vc5fRFOEIp7XFdanVNv0wE3jX6XN8XO GaPL8vIOA9oe1BbF9Xzrpjdmk7jfWsae+Duz32hrEAElaflqslvaLJsXmJ5Y+ynRYPwe CNshQioknKxPniJAF5tYaUtq543DV3V83dYOZ5Ds+BtnEwq4T0FzIPvTL5iX9Vb7uWRa VxDBjP32c75xDXwSvMzdkIRMbbX620VU9eAztE4C7J0GCmF8jh0LxPMfHIdouyYVcysj zehA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:message-id:in-reply-to :subject:cc:to:from:date:dkim-signature; bh=2gSRrpVaV+v6oTVvWHOeguB0KnrQ/fwH26LJ25rfvoA=; b=W6uTWpiyoSEnOyHQo3ZZcfvpEUCWjP/S6jnRh9tp97nN/8JYx9vBga7kiulkS7ZVcD 5F4jfgzNIEXdPYSzfYXHabZ5P6+mMx2t95wh6RywHx4MPFgmO7eJlyzoKZjIRrRLp7J2 cEcgkxOzbf8TXT4L9MlyP7j+kBfjg4LOeU7ayls9BMZ0Ebnxamo0/EEI+Ke90FeBS7wB amAKoaxSfh6ks25cl7iXxWBYoV6FdOOpHeI47vltVubm/+tq4mdwosab6b5gdtNOiioi QT7CjRPyEIaqMDVXy5yeldVRaITsGUfgfIDKGlJsVtsipJ6OKmoNMW+/iVRww6F5kWxa ofpA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=mUjO4pWK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id pi5-20020a17090b1e4500b002535e5e607asi9935723pjb.150.2023.05.28.23.48.44; Sun, 28 May 2023 23:48:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=mUjO4pWK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231351AbjE2GYZ (ORCPT <rfc822;callmefire3@gmail.com> + 99 others); Mon, 29 May 2023 02:24:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51422 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230158AbjE2GYX (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 29 May 2023 02:24:23 -0400 Received: from mail-yb1-xb34.google.com (mail-yb1-xb34.google.com [IPv6:2607:f8b0:4864:20::b34]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E8CB100 for <linux-kernel@vger.kernel.org>; Sun, 28 May 2023 23:23:52 -0700 (PDT) Received: by mail-yb1-xb34.google.com with SMTP id 3f1490d57ef6-ba81f71dfefso4296192276.0 for <linux-kernel@vger.kernel.org>; Sun, 28 May 2023 23:23:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685341431; x=1687933431; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=2gSRrpVaV+v6oTVvWHOeguB0KnrQ/fwH26LJ25rfvoA=; b=mUjO4pWKJgDDKZpKrtcPe8FWu0b2JY0+TgvW1MX4OtSul/CzSbQKQ3pqCcMlWFJNGp gWOyHbCB9BXc3wVa0KzB77Yil0KZvLHMQCnsIsQVZ9fFzAnqckLLbO2LnWVC3QgXUA2m AKj7k1CX2prs/vdJ8Dux8TrJisS6znNjC1Vtrv/PsdyCPjLHWieMNw6R9XprNyEWjRCZ 4j0MNun5Zv7iQwiFWh+o2sHynbvf1kUMSyoEE1HC9cxss3LWR7d6E9drnyGo/GZyrbJL X1vYzNNY1HAUAQnzj74UQMsBrtflmgqlc89omMQkyN/nWA1df9Rcpj56XseLcz9CDb7K rRHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685341431; x=1687933431; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2gSRrpVaV+v6oTVvWHOeguB0KnrQ/fwH26LJ25rfvoA=; b=bfyhuat8baSHlbVL5vZ0nWX0Y4m6DavaI89nlBOrFz3r529U/BmNUUPUKsdocYNErU FLD5WBfjsZmoqG+Za8LdL88YD5gcFVADMkgqY9S9z+a5ibJBR7GYN4BAXeQQR3a2wvhw iY9btR/U/CF0v6ZnJl6hGQCm4ajge+ZolLVeXH9zKHvj95aAVTaZBtQwhyvJ7k80LJiY 8FkYsuxOcFYXKKKcYUftVXV7Qj71gcRxniuUmf82R4E7ID6N+l7Cvpszy4CDfgx2jY2T 2y0rg8eQhl+VlF01yRUpDV7JCCFBWHhsv/d3+0Gqc7uNU3RbTKEasKkQkUof2SgYZoRl PeLQ== X-Gm-Message-State: AC+VfDzacjEr/g5UI8xJvja7t0ObkH1WXtYpAuP+z27Q0Ad1JNwwDYNm 5/wqboCcBezk/r48HkaT2q4Nrw== X-Received: by 2002:a25:aae2:0:b0:bab:fdb3:7b56 with SMTP id t89-20020a25aae2000000b00babfdb37b56mr13795477ybi.24.1685341431250; Sun, 28 May 2023 23:23:51 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id j73-20020a25d24c000000b00bb064767a4esm503449ybg.38.2023.05.28.23.23.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 May 2023 23:23:50 -0700 (PDT) Date: Sun, 28 May 2023 23:23:47 -0700 (PDT) From: Hugh Dickins <hughd@google.com> X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton <akpm@linux-foundation.org> cc: Mike Kravetz <mike.kravetz@oracle.com>, Mike Rapoport <rppt@kernel.org>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Matthew Wilcox <willy@infradead.org>, David Hildenbrand <david@redhat.com>, Suren Baghdasaryan <surenb@google.com>, Qi Zheng <zhengqi.arch@bytedance.com>, Yang Shi <shy828301@gmail.com>, Mel Gorman <mgorman@techsingularity.net>, Peter Xu <peterx@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>, Alistair Popple <apopple@nvidia.com>, Ralph Campbell <rcampbell@nvidia.com>, Ira Weiny <ira.weiny@intel.com>, Steven Price <steven.price@arm.com>, SeongJae Park <sj@kernel.org>, Naoya Horiguchi <naoya.horiguchi@nec.com>, Christophe Leroy <christophe.leroy@csgroup.eu>, Zack Rusin <zackr@vmware.com>, Jason Gunthorpe <jgg@ziepe.ca>, Axel Rasmussen <axelrasmussen@google.com>, Anshuman Khandual <anshuman.khandual@arm.com>, Pasha Tatashin <pasha.tatashin@soleen.com>, Miaohe Lin <linmiaohe@huawei.com>, Minchan Kim <minchan@kernel.org>, Christoph Hellwig <hch@infradead.org>, Song Liu <song@kernel.org>, Thomas Hellstrom <thomas.hellstrom@linux.intel.com>, Russell King <linux@armlinux.org.uk>, "David S. Miller" <davem@davemloft.net>, Michael Ellerman <mpe@ellerman.id.au>, "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>, Heiko Carstens <hca@linux.ibm.com>, Christian Borntraeger <borntraeger@linux.ibm.com>, Claudio Imbrenda <imbrenda@linux.ibm.com>, Alexander Gordeev <agordeev@linux.ibm.com>, Jann Horn <jannh@google.com>, linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 08/12] mm/pgtable: add pte_free_defer() for pgtable as page In-Reply-To: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> Message-ID: <739964d-c535-4db4-90ec-2166285b4d47@google.com> References: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767210154586849753?= X-GMAIL-MSGID: =?utf-8?q?1767210154586849753?= |
Series |
mm: free retracted page table by RCU
|
|
Commit Message
Hugh Dickins
May 29, 2023, 6:23 a.m. UTC
Add the generic pte_free_defer(), to call pte_free() via call_rcu().
pte_free_defer() will be called inside khugepaged's retract_page_tables()
loop, where allocating extra memory cannot be relied upon. This version
suits all those architectures which use an unfragmented page for one page
table (none of whose pte_free()s use the mm arg which was passed to it).
Signed-off-by: Hugh Dickins <hughd@google.com>
---
include/linux/pgtable.h | 2 ++
mm/pgtable-generic.c | 20 ++++++++++++++++++++
2 files changed, 22 insertions(+)
Comments
On Sun, May 28, 2023 at 11:23:47PM -0700, Hugh Dickins wrote: > Add the generic pte_free_defer(), to call pte_free() via call_rcu(). > pte_free_defer() will be called inside khugepaged's retract_page_tables() > loop, where allocating extra memory cannot be relied upon. This version > suits all those architectures which use an unfragmented page for one page > table (none of whose pte_free()s use the mm arg which was passed to it). > > Signed-off-by: Hugh Dickins <hughd@google.com> > --- > include/linux/pgtable.h | 2 ++ > mm/pgtable-generic.c | 20 ++++++++++++++++++++ > 2 files changed, 22 insertions(+) > > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h > index 8b0fc7fdc46f..62a8732d92f0 100644 > --- a/include/linux/pgtable.h > +++ b/include/linux/pgtable.h > @@ -112,6 +112,8 @@ static inline void pte_unmap(pte_t *pte) > } > #endif > > +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable); > + > /* Find an entry in the second-level page table.. */ > #ifndef pmd_offset > static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) > diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c > index d28b63386cef..471697dcb244 100644 > --- a/mm/pgtable-generic.c > +++ b/mm/pgtable-generic.c > @@ -13,6 +13,7 @@ > #include <linux/swap.h> > #include <linux/swapops.h> > #include <linux/mm_inline.h> > +#include <asm/pgalloc.h> > #include <asm/tlb.h> > > /* > @@ -230,6 +231,25 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, > return pmd; > } > #endif > + > +/* arch define pte_free_defer in asm/pgalloc.h for its own implementation */ > +#ifndef pte_free_defer > +static void pte_free_now(struct rcu_head *head) > +{ > + struct page *page; > + > + page = container_of(head, struct page, rcu_head); > + pte_free(NULL /* mm not passed and not used */, (pgtable_t)page); > +} > + > +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) > +{ > + struct page *page; > + > + page = pgtable; > + call_rcu(&page->rcu_head, pte_free_now); People have told me that we can't use the rcu_head on the struct page backing page table blocks. I understood it was because PPC was using that memory for something else. I was hoping Mathew's folio conversion would help clarify this.. On the flip side, if we are able to use rcu_head here then we should use it everywhere and also use it mmu_gather.c instead of allocating memory and having the smp_call_function() fallback. This would fix it to be actual RCU. There have been a few talks that it sure would be nice if the page tables were always freed via RCU and every arch just turns on CONFIG_MMU_GATHER_RCU_TABLE_FREE. It seems to me that patch 10 is kind of half doing that by making this one path always use RCU on all arches. AFAIK the main reason it hasn't been done was the lack of a rcu_head.. Jason
On Mon, May 29, 2023 at 8:23 AM Hugh Dickins <hughd@google.com> wrote: > Add the generic pte_free_defer(), to call pte_free() via call_rcu(). > pte_free_defer() will be called inside khugepaged's retract_page_tables() > loop, where allocating extra memory cannot be relied upon. This version > suits all those architectures which use an unfragmented page for one page > table (none of whose pte_free()s use the mm arg which was passed to it). Pages that have been scheduled for deferred freeing can still be locked, right? So struct page's members "ptl" and "rcu_head" can now be in use at the same time? If that's intended, it would probably be a good idea to add comments in the "/* Page table pages */" part of struct page to point out that the first two members can be used by the rcu_head while the page is still used as a page table in some contexts, including use of the ptl.
On Wed, 31 May 2023, Jason Gunthorpe wrote: > On Sun, May 28, 2023 at 11:23:47PM -0700, Hugh Dickins wrote: > > Add the generic pte_free_defer(), to call pte_free() via call_rcu(). > > pte_free_defer() will be called inside khugepaged's retract_page_tables() > > loop, where allocating extra memory cannot be relied upon. This version > > suits all those architectures which use an unfragmented page for one page > > table (none of whose pte_free()s use the mm arg which was passed to it). > > > > Signed-off-by: Hugh Dickins <hughd@google.com> > > --- > > + page = pgtable; > > + call_rcu(&page->rcu_head, pte_free_now); > > People have told me that we can't use the rcu_head on the struct page > backing page table blocks. I understood it was because PPC was using > that memory for something else. In the 05/12 thread, Matthew pointed out that powerpc (and a few others) use the one struct page for multiple page tables, and the lack of multiple rcu_heads means I've got that patch and 06/12 sparc and 07/12 s390 embarrassingly wrong (whereas this generic 08/12 is okay). I believe I know the extra grossness needed for powerpc and sparc: I had it already for powerpc, but fooled myself into thinking not yet needed. But (I haven't quite got there yet) it looks like Gerald is pointing out that s390 is using lru which coincides with rcu_head: I already knew s390 the most difficult, but that will be another layer of difficulty. I expect it was s390 which people warned you of. > > I was hoping Mathew's folio conversion would help clarify this.. I doubt that: what we have for use today is pages, however they are dressed up. > > On the flip side, if we are able to use rcu_head here then we should > use it everywhere and also use it mmu_gather.c instead of allocating > memory and having the smp_call_function() fallback. This would fix it > to be actual RCU. > > There have been a few talks that it sure would be nice if the page > tables were always freed via RCU and every arch just turns on > CONFIG_MMU_GATHER_RCU_TABLE_FREE. It seems to me that patch 10 is kind > of half doing that by making this one path always use RCU on all > arches. > > AFAIK the main reason it hasn't been done was the lack of a rcu_head.. I haven't paid attention to that part of the history, and won't be competent to propagate this further, into MMU-Gather-World; but agree that would be a satisfying conclusion. Hugh
On Thu, Jun 01, 2023 at 11:03:11PM -0700, Hugh Dickins wrote: > > I was hoping Mathew's folio conversion would help clarify this.. > > I doubt that: what we have for use today is pages, however they are > dressed up. I mean the part where Matthew is going and splitting the types and making it much clearer and type safe how the memory is layed out. eg no more guessing if the arch code is overlaying something else onto the rcu_head. Then the hope against hope is that after doing all this we can find enough space for everything including the rcu heads.. Jason
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 8b0fc7fdc46f..62a8732d92f0 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -112,6 +112,8 @@ static inline void pte_unmap(pte_t *pte) } #endif +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable); + /* Find an entry in the second-level page table.. */ #ifndef pmd_offset static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index d28b63386cef..471697dcb244 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -13,6 +13,7 @@ #include <linux/swap.h> #include <linux/swapops.h> #include <linux/mm_inline.h> +#include <asm/pgalloc.h> #include <asm/tlb.h> /* @@ -230,6 +231,25 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, return pmd; } #endif + +/* arch define pte_free_defer in asm/pgalloc.h for its own implementation */ +#ifndef pte_free_defer +static void pte_free_now(struct rcu_head *head) +{ + struct page *page; + + page = container_of(head, struct page, rcu_head); + pte_free(NULL /* mm not passed and not used */, (pgtable_t)page); +} + +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) +{ + struct page *page; + + page = pgtable; + call_rcu(&page->rcu_head, pte_free_now); +} +#endif /* pte_free_defer */ #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #if defined(CONFIG_GUP_GET_PXX_LOW_HIGH) && \