[08/12] mm/pgtable: add pte_free_defer() for pgtable as page

Message ID 739964d-c535-4db4-90ec-2166285b4d47@google.com
State New
Headers
Series mm: free retracted page table by RCU |

Commit Message

Hugh Dickins May 29, 2023, 6:23 a.m. UTC
  Add the generic pte_free_defer(), to call pte_free() via call_rcu().
pte_free_defer() will be called inside khugepaged's retract_page_tables()
loop, where allocating extra memory cannot be relied upon.  This version
suits all those architectures which use an unfragmented page for one page
table (none of whose pte_free()s use the mm arg which was passed to it).

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 include/linux/pgtable.h |  2 ++
 mm/pgtable-generic.c    | 20 ++++++++++++++++++++
 2 files changed, 22 insertions(+)
  

Comments

Jason Gunthorpe May 31, 2023, 7:48 p.m. UTC | #1
On Sun, May 28, 2023 at 11:23:47PM -0700, Hugh Dickins wrote:
> Add the generic pte_free_defer(), to call pte_free() via call_rcu().
> pte_free_defer() will be called inside khugepaged's retract_page_tables()
> loop, where allocating extra memory cannot be relied upon.  This version
> suits all those architectures which use an unfragmented page for one page
> table (none of whose pte_free()s use the mm arg which was passed to it).
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
>  include/linux/pgtable.h |  2 ++
>  mm/pgtable-generic.c    | 20 ++++++++++++++++++++
>  2 files changed, 22 insertions(+)
> 
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 8b0fc7fdc46f..62a8732d92f0 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -112,6 +112,8 @@ static inline void pte_unmap(pte_t *pte)
>  }
>  #endif
>  
> +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable);
> +
>  /* Find an entry in the second-level page table.. */
>  #ifndef pmd_offset
>  static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
> index d28b63386cef..471697dcb244 100644
> --- a/mm/pgtable-generic.c
> +++ b/mm/pgtable-generic.c
> @@ -13,6 +13,7 @@
>  #include <linux/swap.h>
>  #include <linux/swapops.h>
>  #include <linux/mm_inline.h>
> +#include <asm/pgalloc.h>
>  #include <asm/tlb.h>
>  
>  /*
> @@ -230,6 +231,25 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address,
>  	return pmd;
>  }
>  #endif
> +
> +/* arch define pte_free_defer in asm/pgalloc.h for its own implementation */
> +#ifndef pte_free_defer
> +static void pte_free_now(struct rcu_head *head)
> +{
> +	struct page *page;
> +
> +	page = container_of(head, struct page, rcu_head);
> +	pte_free(NULL /* mm not passed and not used */, (pgtable_t)page);
> +}
> +
> +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable)
> +{
> +	struct page *page;
> +
> +	page = pgtable;
> +	call_rcu(&page->rcu_head, pte_free_now);

People have told me that we can't use the rcu_head on the struct page
backing page table blocks. I understood it was because PPC was using
that memory for something else.

I was hoping Mathew's folio conversion would help clarify this..

On the flip side, if we are able to use rcu_head here then we should
use it everywhere and also use it mmu_gather.c instead of allocating
memory and having the smp_call_function() fallback. This would fix it
to be actual RCU.

There have been a few talks that it sure would be nice if the page
tables were always freed via RCU and every arch just turns on
CONFIG_MMU_GATHER_RCU_TABLE_FREE. It seems to me that patch 10 is kind
of half doing that by making this one path always use RCU on all
arches.

AFAIK the main reason it hasn't been done was the lack of a rcu_head..

Jason
  
Jann Horn June 1, 2023, 1:31 p.m. UTC | #2
On Mon, May 29, 2023 at 8:23 AM Hugh Dickins <hughd@google.com> wrote:
> Add the generic pte_free_defer(), to call pte_free() via call_rcu().
> pte_free_defer() will be called inside khugepaged's retract_page_tables()
> loop, where allocating extra memory cannot be relied upon.  This version
> suits all those architectures which use an unfragmented page for one page
> table (none of whose pte_free()s use the mm arg which was passed to it).

Pages that have been scheduled for deferred freeing can still be
locked, right? So struct page's members "ptl" and "rcu_head" can now
be in use at the same time? If that's intended, it would probably be a
good idea to add comments in the "/* Page table pages */" part of
struct page to point out that the first two members can be used by the
rcu_head while the page is still used as a page table in some
contexts, including use of the ptl.
  
Hugh Dickins June 2, 2023, 6:03 a.m. UTC | #3
On Wed, 31 May 2023, Jason Gunthorpe wrote:
> On Sun, May 28, 2023 at 11:23:47PM -0700, Hugh Dickins wrote:
> > Add the generic pte_free_defer(), to call pte_free() via call_rcu().
> > pte_free_defer() will be called inside khugepaged's retract_page_tables()
> > loop, where allocating extra memory cannot be relied upon.  This version
> > suits all those architectures which use an unfragmented page for one page
> > table (none of whose pte_free()s use the mm arg which was passed to it).
> > 
> > Signed-off-by: Hugh Dickins <hughd@google.com>
> > ---
> > +	page = pgtable;
> > +	call_rcu(&page->rcu_head, pte_free_now);
> 
> People have told me that we can't use the rcu_head on the struct page
> backing page table blocks. I understood it was because PPC was using
> that memory for something else.

In the 05/12 thread, Matthew pointed out that powerpc (and a few others)
use the one struct page for multiple page tables, and the lack of
multiple rcu_heads means I've got that patch and 06/12 sparc and
07/12 s390 embarrassingly wrong (whereas this generic 08/12 is okay).

I believe I know the extra grossness needed for powerpc and sparc: I had
it already for powerpc, but fooled myself into thinking not yet needed.

But (I haven't quite got there yet) it looks like Gerald is pointing
out that s390 is using lru which coincides with rcu_head: I already knew
s390 the most difficult, but that will be another layer of difficulty.

I expect it was s390 which people warned you of.

> 
> I was hoping Mathew's folio conversion would help clarify this..

I doubt that: what we have for use today is pages, however they are
dressed up.

> 
> On the flip side, if we are able to use rcu_head here then we should
> use it everywhere and also use it mmu_gather.c instead of allocating
> memory and having the smp_call_function() fallback. This would fix it
> to be actual RCU.
> 
> There have been a few talks that it sure would be nice if the page
> tables were always freed via RCU and every arch just turns on
> CONFIG_MMU_GATHER_RCU_TABLE_FREE. It seems to me that patch 10 is kind
> of half doing that by making this one path always use RCU on all
> arches.
> 
> AFAIK the main reason it hasn't been done was the lack of a rcu_head..

I haven't paid attention to that part of the history, and won't be
competent to propagate this further, into MMU-Gather-World; but agree
that would be a satisfying conclusion.

Hugh
  
Jason Gunthorpe June 2, 2023, 12:15 p.m. UTC | #4
On Thu, Jun 01, 2023 at 11:03:11PM -0700, Hugh Dickins wrote:
> > I was hoping Mathew's folio conversion would help clarify this..
> 
> I doubt that: what we have for use today is pages, however they are
> dressed up.

I mean the part where Matthew is going and splitting the types and
making it much clearer and type safe how the memory is layed out. eg
no more guessing if the arch code is overlaying something else onto
the rcu_head.

Then the hope against hope is that after doing all this we can find
enough space for everything including the rcu heads..

Jason
  

Patch

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 8b0fc7fdc46f..62a8732d92f0 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -112,6 +112,8 @@  static inline void pte_unmap(pte_t *pte)
 }
 #endif
 
+void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable);
+
 /* Find an entry in the second-level page table.. */
 #ifndef pmd_offset
 static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index d28b63386cef..471697dcb244 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -13,6 +13,7 @@ 
 #include <linux/swap.h>
 #include <linux/swapops.h>
 #include <linux/mm_inline.h>
+#include <asm/pgalloc.h>
 #include <asm/tlb.h>
 
 /*
@@ -230,6 +231,25 @@  pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address,
 	return pmd;
 }
 #endif
+
+/* arch define pte_free_defer in asm/pgalloc.h for its own implementation */
+#ifndef pte_free_defer
+static void pte_free_now(struct rcu_head *head)
+{
+	struct page *page;
+
+	page = container_of(head, struct page, rcu_head);
+	pte_free(NULL /* mm not passed and not used */, (pgtable_t)page);
+}
+
+void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable)
+{
+	struct page *page;
+
+	page = pgtable;
+	call_rcu(&page->rcu_head, pte_free_now);
+}
+#endif /* pte_free_defer */
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 #if defined(CONFIG_GUP_GET_PXX_LOW_HIGH) && \