diff mbox series

[04/13] mm: Fix pmd_read_atomic()

Message ID	20221022114424.711181252@infradead.org
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Message-ID: <20221022114424.711181252@infradead.org> User-Agent: quilt/0.66 Date: Sat, 22 Oct 2022 13:14:07 +0200 From: Peter Zijlstra <peterz@infradead.org> To: x86@kernel.org, willy@infradead.org, torvalds@linux-foundation.org, akpm@linux-foundation.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, linux-mm@kvack.org, aarcange@redhat.com, kirill.shutemov@linux.intel.com, jroedel@suse.de, ubizjak@gmail.com Subject: [PATCH 04/13] mm: Fix pmd_read_atomic() References: <20221022111403.531902164@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Precedence: bulk
Series	Clean up pmd_get_atomic() and i386-PAE \| [00/13] Clean up pmd_get_atomic() and i386-PAE [01/13] mm: Update ptep_get_lockless()s comment [02/13] x86/mm/pae: Make pmd_t similar to pte_t [03/13] sh/mm: Make pmd_t similar to pte_t [04/13] mm: Fix pmd_read_atomic() [05/13] mm: Rename GUP_GET_PTE_LOW_HIGH [06/13] mm: Rename pmd_read_atomic() [07/13] mm/gup: Fix the lockless PMD access [08/13] x86/mm/pae: Dont (ab)use atomic64 [09/13] x86/mm/pae: Use WRITE_ONCE() [10/13] x86/mm/pae: Be consistent with pXXp_get_and_clear() [11/13] x86_64: Remove pointless set_64bit() usage [12/13] x86/mm/pae: Get rid of set_64bit() [13/13] mm: Remove pointless barrier() after pmdp_get_lockless()

Commit Message

Peter Zijlstra Oct. 22, 2022, 11:14 a.m. UTC

  AFAICT there's no reason to do anything different than what we do for
PTEs. Make it so (also affects SH).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/pgtable-3level.h |   56 ----------------------------------
 include/linux/pgtable.h               |   49 +++++++++++++++++++++++------
 2 files changed, 39 insertions(+), 66 deletions(-)

Comments

Linus Torvalds Oct. 22, 2022, 5:30 p.m. UTC | #1

On Sat, Oct 22, 2022 at 4:48 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -258,6 +258,13 @@ static inline pte_t ptep_get(pte_t *ptep
>  }
>  #endif
>
> +#ifndef __HAVE_ARCH_PMDP_GET
> +static inline pmd_t pmdp_get(pmd_t *pmdp)
> +{
> +       return READ_ONCE(*pmdp);
> +}
> +#endif

What, what, what?

Where did that __HAVE_ARCH_PMDP_GET come from?

I'm not seeing it #define'd anywhere, and we _really_ shouldn't be
doing this any more.

Please just do

    #ifndef pmdp_get
    static inline pmd_t pmdp_get(pmd_t *pmdp)
    ..

and have the architectures that do their own pmdp_get(), just have that

   #define pmdp_get pmdp_get

to let the generic code know about it. Instead of making up a new
__HAVE_ARCH_XYZ name.

That "use the same name for testing" pattern means that it shows up
much nicer when grepping for "where does this come from", but also
means that you really never need to make up new names for "does this
exist".

             Linus

Peter Zijlstra Oct. 24, 2022, 8:09 a.m. UTC | #2

On Sat, Oct 22, 2022 at 10:30:51AM -0700, Linus Torvalds wrote:
> On Sat, Oct 22, 2022 at 4:48 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > --- a/include/linux/pgtable.h
> > +++ b/include/linux/pgtable.h
> > @@ -258,6 +258,13 @@ static inline pte_t ptep_get(pte_t *ptep
> >  }
> >  #endif
> >
> > +#ifndef __HAVE_ARCH_PMDP_GET
> > +static inline pmd_t pmdp_get(pmd_t *pmdp)
> > +{
> > +       return READ_ONCE(*pmdp);
> > +}
> > +#endif
> 
> What, what, what?
> 
> Where did that __HAVE_ARCH_PMDP_GET come from?

Copy/paste like from ptep_get(), that has __HAVE_ARCH_PTEP_GET (which
does appear to get used, once).

Do I break the pattern and simply leave this off, or do I stay
consistent even though we hate it a little? ;-)

Peter Zijlstra Nov. 1, 2022, 12:41 p.m. UTC | #3

On Sat, Oct 22, 2022 at 10:30:51AM -0700, Linus Torvalds wrote:
> On Sat, Oct 22, 2022 at 4:48 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > --- a/include/linux/pgtable.h
> > +++ b/include/linux/pgtable.h
> > @@ -258,6 +258,13 @@ static inline pte_t ptep_get(pte_t *ptep
> >  }
> >  #endif
> >
> > +#ifndef __HAVE_ARCH_PMDP_GET
> > +static inline pmd_t pmdp_get(pmd_t *pmdp)
> > +{
> > +       return READ_ONCE(*pmdp);
> > +}
> > +#endif
> 
> What, what, what?
> 
> Where did that __HAVE_ARCH_PMDP_GET come from?
> 
> I'm not seeing it #define'd anywhere, and we _really_ shouldn't be
> doing this any more.
> 
> Please just do
> 
>     #ifndef pmdp_get
>     static inline pmd_t pmdp_get(pmd_t *pmdp)
>     ..
> 
> and have the architectures that do their own pmdp_get(), just have that
> 
>    #define pmdp_get pmdp_get
> 
> to let the generic code know about it. Instead of making up a new
> __HAVE_ARCH_XYZ name.

So I've stuck the below on. There's a *TON* more to convert and I'm not
going to be doing that just now (seems like a clever enough script
should be able to), but this gets rid of the new one I introduced.

---

Subject: mm: Convert __HAVE_ARCH_P..P_GET to the new style
From: Peter Zijlstra <peterz@infradead.org>
Date: Tue Nov  1 12:53:18 CET 2022

Since __HAVE_ARCH_* style guards have been depricated in favour of
defining the function name onto itself, convert pxxp_get().

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/powerpc/include/asm/nohash/32/pgtable.h |    2 +-
 include/linux/pgtable.h                      |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -263,7 +263,7 @@ static inline pte_basic_t pte_update(str
 }
 
 #ifdef CONFIG_PPC_16K_PAGES
-#define __HAVE_ARCH_PTEP_GET
+#define ptep_get ptep_get
 static inline pte_t ptep_get(pte_t *ptep)
 {
	pte_basic_t val = READ_ONCE(ptep->pte);
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -291,14 +291,14 @@ static inline void ptep_clear(struct mm_
	ptep_get_and_clear(mm, addr, ptep);
 }
 
-#ifndef __HAVE_ARCH_PTEP_GET
+#ifndef ptep_get
 static inline pte_t ptep_get(pte_t *ptep)
 {
	return READ_ONCE(*ptep);
 }
 #endif
 
-#ifndef __HAVE_ARCH_PMDP_GET
+#ifndef pmdp_get
 static inline pmd_t pmdp_get(pmd_t *pmdp)
 {
	return READ_ONCE(*pmdp);

Linus Torvalds Nov. 1, 2022, 5:42 p.m. UTC | #4

On Tue, Nov 1, 2022 at 5:42 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> So I've stuck the below on. There's a *TON* more to convert and I'm not
> going to be doing that just now (seems like a clever enough script
> should be able to), but this gets rid of the new one I introduced.

Thanks.

And no, I don't think the churn of converting old cases is worth it. I
just want to discourage *more* of this.

Using the same name really helps when you do a "git grep" for a
symbol, the whole '#ifndef' patterns for alternate architecture
definitions shows up really clearly.

So that - together with not having the possibility of mixing up names
- is the main reason I don't like the ARCH_HAS_XYZ pattern, and much
prefer just using the name of whichever function gets an architecture
override.

                 Linus

PS. I'd love to get an ack/nak on the "mm: delay rmap removal until
after TLB flush" thing.

diff mbox series

Patch

--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -34,62 +34,6 @@  static inline void native_set_pte(pte_t
 	ptep->pte_low = pte.pte_low;
 }
 
-#define pmd_read_atomic pmd_read_atomic
-/*
- * pte_offset_map_lock() on 32-bit PAE kernels was reading the pmd_t with
- * a "*pmdp" dereference done by GCC. Problem is, in certain places
- * where pte_offset_map_lock() is called, concurrent page faults are
- * allowed, if the mmap_lock is hold for reading. An example is mincore
- * vs page faults vs MADV_DONTNEED. On the page fault side
- * pmd_populate() rightfully does a set_64bit(), but if we're reading the
- * pmd_t with a "*pmdp" on the mincore side, a SMP race can happen
- * because GCC will not read the 64-bit value of the pmd atomically.
- *
- * To fix this all places running pte_offset_map_lock() while holding the
- * mmap_lock in read mode, shall read the pmdp pointer using this
- * function to know if the pmd is null or not, and in turn to know if
- * they can run pte_offset_map_lock() or pmd_trans_huge() or other pmd
- * operations.
- *
- * Without THP if the mmap_lock is held for reading, the pmd can only
- * transition from null to not null while pmd_read_atomic() runs. So
- * we can always return atomic pmd values with this function.
- *
- * With THP if the mmap_lock is held for reading, the pmd can become
- * trans_huge or none or point to a pte (and in turn become "stable")
- * at any time under pmd_read_atomic(). We could read it truly
- * atomically here with an atomic64_read() for the THP enabled case (and
- * it would be a whole lot simpler), but to avoid using cmpxchg8b we
- * only return an atomic pmdval if the low part of the pmdval is later
- * found to be stable (i.e. pointing to a pte). We are also returning a
- * 'none' (zero) pmdval if the low part of the pmd is zero.
- *
- * In some cases the high and low part of the pmdval returned may not be
- * consistent if THP is enabled (the low part may point to previously
- * mapped hugepage, while the high part may point to a more recently
- * mapped hugepage), but pmd_none_or_trans_huge_or_clear_bad() only
- * needs the low part of the pmd to be read atomically to decide if the
- * pmd is unstable or not, with the only exception when the low part
- * of the pmd is zero, in which case we return a 'none' pmd.
- */
-static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
-{
-	pmdval_t ret;
-	u32 *tmp = (u32 *)pmdp;
-
-	ret = (pmdval_t) (*tmp);
-	if (ret) {
-		/*
-		 * If the low part is null, we must not read the high part
-		 * or we can end up with a partial pmd.
-		 */
-		smp_rmb();
-		ret |= ((pmdval_t)*(tmp + 1)) << 32;
-	}
-
-	return (pmd_t) { .pmd = ret };
-}
-
 static inline void native_set_pte_atomic(pte_t *ptep, pte_t pte)
 {
 	set_64bit((unsigned long long *)(ptep), native_pte_val(pte));
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -258,6 +258,13 @@  static inline pte_t ptep_get(pte_t *ptep
 }
 #endif
 
+#ifndef __HAVE_ARCH_PMDP_GET
+static inline pmd_t pmdp_get(pmd_t *pmdp)
+{
+	return READ_ONCE(*pmdp);
+}
+#endif
+
 #ifdef CONFIG_GUP_GET_PTE_LOW_HIGH
 /*
  * For walking the pagetables without holding any locks.  Some architectures
@@ -302,15 +309,42 @@  static inline pte_t ptep_get_lockless(pt
 
 	return pte;
 }
-#else /* CONFIG_GUP_GET_PTE_LOW_HIGH */
+#define ptep_get_lockless ptep_get_lockless
+
+#if CONFIG_PGTABLE_LEVELS > 2
+static inline pmd_t pmdp_get_lockless(pmd_t *pmdp)
+{
+	pmd_t pmd;
+
+	do {
+		pmd.pmd_low = pmdp->pmd_low;
+		smp_rmb();
+		pmd.pmd_high = pmdp->pmd_high;
+		smp_rmb();
+	} while (unlikely(pmd.pmd_low != pmdp->pmd_low));
+
+	return pmd;
+}
+#define pmdp_get_lockless pmdp_get_lockless
+#endif /* CONFIG_PGTABLE_LEVELS > 2 */
+#endif /* CONFIG_GUP_GET_PTE_LOW_HIGH */
+
 /*
  * We require that the PTE can be read atomically.
  */
+#ifndef ptep_get_lockless
 static inline pte_t ptep_get_lockless(pte_t *ptep)
 {
 	return ptep_get(ptep);
 }
-#endif /* CONFIG_GUP_GET_PTE_LOW_HIGH */
+#endif
+
+#ifndef pmdp_get_lockless
+static inline pmd_t pmdp_get_lockless(pmd_t *pmdp)
+{
+	return pmdp_get(pmdp);
+}
+#endif
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #ifndef __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR
@@ -1211,17 +1247,10 @@  static inline int pud_trans_unstable(pud
 #endif
 }
 
-#ifndef pmd_read_atomic
 static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
 {
-	/*
-	 * Depend on compiler for an atomic pmd read. NOTE: this is
-	 * only going to work, if the pmdval_t isn't larger than
-	 * an unsigned long.
-	 */
-	return *pmdp;
+	return pmdp_get_lockless(pmdp);
 }
-#endif
 
 #ifndef arch_needs_pgtable_deposit
 #define arch_needs_pgtable_deposit() (false)