[07/13] mm/gup: Fix the lockless PMD access

Message ID 20221022114424.906110403@infradead.org
State New
Headers
Series Clean up pmd_get_atomic() and i386-PAE |

Commit Message

Peter Zijlstra Oct. 22, 2022, 11:14 a.m. UTC
  On architectures where the PTE/PMD is larger than the native word size
(i386-PAE for example), READ_ONCE() can do the wrong thing. Use
pmdp_get_lockless() just like we use ptep_get_lockless().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/events/core.c |    2 +-
 mm/gup.c             |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
  

Comments

Hugh Dickins Oct. 23, 2022, 12:42 a.m. UTC | #1
On Sat, 22 Oct 2022, Peter Zijlstra wrote:

> On architectures where the PTE/PMD is larger than the native word size
> (i386-PAE for example), READ_ONCE() can do the wrong thing. Use
> pmdp_get_lockless() just like we use ptep_get_lockless().

I thought that was something Will Deacon put a lot of effort
into handling around 5.8 and 5.9: see "strong prevailing wind" in
include/asm-generic/rwonce.h, formerly in include/linux/compiler.h.

Was it too optimistic?  Did the wind drop?

I'm interested in the answer, but I've certainly no objection
to making this all more obviously robust - thanks.

Hugh

> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/events/core.c |    2 +-
>  mm/gup.c             |    2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -7186,7 +7186,7 @@ static u64 perf_get_pgtable_size(struct
>  		return pud_leaf_size(pud);
>  
>  	pmdp = pmd_offset_lockless(pudp, pud, addr);
> -	pmd = READ_ONCE(*pmdp);
> +	pmd = pmdp_get_lockless(pmdp);
>  	if (!pmd_present(pmd))
>  		return 0;
>  
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2507,7 +2507,7 @@ static int gup_pmd_range(pud_t *pudp, pu
>  
>  	pmdp = pmd_offset_lockless(pudp, pud, addr);
>  	do {
> -		pmd_t pmd = READ_ONCE(*pmdp);
> +		pmd_t pmd = pmdp_get_lockless(pmdp);
>  
>  		next = pmd_addr_end(addr, end);
>  		if (!pmd_present(pmd))
  
Peter Zijlstra Oct. 24, 2022, 7:42 a.m. UTC | #2
On Sat, Oct 22, 2022 at 05:42:18PM -0700, Hugh Dickins wrote:
> On Sat, 22 Oct 2022, Peter Zijlstra wrote:
> 
> > On architectures where the PTE/PMD is larger than the native word size
> > (i386-PAE for example), READ_ONCE() can do the wrong thing. Use
> > pmdp_get_lockless() just like we use ptep_get_lockless().
> 
> I thought that was something Will Deacon put a lot of effort
> into handling around 5.8 and 5.9: see "strong prevailing wind" in
> include/asm-generic/rwonce.h, formerly in include/linux/compiler.h.
> 
> Was it too optimistic?  Did the wind drop?
> 
> I'm interested in the answer, but I've certainly no objection
> to making this all more obviously robust - thanks.

READ_ONCE() can't do what the hardware can't do. There is absolutely no
way i386 can do an atomic 64bit load without resorting to cmpxchg8b.

Also see the comment that goes with compiletime_assert_rwonce_type(). It
explicitly allows 64bit because there's just too much stuff that does
that (and there's actually 32bit hardware that *can* do it).

But it's still very wrong.
  
Hugh Dickins Oct. 25, 2022, 3:58 a.m. UTC | #3
On Mon, 24 Oct 2022, Peter Zijlstra wrote:
> On Sat, Oct 22, 2022 at 05:42:18PM -0700, Hugh Dickins wrote:
> > On Sat, 22 Oct 2022, Peter Zijlstra wrote:
> > 
> > > On architectures where the PTE/PMD is larger than the native word size
> > > (i386-PAE for example), READ_ONCE() can do the wrong thing. Use
> > > pmdp_get_lockless() just like we use ptep_get_lockless().
> > 
> > I thought that was something Will Deacon put a lot of effort
> > into handling around 5.8 and 5.9: see "strong prevailing wind" in
> > include/asm-generic/rwonce.h, formerly in include/linux/compiler.h.
> > 
> > Was it too optimistic?  Did the wind drop?
> > 
> > I'm interested in the answer, but I've certainly no objection
> > to making this all more obviously robust - thanks.
> 
> READ_ONCE() can't do what the hardware can't do. There is absolutely no
> way i386 can do an atomic 64bit load without resorting to cmpxchg8b.

Right.

> 
> Also see the comment that goes with compiletime_assert_rwonce_type(). It
> explicitly allows 64bit because there's just too much stuff that does
> that (and there's actually 32bit hardware that *can* do it).

Yes, the "strong prevailing wind" comment. I think I've never read that
carefully enough, until you redirected me back there: it is in fact
quite clear, that it's only *atomic* in the Armv7 + LPAE case; but
READ_ONCEy (READ_EACH_HALF_ONCE I guess) for other 64-on-32 cases.

> 
> But it's still very wrong.

Somewhat clearer to me now, thanks.

Hugh
  

Patch

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7186,7 +7186,7 @@  static u64 perf_get_pgtable_size(struct
 		return pud_leaf_size(pud);
 
 	pmdp = pmd_offset_lockless(pudp, pud, addr);
-	pmd = READ_ONCE(*pmdp);
+	pmd = pmdp_get_lockless(pmdp);
 	if (!pmd_present(pmd))
 		return 0;
 
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2507,7 +2507,7 @@  static int gup_pmd_range(pud_t *pudp, pu
 
 	pmdp = pmd_offset_lockless(pudp, pud, addr);
 	do {
-		pmd_t pmd = READ_ONCE(*pmdp);
+		pmd_t pmd = pmdp_get_lockless(pmdp);
 
 		next = pmd_addr_end(addr, end);
 		if (!pmd_present(pmd))