[17/31] mm/various: give up if pte_offset_map[_lock]() fails

Message ID c299eba-4e17-c645-1115-ccd1fd9956bd@google.com
State New
Headers
Series mm: allow pte_offset_map[_lock]() to fail |

Commit Message

Hugh Dickins May 22, 2023, 5:10 a.m. UTC
  Following the examples of nearby code, various functions can just give
up if pte_offset_map() or pte_offset_map_lock() fails.  And there's no
need for a preliminary pmd_trans_unstable() or other such check, since
such cases are now safely handled inside.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/gup.c            | 9 ++++++---
 mm/ksm.c            | 7 ++++---
 mm/memcontrol.c     | 8 ++++----
 mm/memory-failure.c | 8 +++++---
 mm/migrate.c        | 3 +++
 mm/swap_state.c     | 3 +++
 6 files changed, 25 insertions(+), 13 deletions(-)
  

Comments

Qi Zheng May 22, 2023, 12:24 p.m. UTC | #1
On 2023/5/22 13:10, Hugh Dickins wrote:
> Following the examples of nearby code, various functions can just give
> up if pte_offset_map() or pte_offset_map_lock() fails.  And there's no
> need for a preliminary pmd_trans_unstable() or other such check, since
> such cases are now safely handled inside.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
>   mm/gup.c            | 9 ++++++---
>   mm/ksm.c            | 7 ++++---
>   mm/memcontrol.c     | 8 ++++----
>   mm/memory-failure.c | 8 +++++---
>   mm/migrate.c        | 3 +++
>   mm/swap_state.c     | 3 +++
>   6 files changed, 25 insertions(+), 13 deletions(-)
> 

[...]

> diff --git a/mm/migrate.c b/mm/migrate.c
> index 3ecb7a40075f..308a56f0b156 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -305,6 +305,9 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
>   	swp_entry_t entry;
>   
>   	ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
> +	if (!ptep)
> +		return;

Maybe we should return false and let the caller handle the failure.

> +
>   	pte = *ptep;
>   	pte_unmap(ptep);
>   
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index b76a65ac28b3..db2ec85ef332 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -734,6 +734,9 @@ static void swap_ra_info(struct vm_fault *vmf,
>   
>   	/* Copy the PTEs because the page table may be unmapped */
>   	orig_pte = pte = pte_offset_map(vmf->pmd, faddr);
> +	if (!pte)
> +		return;

Ditto?

> +
>   	if (fpfn == pfn + 1) {
>   		lpfn = fpfn;
>   		rpfn = fpfn + win;
  
Qi Zheng May 22, 2023, 12:37 p.m. UTC | #2
On 2023/5/22 20:24, Qi Zheng wrote:
> 
> 
> On 2023/5/22 13:10, Hugh Dickins wrote:
>> Following the examples of nearby code, various functions can just give
>> up if pte_offset_map() or pte_offset_map_lock() fails.  And there's no
>> need for a preliminary pmd_trans_unstable() or other such check, since
>> such cases are now safely handled inside.
>>
>> Signed-off-by: Hugh Dickins <hughd@google.com>
>> ---
>>   mm/gup.c            | 9 ++++++---
>>   mm/ksm.c            | 7 ++++---
>>   mm/memcontrol.c     | 8 ++++----
>>   mm/memory-failure.c | 8 +++++---
>>   mm/migrate.c        | 3 +++
>>   mm/swap_state.c     | 3 +++
>>   6 files changed, 25 insertions(+), 13 deletions(-)
>>
> 
> [...]
> 
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 3ecb7a40075f..308a56f0b156 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -305,6 +305,9 @@ void migration_entry_wait(struct mm_struct *mm, 
>> pmd_t *pmd,
>>       swp_entry_t entry;
>>       ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
>> +    if (!ptep)
>> +        return;
> 
> Maybe we should return false and let the caller handle the failure.
> 
>> +
>>       pte = *ptep;
>>       pte_unmap(ptep);
>> diff --git a/mm/swap_state.c b/mm/swap_state.c
>> index b76a65ac28b3..db2ec85ef332 100644
>> --- a/mm/swap_state.c
>> +++ b/mm/swap_state.c
>> @@ -734,6 +734,9 @@ static void swap_ra_info(struct vm_fault *vmf,
>>       /* Copy the PTEs because the page table may be unmapped */
>>       orig_pte = pte = pte_offset_map(vmf->pmd, faddr);
>> +    if (!pte)
>> +        return;
> 
> Ditto?

Oh, I see that you handle it in the PATCH[22/31].

> 
>> +
>>       if (fpfn == pfn + 1) {
>>           lpfn = fpfn;
>>           rpfn = fpfn + win;
>
  
Hugh Dickins May 24, 2023, 3:20 a.m. UTC | #3
On Mon, 22 May 2023, Qi Zheng wrote:
> On 2023/5/22 20:24, Qi Zheng wrote:
> > On 2023/5/22 13:10, Hugh Dickins wrote:
> >> Following the examples of nearby code, various functions can just give
> >> up if pte_offset_map() or pte_offset_map_lock() fails.  And there's no
> >> need for a preliminary pmd_trans_unstable() or other such check, since
> >> such cases are now safely handled inside.
> >>
> >> Signed-off-by: Hugh Dickins <hughd@google.com>
> >> ---
> >>   mm/gup.c            | 9 ++++++---
> >>   mm/ksm.c            | 7 ++++---
> >>   mm/memcontrol.c     | 8 ++++----
> >>   mm/memory-failure.c | 8 +++++---
> >>   mm/migrate.c        | 3 +++
> >>   mm/swap_state.c     | 3 +++
> >>   6 files changed, 25 insertions(+), 13 deletions(-)
> >>
> > 
> > [...]
> > 
> >> diff --git a/mm/migrate.c b/mm/migrate.c
> >> index 3ecb7a40075f..308a56f0b156 100644
> >> --- a/mm/migrate.c
> >> +++ b/mm/migrate.c
> >> @@ -305,6 +305,9 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t
> >> *pmd,
> >>       swp_entry_t entry;
> >>       ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
> >> +    if (!ptep)
> >> +        return;
> > 
> > Maybe we should return false and let the caller handle the failure.

We have not needed to do that before, it's normal for migration_entry_wait()
not to wait sometimes: it just goes back out to userspace to try again (by
which time the situation is usually resolved).  I don't think we want to
trouble the callers with a new case to handle in some other way.

> > 
> >> +
> >>       pte = *ptep;
> >>       pte_unmap(ptep);
> >> diff --git a/mm/swap_state.c b/mm/swap_state.c
> >> index b76a65ac28b3..db2ec85ef332 100644
> >> --- a/mm/swap_state.c
> >> +++ b/mm/swap_state.c
> >> @@ -734,6 +734,9 @@ static void swap_ra_info(struct vm_fault *vmf,
> >>       /* Copy the PTEs because the page table may be unmapped */
> >>       orig_pte = pte = pte_offset_map(vmf->pmd, faddr);
> >> +    if (!pte)
> >> +        return;
> > 
> > Ditto?
> 
> Oh, I see that you handle it in the PATCH[22/31].

I don't think 22/31 (about swapoff "unuse") relates to this one.
Here swap_vma_readahead() is doing an interesting calculation for
how big the readaround window should be, and my thinking was, who cares?
just read 1, in the rare case that the page table vanishes underneath us.

But thank you for making me look again: it looks like I was not careful
enough before, ra_info->win is definitely *not* 1 on this line, and I
wonder if something bad might result from not following through on the
ensuing calculations - see how !CONFIG_64BIT is copying ptes (and that
implies CONFIG_64BIT is accessing the page table after pte_unmap()).

This swap_ra_info() code looks like it will need a patch all it own:
I must come back to it.

Thanks!
Hugh
  

Patch

diff --git a/mm/gup.c b/mm/gup.c
index 3bd5d3854c51..bb67193c5460 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -544,10 +544,10 @@  static struct page *follow_page_pte(struct vm_area_struct *vma,
 	if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) ==
 			 (FOLL_PIN | FOLL_GET)))
 		return ERR_PTR(-EINVAL);
-	if (unlikely(pmd_bad(*pmd)))
-		return no_page_table(vma, flags);
 
 	ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
+	if (!ptep)
+		return no_page_table(vma, flags);
 	pte = *ptep;
 	if (!pte_present(pte))
 		goto no_page;
@@ -851,8 +851,9 @@  static int get_gate_page(struct mm_struct *mm, unsigned long address,
 	pmd = pmd_offset(pud, address);
 	if (!pmd_present(*pmd))
 		return -EFAULT;
-	VM_BUG_ON(pmd_trans_huge(*pmd));
 	pte = pte_offset_map(pmd, address);
+	if (!pte)
+		return -EFAULT;
 	if (pte_none(*pte))
 		goto unmap;
 	*vma = get_gate_vma(mm);
@@ -2377,6 +2378,8 @@  static int gup_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr,
 	pte_t *ptep, *ptem;
 
 	ptem = ptep = pte_offset_map(&pmd, addr);
+	if (!ptep)
+		return 0;
 	do {
 		pte_t pte = ptep_get_lockless(ptep);
 		struct page *page;
diff --git a/mm/ksm.c b/mm/ksm.c
index df2aa281d49d..3dc15459dd20 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -431,10 +431,9 @@  static int break_ksm_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long nex
 	pte_t *pte;
 	int ret;
 
-	if (pmd_leaf(*pmd) || !pmd_present(*pmd))
-		return 0;
-
 	pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
+	if (!pte)
+		return 0;
 	if (pte_present(*pte)) {
 		page = vm_normal_page(walk->vma, addr, *pte);
 	} else if (!pte_none(*pte)) {
@@ -1203,6 +1202,8 @@  static int replace_page(struct vm_area_struct *vma, struct page *page,
 	mmu_notifier_invalidate_range_start(&range);
 
 	ptep = pte_offset_map_lock(mm, pmd, addr, &ptl);
+	if (!ptep)
+		goto out_mn;
 	if (!pte_same(*ptep, orig_pte)) {
 		pte_unmap_unlock(ptep, ptl);
 		goto out_mn;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4b27e245a055..fdd953655fe1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6057,9 +6057,9 @@  static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd,
 		return 0;
 	}
 
-	if (pmd_trans_unstable(pmd))
-		return 0;
 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
+	if (!pte)
+		return 0;
 	for (; addr != end; pte++, addr += PAGE_SIZE)
 		if (get_mctgt_type(vma, addr, *pte, NULL))
 			mc.precharge++;	/* increment precharge temporarily */
@@ -6277,10 +6277,10 @@  static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
 		return 0;
 	}
 
-	if (pmd_trans_unstable(pmd))
-		return 0;
 retry:
 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
+	if (!pte)
+		return 0;
 	for (; addr != end; addr += PAGE_SIZE) {
 		pte_t ptent = *(pte++);
 		bool device = false;
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 5b663eca1f29..b3cc8f213fe3 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -414,6 +414,8 @@  static unsigned long dev_pagemap_mapping_shift(struct vm_area_struct *vma,
 	if (pmd_devmap(*pmd))
 		return PMD_SHIFT;
 	pte = pte_offset_map(pmd, address);
+	if (!pte)
+		return 0;
 	if (pte_present(*pte) && pte_devmap(*pte))
 		ret = PAGE_SHIFT;
 	pte_unmap(pte);
@@ -800,11 +802,11 @@  static int hwpoison_pte_range(pmd_t *pmdp, unsigned long addr,
 		goto out;
 	}
 
-	if (pmd_trans_unstable(pmdp))
-		goto out;
-
 	mapped_pte = ptep = pte_offset_map_lock(walk->vma->vm_mm, pmdp,
 						addr, &ptl);
+	if (!ptep)
+		goto out;
+
 	for (; addr != end; ptep++, addr += PAGE_SIZE) {
 		ret = check_hwpoisoned_entry(*ptep, addr, PAGE_SHIFT,
 					     hwp->pfn, &hwp->tk);
diff --git a/mm/migrate.c b/mm/migrate.c
index 3ecb7a40075f..308a56f0b156 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -305,6 +305,9 @@  void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
 	swp_entry_t entry;
 
 	ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
+	if (!ptep)
+		return;
+
 	pte = *ptep;
 	pte_unmap(ptep);
 
diff --git a/mm/swap_state.c b/mm/swap_state.c
index b76a65ac28b3..db2ec85ef332 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -734,6 +734,9 @@  static void swap_ra_info(struct vm_fault *vmf,
 
 	/* Copy the PTEs because the page table may be unmapped */
 	orig_pte = pte = pte_offset_map(vmf->pmd, faddr);
+	if (!pte)
+		return;
+
 	if (fpfn == pfn + 1) {
 		lpfn = fpfn;
 		rpfn = fpfn + win;