[13/31] mm/hmm: retry if pte_offset_map() fails

Message ID 2edc4657-b6ff-3d6e-2342-6b60bfccc5b@google.com
State New
Headers
Series mm: allow pte_offset_map[_lock]() to fail |

Commit Message

Hugh Dickins May 22, 2023, 5:05 a.m. UTC
  hmm_vma_walk_pmd() is called through mm_walk, but already has a goto
again loop of its own, so take part in that if pte_offset_map() fails.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/hmm.c | 2 ++
 1 file changed, 2 insertions(+)
  

Comments

Qi Zheng May 22, 2023, 12:11 p.m. UTC | #1
On 2023/5/22 13:05, Hugh Dickins wrote:
> hmm_vma_walk_pmd() is called through mm_walk, but already has a goto
> again loop of its own, so take part in that if pte_offset_map() fails.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
>   mm/hmm.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index e23043345615..b1a9159d7c92 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -381,6 +381,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>   	}
>   
>   	ptep = pte_offset_map(pmdp, addr);
> +	if (!ptep)
> +		goto again;
>   	for (; addr < end; addr += PAGE_SIZE, ptep++, hmm_pfns++) {
>   		int r;
>   

I haven't read the entire patch set yet, but taking a note here.
The hmm_vma_handle_pte() will unmap pte and then call
migration_entry_wait() to remap pte, so this may fail, we need to
handle this case like below:

diff --git a/mm/hmm.c b/mm/hmm.c
index 6a151c09de5e..eb726ff0981c 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -276,7 +276,8 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, 
unsigned long addr,
                 if (is_migration_entry(entry)) {
                         pte_unmap(ptep);
                         hmm_vma_walk->last = addr;
-                       migration_entry_wait(walk->mm, pmdp, addr);
+                       if (!migration_entry_wait(walk->mm, pmdp, addr))
+                               return -EAGAIN;
                         return -EBUSY;
                 }

@@ -386,6 +387,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,

                 r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, 
hmm_pfns);
                 if (r) {
+                       if (r == -EAGAIN)
+                               goto again;
                         /* hmm_vma_handle_pte() did pte_unmap() */
                         return r;
                 }

Of course, the migration_entry_wait() also needs to be modified.
  
Alistair Popple May 23, 2023, 2:39 a.m. UTC | #2
Qi Zheng <qi.zheng@linux.dev> writes:

> On 2023/5/22 13:05, Hugh Dickins wrote:
>> hmm_vma_walk_pmd() is called through mm_walk, but already has a goto
>> again loop of its own, so take part in that if pte_offset_map() fails.
>> Signed-off-by: Hugh Dickins <hughd@google.com>
>> ---
>>   mm/hmm.c | 2 ++
>>   1 file changed, 2 insertions(+)
>> diff --git a/mm/hmm.c b/mm/hmm.c
>> index e23043345615..b1a9159d7c92 100644
>> --- a/mm/hmm.c
>> +++ b/mm/hmm.c
>> @@ -381,6 +381,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>>   	}
>>     	ptep = pte_offset_map(pmdp, addr);
>> +	if (!ptep)
>> +		goto again;
>>   	for (; addr < end; addr += PAGE_SIZE, ptep++, hmm_pfns++) {
>>   		int r;
>>   
>
> I haven't read the entire patch set yet, but taking a note here.
> The hmm_vma_handle_pte() will unmap pte and then call
> migration_entry_wait() to remap pte, so this may fail, we need to
> handle this case like below:

I don't see a problem here. Sure, hmm_vma_handle_pte() might return
-EBUSY but that will get returned up to hmm_range_fault() which will
retry the whole thing again and presumably fail when looking at the PMD.

> diff --git a/mm/hmm.c b/mm/hmm.c
> index 6a151c09de5e..eb726ff0981c 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -276,7 +276,8 @@ static int hmm_vma_handle_pte(struct mm_walk
> *walk, unsigned long addr,
>                 if (is_migration_entry(entry)) {
>                         pte_unmap(ptep);
>                         hmm_vma_walk->last = addr;
> -                       migration_entry_wait(walk->mm, pmdp, addr);
> +                       if (!migration_entry_wait(walk->mm, pmdp, addr))
> +                               return -EAGAIN;
>                         return -EBUSY;
>                 }
>
> @@ -386,6 +387,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>
>                 r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep,
>                 hmm_pfns);
>                 if (r) {
> +                       if (r == -EAGAIN)
> +                               goto again;
>                         /* hmm_vma_handle_pte() did pte_unmap() */
>                         return r;
>                 }
>
> Of course, the migration_entry_wait() also needs to be modified.
  
Qi Zheng May 23, 2023, 6:06 a.m. UTC | #3
On 2023/5/23 10:39, Alistair Popple wrote:
> 
> Qi Zheng <qi.zheng@linux.dev> writes:
> 
>> On 2023/5/22 13:05, Hugh Dickins wrote:
>>> hmm_vma_walk_pmd() is called through mm_walk, but already has a goto
>>> again loop of its own, so take part in that if pte_offset_map() fails.
>>> Signed-off-by: Hugh Dickins <hughd@google.com>
>>> ---
>>>    mm/hmm.c | 2 ++
>>>    1 file changed, 2 insertions(+)
>>> diff --git a/mm/hmm.c b/mm/hmm.c
>>> index e23043345615..b1a9159d7c92 100644
>>> --- a/mm/hmm.c
>>> +++ b/mm/hmm.c
>>> @@ -381,6 +381,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>>>    	}
>>>      	ptep = pte_offset_map(pmdp, addr);
>>> +	if (!ptep)
>>> +		goto again;
>>>    	for (; addr < end; addr += PAGE_SIZE, ptep++, hmm_pfns++) {
>>>    		int r;
>>>    
>>
>> I haven't read the entire patch set yet, but taking a note here.
>> The hmm_vma_handle_pte() will unmap pte and then call
>> migration_entry_wait() to remap pte, so this may fail, we need to
>> handle this case like below:
> 
> I don't see a problem here. Sure, hmm_vma_handle_pte() might return
> -EBUSY but that will get returned up to hmm_range_fault() which will
> retry the whole thing again and presumably fail when looking at the PMD.

Yeah. There is no problem with this and the modification to
migration_entry_wait() can be simplified. My previous thought was that
we can finish the retry logic in hmm_vma_walk_pmd() without handing it
over to the caller. :)

> 
>> diff --git a/mm/hmm.c b/mm/hmm.c
>> index 6a151c09de5e..eb726ff0981c 100644
>> --- a/mm/hmm.c
>> +++ b/mm/hmm.c
>> @@ -276,7 +276,8 @@ static int hmm_vma_handle_pte(struct mm_walk
>> *walk, unsigned long addr,
>>                  if (is_migration_entry(entry)) {
>>                          pte_unmap(ptep);
>>                          hmm_vma_walk->last = addr;
>> -                       migration_entry_wait(walk->mm, pmdp, addr);
>> +                       if (!migration_entry_wait(walk->mm, pmdp, addr))
>> +                               return -EAGAIN;
>>                          return -EBUSY;
>>                  }
>>
>> @@ -386,6 +387,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>>
>>                  r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep,
>>                  hmm_pfns);
>>                  if (r) {
>> +                       if (r == -EAGAIN)
>> +                               goto again;
>>                          /* hmm_vma_handle_pte() did pte_unmap() */
>>                          return r;
>>                  }
>>
>> Of course, the migration_entry_wait() also needs to be modified.
>
  
Hugh Dickins May 24, 2023, 2:50 a.m. UTC | #4
On Tue, 23 May 2023, Qi Zheng wrote:
> On 2023/5/23 10:39, Alistair Popple wrote:
> > Qi Zheng <qi.zheng@linux.dev> writes:
> >> On 2023/5/22 13:05, Hugh Dickins wrote:
> >>> hmm_vma_walk_pmd() is called through mm_walk, but already has a goto
> >>> again loop of its own, so take part in that if pte_offset_map() fails.
> >>> Signed-off-by: Hugh Dickins <hughd@google.com>
> >>> ---
> >>>    mm/hmm.c | 2 ++
> >>>    1 file changed, 2 insertions(+)
> >>> diff --git a/mm/hmm.c b/mm/hmm.c
> >>> index e23043345615..b1a9159d7c92 100644
> >>> --- a/mm/hmm.c
> >>> +++ b/mm/hmm.c
> >>> @@ -381,6 +381,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
> >>>     }
> >>>      	ptep = pte_offset_map(pmdp, addr);
> >>> +	if (!ptep)
> >>> +		goto again;
> >>>     for (; addr < end; addr += PAGE_SIZE, ptep++, hmm_pfns++) {
> >>>      int r;
> >>>    
> >>
> >> I haven't read the entire patch set yet, but taking a note here.
> >> The hmm_vma_handle_pte() will unmap pte and then call
> >> migration_entry_wait() to remap pte, so this may fail, we need to
> >> handle this case like below:
> > 
> > I don't see a problem here. Sure, hmm_vma_handle_pte() might return
> > -EBUSY but that will get returned up to hmm_range_fault() which will
> > retry the whole thing again and presumably fail when looking at the PMD.
> 
> Yeah. There is no problem with this and the modification to
> migration_entry_wait() can be simplified. My previous thought was that
> we can finish the retry logic in hmm_vma_walk_pmd() without handing it
> over to the caller. :)

Okay, Alistair has resolved this one, thanks, I agree; but what is
"the modification to migration_entry_wait()" that you refer to there?

I don't think there's any need to make it a bool, it's normal for there
to be races on entry to migration_entry_wait(), and we're used to just
returning to caller (and back up to userspace) when it does not wait.

Hugh
  
Alistair Popple May 24, 2023, 5:16 a.m. UTC | #5
Hugh Dickins <hughd@google.com> writes:

> On Tue, 23 May 2023, Qi Zheng wrote:
>> On 2023/5/23 10:39, Alistair Popple wrote:
>> > Qi Zheng <qi.zheng@linux.dev> writes:
>> >> On 2023/5/22 13:05, Hugh Dickins wrote:
>> >>> hmm_vma_walk_pmd() is called through mm_walk, but already has a goto
>> >>> again loop of its own, so take part in that if pte_offset_map() fails.
>> >>> Signed-off-by: Hugh Dickins <hughd@google.com>
>> >>> ---
>> >>>    mm/hmm.c | 2 ++
>> >>>    1 file changed, 2 insertions(+)
>> >>> diff --git a/mm/hmm.c b/mm/hmm.c
>> >>> index e23043345615..b1a9159d7c92 100644
>> >>> --- a/mm/hmm.c
>> >>> +++ b/mm/hmm.c
>> >>> @@ -381,6 +381,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>> >>>     }
>> >>>      	ptep = pte_offset_map(pmdp, addr);
>> >>> +	if (!ptep)
>> >>> +		goto again;
>> >>>     for (; addr < end; addr += PAGE_SIZE, ptep++, hmm_pfns++) {
>> >>>      int r;
>> >>>    
>> >>
>> >> I haven't read the entire patch set yet, but taking a note here.
>> >> The hmm_vma_handle_pte() will unmap pte and then call
>> >> migration_entry_wait() to remap pte, so this may fail, we need to
>> >> handle this case like below:
>> > 
>> > I don't see a problem here. Sure, hmm_vma_handle_pte() might return
>> > -EBUSY but that will get returned up to hmm_range_fault() which will
>> > retry the whole thing again and presumably fail when looking at the PMD.
>> 
>> Yeah. There is no problem with this and the modification to
>> migration_entry_wait() can be simplified. My previous thought was that
>> we can finish the retry logic in hmm_vma_walk_pmd() without handing it
>> over to the caller. :)
>
> Okay, Alistair has resolved this one, thanks, I agree; but what is
> "the modification to migration_entry_wait()" that you refer to there?
>
> I don't think there's any need to make it a bool, it's normal for there
> to be races on entry to migration_entry_wait(), and we're used to just
> returning to caller (and back up to userspace) when it does not wait.

Agreed. I didn't spot any places where returning to the caller without
actually waiting would cause looping. I assume any retries or refaults
will find the cleared PMD and fault/error out in some other manner
anyway.

hmm_range_fault() is the only place that might have been a bit special,
but it looks fine to me so:

Reviewed-by: Alistair Popple <apopple@nvidia.com>

> Hugh
  

Patch

diff --git a/mm/hmm.c b/mm/hmm.c
index e23043345615..b1a9159d7c92 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -381,6 +381,8 @@  static int hmm_vma_walk_pmd(pmd_t *pmdp,
 	}
 
 	ptep = pte_offset_map(pmdp, addr);
+	if (!ptep)
+		goto again;
 	for (; addr < end; addr += PAGE_SIZE, ptep++, hmm_pfns++) {
 		int r;