diff mbox series

[13/31] mm/hmm: retry if pte_offset_map() fails

Message ID	2edc4657-b6ff-3d6e-2342-6b60bfccc5b@google.com
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Date: Sun, 21 May 2023 22:05:15 -0700 (PDT) From: Hugh Dickins <hughd@google.com> To: Andrew Morton <akpm@linux-foundation.org> cc: Mike Kravetz <mike.kravetz@oracle.com>, Mike Rapoport <rppt@kernel.org>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Matthew Wilcox <willy@infradead.org>, David Hildenbrand <david@redhat.com>, Suren Baghdasaryan <surenb@google.com>, Qi Zheng <zhengqi.arch@bytedance.com>, Yang Shi <shy828301@gmail.com>, Mel Gorman <mgorman@techsingularity.net>, Peter Xu <peterx@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>, Alistair Popple <apopple@nvidia.com>, Ralph Campbell <rcampbell@nvidia.com>, Ira Weiny <ira.weiny@intel.com>, Steven Price <steven.price@arm.com>, SeongJae Park <sj@kernel.org>, Naoya Horiguchi <naoya.horiguchi@nec.com>, Christophe Leroy <christophe.leroy@csgroup.eu>, Zack Rusin <zackr@vmware.com>, Jason Gunthorpe <jgg@ziepe.ca>, Axel Rasmussen <axelrasmussen@google.com>, Anshuman Khandual <anshuman.khandual@arm.com>, Pasha Tatashin <pasha.tatashin@soleen.com>, Miaohe Lin <linmiaohe@huawei.com>, Minchan Kim <minchan@kernel.org>, Christoph Hellwig <hch@infradead.org>, Song Liu <song@kernel.org>, Thomas Hellstrom <thomas.hellstrom@linux.intel.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 13/31] mm/hmm: retry if pte_offset_map() fails In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: <2edc4657-b6ff-3d6e-2342-6b60bfccc5b@google.com> References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Precedence: bulk
Series	mm: allow pte_offset_map[_lock]() to fail \| [00/31] mm: allow pte_offset_map[_lock]() to fail [01/31] mm: use pmdp_get_lockless() without surplus barrier() [02/31] mm/migrate: remove cruft from migration_entry_wait()s [03/31] mm/pgtable: kmap_local_page() instead of kmap_atomic() [04/31] mm/pgtable: allow pte_offset_map[_lock]() to fail [05/31] mm/filemap: allow pte_offset_map_lock() to fail [06/31] mm/page_vma_mapped: delete bogosity in page_vma_mapped_walk() [07/31] mm/page_vma_mapped: reformat map_pte() with less indentation [08/31] mm/page_vma_mapped: pte_offset_map_nolock() not pte_lockptr() [09/31] mm/pagewalkers: ACTION_AGAIN if pte_offset_map_lock() fails [10/31] mm/pagewalk: walk_pte_range() allow for pte_offset_map() [11/31] mm/vmwgfx: simplify pmd & pud mapping dirty helpers [12/31] mm/vmalloc: vmalloc_to_page() use pte_offset_kernel() [13/31] mm/hmm: retry if pte_offset_map() fails [14/31] fs/userfaultfd: retry if pte_offset_map() fails [15/31] mm/userfaultfd: allow pte_offset_map_lock() to fail [16/31] mm/debug_vm_pgtable,page_table_check: warn pte map fails [17/31] mm/various: give up if pte_offset_map[_lock]() fails [18/31] mm/mprotect: delete pmd_none_or_clear_bad_unless_trans_huge() [19/31] mm/mremap: retry if either pte_offset_map_*lock() fails [20/31] mm/madvise: clean up pte_offset_map_lock() scans [21/31] mm/madvise: clean up force_shm_swapin_readahead() [22/31] mm/swapoff: allow pte_offset_map[_lock]() to fail [23/31] mm/mglru: allow pte_offset_map_nolock() to fail [24/31] mm/migrate_device: allow pte_offset_map_lock() to fail [25/31] mm/gup: remove FOLL_SPLIT_PMD use of pmd_trans_unstable() [26/31] mm/huge_memory: split huge pmd under one pte_offset_map() [27/31] mm/khugepaged: allow pte_offset_map[_lock]() to fail [28/31] mm/memory: allow pte_offset_map[_lock]() to fail [29/31] mm/memory: handle_pte_fault() use pte_offset_map_nolock() [30/31] mm/pgtable: delete pmd_trans_unstable() and friends [31/31] perf/core: Allow pte_offset_map() to fail

Commit Message

Hugh Dickins May 22, 2023, 5:05 a.m. UTC

  hmm_vma_walk_pmd() is called through mm_walk, but already has a goto
again loop of its own, so take part in that if pte_offset_map() fails.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/hmm.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Qi Zheng May 22, 2023, 12:11 p.m. UTC | #1

On 2023/5/22 13:05, Hugh Dickins wrote:
> hmm_vma_walk_pmd() is called through mm_walk, but already has a goto
> again loop of its own, so take part in that if pte_offset_map() fails.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
>   mm/hmm.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index e23043345615..b1a9159d7c92 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -381,6 +381,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>   	}
>   
>   	ptep = pte_offset_map(pmdp, addr);
> +	if (!ptep)
> +		goto again;
>   	for (; addr < end; addr += PAGE_SIZE, ptep++, hmm_pfns++) {
>   		int r;
>   

I haven't read the entire patch set yet, but taking a note here.
The hmm_vma_handle_pte() will unmap pte and then call
migration_entry_wait() to remap pte, so this may fail, we need to
handle this case like below:

diff --git a/mm/hmm.c b/mm/hmm.c
index 6a151c09de5e..eb726ff0981c 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -276,7 +276,8 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, 
unsigned long addr,
                 if (is_migration_entry(entry)) {
                         pte_unmap(ptep);
                         hmm_vma_walk->last = addr;
-                       migration_entry_wait(walk->mm, pmdp, addr);
+                       if (!migration_entry_wait(walk->mm, pmdp, addr))
+                               return -EAGAIN;
                         return -EBUSY;
                 }

@@ -386,6 +387,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,

                 r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, 
hmm_pfns);
                 if (r) {
+                       if (r == -EAGAIN)
+                               goto again;
                         /* hmm_vma_handle_pte() did pte_unmap() */
                         return r;
                 }

Of course, the migration_entry_wait() also needs to be modified.

Alistair Popple May 23, 2023, 2:39 a.m. UTC | #2

Qi Zheng <qi.zheng@linux.dev> writes:

> On 2023/5/22 13:05, Hugh Dickins wrote:
>> hmm_vma_walk_pmd() is called through mm_walk, but already has a goto
>> again loop of its own, so take part in that if pte_offset_map() fails.
>> Signed-off-by: Hugh Dickins <hughd@google.com>
>> ---
>>   mm/hmm.c | 2 ++
>>   1 file changed, 2 insertions(+)
>> diff --git a/mm/hmm.c b/mm/hmm.c
>> index e23043345615..b1a9159d7c92 100644
>> --- a/mm/hmm.c
>> +++ b/mm/hmm.c
>> @@ -381,6 +381,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>>   	}
>>     	ptep = pte_offset_map(pmdp, addr);
>> +	if (!ptep)
>> +		goto again;
>>   	for (; addr < end; addr += PAGE_SIZE, ptep++, hmm_pfns++) {
>>   		int r;
>>   
>
> I haven't read the entire patch set yet, but taking a note here.
> The hmm_vma_handle_pte() will unmap pte and then call
> migration_entry_wait() to remap pte, so this may fail, we need to
> handle this case like below:

I don't see a problem here. Sure, hmm_vma_handle_pte() might return
-EBUSY but that will get returned up to hmm_range_fault() which will
retry the whole thing again and presumably fail when looking at the PMD.

> diff --git a/mm/hmm.c b/mm/hmm.c
> index 6a151c09de5e..eb726ff0981c 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -276,7 +276,8 @@ static int hmm_vma_handle_pte(struct mm_walk
> *walk, unsigned long addr,
>                 if (is_migration_entry(entry)) {
>                         pte_unmap(ptep);
>                         hmm_vma_walk->last = addr;
> -                       migration_entry_wait(walk->mm, pmdp, addr);
> +                       if (!migration_entry_wait(walk->mm, pmdp, addr))
> +                               return -EAGAIN;
>                         return -EBUSY;
>                 }
>
> @@ -386,6 +387,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>
>                 r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep,
>                 hmm_pfns);
>                 if (r) {
> +                       if (r == -EAGAIN)
> +                               goto again;
>                         /* hmm_vma_handle_pte() did pte_unmap() */
>                         return r;
>                 }
>
> Of course, the migration_entry_wait() also needs to be modified.

Qi Zheng May 23, 2023, 6:06 a.m. UTC | #3

On 2023/5/23 10:39, Alistair Popple wrote:
> 
> Qi Zheng <qi.zheng@linux.dev> writes:
> 
>> On 2023/5/22 13:05, Hugh Dickins wrote:
>>> hmm_vma_walk_pmd() is called through mm_walk, but already has a goto
>>> again loop of its own, so take part in that if pte_offset_map() fails.
>>> Signed-off-by: Hugh Dickins <hughd@google.com>
>>> ---
>>>    mm/hmm.c | 2 ++
>>>    1 file changed, 2 insertions(+)
>>> diff --git a/mm/hmm.c b/mm/hmm.c
>>> index e23043345615..b1a9159d7c92 100644
>>> --- a/mm/hmm.c
>>> +++ b/mm/hmm.c
>>> @@ -381,6 +381,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>>>    	}
>>>      	ptep = pte_offset_map(pmdp, addr);
>>> +	if (!ptep)
>>> +		goto again;
>>>    	for (; addr < end; addr += PAGE_SIZE, ptep++, hmm_pfns++) {
>>>    		int r;
>>>    
>>
>> I haven't read the entire patch set yet, but taking a note here.
>> The hmm_vma_handle_pte() will unmap pte and then call
>> migration_entry_wait() to remap pte, so this may fail, we need to
>> handle this case like below:
> 
> I don't see a problem here. Sure, hmm_vma_handle_pte() might return
> -EBUSY but that will get returned up to hmm_range_fault() which will
> retry the whole thing again and presumably fail when looking at the PMD.

Yeah. There is no problem with this and the modification to
migration_entry_wait() can be simplified. My previous thought was that
we can finish the retry logic in hmm_vma_walk_pmd() without handing it
over to the caller. :)

> 
>> diff --git a/mm/hmm.c b/mm/hmm.c
>> index 6a151c09de5e..eb726ff0981c 100644
>> --- a/mm/hmm.c
>> +++ b/mm/hmm.c
>> @@ -276,7 +276,8 @@ static int hmm_vma_handle_pte(struct mm_walk
>> *walk, unsigned long addr,
>>                  if (is_migration_entry(entry)) {
>>                          pte_unmap(ptep);
>>                          hmm_vma_walk->last = addr;
>> -                       migration_entry_wait(walk->mm, pmdp, addr);
>> +                       if (!migration_entry_wait(walk->mm, pmdp, addr))
>> +                               return -EAGAIN;
>>                          return -EBUSY;
>>                  }
>>
>> @@ -386,6 +387,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>>
>>                  r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep,
>>                  hmm_pfns);
>>                  if (r) {
>> +                       if (r == -EAGAIN)
>> +                               goto again;
>>                          /* hmm_vma_handle_pte() did pte_unmap() */
>>                          return r;
>>                  }
>>
>> Of course, the migration_entry_wait() also needs to be modified.
>

Hugh Dickins May 24, 2023, 2:50 a.m. UTC | #4

On Tue, 23 May 2023, Qi Zheng wrote:
> On 2023/5/23 10:39, Alistair Popple wrote:
> > Qi Zheng <qi.zheng@linux.dev> writes:
> >> On 2023/5/22 13:05, Hugh Dickins wrote:
> >>> hmm_vma_walk_pmd() is called through mm_walk, but already has a goto
> >>> again loop of its own, so take part in that if pte_offset_map() fails.
> >>> Signed-off-by: Hugh Dickins <hughd@google.com>
> >>> ---
> >>>    mm/hmm.c | 2 ++
> >>>    1 file changed, 2 insertions(+)
> >>> diff --git a/mm/hmm.c b/mm/hmm.c
> >>> index e23043345615..b1a9159d7c92 100644
> >>> --- a/mm/hmm.c
> >>> +++ b/mm/hmm.c
> >>> @@ -381,6 +381,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
> >>>     }
> >>>      	ptep = pte_offset_map(pmdp, addr);
> >>> +	if (!ptep)
> >>> +		goto again;
> >>>     for (; addr < end; addr += PAGE_SIZE, ptep++, hmm_pfns++) {
> >>>      int r;
> >>>    
> >>
> >> I haven't read the entire patch set yet, but taking a note here.
> >> The hmm_vma_handle_pte() will unmap pte and then call
> >> migration_entry_wait() to remap pte, so this may fail, we need to
> >> handle this case like below:
> > 
> > I don't see a problem here. Sure, hmm_vma_handle_pte() might return
> > -EBUSY but that will get returned up to hmm_range_fault() which will
> > retry the whole thing again and presumably fail when looking at the PMD.
> 
> Yeah. There is no problem with this and the modification to
> migration_entry_wait() can be simplified. My previous thought was that
> we can finish the retry logic in hmm_vma_walk_pmd() without handing it
> over to the caller. :)

Okay, Alistair has resolved this one, thanks, I agree; but what is
"the modification to migration_entry_wait()" that you refer to there?

I don't think there's any need to make it a bool, it's normal for there
to be races on entry to migration_entry_wait(), and we're used to just
returning to caller (and back up to userspace) when it does not wait.

Hugh

Alistair Popple May 24, 2023, 5:16 a.m. UTC | #5

Hugh Dickins <hughd@google.com> writes:

> On Tue, 23 May 2023, Qi Zheng wrote:
>> On 2023/5/23 10:39, Alistair Popple wrote:
>> > Qi Zheng <qi.zheng@linux.dev> writes:
>> >> On 2023/5/22 13:05, Hugh Dickins wrote:
>> >>> hmm_vma_walk_pmd() is called through mm_walk, but already has a goto
>> >>> again loop of its own, so take part in that if pte_offset_map() fails.
>> >>> Signed-off-by: Hugh Dickins <hughd@google.com>
>> >>> ---
>> >>>    mm/hmm.c | 2 ++
>> >>>    1 file changed, 2 insertions(+)
>> >>> diff --git a/mm/hmm.c b/mm/hmm.c
>> >>> index e23043345615..b1a9159d7c92 100644
>> >>> --- a/mm/hmm.c
>> >>> +++ b/mm/hmm.c
>> >>> @@ -381,6 +381,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>> >>>     }
>> >>>      	ptep = pte_offset_map(pmdp, addr);
>> >>> +	if (!ptep)
>> >>> +		goto again;
>> >>>     for (; addr < end; addr += PAGE_SIZE, ptep++, hmm_pfns++) {
>> >>>      int r;
>> >>>    
>> >>
>> >> I haven't read the entire patch set yet, but taking a note here.
>> >> The hmm_vma_handle_pte() will unmap pte and then call
>> >> migration_entry_wait() to remap pte, so this may fail, we need to
>> >> handle this case like below:
>> > 
>> > I don't see a problem here. Sure, hmm_vma_handle_pte() might return
>> > -EBUSY but that will get returned up to hmm_range_fault() which will
>> > retry the whole thing again and presumably fail when looking at the PMD.
>> 
>> Yeah. There is no problem with this and the modification to
>> migration_entry_wait() can be simplified. My previous thought was that
>> we can finish the retry logic in hmm_vma_walk_pmd() without handing it
>> over to the caller. :)
>
> Okay, Alistair has resolved this one, thanks, I agree; but what is
> "the modification to migration_entry_wait()" that you refer to there?
>
> I don't think there's any need to make it a bool, it's normal for there
> to be races on entry to migration_entry_wait(), and we're used to just
> returning to caller (and back up to userspace) when it does not wait.

Agreed. I didn't spot any places where returning to the caller without
actually waiting would cause looping. I assume any retries or refaults
will find the cleared PMD and fault/error out in some other manner
anyway.

hmm_range_fault() is the only place that might have been a bit special,
but it looks fine to me so:

Reviewed-by: Alistair Popple <apopple@nvidia.com>

> Hugh

diff mbox series

Patch

diff --git a/mm/hmm.c b/mm/hmm.c
index e23043345615..b1a9159d7c92 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -381,6 +381,8 @@  static int hmm_vma_walk_pmd(pmd_t *pmdp,
 	}
 
 	ptep = pte_offset_map(pmdp, addr);
+	if (!ptep)
+		goto again;
 	for (; addr < end; addr += PAGE_SIZE, ptep++, hmm_pfns++) {
 		int r;