[v2] mm: remove redundant lru_add_drain() prior to unmapping pages

Message ID 20231214222717.50277-1-jianfeng.w.wang@oracle.com
State New
Headers
Series [v2] mm: remove redundant lru_add_drain() prior to unmapping pages |

Commit Message

Jianfeng Wang Dec. 14, 2023, 10:27 p.m. UTC
  When unmapping VMA pages, pages will be gathered in batch and released by
tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function
tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(),
which calls lru_add_drain() to drain cached pages in folio_batch before
releasing gathered pages. Thus, it is redundant to call lru_add_drain()
before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set.

Remove lru_add_drain() prior to gathering and unmapping pages in
exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set.

Note that the page unmapping process in oom_killer (e.g., in
__oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have
redundant lru_add_drain(). So, this commit makes the code more consistent.

Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com>
---
 mm/mmap.c | 6 ++++++
 1 file changed, 6 insertions(+)
  

Comments

Matthew Wilcox Dec. 14, 2023, 11 p.m. UTC | #1
On Thu, Dec 14, 2023 at 02:27:17PM -0800, Jianfeng Wang wrote:
> When unmapping VMA pages, pages will be gathered in batch and released by
> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function
> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(),
> which calls lru_add_drain() to drain cached pages in folio_batch before
> releasing gathered pages. Thus, it is redundant to call lru_add_drain()
> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set.
> 
> Remove lru_add_drain() prior to gathering and unmapping pages in
> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set.
> 
> Note that the page unmapping process in oom_killer (e.g., in
> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have
> redundant lru_add_drain(). So, this commit makes the code more consistent.

Shouldn't we put this in __tlb_gather_mmu() which already has the
CONFIG_MMU_GATHER_NO_GATHER ifdefs?  That would presuambly help with, eg
zap_page_range_single() too.

> Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com>
> ---
>  mm/mmap.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 1971bfffcc03..da0308eef435 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2330,7 +2330,10 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas,
>  	struct mmu_gather tlb;
>  	unsigned long mt_start = mas->index;
>  
> +	/* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */
> +#ifdef CONFIG_MMU_GATHER_NO_GATHER
>  	lru_add_drain();
> +#endif
>  	tlb_gather_mmu(&tlb, mm);
>  	update_hiwater_rss(mm);
>  	unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked);
> @@ -3300,7 +3303,10 @@ void exit_mmap(struct mm_struct *mm)
>  		return;
>  	}
>  
> +	/* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */
> +#ifdef CONFIG_MMU_GATHER_NO_GATHER
>  	lru_add_drain();
> +#endif
>  	flush_cache_mm(mm);
>  	tlb_gather_mmu_fullmm(&tlb, mm);
>  	/* update_hiwater_rss(mm) here? but nobody should be looking */
> -- 
> 2.42.1
> 
>
  
Jianfeng Wang Dec. 14, 2023, 11:56 p.m. UTC | #2
On 12/14/23 3:00 PM, Matthew Wilcox wrote:
> On Thu, Dec 14, 2023 at 02:27:17PM -0800, Jianfeng Wang wrote:
>> When unmapping VMA pages, pages will be gathered in batch and released by
>> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function
>> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(),
>> which calls lru_add_drain() to drain cached pages in folio_batch before
>> releasing gathered pages. Thus, it is redundant to call lru_add_drain()
>> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set.
>>
>> Remove lru_add_drain() prior to gathering and unmapping pages in
>> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set.
>>
>> Note that the page unmapping process in oom_killer (e.g., in
>> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have
>> redundant lru_add_drain(). So, this commit makes the code more consistent.
> 
> Shouldn't we put this in __tlb_gather_mmu() which already has the
> CONFIG_MMU_GATHER_NO_GATHER ifdefs?  That would presuambly help with, eg
> zap_page_range_single() too.
> 

Thanks. It makes sense to me.
This commit is motivated by a workload that use mmap/unmap heavily.
While the mmu_gather feature is also used by hugetlb, madvise, mprotect,
etc., thus I prefer to have another standalone commit (following this one)
that moves lru_add_drain() to __tlb_gather_mmu() to unify these cases for
not making redundant lru_add_drain() calls when using mmu_gather.

>> Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com>
>> ---
>>  mm/mmap.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index 1971bfffcc03..da0308eef435 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -2330,7 +2330,10 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas,
>>  	struct mmu_gather tlb;
>>  	unsigned long mt_start = mas->index;
>>  
>> +	/* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */
>> +#ifdef CONFIG_MMU_GATHER_NO_GATHER
>>  	lru_add_drain();
>> +#endif
>>  	tlb_gather_mmu(&tlb, mm);
>>  	update_hiwater_rss(mm);
>>  	unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked);
>> @@ -3300,7 +3303,10 @@ void exit_mmap(struct mm_struct *mm)
>>  		return;
>>  	}
>>  
>> +	/* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */
>> +#ifdef CONFIG_MMU_GATHER_NO_GATHER
>>  	lru_add_drain();
>> +#endif
>>  	flush_cache_mm(mm);
>>  	tlb_gather_mmu_fullmm(&tlb, mm);
>>  	/* update_hiwater_rss(mm) here? but nobody should be looking */
>> -- 
>> 2.42.1
>>
>>
  
Jianfeng Wang Dec. 14, 2023, 11:59 p.m. UTC | #3
On 12/14/23 3:00 PM, Matthew Wilcox wrote:
> On Thu, Dec 14, 2023 at 02:27:17PM -0800, Jianfeng Wang wrote:
>> When unmapping VMA pages, pages will be gathered in batch and released by
>> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function
>> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(),
>> which calls lru_add_drain() to drain cached pages in folio_batch before
>> releasing gathered pages. Thus, it is redundant to call lru_add_drain()
>> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set.
>>
>> Remove lru_add_drain() prior to gathering and unmapping pages in
>> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set.
>>
>> Note that the page unmapping process in oom_killer (e.g., in
>> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have
>> redundant lru_add_drain(). So, this commit makes the code more consistent.
> 
> Shouldn't we put this in __tlb_gather_mmu() which already has the
> CONFIG_MMU_GATHER_NO_GATHER ifdefs?  That would presuambly help with, eg
> zap_page_range_single() too.
> 

Thanks. It makes sense to me.
This commit is motivated by a workload that use mmap/unmap heavily.
While the mmu_gather feature is also used by hugetlb, madvise, mprotect,
etc., thus I prefer to have another standalone commit (following this one)
that moves lru_add_drain() to __tlb_gather_mmu() to unify these cases for
not making redundant lru_add_drain() calls when using mmu_gather.

>> Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com>
>> ---
>>  mm/mmap.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index 1971bfffcc03..da0308eef435 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -2330,7 +2330,10 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas,
>>  	struct mmu_gather tlb;
>>  	unsigned long mt_start = mas->index;
>>  
>> +	/* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */
>> +#ifdef CONFIG_MMU_GATHER_NO_GATHER
>>  	lru_add_drain();
>> +#endif
>>  	tlb_gather_mmu(&tlb, mm);
>>  	update_hiwater_rss(mm);
>>  	unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked);
>> @@ -3300,7 +3303,10 @@ void exit_mmap(struct mm_struct *mm)
>>  		return;
>>  	}
>>  
>> +	/* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */
>> +#ifdef CONFIG_MMU_GATHER_NO_GATHER
>>  	lru_add_drain();
>> +#endif
>>  	flush_cache_mm(mm);
>>  	tlb_gather_mmu_fullmm(&tlb, mm);
>>  	/* update_hiwater_rss(mm) here? but nobody should be looking */
>> -- 
>> 2.42.1
>>
>>
  
Matthew Wilcox Dec. 15, 2023, 3:06 a.m. UTC | #4
On Thu, Dec 14, 2023 at 03:59:00PM -0800, Jianfeng Wang wrote:
> On 12/14/23 3:00 PM, Matthew Wilcox wrote:
> > On Thu, Dec 14, 2023 at 02:27:17PM -0800, Jianfeng Wang wrote:
> >> When unmapping VMA pages, pages will be gathered in batch and released by
> >> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function
> >> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(),
> >> which calls lru_add_drain() to drain cached pages in folio_batch before
> >> releasing gathered pages. Thus, it is redundant to call lru_add_drain()
> >> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set.
> >>
> >> Remove lru_add_drain() prior to gathering and unmapping pages in
> >> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set.
> >>
> >> Note that the page unmapping process in oom_killer (e.g., in
> >> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have
> >> redundant lru_add_drain(). So, this commit makes the code more consistent.
> > 
> > Shouldn't we put this in __tlb_gather_mmu() which already has the
> > CONFIG_MMU_GATHER_NO_GATHER ifdefs?  That would presuambly help with, eg
> > zap_page_range_single() too.
> > 
> 
> Thanks. It makes sense to me.
> This commit is motivated by a workload that use mmap/unmap heavily.
> While the mmu_gather feature is also used by hugetlb, madvise, mprotect,
> etc., thus I prefer to have another standalone commit (following this one)
> that moves lru_add_drain() to __tlb_gather_mmu() to unify these cases for
> not making redundant lru_add_drain() calls when using mmu_gather.

That's not normally the approach we take.
  
Jianfeng Wang Dec. 15, 2023, 3:57 a.m. UTC | #5
On 12/14/23 7:06 PM, Matthew Wilcox wrote:
> On Thu, Dec 14, 2023 at 03:59:00PM -0800, Jianfeng Wang wrote:
>> On 12/14/23 3:00 PM, Matthew Wilcox wrote:
>>> On Thu, Dec 14, 2023 at 02:27:17PM -0800, Jianfeng Wang wrote:
>>>> When unmapping VMA pages, pages will be gathered in batch and released by
>>>> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function
>>>> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(),
>>>> which calls lru_add_drain() to drain cached pages in folio_batch before
>>>> releasing gathered pages. Thus, it is redundant to call lru_add_drain()
>>>> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set.
>>>>
>>>> Remove lru_add_drain() prior to gathering and unmapping pages in
>>>> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set.
>>>>
>>>> Note that the page unmapping process in oom_killer (e.g., in
>>>> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have
>>>> redundant lru_add_drain(). So, this commit makes the code more consistent.
>>>
>>> Shouldn't we put this in __tlb_gather_mmu() which already has the
>>> CONFIG_MMU_GATHER_NO_GATHER ifdefs?  That would presuambly help with, eg
>>> zap_page_range_single() too.
>>>
>>
>> Thanks. It makes sense to me.
>> This commit is motivated by a workload that use mmap/unmap heavily.
>> While the mmu_gather feature is also used by hugetlb, madvise, mprotect,
>> etc., thus I prefer to have another standalone commit (following this one)
>> that moves lru_add_drain() to __tlb_gather_mmu() to unify these cases for
>> not making redundant lru_add_drain() calls when using mmu_gather.
> 
> That's not normally the approach we take.

Okay, understood. Thanks for pointing it out.
Let me send a new patch that aligns with your suggestion.
  
Jianfeng Wang Dec. 15, 2023, 8:48 a.m. UTC | #6
On 12/14/23 3:00 PM, Matthew Wilcox wrote:
> On Thu, Dec 14, 2023 at 02:27:17PM -0800, Jianfeng Wang wrote:
>> When unmapping VMA pages, pages will be gathered in batch and released by
>> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function
>> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(),
>> which calls lru_add_drain() to drain cached pages in folio_batch before
>> releasing gathered pages. Thus, it is redundant to call lru_add_drain()
>> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set.
>>
>> Remove lru_add_drain() prior to gathering and unmapping pages in
>> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set.
>>
>> Note that the page unmapping process in oom_killer (e.g., in
>> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have
>> redundant lru_add_drain(). So, this commit makes the code more consistent.
> 
> Shouldn't we put this in __tlb_gather_mmu() which already has the
> CONFIG_MMU_GATHER_NO_GATHER ifdefs?  That would presuambly help with, eg
> zap_page_range_single() too.
> 

After looking at different use cases of tlb_gather_mmu(), I feel it is
questionable to move lru_add_drain() into __tlb_gather_mmu(). There are
two use cases of tlb_gather_mmu(): one for unmapping and releasing pages
(e.g., the two cases in mmap.c); the other one is to update page table
entries and flush TLB without releasing pages (e.g., together with
mprotect_fixup()). For the latter use case, it is reasonable to not call
lru_add_drain() prior to or within tlb_gather_mmu().

Of course, we may update tlb_gather_mmu()'s API to take this into account.
For example, we can have tlb_gather_mmu_for_release() for the first case
and tlb_gather_mmu() for the latter. I'd like to have your opinion on this.
Thanks!

>> Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com>
>> ---
>>  mm/mmap.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index 1971bfffcc03..da0308eef435 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -2330,7 +2330,10 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas,
>>  	struct mmu_gather tlb;
>>  	unsigned long mt_start = mas->index;
>>  
>> +	/* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */
>> +#ifdef CONFIG_MMU_GATHER_NO_GATHER
>>  	lru_add_drain();
>> +#endif
>>  	tlb_gather_mmu(&tlb, mm);
>>  	update_hiwater_rss(mm);
>>  	unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked);
>> @@ -3300,7 +3303,10 @@ void exit_mmap(struct mm_struct *mm)
>>  		return;
>>  	}
>>  
>> +	/* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */
>> +#ifdef CONFIG_MMU_GATHER_NO_GATHER
>>  	lru_add_drain();
>> +#endif
>>  	flush_cache_mm(mm);
>>  	tlb_gather_mmu_fullmm(&tlb, mm);
>>  	/* update_hiwater_rss(mm) here? but nobody should be looking */
>> -- 
>> 2.42.1
>>
>>
  
Matthew Wilcox Dec. 15, 2023, 1:45 p.m. UTC | #7
On Fri, Dec 15, 2023 at 12:48:10AM -0800, Jianfeng Wang wrote:
> On 12/14/23 3:00 PM, Matthew Wilcox wrote:
> > Shouldn't we put this in __tlb_gather_mmu() which already has the
> > CONFIG_MMU_GATHER_NO_GATHER ifdefs?  That would presuambly help with, eg
> > zap_page_range_single() too.
> 
> After looking at different use cases of tlb_gather_mmu(), I feel it is
> questionable to move lru_add_drain() into __tlb_gather_mmu(). There are
> two use cases of tlb_gather_mmu(): one for unmapping and releasing pages
> (e.g., the two cases in mmap.c); the other one is to update page table
> entries and flush TLB without releasing pages (e.g., together with
> mprotect_fixup()). For the latter use case, it is reasonable to not call
> lru_add_drain() prior to or within tlb_gather_mmu().
> 
> Of course, we may update tlb_gather_mmu()'s API to take this into account.
> For example, we can have tlb_gather_mmu_for_release() for the first case
> and tlb_gather_mmu() for the latter. I'd like to have your opinion on this.

Yes, I like this idea.  You're right that there's no need to drain the
lru lists for the other use case, so it makes sense to have two APIs.
  
kernel test robot Dec. 22, 2023, 4:01 a.m. UTC | #8
Hello,

kernel test robot noticed "kernel_BUG_at_mm/page_alloc.c" on:

commit: 5e3b1fa97b04faab8760b62ddab2cd2d62110400 ("[PATCH v2] mm: remove redundant lru_add_drain() prior to unmapping pages")
url: https://github.com/intel-lab-lkp/linux/commits/Jianfeng-Wang/mm-remove-redundant-lru_add_drain-prior-to-unmapping-pages/20231215-062828
base: v6.7-rc5
patch link: https://lore.kernel.org/all/20231214222717.50277-1-jianfeng.w.wang@oracle.com/
patch subject: [PATCH v2] mm: remove redundant lru_add_drain() prior to unmapping pages

in testcase: trinity
version: trinity-i386-abe9de86-1_20230429
with following parameters:

	runtime: 600s

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


compiler: clang-16
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


+--------------------------------------------------------------------------------------+----------+------------+
|                                                                                      | v6.7-rc5 | 5e3b1fa97b |
+--------------------------------------------------------------------------------------+----------+------------+
| kernel_BUG_at_mm/page_alloc.c                                                        | 0        | 10         |
| invalid_opcode:#[##]                                                                 | 0        | 10         |
| EIP:free_unref_page_prepare                                                          | 0        | 10         |
+--------------------------------------------------------------------------------------+----------+------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202312221154.85f3c4a3-oliver.sang@intel.com


[  127.901286][ T2652] ------------[ cut here ]------------
[  127.901579][ T2652] kernel BUG at mm/page_alloc.c:1084!
[  127.901870][ T2652] invalid opcode: 0000 [#1] PREEMPT SMP
[  127.902168][ T2652] CPU: 1 PID: 2652 Comm: trinity-c3 Tainted: G        W        N 6.7.0-rc5-00001-g5e3b1fa97b04 #1 e678fecaf3798641ec130c7b4b31f90b64a8d53d
[ 127.902921][ T2652] EIP: free_unref_page_prepare (mm/page_alloc.c:1084) 
[ 127.903243][ T2652] Code: 8b 55 e4 b9 07 00 00 00 e8 a3 de ff ff 89 47 10 b0 01 83 c4 10 5e 5f 5b 5d 31 c9 31 d2 c3 89 f8 ba 4c 7c 68 c2 e8 86 76 fd ff <0f> 0b 68 c8 21 b2 c2 e8 8a f4 1f 00 0f 0b 0f a3 05 1c e4 c5 c2 0f
All code
========
   0:	8b 55 e4             	mov    -0x1c(%rbp),%edx
   3:	b9 07 00 00 00       	mov    $0x7,%ecx
   8:	e8 a3 de ff ff       	call   0xffffffffffffdeb0
   d:	89 47 10             	mov    %eax,0x10(%rdi)
  10:	b0 01                	mov    $0x1,%al
  12:	83 c4 10             	add    $0x10,%esp
  15:	5e                   	pop    %rsi
  16:	5f                   	pop    %rdi
  17:	5b                   	pop    %rbx
  18:	5d                   	pop    %rbp
  19:	31 c9                	xor    %ecx,%ecx
  1b:	31 d2                	xor    %edx,%edx
  1d:	c3                   	ret
  1e:	89 f8                	mov    %edi,%eax
  20:	ba 4c 7c 68 c2       	mov    $0xc2687c4c,%edx
  25:	e8 86 76 fd ff       	call   0xfffffffffffd76b0
  2a:*	0f 0b                	ud2		<-- trapping instruction
  2c:	68 c8 21 b2 c2       	push   $0xffffffffc2b221c8
  31:	e8 8a f4 1f 00       	call   0x1ff4c0
  36:	0f 0b                	ud2
  38:	0f a3 05 1c e4 c5 c2 	bt     %eax,-0x3d3a1be4(%rip)        # 0xffffffffc2c5e45b
  3f:	0f                   	.byte 0xf

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	68 c8 21 b2 c2       	push   $0xffffffffc2b221c8
   7:	e8 8a f4 1f 00       	call   0x1ff496
   c:	0f 0b                	ud2
   e:	0f a3 05 1c e4 c5 c2 	bt     %eax,-0x3d3a1be4(%rip)        # 0xffffffffc2c5e431
  15:	0f                   	.byte 0xf
[  127.904296][ T2652] EAX: 00000000 EBX: e869ad40 ECX: 00000000 EDX: 00000000
[  127.904681][ T2652] ESI: 00000001 EDI: e869ad40 EBP: ed779b3b ESP: ed779b1f
[  127.905066][ T2652] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010246
[  127.905480][ T2652] CR0: 80050033 CR2: b69e1000 CR3: 2cd56000 CR4: 00040690
[  127.905867][ T2652] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  127.906253][ T2652] DR6: fffe0ff0 DR7: 00000400
[  127.906508][ T2652] Call Trace:
[ 127.906711][ T2652] ? fault_in_iov_iter_readable (lib/iov_iter.c:109) 
[ 127.907028][ T2652] ? generic_perform_write (mm/filemap.c:?) 
[ 127.907339][ T2652] ? shmem_file_write_iter (mm/shmem.c:?) 
[ 127.907637][ T2652] ? do_iter_write (fs/read_write.c:736) 
[ 127.907899][ T2652] ? vfs_writev (fs/read_write.c:933) 
[ 127.908144][ T2652] ? do_writev (fs/read_write.c:976) 
[ 127.908376][ T2652] ? __ia32_sys_writev (fs/read_write.c:1049 fs/read_write.c:1046 fs/read_write.c:1046) 
[ 127.908644][ T2652] ? __do_fast_syscall_32 (arch/x86/entry/common.c:165) 
[ 127.908927][ T2652] ? irqentry_exit_to_user_mode (kernel/entry/common.c:312) 
[ 127.909235][ T2652] ? irqentry_exit (kernel/entry/common.c:445) 
[ 127.909485][ T2652] ? do_fast_syscall_32 (arch/x86/entry/common.c:346) 
[ 127.909757][ T2652] ? do_SYSENTER_32 (arch/x86/entry/common.c:384) 
[ 127.910012][ T2652] ? entry_SYSENTER_32 (arch/x86/entry/entry_32.S:840) 
[  127.910282][ T2652] Modules linked in:
[  127.910512][ T2652] ---[ end trace 0000000000000000 ]---
[ 127.910806][ T2652] EIP: free_unref_page_prepare (mm/page_alloc.c:1084) 
[ 127.911140][ T2652] Code: 8b 55 e4 b9 07 00 00 00 e8 a3 de ff ff 89 47 10 b0 01 83 c4 10 5e 5f 5b 5d 31 c9 31 d2 c3 89 f8 ba 4c 7c 68 c2 e8 86 76 fd ff <0f> 0b 68 c8 21 b2 c2 e8 8a f4 1f 00 0f 0b 0f a3 05 1c e4 c5 c2 0f
All code
========
   0:	8b 55 e4             	mov    -0x1c(%rbp),%edx
   3:	b9 07 00 00 00       	mov    $0x7,%ecx
   8:	e8 a3 de ff ff       	call   0xffffffffffffdeb0
   d:	89 47 10             	mov    %eax,0x10(%rdi)
  10:	b0 01                	mov    $0x1,%al
  12:	83 c4 10             	add    $0x10,%esp
  15:	5e                   	pop    %rsi
  16:	5f                   	pop    %rdi
  17:	5b                   	pop    %rbx
  18:	5d                   	pop    %rbp
  19:	31 c9                	xor    %ecx,%ecx
  1b:	31 d2                	xor    %edx,%edx
  1d:	c3                   	ret
  1e:	89 f8                	mov    %edi,%eax
  20:	ba 4c 7c 68 c2       	mov    $0xc2687c4c,%edx
  25:	e8 86 76 fd ff       	call   0xfffffffffffd76b0
  2a:*	0f 0b                	ud2		<-- trapping instruction
  2c:	68 c8 21 b2 c2       	push   $0xffffffffc2b221c8
  31:	e8 8a f4 1f 00       	call   0x1ff4c0
  36:	0f 0b                	ud2
  38:	0f a3 05 1c e4 c5 c2 	bt     %eax,-0x3d3a1be4(%rip)        # 0xffffffffc2c5e45b
  3f:	0f                   	.byte 0xf

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	68 c8 21 b2 c2       	push   $0xffffffffc2b221c8
   7:	e8 8a f4 1f 00       	call   0x1ff496
   c:	0f 0b                	ud2
   e:	0f a3 05 1c e4 c5 c2 	bt     %eax,-0x3d3a1be4(%rip)        # 0xffffffffc2c5e431
  15:	0f                   	.byte 0xf


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231222/202312221154.85f3c4a3-oliver.sang@intel.com
  

Patch

diff --git a/mm/mmap.c b/mm/mmap.c
index 1971bfffcc03..da0308eef435 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2330,7 +2330,10 @@  static void unmap_region(struct mm_struct *mm, struct ma_state *mas,
 	struct mmu_gather tlb;
 	unsigned long mt_start = mas->index;
 
+	/* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */
+#ifdef CONFIG_MMU_GATHER_NO_GATHER
 	lru_add_drain();
+#endif
 	tlb_gather_mmu(&tlb, mm);
 	update_hiwater_rss(mm);
 	unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked);
@@ -3300,7 +3303,10 @@  void exit_mmap(struct mm_struct *mm)
 		return;
 	}
 
+	/* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */
+#ifdef CONFIG_MMU_GATHER_NO_GATHER
 	lru_add_drain();
+#endif
 	flush_cache_mm(mm);
 	tlb_gather_mmu_fullmm(&tlb, mm);
 	/* update_hiwater_rss(mm) here? but nobody should be looking */