mm: fix draining remote pageset

Message ID 20230811090819.60845-1-ying.huang@intel.com
State New
Headers
Series mm: fix draining remote pageset |

Commit Message

Huang, Ying Aug. 11, 2023, 9:08 a.m. UTC
  If there is no memory allocation/freeing in the remote pageset after
some time (3 seconds for now), the remote pageset will be drained to
avoid memory wastage.

But in the current implementation, vmstat updater worker may not be
re-queued when we are waiting for the timeout (pcp->expire != 0) if
there are no vmstat changes, for example, when CPU goes idle.

This is fixed via guaranteeing that the vmstat updater worker will
always be re-queued when we are waiting for the timeout.

We can reproduce the bug via allocating/freeing pages from remote
node, then go idle.  And the patch can fix it.

Fixes: 7cc36bbddde5 ("vmstat: on-demand vmstat workers V8")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
---
 mm/vmstat.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
  

Comments

Michal Hocko Aug. 11, 2023, 9:35 a.m. UTC | #1
On Fri 11-08-23 17:08:19, Huang Ying wrote:
> If there is no memory allocation/freeing in the remote pageset after
> some time (3 seconds for now), the remote pageset will be drained to
> avoid memory wastage.
> 
> But in the current implementation, vmstat updater worker may not be
> re-queued when we are waiting for the timeout (pcp->expire != 0) if
> there are no vmstat changes, for example, when CPU goes idle.

Why is that a problem?

> This is fixed via guaranteeing that the vmstat updater worker will
> always be re-queued when we are waiting for the timeout.
> 
> We can reproduce the bug via allocating/freeing pages from remote
> node, then go idle.  And the patch can fix it.
> 
> Fixes: 7cc36bbddde5 ("vmstat: on-demand vmstat workers V8")
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Michal Hocko <mhocko@kernel.org>
> ---
>  mm/vmstat.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index b731d57996c5..111118741abf 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -856,8 +856,10 @@ static int refresh_cpu_vm_stats(bool do_pagesets)
>  				continue;
>  			}
>  
> -			if (__this_cpu_dec_return(pcp->expire))
> +			if (__this_cpu_dec_return(pcp->expire)) {
> +				changes++;
>  				continue;
> +			}
>  
>  			if (__this_cpu_read(pcp->count)) {
>  				drain_zone_pages(zone, this_cpu_ptr(pcp));
> -- 
> 2.39.2
  
Huang, Ying Aug. 14, 2023, 1:59 a.m. UTC | #2
Hi, Michal,

Michal Hocko <mhocko@suse.com> writes:

> On Fri 11-08-23 17:08:19, Huang Ying wrote:
>> If there is no memory allocation/freeing in the remote pageset after
>> some time (3 seconds for now), the remote pageset will be drained to
>> avoid memory wastage.
>> 
>> But in the current implementation, vmstat updater worker may not be
>> re-queued when we are waiting for the timeout (pcp->expire != 0) if
>> there are no vmstat changes, for example, when CPU goes idle.
>
> Why is that a problem?

The pages of the remote zone may be kept in the local per-CPU pageset
for long time as long as there's no page allocation/freeing on the
logical CPU.  In addition to the logical CPU goes idle, this is also
possible if the logical CPU is busy in the user space.

I will update the change log to include this.

--
Best Regards,
Huang, Ying

>> This is fixed via guaranteeing that the vmstat updater worker will
>> always be re-queued when we are waiting for the timeout.
>> 
>> We can reproduce the bug via allocating/freeing pages from remote
>> node, then go idle.  And the patch can fix it.
>> 
>> Fixes: 7cc36bbddde5 ("vmstat: on-demand vmstat workers V8")
>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>> Cc: Christoph Lameter <cl@linux.com>
>> Cc: Mel Gorman <mgorman@techsingularity.net>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> ---
>>  mm/vmstat.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>> 
>> diff --git a/mm/vmstat.c b/mm/vmstat.c
>> index b731d57996c5..111118741abf 100644
>> --- a/mm/vmstat.c
>> +++ b/mm/vmstat.c
>> @@ -856,8 +856,10 @@ static int refresh_cpu_vm_stats(bool do_pagesets)
>>  				continue;
>>  			}
>>  
>> -			if (__this_cpu_dec_return(pcp->expire))
>> +			if (__this_cpu_dec_return(pcp->expire)) {
>> +				changes++;
>>  				continue;
>> +			}
>>  
>>  			if (__this_cpu_read(pcp->count)) {
>>  				drain_zone_pages(zone, this_cpu_ptr(pcp));
>> -- 
>> 2.39.2
  

Patch

diff --git a/mm/vmstat.c b/mm/vmstat.c
index b731d57996c5..111118741abf 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -856,8 +856,10 @@  static int refresh_cpu_vm_stats(bool do_pagesets)
 				continue;
 			}
 
-			if (__this_cpu_dec_return(pcp->expire))
+			if (__this_cpu_dec_return(pcp->expire)) {
+				changes++;
 				continue;
+			}
 
 			if (__this_cpu_read(pcp->count)) {
 				drain_zone_pages(zone, this_cpu_ptr(pcp));