[1/2] mm/page_alloc: remove track of active PCP lists range in bulk free

Message ID 20230809100754.3094517-2-shikemeng@huaweicloud.com
State New
Headers
Series Two minor cleanups for pcp list in page_alloc |

Commit Message

Kemeng Shi Aug. 9, 2023, 10:07 a.m. UTC
  After commit fd56eef258a17 ("mm/page_alloc: simplify how many pages are
selected per pcp list during bulk free"), we will drain all pages in
selected pcp list. And we ensured passed count is < pcp->count. Then,
the search will finish before wrap-around and track of active PCP lists
range intended for wrap-around case is no longer needed.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 mm/page_alloc.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)
  

Comments

Chris Li Aug. 15, 2023, 5:45 p.m. UTC | #1
Hi Kemeng,

Can you confirm this patch has no intended functional change?

I have a patch sitting in my tree for a while related to this
count vs pcp->count.  The BPF function hook can potentially change
pcp->count and make count out of sync with pcp->count which causes
a dead loop.

Maybe I can send my out alone side with yours for discussion?
I don't mind my patch combined with yours.

Your change looks fine to me. There is more can be done
on the clean up.

Chris

On Wed, Aug 09, 2023 at 06:07:53PM +0800, Kemeng Shi wrote:
> After commit fd56eef258a17 ("mm/page_alloc: simplify how many pages are
> selected per pcp list during bulk free"), we will drain all pages in
> selected pcp list. And we ensured passed count is < pcp->count. Then,
> the search will finish before wrap-around and track of active PCP lists
> range intended for wrap-around case is no longer needed.

> 
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>  mm/page_alloc.c | 15 +++------------
>  1 file changed, 3 insertions(+), 12 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 96b7c1a7d1f2..1ddcb2707d05 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1207,8 +1207,6 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>  					int pindex)
>  {
>  	unsigned long flags;
> -	int min_pindex = 0;
> -	int max_pindex = NR_PCP_LISTS - 1;
>  	unsigned int order;
>  	bool isolated_pageblocks;
>  	struct page *page;
> @@ -1231,17 +1229,10 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>  
>  		/* Remove pages from lists in a round-robin fashion. */
>  		do {
> -			if (++pindex > max_pindex)
> -				pindex = min_pindex;
> +			if (++pindex > NR_PCP_LISTS - 1)
> +				pindex = 0;
>  			list = &pcp->lists[pindex];
> -			if (!list_empty(list))
> -				break;
> -
> -			if (pindex == max_pindex)
> -				max_pindex--;
> -			if (pindex == min_pindex)
> -				min_pindex++;
> -		} while (1);
> +		} while (list_empty(list));
>  
>  		order = pindex_to_order(pindex);
>  		nr_pages = 1 << order;
> -- 
> 2.30.0
>
  
Kemeng Shi Aug. 17, 2023, 2:22 a.m. UTC | #2
on 8/16/2023 1:45 AM, Chris Li wrote:
> Hi Kemeng,
> 
> Can you confirm this patch has no intended functional change?
> 
Hi Chris, there is no functional change intended in this patch. As
I menthioned in changelog, there is no wrap for list iteration, so
that the active PCP lists range will never be used.
> I have a patch sitting in my tree for a while related to this
> count vs pcp->count.  The BPF function hook can potentially change
> pcp->count and make count out of sync with pcp->count which causes
> a dead loop.
> 
I guess pcp->count is set to bigger than it should be. In this case,
we will keep trying get pages while all pages in pcp list were taken
off already and dead lock will happen. In this case, dead looo will
happen with or without this patch as the root cause is that we try
to get pages more than pcp list owns.> Maybe I can send my out alone side with yours for discussion?
> I don't mind my patch combined with yours.
>
Either way is acceptable to me, just feel free to choose one you like
and I'd like to see if more we could do to this.

> Your change looks fine to me. There is more can be done
> on the clean up.
>
Thanks for feedback, and more clean up is welcome.
> Chris
> 
> On Wed, Aug 09, 2023 at 06:07:53PM +0800, Kemeng Shi wrote:
>> After commit fd56eef258a17 ("mm/page_alloc: simplify how many pages are
>> selected per pcp list during bulk free"), we will drain all pages in
>> selected pcp list. And we ensured passed count is < pcp->count. Then,
>> the search will finish before wrap-around and track of active PCP lists
>> range intended for wrap-around case is no longer needed.
> 
>>
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> ---
>>  mm/page_alloc.c | 15 +++------------
>>  1 file changed, 3 insertions(+), 12 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 96b7c1a7d1f2..1ddcb2707d05 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1207,8 +1207,6 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>>  					int pindex)
>>  {
>>  	unsigned long flags;
>> -	int min_pindex = 0;
>> -	int max_pindex = NR_PCP_LISTS - 1;
>>  	unsigned int order;
>>  	bool isolated_pageblocks;
>>  	struct page *page;
>> @@ -1231,17 +1229,10 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>>  
>>  		/* Remove pages from lists in a round-robin fashion. */
>>  		do {
>> -			if (++pindex > max_pindex)
>> -				pindex = min_pindex;
>> +			if (++pindex > NR_PCP_LISTS - 1)
>> +				pindex = 0;
>>  			list = &pcp->lists[pindex];
>> -			if (!list_empty(list))
>> -				break;
>> -
>> -			if (pindex == max_pindex)
>> -				max_pindex--;
>> -			if (pindex == min_pindex)
>> -				min_pindex++;
>> -		} while (1);
>> +		} while (list_empty(list));
>>  
>>  		order = pindex_to_order(pindex);
>>  		nr_pages = 1 << order;
>> -- 
>> 2.30.0
>>
>
  
Chris Li Aug. 17, 2023, 7:51 a.m. UTC | #3
Hi Kemeng,

On Wed, Aug 16, 2023 at 7:22 PM Kemeng Shi <shikemeng@huaweicloud.com> wrote:
> Hi Chris, there is no functional change intended in this patch. As
> I menthioned in changelog, there is no wrap for list iteration, so
> that the active PCP lists range will never be used.
> > I have a patch sitting in my tree for a while related to this
> > count vs pcp->count.  The BPF function hook can potentially change
> > pcp->count and make count out of sync with pcp->count which causes
> > a dead loop.

In this case the BPF allocates a page inside spin_lock. The "pcp->count" is
smaller than "count". The loop condition only checks "count > 0" but all
pcp->lists pages have been free. pcp->count is 0 but "count" is 1.
After a few times wrap around, the pindex_max is smaller than pindex_min,
then reach to -1 cause the invalid page fault.

> I guess pcp->count is set to bigger than it should be. In this case,
> we will keep trying get pages while all pages in pcp list were taken
> off already and dead lock will happen. In this case, dead looo will
> happen with or without this patch as the root cause is that we try
> to get pages more than pcp list owns.> Maybe I can send my out alone side with yours for discussion?

My patch is split into two parts, the first patch is a functional
change to allow
pcp->count drop below "count".
The other patch is just to clean up, and should have the same function.

Sure will send it out and CC you for discussion.

> > I don't mind my patch combined with yours.
> >
> Either way is acceptable to me, just feel free to choose one you like
> and I'd like to see if more we could do to this.

Thanks

Chris
  

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 96b7c1a7d1f2..1ddcb2707d05 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1207,8 +1207,6 @@  static void free_pcppages_bulk(struct zone *zone, int count,
 					int pindex)
 {
 	unsigned long flags;
-	int min_pindex = 0;
-	int max_pindex = NR_PCP_LISTS - 1;
 	unsigned int order;
 	bool isolated_pageblocks;
 	struct page *page;
@@ -1231,17 +1229,10 @@  static void free_pcppages_bulk(struct zone *zone, int count,
 
 		/* Remove pages from lists in a round-robin fashion. */
 		do {
-			if (++pindex > max_pindex)
-				pindex = min_pindex;
+			if (++pindex > NR_PCP_LISTS - 1)
+				pindex = 0;
 			list = &pcp->lists[pindex];
-			if (!list_empty(list))
-				break;
-
-			if (pindex == max_pindex)
-				max_pindex--;
-			if (pindex == min_pindex)
-				min_pindex++;
-		} while (1);
+		} while (list_empty(list));
 
 		order = pindex_to_order(pindex);
 		nr_pages = 1 << order;