[PATCHv2] mm: skip CMA pages when they are not available

Message ID 1681979577-11360-1-git-send-email-zhaoyang.huang@unisoc.com
State New
Headers
Series [PATCHv2] mm: skip CMA pages when they are not available |

Commit Message

zhaoyang.huang April 20, 2023, 8:32 a.m. UTC
  From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>

This patch fixes unproductive reclaiming of CMA pages by skipping them when they
are not available for current context. It is arise from bellowing OOM issue, which
caused by large proportion of MIGRATE_CMA pages among free pages. There has been
commit(168676649) to fix it by trying CMA pages first instead of fallback in
rmqueue. I would like to propose another one from reclaiming perspective.

04166 < 4> [   36.172486] [03-19 10:05:52.172] ActivityManager: page allocation failure: order:0, mode:0xc00(GFP_NOIO), nodemask=(null),cpuset=foreground,mems_allowed=0
0419C < 4> [   36.189447] [03-19 10:05:52.189] DMA32: 0*4kB 447*8kB (C) 217*16kB (C) 124*32kB (C) 136*64kB (C) 70*128kB (C) 22*256kB (C) 3*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 35848kB
0419D < 4> [   36.193125] [03-19 10:05:52.193] Normal: 231*4kB (UMEH) 49*8kB (MEH) 14*16kB (H) 13*32kB (H) 8*64kB (H) 2*128kB (H) 0*256kB 1*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 3236kB
	......
041EA < 4> [   36.234447] [03-19 10:05:52.234] SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
041EB < 4> [   36.234455] [03-19 10:05:52.234] cache: ext4_io_end, object size: 64, buffer size: 64, default order: 0, min order: 0
041EC < 4> [   36.234459] [03-19 10:05:52.234] node 0: slabs: 53,objs: 3392, free: 0

Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
v2: update commit message and fix build error when CONFIG_CMA is not set
---
---
 mm/vmscan.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)
  

Comments

Zhaoyang Huang April 20, 2023, 8:55 a.m. UTC | #1
This patch could be more helpful when large cma reserved areas reside
in a single zone(etc. 500MB among a 2GB zone), where a lot of CMA
pages spread in LRU allocated by fallback of GFP_MOVABLE allocation.

On Thu, Apr 20, 2023 at 4:33 PM zhaoyang.huang
<zhaoyang.huang@unisoc.com> wrote:
>
> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>
> This patch fixes unproductive reclaiming of CMA pages by skipping them when they
> are not available for current context. It is arise from bellowing OOM issue, which
> caused by large proportion of MIGRATE_CMA pages among free pages. There has been
> commit(168676649) to fix it by trying CMA pages first instead of fallback in
> rmqueue. I would like to propose another one from reclaiming perspective.
>
> 04166 < 4> [   36.172486] [03-19 10:05:52.172] ActivityManager: page allocation failure: order:0, mode:0xc00(GFP_NOIO), nodemask=(null),cpuset=foreground,mems_allowed=0
> 0419C < 4> [   36.189447] [03-19 10:05:52.189] DMA32: 0*4kB 447*8kB (C) 217*16kB (C) 124*32kB (C) 136*64kB (C) 70*128kB (C) 22*256kB (C) 3*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 35848kB
> 0419D < 4> [   36.193125] [03-19 10:05:52.193] Normal: 231*4kB (UMEH) 49*8kB (MEH) 14*16kB (H) 13*32kB (H) 8*64kB (H) 2*128kB (H) 0*256kB 1*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 3236kB
>         ......
> 041EA < 4> [   36.234447] [03-19 10:05:52.234] SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
> 041EB < 4> [   36.234455] [03-19 10:05:52.234] cache: ext4_io_end, object size: 64, buffer size: 64, default order: 0, min order: 0
> 041EC < 4> [   36.234459] [03-19 10:05:52.234] node 0: slabs: 53,objs: 3392, free: 0
>
> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> ---
> v2: update commit message and fix build error when CONFIG_CMA is not set
> ---
> ---
>  mm/vmscan.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index bd6637f..19fb445 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2225,10 +2225,16 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
>         unsigned long nr_skipped[MAX_NR_ZONES] = { 0, };
>         unsigned long skipped = 0;
>         unsigned long scan, total_scan, nr_pages;
> +       bool cma_cap = true;
> +       struct page *page;
>         LIST_HEAD(folios_skipped);
>
>         total_scan = 0;
>         scan = 0;
> +       if ((IS_ENABLED(CONFIG_CMA)) && !current_is_kswapd()
> +               && (gfp_migratetype(sc->gfp_mask) != MIGRATE_MOVABLE))
> +               cma_cap = false;
> +
>         while (scan < nr_to_scan && !list_empty(src)) {
>                 struct list_head *move_to = src;
>                 struct folio *folio;
> @@ -2239,12 +2245,17 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
>                 nr_pages = folio_nr_pages(folio);
>                 total_scan += nr_pages;
>
> -               if (folio_zonenum(folio) > sc->reclaim_idx) {
> +               page = &folio->page;
> +
> +               if ((folio_zonenum(folio) > sc->reclaim_idx)
> +#ifdef CONFIG_CMA
> +                       || (get_pageblock_migratetype(page) == MIGRATE_CMA && !cma_cap)
> +#endif
> +               ) {
>                         nr_skipped[folio_zonenum(folio)] += nr_pages;
>                         move_to = &folios_skipped;
>                         goto move;
>                 }
> -
>                 /*
>                  * Do not count skipped folios because that makes the function
>                  * return with no isolated folios if the LRU mostly contains
> --
> 1.9.1
>
  
kernel test robot April 20, 2023, 9:44 a.m. UTC | #2
Hi zhaoyang.huang,

kernel test robot noticed the following build warnings:

[auto build test WARNING on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/zhaoyang-huang/mm-skip-CMA-pages-when-they-are-not-available/20230420-163443
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/1681979577-11360-1-git-send-email-zhaoyang.huang%40unisoc.com
patch subject: [PATCHv2] mm: skip CMA pages when they are not available
config: powerpc-allnoconfig (https://download.01.org/0day-ci/archive/20230420/202304201725.pa3nMNWa-lkp@intel.com/config)
compiler: powerpc-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/2227e95350088b79da6d6b9e6c95a67474593852
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review zhaoyang-huang/mm-skip-CMA-pages-when-they-are-not-available/20230420-163443
        git checkout 2227e95350088b79da6d6b9e6c95a67474593852
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=powerpc olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=powerpc SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202304201725.pa3nMNWa-lkp@intel.com/

All warnings (new ones prefixed by >>):

   mm/vmscan.c: In function 'isolate_lru_folios':
>> mm/vmscan.c:2295:22: warning: variable 'page' set but not used [-Wunused-but-set-variable]
    2295 |         struct page *page;
         |                      ^~~~
>> mm/vmscan.c:2294:14: warning: variable 'cma_cap' set but not used [-Wunused-but-set-variable]
    2294 |         bool cma_cap = true;
         |              ^~~~~~~


vim +/page +2295 mm/vmscan.c

  2261	
  2262	/*
  2263	 * Isolating page from the lruvec to fill in @dst list by nr_to_scan times.
  2264	 *
  2265	 * lruvec->lru_lock is heavily contended.  Some of the functions that
  2266	 * shrink the lists perform better by taking out a batch of pages
  2267	 * and working on them outside the LRU lock.
  2268	 *
  2269	 * For pagecache intensive workloads, this function is the hottest
  2270	 * spot in the kernel (apart from copy_*_user functions).
  2271	 *
  2272	 * Lru_lock must be held before calling this function.
  2273	 *
  2274	 * @nr_to_scan:	The number of eligible pages to look through on the list.
  2275	 * @lruvec:	The LRU vector to pull pages from.
  2276	 * @dst:	The temp list to put pages on to.
  2277	 * @nr_scanned:	The number of pages that were scanned.
  2278	 * @sc:		The scan_control struct for this reclaim session
  2279	 * @lru:	LRU list id for isolating
  2280	 *
  2281	 * returns how many pages were moved onto *@dst.
  2282	 */
  2283	static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
  2284			struct lruvec *lruvec, struct list_head *dst,
  2285			unsigned long *nr_scanned, struct scan_control *sc,
  2286			enum lru_list lru)
  2287	{
  2288		struct list_head *src = &lruvec->lists[lru];
  2289		unsigned long nr_taken = 0;
  2290		unsigned long nr_zone_taken[MAX_NR_ZONES] = { 0 };
  2291		unsigned long nr_skipped[MAX_NR_ZONES] = { 0, };
  2292		unsigned long skipped = 0;
  2293		unsigned long scan, total_scan, nr_pages;
> 2294		bool cma_cap = true;
> 2295		struct page *page;
  2296		LIST_HEAD(folios_skipped);
  2297	
  2298		total_scan = 0;
  2299		scan = 0;
  2300		if ((IS_ENABLED(CONFIG_CMA)) && !current_is_kswapd()
  2301			&& (gfp_migratetype(sc->gfp_mask) != MIGRATE_MOVABLE))
  2302			cma_cap = false;
  2303	
  2304		while (scan < nr_to_scan && !list_empty(src)) {
  2305			struct list_head *move_to = src;
  2306			struct folio *folio;
  2307	
  2308			folio = lru_to_folio(src);
  2309			prefetchw_prev_lru_folio(folio, src, flags);
  2310	
  2311			nr_pages = folio_nr_pages(folio);
  2312			total_scan += nr_pages;
  2313	
  2314			page = &folio->page;
  2315	
  2316			if ((folio_zonenum(folio) > sc->reclaim_idx)
  2317	#ifdef CONFIG_CMA
  2318				|| (get_pageblock_migratetype(page) == MIGRATE_CMA && !cma_cap)
  2319	#endif
  2320			) {
  2321				nr_skipped[folio_zonenum(folio)] += nr_pages;
  2322				move_to = &folios_skipped;
  2323				goto move;
  2324			}
  2325			/*
  2326			 * Do not count skipped folios because that makes the function
  2327			 * return with no isolated folios if the LRU mostly contains
  2328			 * ineligible folios.  This causes the VM to not reclaim any
  2329			 * folios, triggering a premature OOM.
  2330			 * Account all pages in a folio.
  2331			 */
  2332			scan += nr_pages;
  2333	
  2334			if (!folio_test_lru(folio))
  2335				goto move;
  2336			if (!sc->may_unmap && folio_mapped(folio))
  2337				goto move;
  2338	
  2339			/*
  2340			 * Be careful not to clear the lru flag until after we're
  2341			 * sure the folio is not being freed elsewhere -- the
  2342			 * folio release code relies on it.
  2343			 */
  2344			if (unlikely(!folio_try_get(folio)))
  2345				goto move;
  2346	
  2347			if (!folio_test_clear_lru(folio)) {
  2348				/* Another thread is already isolating this folio */
  2349				folio_put(folio);
  2350				goto move;
  2351			}
  2352	
  2353			nr_taken += nr_pages;
  2354			nr_zone_taken[folio_zonenum(folio)] += nr_pages;
  2355			move_to = dst;
  2356	move:
  2357			list_move(&folio->lru, move_to);
  2358		}
  2359	
  2360		/*
  2361		 * Splice any skipped folios to the start of the LRU list. Note that
  2362		 * this disrupts the LRU order when reclaiming for lower zones but
  2363		 * we cannot splice to the tail. If we did then the SWAP_CLUSTER_MAX
  2364		 * scanning would soon rescan the same folios to skip and waste lots
  2365		 * of cpu cycles.
  2366		 */
  2367		if (!list_empty(&folios_skipped)) {
  2368			int zid;
  2369	
  2370			list_splice(&folios_skipped, src);
  2371			for (zid = 0; zid < MAX_NR_ZONES; zid++) {
  2372				if (!nr_skipped[zid])
  2373					continue;
  2374	
  2375				__count_zid_vm_events(PGSCAN_SKIP, zid, nr_skipped[zid]);
  2376				skipped += nr_skipped[zid];
  2377			}
  2378		}
  2379		*nr_scanned = total_scan;
  2380		trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, nr_to_scan,
  2381					    total_scan, skipped, nr_taken,
  2382					    sc->may_unmap ? 0 : ISOLATE_UNMAPPED, lru);
  2383		update_lru_sizes(lruvec, lru, nr_zone_taken);
  2384		return nr_taken;
  2385	}
  2386
  
Huang, Ying April 21, 2023, 6:45 a.m. UTC | #3
"zhaoyang.huang" <zhaoyang.huang@unisoc.com> writes:

> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>
> This patch fixes unproductive reclaiming of CMA pages by skipping them when they
> are not available for current context. It is arise from bellowing OOM issue, which
> caused by large proportion of MIGRATE_CMA pages among free pages. There has been
> commit(168676649) to fix it by trying CMA pages first instead of fallback in
> rmqueue. I would like to propose another one from reclaiming perspective.
>
> 04166 < 4> [   36.172486] [03-19 10:05:52.172] ActivityManager: page allocation failure: order:0, mode:0xc00(GFP_NOIO), nodemask=(null),cpuset=foreground,mems_allowed=0
> 0419C < 4> [   36.189447] [03-19 10:05:52.189] DMA32: 0*4kB 447*8kB (C) 217*16kB (C) 124*32kB (C) 136*64kB (C) 70*128kB (C) 22*256kB (C) 3*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 35848kB
> 0419D < 4> [   36.193125] [03-19 10:05:52.193] Normal: 231*4kB (UMEH) 49*8kB (MEH) 14*16kB (H) 13*32kB (H) 8*64kB (H) 2*128kB (H) 0*256kB 1*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 3236kB
> 	......
> 041EA < 4> [   36.234447] [03-19 10:05:52.234] SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
> 041EB < 4> [   36.234455] [03-19 10:05:52.234] cache: ext4_io_end, object size: 64, buffer size: 64, default order: 0, min order: 0
> 041EC < 4> [   36.234459] [03-19 10:05:52.234] node 0: slabs: 53,objs: 3392, free: 0

From the above description, you are trying to resolve an issue that has
been resolved already.  If so, why do we need your patch?  What is the
issue it try to resolve in current upstream kernel?

At the first glance, I don't think your patch doesn't make sense.  But
you really need to show the value of the patch.

Best Regards,
Huang, Ying

> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> ---
> v2: update commit message and fix build error when CONFIG_CMA is not set
> ---
> ---
>  mm/vmscan.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index bd6637f..19fb445 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2225,10 +2225,16 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
>  	unsigned long nr_skipped[MAX_NR_ZONES] = { 0, };
>  	unsigned long skipped = 0;
>  	unsigned long scan, total_scan, nr_pages;
> +	bool cma_cap = true;
> +	struct page *page;
>  	LIST_HEAD(folios_skipped);
>  
>  	total_scan = 0;
>  	scan = 0;
> +	if ((IS_ENABLED(CONFIG_CMA)) && !current_is_kswapd()
> +		&& (gfp_migratetype(sc->gfp_mask) != MIGRATE_MOVABLE))
> +		cma_cap = false;
> +
>  	while (scan < nr_to_scan && !list_empty(src)) {
>  		struct list_head *move_to = src;
>  		struct folio *folio;
> @@ -2239,12 +2245,17 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
>  		nr_pages = folio_nr_pages(folio);
>  		total_scan += nr_pages;
>  
> -		if (folio_zonenum(folio) > sc->reclaim_idx) {
> +		page = &folio->page;
> +
> +		if ((folio_zonenum(folio) > sc->reclaim_idx)
> +#ifdef CONFIG_CMA
> +			|| (get_pageblock_migratetype(page) == MIGRATE_CMA && !cma_cap)
> +#endif
> +		) {
>  			nr_skipped[folio_zonenum(folio)] += nr_pages;
>  			move_to = &folios_skipped;
>  			goto move;
>  		}
> -
>  		/*
>  		 * Do not count skipped folios because that makes the function
>  		 * return with no isolated folios if the LRU mostly contains
  
Zhaoyang Huang April 21, 2023, 9 a.m. UTC | #4
On Fri, Apr 21, 2023 at 2:47 PM Huang, Ying <ying.huang@intel.com> wrote:
>
> "zhaoyang.huang" <zhaoyang.huang@unisoc.com> writes:
>
> > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> >
> > This patch fixes unproductive reclaiming of CMA pages by skipping them when they
> > are not available for current context. It is arise from bellowing OOM issue, which
> > caused by large proportion of MIGRATE_CMA pages among free pages. There has been
> > commit(168676649) to fix it by trying CMA pages first instead of fallback in
> > rmqueue. I would like to propose another one from reclaiming perspective.
> >
> > 04166 < 4> [   36.172486] [03-19 10:05:52.172] ActivityManager: page allocation failure: order:0, mode:0xc00(GFP_NOIO), nodemask=(null),cpuset=foreground,mems_allowed=0
> > 0419C < 4> [   36.189447] [03-19 10:05:52.189] DMA32: 0*4kB 447*8kB (C) 217*16kB (C) 124*32kB (C) 136*64kB (C) 70*128kB (C) 22*256kB (C) 3*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 35848kB
> > 0419D < 4> [   36.193125] [03-19 10:05:52.193] Normal: 231*4kB (UMEH) 49*8kB (MEH) 14*16kB (H) 13*32kB (H) 8*64kB (H) 2*128kB (H) 0*256kB 1*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 3236kB
> >       ......
> > 041EA < 4> [   36.234447] [03-19 10:05:52.234] SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
> > 041EB < 4> [   36.234455] [03-19 10:05:52.234] cache: ext4_io_end, object size: 64, buffer size: 64, default order: 0, min order: 0
> > 041EC < 4> [   36.234459] [03-19 10:05:52.234] node 0: slabs: 53,objs: 3392, free: 0
>
> From the above description, you are trying to resolve an issue that has
> been resolved already.  If so, why do we need your patch?  What is the
> issue it try to resolve in current upstream kernel?

Please consider this bellowing sequence as __perform_reclaim() return
with reclaiming 32 CMA pages successfully and then lead to
get_page_from_freelist failure if MIGRATE_CMA is NOT over 1/2 number
of free pages which will then unreserve H pageblocks and drain percpu
pageset. right? Furthermore, this could also introduce OOM as
direct_reclaim is the final guard for alloc_pages.

*did_some_progress = __perform_reclaim(gfp_mask, order, ac);

retry:
page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);

if (!page && !drained) {
unreserve_highatomic_pageblock(ac, false);
drain_all_pages(NULL);
drained = true;
goto retry;
}

return page;
>
> At the first glance, I don't think your patch doesn't make sense.  But
> you really need to show the value of the patch.
>
> Best Regards,
> Huang, Ying
>
> > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > ---
> > v2: update commit message and fix build error when CONFIG_CMA is not set
> > ---
> > ---
> >  mm/vmscan.c | 15 +++++++++++++--
> >  1 file changed, 13 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index bd6637f..19fb445 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2225,10 +2225,16 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
> >       unsigned long nr_skipped[MAX_NR_ZONES] = { 0, };
> >       unsigned long skipped = 0;
> >       unsigned long scan, total_scan, nr_pages;
> > +     bool cma_cap = true;
> > +     struct page *page;
> >       LIST_HEAD(folios_skipped);
> >
> >       total_scan = 0;
> >       scan = 0;
> > +     if ((IS_ENABLED(CONFIG_CMA)) && !current_is_kswapd()
> > +             && (gfp_migratetype(sc->gfp_mask) != MIGRATE_MOVABLE))
> > +             cma_cap = false;
> > +
> >       while (scan < nr_to_scan && !list_empty(src)) {
> >               struct list_head *move_to = src;
> >               struct folio *folio;
> > @@ -2239,12 +2245,17 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
> >               nr_pages = folio_nr_pages(folio);
> >               total_scan += nr_pages;
> >
> > -             if (folio_zonenum(folio) > sc->reclaim_idx) {
> > +             page = &folio->page;
> > +
> > +             if ((folio_zonenum(folio) > sc->reclaim_idx)
> > +#ifdef CONFIG_CMA
> > +                     || (get_pageblock_migratetype(page) == MIGRATE_CMA && !cma_cap)
> > +#endif
> > +             ) {
> >                       nr_skipped[folio_zonenum(folio)] += nr_pages;
> >                       move_to = &folios_skipped;
> >                       goto move;
> >               }
> > -
> >               /*
> >                * Do not count skipped folios because that makes the function
> >                * return with no isolated folios if the LRU mostly contains
  
Huang, Ying April 21, 2023, 9:02 a.m. UTC | #5
Zhaoyang Huang <huangzhaoyang@gmail.com> writes:

> On Fri, Apr 21, 2023 at 2:47 PM Huang, Ying <ying.huang@intel.com> wrote:
>>
>> "zhaoyang.huang" <zhaoyang.huang@unisoc.com> writes:
>>
>> > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>> >
>> > This patch fixes unproductive reclaiming of CMA pages by skipping them when they
>> > are not available for current context. It is arise from bellowing OOM issue, which
>> > caused by large proportion of MIGRATE_CMA pages among free pages. There has been
>> > commit(168676649) to fix it by trying CMA pages first instead of fallback in
>> > rmqueue. I would like to propose another one from reclaiming perspective.
>> >
>> > 04166 < 4> [   36.172486] [03-19 10:05:52.172] ActivityManager: page allocation failure: order:0, mode:0xc00(GFP_NOIO), nodemask=(null),cpuset=foreground,mems_allowed=0
>> > 0419C < 4> [   36.189447] [03-19 10:05:52.189] DMA32: 0*4kB 447*8kB (C) 217*16kB (C) 124*32kB (C) 136*64kB (C) 70*128kB (C) 22*256kB (C) 3*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 35848kB
>> > 0419D < 4> [   36.193125] [03-19 10:05:52.193] Normal: 231*4kB (UMEH) 49*8kB (MEH) 14*16kB (H) 13*32kB (H) 8*64kB (H) 2*128kB (H) 0*256kB 1*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 3236kB
>> >       ......
>> > 041EA < 4> [   36.234447] [03-19 10:05:52.234] SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
>> > 041EB < 4> [   36.234455] [03-19 10:05:52.234] cache: ext4_io_end, object size: 64, buffer size: 64, default order: 0, min order: 0
>> > 041EC < 4> [   36.234459] [03-19 10:05:52.234] node 0: slabs: 53,objs: 3392, free: 0
>>
>> From the above description, you are trying to resolve an issue that has
>> been resolved already.  If so, why do we need your patch?  What is the
>> issue it try to resolve in current upstream kernel?
>
> Please consider this bellowing sequence as __perform_reclaim() return
> with reclaiming 32 CMA pages successfully and then lead to
> get_page_from_freelist failure if MIGRATE_CMA is NOT over 1/2 number
> of free pages which will then unreserve H pageblocks and drain percpu
> pageset. right? Furthermore, this could also introduce OOM as
> direct_reclaim is the final guard for alloc_pages.
>
> *did_some_progress = __perform_reclaim(gfp_mask, order, ac);
>
> retry:
> page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
>
> if (!page && !drained) {
> unreserve_highatomic_pageblock(ac, false);
> drain_all_pages(NULL);
> drained = true;
> goto retry;
> }

If you think OOM can be triggered, please try to reproduce it.

Best Regards,
Huang, Ying

> return page;
>>
>> At the first glance, I don't think your patch doesn't make sense.  But
>> you really need to show the value of the patch.
>>
>> Best Regards,
>> Huang, Ying
>>
>> > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>> > ---
>> > v2: update commit message and fix build error when CONFIG_CMA is not set
>> > ---
>> > ---
>> >  mm/vmscan.c | 15 +++++++++++++--
>> >  1 file changed, 13 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/mm/vmscan.c b/mm/vmscan.c
>> > index bd6637f..19fb445 100644
>> > --- a/mm/vmscan.c
>> > +++ b/mm/vmscan.c
>> > @@ -2225,10 +2225,16 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
>> >       unsigned long nr_skipped[MAX_NR_ZONES] = { 0, };
>> >       unsigned long skipped = 0;
>> >       unsigned long scan, total_scan, nr_pages;
>> > +     bool cma_cap = true;
>> > +     struct page *page;
>> >       LIST_HEAD(folios_skipped);
>> >
>> >       total_scan = 0;
>> >       scan = 0;
>> > +     if ((IS_ENABLED(CONFIG_CMA)) && !current_is_kswapd()
>> > +             && (gfp_migratetype(sc->gfp_mask) != MIGRATE_MOVABLE))
>> > +             cma_cap = false;
>> > +
>> >       while (scan < nr_to_scan && !list_empty(src)) {
>> >               struct list_head *move_to = src;
>> >               struct folio *folio;
>> > @@ -2239,12 +2245,17 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
>> >               nr_pages = folio_nr_pages(folio);
>> >               total_scan += nr_pages;
>> >
>> > -             if (folio_zonenum(folio) > sc->reclaim_idx) {
>> > +             page = &folio->page;
>> > +
>> > +             if ((folio_zonenum(folio) > sc->reclaim_idx)
>> > +#ifdef CONFIG_CMA
>> > +                     || (get_pageblock_migratetype(page) == MIGRATE_CMA && !cma_cap)
>> > +#endif
>> > +             ) {
>> >                       nr_skipped[folio_zonenum(folio)] += nr_pages;
>> >                       move_to = &folios_skipped;
>> >                       goto move;
>> >               }
>> > -
>> >               /*
>> >                * Do not count skipped folios because that makes the function
>> >                * return with no isolated folios if the LRU mostly contains
  

Patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index bd6637f..19fb445 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2225,10 +2225,16 @@  static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
 	unsigned long nr_skipped[MAX_NR_ZONES] = { 0, };
 	unsigned long skipped = 0;
 	unsigned long scan, total_scan, nr_pages;
+	bool cma_cap = true;
+	struct page *page;
 	LIST_HEAD(folios_skipped);
 
 	total_scan = 0;
 	scan = 0;
+	if ((IS_ENABLED(CONFIG_CMA)) && !current_is_kswapd()
+		&& (gfp_migratetype(sc->gfp_mask) != MIGRATE_MOVABLE))
+		cma_cap = false;
+
 	while (scan < nr_to_scan && !list_empty(src)) {
 		struct list_head *move_to = src;
 		struct folio *folio;
@@ -2239,12 +2245,17 @@  static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
 		nr_pages = folio_nr_pages(folio);
 		total_scan += nr_pages;
 
-		if (folio_zonenum(folio) > sc->reclaim_idx) {
+		page = &folio->page;
+
+		if ((folio_zonenum(folio) > sc->reclaim_idx)
+#ifdef CONFIG_CMA
+			|| (get_pageblock_migratetype(page) == MIGRATE_CMA && !cma_cap)
+#endif
+		) {
 			nr_skipped[folio_zonenum(folio)] += nr_pages;
 			move_to = &folios_skipped;
 			goto move;
 		}
-
 		/*
 		 * Do not count skipped folios because that makes the function
 		 * return with no isolated folios if the LRU mostly contains