Message ID | 20231016053002.756205-9-ying.huang@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2908:b0:403:3b70:6f57 with SMTP id ib8csp3251399vqb; Sun, 15 Oct 2023 22:32:55 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGiSZHnWVOIOCA3lDhB94DWYdddbJt0jmC8fgeACwj1sPB7fEqfCk9+HyqtEOsxOIg+g6d6 X-Received: by 2002:a17:902:ef04:b0:1ca:85b4:b962 with SMTP id d4-20020a170902ef0400b001ca85b4b962mr868404plx.4.1697434374944; Sun, 15 Oct 2023 22:32:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697434374; cv=none; d=google.com; s=arc-20160816; b=vGMNKyhQI56rIVHkz3AgeD/E6HovYZygf4snwJt2py0rkTdrujmVnNybUUXZp0lJwe R5mYnk4hhEn4lv1uWwTrttYz/eA38GepWWlQb403bDDHfv1Oc5au0gxabNksiGiKgiIP 51UHeW3YkuoFcLjHZMiAqqArI2Qw0VKKkZeIY88Scsudw3KnVmtH2HMstQlYXH8qQZei eTIDHs1dsB12A4nHEAMgn41kLRJgJbc2brlLP9x3DO57cL1uDzWziHL3k+ELvKSnpvhU r7NulJNFx45RJR41IIt4bcEF4kYOgxv7tbwGh0IkKFXNG7sVYJX9nNe09jIlG2tdGfEu U3cA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=EsUyHh8Yv1F4HMKZPnyBlZFhM2fGhvzjiN4YX7RaOzY=; fh=rOqdWm0xLtwhY96CBVlHZJCtqAZkONVUDvFazfYuxhM=; b=SAO56ONdhPae4RChnYSLphCDMOdga+FtCXKsKC3f6x1GRhLJ5weAEpT6mEXm9FfYEg 7+sS6UZEyuPMFcFr3H03op1B/YwpTtz0Hg3q4hRt0sJNuqE7rSUL8fg3vTEM8Gmerz+o QmoGiGr2zU0RJgm46VBEYJpjGcJ+Cg52D3hsTKhvRTQT+iwzkP8FLUoE386jdlaR2XsL 0003Ls9MwcjHVenUXKQOUfeBzpYRRgfX5m6ePUzBT7Fgt/iXq/T4T+oxNZJdn91uRKhd Qx02mzquWLtH0Oa2KJ9yzvX286JZaT9WENVXqQhy+6+vlNPD7YN5lKNEg9LzZtjLtvr5 8meQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=aNwA0sNO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id h14-20020a170902680e00b001c9ca0a03dcsi9287464plk.86.2023.10.15.22.32.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 15 Oct 2023 22:32:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=aNwA0sNO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 703208080C76; Sun, 15 Oct 2023 22:31:55 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231511AbjJPFbc (ORCPT <rfc822;hjfbswb@gmail.com> + 18 others); Mon, 16 Oct 2023 01:31:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49620 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231925AbjJPFbF (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 16 Oct 2023 01:31:05 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5767D1A3 for <linux-kernel@vger.kernel.org>; Sun, 15 Oct 2023 22:30:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697434249; x=1728970249; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=oayBWHmLvZPHkilBpaQJzT0ghr/vVONoAKNZEgSFLCA=; b=aNwA0sNOFRBXky4E/8jKhyFIQyuK3bIzNwKHV6tvtr/BucBLexbY+Azk YeErJoSG87Ir9TqdE1CF8nkhTBc16Ugn8MQhOXz61TmYm6gd49hQnun2C OygZO/ejcP6O2ZNH4fZ07eE1qre5fFMG2B8KteDoRBbz7FghPkWZt3sy2 yC4ZtEsYR2XHjQjSwuAIQXWQiS7vbS4CuJzoF1JwdvTX7z8Qlgsk+nmp9 X9MtGHSsIZk+SEOHWV2jDKh9jTQMl/vb5UPmmdjqaVtzcXfpI78MVIDct o32XXQCn1fVIrY5HdpTyblYwvDIxmQxI8peU0ikyFz6s2naiVL+CDpLhn g==; X-IronPort-AV: E=McAfee;i="6600,9927,10863"; a="389308119" X-IronPort-AV: E=Sophos;i="6.03,228,1694761200"; d="scan'208";a="389308119" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Oct 2023 22:30:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10863"; a="899356750" X-IronPort-AV: E=Sophos;i="6.03,228,1694761200"; d="scan'208";a="899356750" Received: from yhuang6-mobl2.sh.intel.com ([10.238.6.133]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Oct 2023 22:28:47 -0700 From: Huang Ying <ying.huang@intel.com> To: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Arjan Van De Ven <arjan@linux.intel.com>, Huang Ying <ying.huang@intel.com>, Mel Gorman <mgorman@techsingularity.net>, Vlastimil Babka <vbabka@suse.cz>, David Hildenbrand <david@redhat.com>, Johannes Weiner <jweiner@redhat.com>, Dave Hansen <dave.hansen@linux.intel.com>, Michal Hocko <mhocko@suse.com>, Pavel Tatashin <pasha.tatashin@soleen.com>, Matthew Wilcox <willy@infradead.org>, Christoph Lameter <cl@linux.com> Subject: [PATCH -V3 8/9] mm, pcp: decrease PCP high if free pages < high watermark Date: Mon, 16 Oct 2023 13:30:01 +0800 Message-Id: <20231016053002.756205-9-ying.huang@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231016053002.756205-1-ying.huang@intel.com> References: <20231016053002.756205-1-ying.huang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Sun, 15 Oct 2023 22:31:55 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1779888946988130694 X-GMAIL-MSGID: 1779888946988130694 |
Series |
mm: PCP high auto-tuning
|
|
Commit Message
Huang, Ying
Oct. 16, 2023, 5:30 a.m. UTC
One target of PCP is to minimize pages in PCP if the system free pages
is too few. To reach that target, when page reclaiming is active for
the zone (ZONE_RECLAIM_ACTIVE), we will stop increasing PCP high in
allocating path, decrease PCP high and free some pages in freeing
path. But this may be too late because the background page reclaiming
may introduce latency for some workloads. So, in this patch, during
page allocation we will detect whether the number of free pages of the
zone is below high watermark. If so, we will stop increasing PCP high
in allocating path, decrease PCP high and free some pages in freeing
path. With this, we can reduce the possibility of the premature
background page reclaiming caused by too large PCP.
The high watermark checking is done in allocating path to reduce the
overhead in hotter freeing path.
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Lameter <cl@linux.com>
---
include/linux/mmzone.h | 1 +
mm/page_alloc.c | 33 +++++++++++++++++++++++++++++++--
2 files changed, 32 insertions(+), 2 deletions(-)
Comments
On Mon, Oct 16, 2023 at 01:30:01PM +0800, Huang Ying wrote: > One target of PCP is to minimize pages in PCP if the system free pages > is too few. To reach that target, when page reclaiming is active for > the zone (ZONE_RECLAIM_ACTIVE), we will stop increasing PCP high in > allocating path, decrease PCP high and free some pages in freeing > path. But this may be too late because the background page reclaiming > may introduce latency for some workloads. So, in this patch, during > page allocation we will detect whether the number of free pages of the > zone is below high watermark. If so, we will stop increasing PCP high > in allocating path, decrease PCP high and free some pages in freeing > path. With this, we can reduce the possibility of the premature > background page reclaiming caused by too large PCP. > > The high watermark checking is done in allocating path to reduce the > overhead in hotter freeing path. > > Signed-off-by: "Huang, Ying" <ying.huang@intel.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Mel Gorman <mgorman@techsingularity.net> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: David Hildenbrand <david@redhat.com> > Cc: Johannes Weiner <jweiner@redhat.com> > Cc: Dave Hansen <dave.hansen@linux.intel.com> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Pavel Tatashin <pasha.tatashin@soleen.com> > Cc: Matthew Wilcox <willy@infradead.org> > Cc: Christoph Lameter <cl@linux.com> > --- > include/linux/mmzone.h | 1 + > mm/page_alloc.c | 33 +++++++++++++++++++++++++++++++-- > 2 files changed, 32 insertions(+), 2 deletions(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index ec3f7daedcc7..c88770381aaf 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -1018,6 +1018,7 @@ enum zone_flags { > * Cleared when kswapd is woken. > */ > ZONE_RECLAIM_ACTIVE, /* kswapd may be scanning the zone. */ > + ZONE_BELOW_HIGH, /* zone is below high watermark. */ > }; > > static inline unsigned long zone_managed_pages(struct zone *zone) > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 8382ad2cdfd4..253fc7d0498e 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2407,7 +2407,13 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, > return min(batch << 2, pcp->high); > } > > - if (pcp->count >= high && high_min != high_max) { > + if (high_min == high_max) > + return high; > + > + if (test_bit(ZONE_BELOW_HIGH, &zone->flags)) { > + pcp->high = max(high - (batch << pcp->free_factor), high_min); > + high = max(pcp->count, high_min); > + } else if (pcp->count >= high) { > int need_high = (batch << pcp->free_factor) + batch; > > /* pcp->high should be large enough to hold batch freed pages */ > @@ -2457,6 +2463,10 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, > if (pcp->count >= high) { > free_pcppages_bulk(zone, nr_pcp_free(pcp, batch, high, free_high), > pcp, pindex); > + if (test_bit(ZONE_BELOW_HIGH, &zone->flags) && > + zone_watermark_ok(zone, 0, high_wmark_pages(zone), > + ZONE_MOVABLE, 0)) > + clear_bit(ZONE_BELOW_HIGH, &zone->flags); > } > } > This is a relatively fast path and freeing pages should not need to check watermarks. While the overhead is mitigated because it applies only when the watermark is below high, that is also potentially an unbounded condition if a workload is sized precisely enough. Why not clear this bit when kswapd is going to sleep after reclaiming enough pages in a zone? If you agree then a follow-up patch classed as a micro-optimisation is sufficient to avoid redoing all the results again. For most of your tests, it should be performance-neutral or borderline noise.
Mel Gorman <mgorman@techsingularity.net> writes: > On Mon, Oct 16, 2023 at 01:30:01PM +0800, Huang Ying wrote: >> One target of PCP is to minimize pages in PCP if the system free pages >> is too few. To reach that target, when page reclaiming is active for >> the zone (ZONE_RECLAIM_ACTIVE), we will stop increasing PCP high in >> allocating path, decrease PCP high and free some pages in freeing >> path. But this may be too late because the background page reclaiming >> may introduce latency for some workloads. So, in this patch, during >> page allocation we will detect whether the number of free pages of the >> zone is below high watermark. If so, we will stop increasing PCP high >> in allocating path, decrease PCP high and free some pages in freeing >> path. With this, we can reduce the possibility of the premature >> background page reclaiming caused by too large PCP. >> >> The high watermark checking is done in allocating path to reduce the >> overhead in hotter freeing path. >> >> Signed-off-by: "Huang, Ying" <ying.huang@intel.com> >> Cc: Andrew Morton <akpm@linux-foundation.org> >> Cc: Mel Gorman <mgorman@techsingularity.net> >> Cc: Vlastimil Babka <vbabka@suse.cz> >> Cc: David Hildenbrand <david@redhat.com> >> Cc: Johannes Weiner <jweiner@redhat.com> >> Cc: Dave Hansen <dave.hansen@linux.intel.com> >> Cc: Michal Hocko <mhocko@suse.com> >> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> >> Cc: Matthew Wilcox <willy@infradead.org> >> Cc: Christoph Lameter <cl@linux.com> >> --- >> include/linux/mmzone.h | 1 + >> mm/page_alloc.c | 33 +++++++++++++++++++++++++++++++-- >> 2 files changed, 32 insertions(+), 2 deletions(-) >> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >> index ec3f7daedcc7..c88770381aaf 100644 >> --- a/include/linux/mmzone.h >> +++ b/include/linux/mmzone.h >> @@ -1018,6 +1018,7 @@ enum zone_flags { >> * Cleared when kswapd is woken. >> */ >> ZONE_RECLAIM_ACTIVE, /* kswapd may be scanning the zone. */ >> + ZONE_BELOW_HIGH, /* zone is below high watermark. */ >> }; >> >> static inline unsigned long zone_managed_pages(struct zone *zone) >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 8382ad2cdfd4..253fc7d0498e 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -2407,7 +2407,13 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, >> return min(batch << 2, pcp->high); >> } >> >> - if (pcp->count >= high && high_min != high_max) { >> + if (high_min == high_max) >> + return high; >> + >> + if (test_bit(ZONE_BELOW_HIGH, &zone->flags)) { >> + pcp->high = max(high - (batch << pcp->free_factor), high_min); >> + high = max(pcp->count, high_min); >> + } else if (pcp->count >= high) { >> int need_high = (batch << pcp->free_factor) + batch; >> >> /* pcp->high should be large enough to hold batch freed pages */ >> @@ -2457,6 +2463,10 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, >> if (pcp->count >= high) { >> free_pcppages_bulk(zone, nr_pcp_free(pcp, batch, high, free_high), >> pcp, pindex); >> + if (test_bit(ZONE_BELOW_HIGH, &zone->flags) && >> + zone_watermark_ok(zone, 0, high_wmark_pages(zone), >> + ZONE_MOVABLE, 0)) >> + clear_bit(ZONE_BELOW_HIGH, &zone->flags); >> } >> } >> > > This is a relatively fast path and freeing pages should not need to check > watermarks. Another stuff that mitigate the overhead is that the watermarks checking only occurs when we free pages from PCP to buddy. That is, in most cases, every 63 page freeing. > While the overhead is mitigated because it applies only when > the watermark is below high, that is also potentially an unbounded condition > if a workload is sized precisely enough. Why not clear this bit when kswapd > is going to sleep after reclaiming enough pages in a zone? IIUC, if the number of free pages is kept larger than the low watermark, then kswapd will have no opportunity to be waken up even if the number of free pages was ever smaller than the high watermark. > If you agree then a follow-up patch classed as a micro-optimisation is > sufficient to avoid redoing all the results again. For most of your > tests, it should be performance-neutral or borderline noise. -- Best Regards, Huang, Ying
On Fri, Oct 20, 2023 at 11:30:53AM +0800, Huang, Ying wrote: > Mel Gorman <mgorman@techsingularity.net> writes: > > > On Mon, Oct 16, 2023 at 01:30:01PM +0800, Huang Ying wrote: > >> One target of PCP is to minimize pages in PCP if the system free pages > >> is too few. To reach that target, when page reclaiming is active for > >> the zone (ZONE_RECLAIM_ACTIVE), we will stop increasing PCP high in > >> allocating path, decrease PCP high and free some pages in freeing > >> path. But this may be too late because the background page reclaiming > >> may introduce latency for some workloads. So, in this patch, during > >> page allocation we will detect whether the number of free pages of the > >> zone is below high watermark. If so, we will stop increasing PCP high > >> in allocating path, decrease PCP high and free some pages in freeing > >> path. With this, we can reduce the possibility of the premature > >> background page reclaiming caused by too large PCP. > >> > >> The high watermark checking is done in allocating path to reduce the > >> overhead in hotter freeing path. > >> > >> Signed-off-by: "Huang, Ying" <ying.huang@intel.com> > >> Cc: Andrew Morton <akpm@linux-foundation.org> > >> Cc: Mel Gorman <mgorman@techsingularity.net> > >> Cc: Vlastimil Babka <vbabka@suse.cz> > >> Cc: David Hildenbrand <david@redhat.com> > >> Cc: Johannes Weiner <jweiner@redhat.com> > >> Cc: Dave Hansen <dave.hansen@linux.intel.com> > >> Cc: Michal Hocko <mhocko@suse.com> > >> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> > >> Cc: Matthew Wilcox <willy@infradead.org> > >> Cc: Christoph Lameter <cl@linux.com> > >> --- > >> include/linux/mmzone.h | 1 + > >> mm/page_alloc.c | 33 +++++++++++++++++++++++++++++++-- > >> 2 files changed, 32 insertions(+), 2 deletions(-) > >> > >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > >> index ec3f7daedcc7..c88770381aaf 100644 > >> --- a/include/linux/mmzone.h > >> +++ b/include/linux/mmzone.h > >> @@ -1018,6 +1018,7 @@ enum zone_flags { > >> * Cleared when kswapd is woken. > >> */ > >> ZONE_RECLAIM_ACTIVE, /* kswapd may be scanning the zone. */ > >> + ZONE_BELOW_HIGH, /* zone is below high watermark. */ > >> }; > >> > >> static inline unsigned long zone_managed_pages(struct zone *zone) > >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >> index 8382ad2cdfd4..253fc7d0498e 100644 > >> --- a/mm/page_alloc.c > >> +++ b/mm/page_alloc.c > >> @@ -2407,7 +2407,13 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, > >> return min(batch << 2, pcp->high); > >> } > >> > >> - if (pcp->count >= high && high_min != high_max) { > >> + if (high_min == high_max) > >> + return high; > >> + > >> + if (test_bit(ZONE_BELOW_HIGH, &zone->flags)) { > >> + pcp->high = max(high - (batch << pcp->free_factor), high_min); > >> + high = max(pcp->count, high_min); > >> + } else if (pcp->count >= high) { > >> int need_high = (batch << pcp->free_factor) + batch; > >> > >> /* pcp->high should be large enough to hold batch freed pages */ > >> @@ -2457,6 +2463,10 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, > >> if (pcp->count >= high) { > >> free_pcppages_bulk(zone, nr_pcp_free(pcp, batch, high, free_high), > >> pcp, pindex); > >> + if (test_bit(ZONE_BELOW_HIGH, &zone->flags) && > >> + zone_watermark_ok(zone, 0, high_wmark_pages(zone), > >> + ZONE_MOVABLE, 0)) > >> + clear_bit(ZONE_BELOW_HIGH, &zone->flags); > >> } > >> } > >> > > > > This is a relatively fast path and freeing pages should not need to check > > watermarks. > > Another stuff that mitigate the overhead is that the watermarks checking > only occurs when we free pages from PCP to buddy. That is, in most > cases, every 63 page freeing. > True > > While the overhead is mitigated because it applies only when > > the watermark is below high, that is also potentially an unbounded condition > > if a workload is sized precisely enough. Why not clear this bit when kswapd > > is going to sleep after reclaiming enough pages in a zone? > > IIUC, if the number of free pages is kept larger than the low watermark, > then kswapd will have no opportunity to be waken up even if the number > of free pages was ever smaller than the high watermark. > Also true and I did think of that later. I guess it's ok, the chances are that the series overall offsets any micro-costs like this so I'm happy. If, for some reason, this overhead is noticable (doubtful), then it can be revisted. Thanks.
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index ec3f7daedcc7..c88770381aaf 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1018,6 +1018,7 @@ enum zone_flags { * Cleared when kswapd is woken. */ ZONE_RECLAIM_ACTIVE, /* kswapd may be scanning the zone. */ + ZONE_BELOW_HIGH, /* zone is below high watermark. */ }; static inline unsigned long zone_managed_pages(struct zone *zone) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8382ad2cdfd4..253fc7d0498e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2407,7 +2407,13 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, return min(batch << 2, pcp->high); } - if (pcp->count >= high && high_min != high_max) { + if (high_min == high_max) + return high; + + if (test_bit(ZONE_BELOW_HIGH, &zone->flags)) { + pcp->high = max(high - (batch << pcp->free_factor), high_min); + high = max(pcp->count, high_min); + } else if (pcp->count >= high) { int need_high = (batch << pcp->free_factor) + batch; /* pcp->high should be large enough to hold batch freed pages */ @@ -2457,6 +2463,10 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, if (pcp->count >= high) { free_pcppages_bulk(zone, nr_pcp_free(pcp, batch, high, free_high), pcp, pindex); + if (test_bit(ZONE_BELOW_HIGH, &zone->flags) && + zone_watermark_ok(zone, 0, high_wmark_pages(zone), + ZONE_MOVABLE, 0)) + clear_bit(ZONE_BELOW_HIGH, &zone->flags); } } @@ -2763,7 +2773,7 @@ static int nr_pcp_alloc(struct per_cpu_pages *pcp, struct zone *zone, int order) * If we had larger pcp->high, we could avoid to allocate from * zone. */ - if (high_min != high_max && !test_bit(ZONE_RECLAIM_ACTIVE, &zone->flags)) + if (high_min != high_max && !test_bit(ZONE_BELOW_HIGH, &zone->flags)) high = pcp->high = min(high + batch, high_max); if (!order) { @@ -3225,6 +3235,25 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, } } + /* + * Detect whether the number of free pages is below high + * watermark. If so, we will decrease pcp->high and free + * PCP pages in free path to reduce the possibility of + * premature page reclaiming. Detection is done here to + * avoid to do that in hotter free path. + */ + if (test_bit(ZONE_BELOW_HIGH, &zone->flags)) + goto check_alloc_wmark; + + mark = high_wmark_pages(zone); + if (zone_watermark_fast(zone, order, mark, + ac->highest_zoneidx, alloc_flags, + gfp_mask)) + goto try_this_zone; + else + set_bit(ZONE_BELOW_HIGH, &zone->flags); + +check_alloc_wmark: mark = wmark_pages(zone, alloc_flags & ALLOC_WMARK_MASK); if (!zone_watermark_fast(zone, order, mark, ac->highest_zoneidx, alloc_flags,