[3/4] mm: compaction: Update pageblock skip when first migration candidate is not at the start
Message ID | 20230515113344.6869-4-mgorman@techsingularity.net |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp6851752vqo; Mon, 15 May 2023 04:41:43 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5GBztKnzXS3/go4eKmPXLEXmk0iMy0ua4oRzym3G8tKEgo3ERq2o2+vHBhDOzP8D5S6pmE X-Received: by 2002:a17:903:27c3:b0:1ad:c629:2b73 with SMTP id km3-20020a17090327c300b001adc6292b73mr12760024plb.42.1684150903353; Mon, 15 May 2023 04:41:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684150903; cv=none; d=google.com; s=arc-20160816; b=YBDeJJbHTaTbSM+r+8hv/e/fEZl1GnmFZEnX+zUGJOoU9qrEjr6bNUQueaBIPOqDBY mEiaDEaxDtyyJnaInn/lfVV3SqlCOM6AhRS5xQ8UX3RoeERyph+gXENUsnBAz+xqwnsf f4+2PDb4f+pOiQ6vRoFuZxJJDKwISvyFMrnojWLwEhxi2QxDLtqRU/tOmYcTwZiEv3lh avvvPigHWMOdqiVwOjnIbDPkCpIYTOBc/sRXTbmDTtSUNHkA3+WhuKeHontinhjaQ2hv EXWMc+jj3aDyVVYVXGtlVeOqOyrEHhYUnFU0njUoxEY0g+Kqoj3O3Ri0D0XnXrsNHWaR Odhw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=Jz8SIeZM33TCY29BiJtLYJYjb71lbG7QpgZzYM5PrZA=; b=n79sW/tkkkeWcYH82IxAosAF3MQOVFlZRYD+wequSG+7MkeMSeKqE8YiGvye0vhjZB aJPLWH58EPl8akM3LyDKHVLkkzWMFvUqaaqrstyL5Se07BOjrvs5NlNl2Yinu4CAtBFx 4DdK7r261DiEKV8qisth2vfad05+nipXbyJL0uMXR8/dUYV3nyTvibN7gpCcTITmPdl1 hHqw0K83F8symj1or0U2p+I63KLBtPKyfkG2EvK/+GgiAOjBDYgskGatnQnO9w4Gim9C 7Bnd/ntcy3X2eB7qLrRZG/G4eUh+9If11QlG+hvzh6bIDx1qVMzHstKm1QLiNvlMQZno YLtA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s14-20020a170902988e00b001ac528e3f1bsi15095838plp.154.2023.05.15.04.41.27; Mon, 15 May 2023 04:41:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241396AbjEOLhw (ORCPT <rfc822;peekingduck44@gmail.com> + 99 others); Mon, 15 May 2023 07:37:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41276 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241308AbjEOLfv (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 15 May 2023 07:35:51 -0400 Received: from outbound-smtp41.blacknight.com (outbound-smtp41.blacknight.com [46.22.139.224]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E154F19A6 for <linux-kernel@vger.kernel.org>; Mon, 15 May 2023 04:34:29 -0700 (PDT) Received: from mail.blacknight.com (pemlinmail04.blacknight.ie [81.17.254.17]) by outbound-smtp41.blacknight.com (Postfix) with ESMTPS id 8D811230D for <linux-kernel@vger.kernel.org>; Mon, 15 May 2023 12:34:28 +0100 (IST) Received: (qmail 27031 invoked from network); 15 May 2023 11:34:28 -0000 Received: from unknown (HELO localhost.localdomain) (mgorman@techsingularity.net@[193.118.249.27]) by 81.17.254.9 with ESMTPA; 15 May 2023 11:34:28 -0000 From: Mel Gorman <mgorman@techsingularity.net> To: Vlastimil Babka <vbabka@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org>, Jiri Slaby <jirislaby@kernel.org>, Maxim Levitsky <mlevitsk@redhat.com>, Michal Hocko <mhocko@kernel.org>, Pedro Falcato <pedro.falcato@gmail.com>, Paolo Bonzini <pbonzini@redhat.com>, Chuyi Zhou <zhouchuyi@bytedance.com>, Linux-MM <linux-mm@kvack.org>, LKML <linux-kernel@vger.kernel.org>, Mel Gorman <mgorman@techsingularity.net> Subject: [PATCH 3/4] mm: compaction: Update pageblock skip when first migration candidate is not at the start Date: Mon, 15 May 2023 12:33:43 +0100 Message-Id: <20230515113344.6869-4-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230515113344.6869-1-mgorman@techsingularity.net> References: <20230515113344.6869-1-mgorman@techsingularity.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1765960217931571598?= X-GMAIL-MSGID: =?utf-8?q?1765960217931571598?= |
Series |
Follow-up "Fix excessive CPU usage during compaction"
|
|
Commit Message
Mel Gorman
May 15, 2023, 11:33 a.m. UTC
isolate_migratepages_block should mark a pageblock as skip if scanning
started on an aligned pageblock boundary but it only updates the skip
flag if the first migration candidate is also aligned. Tracing during
a compaction stress load (mmtests: workload-usemem-stress-numa-compact)
that many pageblocks are not marked skip causing excessive scanning of
blocks that had been recently checked. Update pageblock skip based on
"valid_page" which is set if scanning started on a pageblock boundary.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/compaction.c | 15 +++++----------
1 file changed, 5 insertions(+), 10 deletions(-)
Comments
On 5/15/23 13:33, Mel Gorman wrote: > isolate_migratepages_block should mark a pageblock as skip if scanning > started on an aligned pageblock boundary but it only updates the skip > flag if the first migration candidate is also aligned. Tracing during > a compaction stress load (mmtests: workload-usemem-stress-numa-compact) > that many pageblocks are not marked skip causing excessive scanning of > blocks that had been recently checked. Update pageblock skip based on > "valid_page" which is set if scanning started on a pageblock boundary. > > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> I wonder if this has an unintended side-effect that if we resume isolate_migratepages_block() of a partially compacted pageblock to finish it, test_and_set_skip() will now tell us to abort, because we already set the skip bit in the previous call. This would include the cc->finish_pageblock rescan cases. So unless I miss something that already prevents that, I agree we should not tie setting the skip bit to pageblock_aligned(pfn), but maybe if we are not pageblock aligned, we should ignore the already-set skip bit, as it was most likely being set by us in the previous iteration and should not prevent us from finishing the pageblock? > --- > mm/compaction.c | 15 +++++---------- > 1 file changed, 5 insertions(+), 10 deletions(-) > > diff --git a/mm/compaction.c b/mm/compaction.c > index accc6568091a..d7be990b1d60 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -392,18 +392,14 @@ void reset_isolation_suitable(pg_data_t *pgdat) > * Sets the pageblock skip bit if it was clear. Note that this is a hint as > * locks are not required for read/writers. Returns true if it was already set. > */ > -static bool test_and_set_skip(struct compact_control *cc, struct page *page, > - unsigned long pfn) > +static bool test_and_set_skip(struct compact_control *cc, struct page *page) > { > bool skip; > > - /* Do no update if skip hint is being ignored */ > + /* Do not update if skip hint is being ignored */ > if (cc->ignore_skip_hint) > return false; > > - if (!pageblock_aligned(pfn)) > - return false; > - > skip = get_pageblock_skip(page); > if (!skip && !cc->no_set_skip_hint) > set_pageblock_skip(page); > @@ -470,8 +466,7 @@ static void update_cached_migrate(struct compact_control *cc, unsigned long pfn) > { > } > > -static bool test_and_set_skip(struct compact_control *cc, struct page *page, > - unsigned long pfn) > +static bool test_and_set_skip(struct compact_control *cc, struct page *page) > { > return false; > } > @@ -1075,9 +1070,9 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, > lruvec_memcg_debug(lruvec, page_folio(page)); > > /* Try get exclusive access under lock */ > - if (!skip_updated) { > + if (!skip_updated && valid_page) { > skip_updated = true; > - if (test_and_set_skip(cc, page, low_pfn)) > + if (test_and_set_skip(cc, valid_page)) > goto isolate_abort; > } >
On Thu, May 25, 2023 at 03:37:43PM +0200, Vlastimil Babka wrote: > On 5/15/23 13:33, Mel Gorman wrote: > > isolate_migratepages_block should mark a pageblock as skip if scanning > > started on an aligned pageblock boundary but it only updates the skip > > flag if the first migration candidate is also aligned. Tracing during > > a compaction stress load (mmtests: workload-usemem-stress-numa-compact) > > that many pageblocks are not marked skip causing excessive scanning of > > blocks that had been recently checked. Update pageblock skip based on > > "valid_page" which is set if scanning started on a pageblock boundary. > > > > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> > > I wonder if this has an unintended side-effect that if we resume > isolate_migratepages_block() of a partially compacted pageblock to finish > it, test_and_set_skip() will now tell us to abort, because we already set > the skip bit in the previous call. This would include the > cc->finish_pageblock rescan cases. > > So unless I miss something that already prevents that, I agree we should not > tie setting the skip bit to pageblock_aligned(pfn), but maybe if we are not > pageblock aligned, we should ignore the already-set skip bit, as it was most > likely being set by us in the previous iteration and should not prevent us > from finishing the pageblock? > Hmm, I think you're right. While it should not hit the original bug, migration candidates are missed until the next compaction scan which could be tricky to detect. Something like this as a separate patch? Build tested only but the intent is for an unaligned start to set the skip bet if already unset but otherwise complete the scan. Like earlier fixes, this might overscan some pageblocks in a given context but we are probably hitting the limits on how compaction can run efficiently in the current scheme without causing other side-effects :( diff --git a/mm/compaction.c b/mm/compaction.c index 91af6a8b7a98..761a2dd7d78a 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -792,6 +792,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, bool skip_on_failure = false; unsigned long next_skip_pfn = 0; bool skip_updated = false; + bool start_aligned; int ret = 0; cc->migrate_pfn = low_pfn; @@ -824,6 +825,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, } /* Time to isolate some pages for migration */ + start_aligned = pageblock_aligned(start_pfn); for (; low_pfn < end_pfn; low_pfn++) { if (skip_on_failure && low_pfn >= next_skip_pfn) { @@ -1069,10 +1071,15 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, lruvec_memcg_debug(lruvec, page_folio(page)); - /* Try get exclusive access under lock */ + /* Try get exclusive access under lock. Isolation is + * only aborted if the start was pageblock aligned + * as this may be a partial resumed scan that set + * the bit on a recent scan but the scan must reach + * the end of the pageblock. + */ if (!skip_updated && valid_page) { skip_updated = true; - if (test_and_set_skip(cc, valid_page)) + if (test_and_set_skip(cc, valid_page) && start_aligned) goto isolate_abort; }
On 5/29/23 12:33, Mel Gorman wrote: > On Thu, May 25, 2023 at 03:37:43PM +0200, Vlastimil Babka wrote: >> On 5/15/23 13:33, Mel Gorman wrote: >> > isolate_migratepages_block should mark a pageblock as skip if scanning >> > started on an aligned pageblock boundary but it only updates the skip >> > flag if the first migration candidate is also aligned. Tracing during >> > a compaction stress load (mmtests: workload-usemem-stress-numa-compact) >> > that many pageblocks are not marked skip causing excessive scanning of >> > blocks that had been recently checked. Update pageblock skip based on >> > "valid_page" which is set if scanning started on a pageblock boundary. >> > >> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> >> >> I wonder if this has an unintended side-effect that if we resume >> isolate_migratepages_block() of a partially compacted pageblock to finish >> it, test_and_set_skip() will now tell us to abort, because we already set >> the skip bit in the previous call. This would include the >> cc->finish_pageblock rescan cases. >> >> So unless I miss something that already prevents that, I agree we should not >> tie setting the skip bit to pageblock_aligned(pfn), but maybe if we are not >> pageblock aligned, we should ignore the already-set skip bit, as it was most >> likely being set by us in the previous iteration and should not prevent us >> from finishing the pageblock? >> > > Hmm, I think you're right. While it should not hit the original bug, > migration candidates are missed until the next compaction scan which > could be tricky to detect. Something like this as a separate patch? > Build tested only but the intent is for an unaligned start to set the skip > bet if already unset but otherwise complete the scan. Like earlier fixes, > this might overscan some pageblocks in a given context but we are probably > hitting the limits on how compaction can run efficiently in the current > scheme without causing other side-effects :( Yeah that should work! I think it should be even folded to 3/4 but if you want separate, fine too. > diff --git a/mm/compaction.c b/mm/compaction.c > index 91af6a8b7a98..761a2dd7d78a 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -792,6 +792,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, > bool skip_on_failure = false; > unsigned long next_skip_pfn = 0; > bool skip_updated = false; > + bool start_aligned; > int ret = 0; > > cc->migrate_pfn = low_pfn; > @@ -824,6 +825,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, > } > > /* Time to isolate some pages for migration */ > + start_aligned = pageblock_aligned(start_pfn); > for (; low_pfn < end_pfn; low_pfn++) { > > if (skip_on_failure && low_pfn >= next_skip_pfn) { > @@ -1069,10 +1071,15 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, > > lruvec_memcg_debug(lruvec, page_folio(page)); > > - /* Try get exclusive access under lock */ > + /* Try get exclusive access under lock. Isolation is > + * only aborted if the start was pageblock aligned > + * as this may be a partial resumed scan that set > + * the bit on a recent scan but the scan must reach > + * the end of the pageblock. > + */ > if (!skip_updated && valid_page) { > skip_updated = true; > - if (test_and_set_skip(cc, valid_page)) > + if (test_and_set_skip(cc, valid_page) && start_aligned) > goto isolate_abort; > } >
On Mon, May 29, 2023 at 02:43:48PM +0200, Vlastimil Babka wrote: > On 5/29/23 12:33, Mel Gorman wrote: > > On Thu, May 25, 2023 at 03:37:43PM +0200, Vlastimil Babka wrote: > >> On 5/15/23 13:33, Mel Gorman wrote: > >> > isolate_migratepages_block should mark a pageblock as skip if scanning > >> > started on an aligned pageblock boundary but it only updates the skip > >> > flag if the first migration candidate is also aligned. Tracing during > >> > a compaction stress load (mmtests: workload-usemem-stress-numa-compact) > >> > that many pageblocks are not marked skip causing excessive scanning of > >> > blocks that had been recently checked. Update pageblock skip based on > >> > "valid_page" which is set if scanning started on a pageblock boundary. > >> > > >> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> > >> > >> I wonder if this has an unintended side-effect that if we resume > >> isolate_migratepages_block() of a partially compacted pageblock to finish > >> it, test_and_set_skip() will now tell us to abort, because we already set > >> the skip bit in the previous call. This would include the > >> cc->finish_pageblock rescan cases. > >> > >> So unless I miss something that already prevents that, I agree we should not > >> tie setting the skip bit to pageblock_aligned(pfn), but maybe if we are not > >> pageblock aligned, we should ignore the already-set skip bit, as it was most > >> likely being set by us in the previous iteration and should not prevent us > >> from finishing the pageblock? > >> > > > > Hmm, I think you're right. While it should not hit the original bug, > > migration candidates are missed until the next compaction scan which > > could be tricky to detect. Something like this as a separate patch? > > Build tested only but the intent is for an unaligned start to set the skip > > bet if already unset but otherwise complete the scan. Like earlier fixes, > > this might overscan some pageblocks in a given context but we are probably > > hitting the limits on how compaction can run efficiently in the current > > scheme without causing other side-effects :( > > Yeah that should work! I think it should be even folded to 3/4 but if you > want separate, fine too. > I was not happy with the test results so limited the scope of the patch which performed much better both in terms of absolute performance and compaction activity. Are you still ok with this version? Thanks --8<-- mm: compaction: Update pageblock skip when first migration candidate is not at the start -fix Vlastimil Babka pointed out the following problem with mm-compaction-update-pageblock-skip-when-first-migration-candidate-is-not-at-the-start.patch I wonder if this has an unintended side-effect that if we resume isolate_migratepages_block() of a partially compacted pageblock to finish it, test_and_set_skip() will now tell us to abort, because we already set the skip bit in the previous call. This would include the cc->finish_pageblock rescan cases. He is correct and a partial rescan as implemented in "mm, compaction: finish pageblocks on complete migration failure" would abort prematurely. Test and set the skip bit when acquiring "exclusive access" to a pageblock for migration but only abort if the calling context is rescanning to finish a pageblock. This is a fix for the mmotm patch mm-compaction-update-pageblock-skip-when-first-migration-candidate-is-not-at-the-start.patch Signed-off-by: Mel Gorman <mgorman@techsingularity.net> --- mm/compaction.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 91af6a8b7a98..300aa968a0cf 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1069,11 +1069,17 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, lruvec_memcg_debug(lruvec, page_folio(page)); - /* Try get exclusive access under lock */ + /* + * Try get exclusive access under lock. If marked for + * skip, the scan is aborted unless the current context + * is a rescan to reach the end of the pageblock. + */ if (!skip_updated && valid_page) { skip_updated = true; - if (test_and_set_skip(cc, valid_page)) + if (test_and_set_skip(cc, valid_page) && + !cc->finish_pageblock) { goto isolate_abort; + } } /*
On 6/2/23 13:16, Mel Gorman wrote: > On Mon, May 29, 2023 at 02:43:48PM +0200, Vlastimil Babka wrote: >> On 5/29/23 12:33, Mel Gorman wrote: >> > On Thu, May 25, 2023 at 03:37:43PM +0200, Vlastimil Babka wrote: >> >> On 5/15/23 13:33, Mel Gorman wrote: >> >> > isolate_migratepages_block should mark a pageblock as skip if scanning >> >> > started on an aligned pageblock boundary but it only updates the skip >> >> > flag if the first migration candidate is also aligned. Tracing during >> >> > a compaction stress load (mmtests: workload-usemem-stress-numa-compact) >> >> > that many pageblocks are not marked skip causing excessive scanning of >> >> > blocks that had been recently checked. Update pageblock skip based on >> >> > "valid_page" which is set if scanning started on a pageblock boundary. >> >> > >> >> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> >> >> >> >> I wonder if this has an unintended side-effect that if we resume >> >> isolate_migratepages_block() of a partially compacted pageblock to finish >> >> it, test_and_set_skip() will now tell us to abort, because we already set >> >> the skip bit in the previous call. This would include the >> >> cc->finish_pageblock rescan cases. >> >> >> >> So unless I miss something that already prevents that, I agree we should not >> >> tie setting the skip bit to pageblock_aligned(pfn), but maybe if we are not >> >> pageblock aligned, we should ignore the already-set skip bit, as it was most >> >> likely being set by us in the previous iteration and should not prevent us >> >> from finishing the pageblock? >> >> >> > >> > Hmm, I think you're right. While it should not hit the original bug, >> > migration candidates are missed until the next compaction scan which >> > could be tricky to detect. Something like this as a separate patch? >> > Build tested only but the intent is for an unaligned start to set the skip >> > bet if already unset but otherwise complete the scan. Like earlier fixes, >> > this might overscan some pageblocks in a given context but we are probably >> > hitting the limits on how compaction can run efficiently in the current >> > scheme without causing other side-effects :( >> >> Yeah that should work! I think it should be even folded to 3/4 but if you >> want separate, fine too. >> > > I was not happy with the test results so limited the scope of the patch > which performed much better both in terms of absolute performance and > compaction activity. That's surprising. Does that mean that if we isolate COMPACT_CLUSTER_MAX pages, migrate them without failing, but it's not enough to succeed (i.e. there are more pages we need to migrate to free up a whole pageblock), it's better to give up on the rest of the pageblock rather than continue? As that's AFAIU the scenario where cc->finish_pageblock is false when we revisit an unfinished pageblock. Are you still ok with this version? It's better than previously in that cc->finish_pageblock == true case is not sabotaged anymore. But the result as described above seems to be a weird non-intuitive and non-obvious heuristic. How did the test differences look like? > Thanks > > --8<-- > mm: compaction: Update pageblock skip when first migration candidate is not at the start -fix > > Vlastimil Babka pointed out the following problem with > mm-compaction-update-pageblock-skip-when-first-migration-candidate-is-not-at-the-start.patch > > I wonder if this has an unintended side-effect that if we resume > isolate_migratepages_block() of a partially compacted pageblock > to finish it, test_and_set_skip() will now tell us to abort, > because we already set the skip bit in the previous call. This > would include the cc->finish_pageblock rescan cases. > > He is correct and a partial rescan as implemented in "mm, compaction: > finish pageblocks on complete migration failure" would abort > prematurely. > > Test and set the skip bit when acquiring "exclusive access" to a pageblock > for migration but only abort if the calling context is rescanning to > finish a pageblock. Should it say NOT rescanning to finish a pageblock? > This is a fix for the mmotm patch > mm-compaction-update-pageblock-skip-when-first-migration-candidate-is-not-at-the-start.patch > > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> > --- > mm/compaction.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/mm/compaction.c b/mm/compaction.c > index 91af6a8b7a98..300aa968a0cf 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -1069,11 +1069,17 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, > > lruvec_memcg_debug(lruvec, page_folio(page)); > > - /* Try get exclusive access under lock */ > + /* > + * Try get exclusive access under lock. If marked for > + * skip, the scan is aborted unless the current context > + * is a rescan to reach the end of the pageblock. > + */ > if (!skip_updated && valid_page) { > skip_updated = true; > - if (test_and_set_skip(cc, valid_page)) > + if (test_and_set_skip(cc, valid_page) && > + !cc->finish_pageblock) { > goto isolate_abort; > + } > } > > /*
On Fri, Jun 02, 2023 at 02:19:00PM +0200, Vlastimil Babka wrote: > On 6/2/23 13:16, Mel Gorman wrote: > > On Mon, May 29, 2023 at 02:43:48PM +0200, Vlastimil Babka wrote: > >> On 5/29/23 12:33, Mel Gorman wrote: > >> > On Thu, May 25, 2023 at 03:37:43PM +0200, Vlastimil Babka wrote: > >> >> On 5/15/23 13:33, Mel Gorman wrote: > >> >> > isolate_migratepages_block should mark a pageblock as skip if scanning > >> >> > started on an aligned pageblock boundary but it only updates the skip > >> >> > flag if the first migration candidate is also aligned. Tracing during > >> >> > a compaction stress load (mmtests: workload-usemem-stress-numa-compact) > >> >> > that many pageblocks are not marked skip causing excessive scanning of > >> >> > blocks that had been recently checked. Update pageblock skip based on > >> >> > "valid_page" which is set if scanning started on a pageblock boundary. > >> >> > > >> >> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> > >> >> > >> >> I wonder if this has an unintended side-effect that if we resume > >> >> isolate_migratepages_block() of a partially compacted pageblock to finish > >> >> it, test_and_set_skip() will now tell us to abort, because we already set > >> >> the skip bit in the previous call. This would include the > >> >> cc->finish_pageblock rescan cases. > >> >> > >> >> So unless I miss something that already prevents that, I agree we should not > >> >> tie setting the skip bit to pageblock_aligned(pfn), but maybe if we are not > >> >> pageblock aligned, we should ignore the already-set skip bit, as it was most > >> >> likely being set by us in the previous iteration and should not prevent us > >> >> from finishing the pageblock? > >> >> > >> > > >> > Hmm, I think you're right. While it should not hit the original bug, > >> > migration candidates are missed until the next compaction scan which > >> > could be tricky to detect. Something like this as a separate patch? > >> > Build tested only but the intent is for an unaligned start to set the skip > >> > bet if already unset but otherwise complete the scan. Like earlier fixes, > >> > this might overscan some pageblocks in a given context but we are probably > >> > hitting the limits on how compaction can run efficiently in the current > >> > scheme without causing other side-effects :( > >> > >> Yeah that should work! I think it should be even folded to 3/4 but if you > >> want separate, fine too. > >> > > > > I was not happy with the test results so limited the scope of the patch > > which performed much better both in terms of absolute performance and > > compaction activity. > > That's surprising. Does that mean that if we isolate COMPACT_CLUSTER_MAX > pages, migrate them without failing, but it's not enough to succeed (i.e. > there are more pages we need to migrate to free up a whole pageblock), it's > better to give up on the rest of the pageblock rather than continue? I don't have precise enough data to answer that with certainty but probably yes, at least in terms of scan activity. The first version had spikes of pages scanned for migration that are not always reproducible and not on all machines. > As > that's AFAIU the scenario where cc->finish_pageblock is false when we > revisit an unfinished pageblock. > > > Are you still ok with this version? > > It's better than previously in that cc->finish_pageblock == true case is not > sabotaged anymore. But the result as described above seems to be a weird > non-intuitive and non-obvious heuristic. How did the test differences look like? > 5 machines were tested in all (different Intel and AMD generations), 2 showed unexpected spikes in scan activity. 1 showed high migration scan counts for workloads workload-thpchallenge-kernel-build-xfs and workload-usemem-stress-numa-compact. They are both basically compaction stressors with the first one using a kernel build in the background to generate noise. For the second machine, only workload-usemem-stress-numa-compact was affected. In all other test cases and machines, the patches were equivalent in terms of behaviour but the extreme counter-examples led be to conclude the fix should be as constrained as possible unless there is a good reason to do otherwise. > > Vlastimil Babka pointed out the following problem with > > mm-compaction-update-pageblock-skip-when-first-migration-candidate-is-not-at-the-start.patch > > > > I wonder if this has an unintended side-effect that if we resume > > isolate_migratepages_block() of a partially compacted pageblock > > to finish it, test_and_set_skip() will now tell us to abort, > > because we already set the skip bit in the previous call. This > > would include the cc->finish_pageblock rescan cases. > > > > He is correct and a partial rescan as implemented in "mm, compaction: > > finish pageblocks on complete migration failure" would abort > > prematurely. > > > > Test and set the skip bit when acquiring "exclusive access" to a pageblock > > for migration but only abort if the calling context is rescanning to > > finish a pageblock. > > Should it say NOT rescanning to finish a pageblock? > Yep, it should. The sentence was a last minute update before pushing send :(
On 6/2/23 14:48, Mel Gorman wrote: > On Fri, Jun 02, 2023 at 02:19:00PM +0200, Vlastimil Babka wrote: >> On 6/2/23 13:16, Mel Gorman wrote: >> > On Mon, May 29, 2023 at 02:43:48PM +0200, Vlastimil Babka wrote: >> >> On 5/29/23 12:33, Mel Gorman wrote: >> >> > On Thu, May 25, 2023 at 03:37:43PM +0200, Vlastimil Babka wrote: >> >> >> On 5/15/23 13:33, Mel Gorman wrote: >> >> >> > isolate_migratepages_block should mark a pageblock as skip if scanning >> >> >> > started on an aligned pageblock boundary but it only updates the skip >> >> >> > flag if the first migration candidate is also aligned. Tracing during >> >> >> > a compaction stress load (mmtests: workload-usemem-stress-numa-compact) >> >> >> > that many pageblocks are not marked skip causing excessive scanning of >> >> >> > blocks that had been recently checked. Update pageblock skip based on >> >> >> > "valid_page" which is set if scanning started on a pageblock boundary. >> >> >> > >> >> >> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> >> >> >> >> >> >> I wonder if this has an unintended side-effect that if we resume >> >> >> isolate_migratepages_block() of a partially compacted pageblock to finish >> >> >> it, test_and_set_skip() will now tell us to abort, because we already set >> >> >> the skip bit in the previous call. This would include the >> >> >> cc->finish_pageblock rescan cases. >> >> >> >> >> >> So unless I miss something that already prevents that, I agree we should not >> >> >> tie setting the skip bit to pageblock_aligned(pfn), but maybe if we are not >> >> >> pageblock aligned, we should ignore the already-set skip bit, as it was most >> >> >> likely being set by us in the previous iteration and should not prevent us >> >> >> from finishing the pageblock? >> >> >> >> >> > >> >> > Hmm, I think you're right. While it should not hit the original bug, >> >> > migration candidates are missed until the next compaction scan which >> >> > could be tricky to detect. Something like this as a separate patch? >> >> > Build tested only but the intent is for an unaligned start to set the skip >> >> > bet if already unset but otherwise complete the scan. Like earlier fixes, >> >> > this might overscan some pageblocks in a given context but we are probably >> >> > hitting the limits on how compaction can run efficiently in the current >> >> > scheme without causing other side-effects :( >> >> >> >> Yeah that should work! I think it should be even folded to 3/4 but if you >> >> want separate, fine too. >> >> >> > >> > I was not happy with the test results so limited the scope of the patch >> > which performed much better both in terms of absolute performance and >> > compaction activity. >> >> That's surprising. Does that mean that if we isolate COMPACT_CLUSTER_MAX >> pages, migrate them without failing, but it's not enough to succeed (i.e. >> there are more pages we need to migrate to free up a whole pageblock), it's >> better to give up on the rest of the pageblock rather than continue? > > I don't have precise enough data to answer that with certainty but probably > yes, at least in terms of scan activity. The first version had spikes of > pages scanned for migration that are not always reproducible and not on > all machines. Well, that kinda sucks. So for the patch (with adding the missing NOT below). Acked-by: Vlastimil Babka <vbabka@suse.cz> But in raises a question whether we should terminate compaction under the right conditions after a successful migration immediately, rather than invoke another iteration of isolate_migratepages_block() where we could skip over some pages uselessly only to abort at first valid page due to the skip bit. It would save some cycles and be much more obvious than now, where anyone trying to understand how it works in detail might conclude it's an oversight? >> As >> that's AFAIU the scenario where cc->finish_pageblock is false when we >> revisit an unfinished pageblock. >> >> > Are you still ok with this version? >> >> It's better than previously in that cc->finish_pageblock == true case is not >> sabotaged anymore. But the result as described above seems to be a weird >> non-intuitive and non-obvious heuristic. How did the test differences look like? >> > > 5 machines were tested in all (different Intel and AMD generations), > 2 showed unexpected spikes in scan activity. 1 showed high migration > scan counts for workloads workload-thpchallenge-kernel-build-xfs > and workload-usemem-stress-numa-compact. They are both basically > compaction stressors with the first one using a kernel build > in the background to generate noise. For the second machine, only > workload-usemem-stress-numa-compact was affected. In all other test cases > and machines, the patches were equivalent in terms of behaviour but the > extreme counter-examples led be to conclude the fix should be as constrained > as possible unless there is a good reason to do otherwise. > >> > Vlastimil Babka pointed out the following problem with >> > mm-compaction-update-pageblock-skip-when-first-migration-candidate-is-not-at-the-start.patch >> > >> > I wonder if this has an unintended side-effect that if we resume >> > isolate_migratepages_block() of a partially compacted pageblock >> > to finish it, test_and_set_skip() will now tell us to abort, >> > because we already set the skip bit in the previous call. This >> > would include the cc->finish_pageblock rescan cases. >> > >> > He is correct and a partial rescan as implemented in "mm, compaction: >> > finish pageblocks on complete migration failure" would abort >> > prematurely. >> > >> > Test and set the skip bit when acquiring "exclusive access" to a pageblock >> > for migration but only abort if the calling context is rescanning to >> > finish a pageblock. >> >> Should it say NOT rescanning to finish a pageblock? >> > > Yep, it should. The sentence was a last minute update before pushing > send :( >
On 6/6/2023 9:11 PM, Vlastimil Babka wrote: > On 6/2/23 14:48, Mel Gorman wrote: >> On Fri, Jun 02, 2023 at 02:19:00PM +0200, Vlastimil Babka wrote: >>> On 6/2/23 13:16, Mel Gorman wrote: >>>> On Mon, May 29, 2023 at 02:43:48PM +0200, Vlastimil Babka wrote: >>>>> On 5/29/23 12:33, Mel Gorman wrote: >>>>>> On Thu, May 25, 2023 at 03:37:43PM +0200, Vlastimil Babka wrote: >>>>>>> On 5/15/23 13:33, Mel Gorman wrote: >>>>>>>> isolate_migratepages_block should mark a pageblock as skip if scanning >>>>>>>> started on an aligned pageblock boundary but it only updates the skip >>>>>>>> flag if the first migration candidate is also aligned. Tracing during >>>>>>>> a compaction stress load (mmtests: workload-usemem-stress-numa-compact) >>>>>>>> that many pageblocks are not marked skip causing excessive scanning of >>>>>>>> blocks that had been recently checked. Update pageblock skip based on >>>>>>>> "valid_page" which is set if scanning started on a pageblock boundary. >>>>>>>> >>>>>>>> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> >>>>>>> >>>>>>> I wonder if this has an unintended side-effect that if we resume >>>>>>> isolate_migratepages_block() of a partially compacted pageblock to finish >>>>>>> it, test_and_set_skip() will now tell us to abort, because we already set >>>>>>> the skip bit in the previous call. This would include the >>>>>>> cc->finish_pageblock rescan cases. >>>>>>> >>>>>>> So unless I miss something that already prevents that, I agree we should not >>>>>>> tie setting the skip bit to pageblock_aligned(pfn), but maybe if we are not >>>>>>> pageblock aligned, we should ignore the already-set skip bit, as it was most >>>>>>> likely being set by us in the previous iteration and should not prevent us >>>>>>> from finishing the pageblock? >>>>>>> >>>>>> >>>>>> Hmm, I think you're right. While it should not hit the original bug, >>>>>> migration candidates are missed until the next compaction scan which >>>>>> could be tricky to detect. Something like this as a separate patch? >>>>>> Build tested only but the intent is for an unaligned start to set the skip >>>>>> bet if already unset but otherwise complete the scan. Like earlier fixes, >>>>>> this might overscan some pageblocks in a given context but we are probably >>>>>> hitting the limits on how compaction can run efficiently in the current >>>>>> scheme without causing other side-effects :( >>>>> >>>>> Yeah that should work! I think it should be even folded to 3/4 but if you >>>>> want separate, fine too. >>>>> >>>> >>>> I was not happy with the test results so limited the scope of the patch >>>> which performed much better both in terms of absolute performance and >>>> compaction activity. >>> >>> That's surprising. Does that mean that if we isolate COMPACT_CLUSTER_MAX >>> pages, migrate them without failing, but it's not enough to succeed (i.e. >>> there are more pages we need to migrate to free up a whole pageblock), it's >>> better to give up on the rest of the pageblock rather than continue? >> >> I don't have precise enough data to answer that with certainty but probably >> yes, at least in terms of scan activity. The first version had spikes of >> pages scanned for migration that are not always reproducible and not on >> all machines. > > Well, that kinda sucks. So for the patch (with adding the missing NOT below). > > Acked-by: Vlastimil Babka <vbabka@suse.cz> > > But in raises a question whether we should terminate compaction under the > right conditions after a successful migration immediately, rather than > invoke another iteration of isolate_migratepages_block() where we could skip > over some pages uselessly only to abort at first valid page due to the skip bit. > It would save some cycles and be much more obvious than now, where anyone > trying to understand how it works in detail might conclude it's an oversight? IIUC, for direct compactor, we can terminate compaction if there is a suitable free page, that means other pages scanning in this pageblock are uselessly like you said, and we can skip them to save cpu cycles. So we can remove below COMPACT_CONTINUE validation in __compact_finished(): /* * Always finish scanning a pageblock to reduce the possibility of * fallbacks in the future. This is particularly important when * migration source is unmovable/reclaimable but it's not worth * special casing. */ if (!pageblock_aligned(cc->migrate_pfn)) return COMPACT_CONTINUE; And in this case, we also should call update_cached_migrate() to record the partial cc->migrate_pfn, which is the start position for next scanning. Please correct me if I missed something. Thanks. >>> As >>> that's AFAIU the scenario where cc->finish_pageblock is false when we >>> revisit an unfinished pageblock. >>> >>>> Are you still ok with this version? >>> >>> It's better than previously in that cc->finish_pageblock == true case is not >>> sabotaged anymore. But the result as described above seems to be a weird >>> non-intuitive and non-obvious heuristic. How did the test differences look like? >>> >> >> 5 machines were tested in all (different Intel and AMD generations), >> 2 showed unexpected spikes in scan activity. 1 showed high migration >> scan counts for workloads workload-thpchallenge-kernel-build-xfs >> and workload-usemem-stress-numa-compact. They are both basically >> compaction stressors with the first one using a kernel build >> in the background to generate noise. For the second machine, only >> workload-usemem-stress-numa-compact was affected. In all other test cases >> and machines, the patches were equivalent in terms of behaviour but the >> extreme counter-examples led be to conclude the fix should be as constrained >> as possible unless there is a good reason to do otherwise. >> >>>> Vlastimil Babka pointed out the following problem with >>>> mm-compaction-update-pageblock-skip-when-first-migration-candidate-is-not-at-the-start.patch >>>> >>>> I wonder if this has an unintended side-effect that if we resume >>>> isolate_migratepages_block() of a partially compacted pageblock >>>> to finish it, test_and_set_skip() will now tell us to abort, >>>> because we already set the skip bit in the previous call. This >>>> would include the cc->finish_pageblock rescan cases. >>>> >>>> He is correct and a partial rescan as implemented in "mm, compaction: >>>> finish pageblocks on complete migration failure" would abort >>>> prematurely. >>>> >>>> Test and set the skip bit when acquiring "exclusive access" to a pageblock >>>> for migration but only abort if the calling context is rescanning to >>>> finish a pageblock. >>> >>> Should it say NOT rescanning to finish a pageblock? >>> >> >> Yep, it should. The sentence was a last minute update before pushing >> send :( >> >
On Tue, Jun 06, 2023 at 03:11:27PM +0200, Vlastimil Babka wrote: > On 6/2/23 14:48, Mel Gorman wrote: > > On Fri, Jun 02, 2023 at 02:19:00PM +0200, Vlastimil Babka wrote: > >> On 6/2/23 13:16, Mel Gorman wrote: > >> > On Mon, May 29, 2023 at 02:43:48PM +0200, Vlastimil Babka wrote: > >> >> On 5/29/23 12:33, Mel Gorman wrote: > >> >> > On Thu, May 25, 2023 at 03:37:43PM +0200, Vlastimil Babka wrote: > >> >> >> On 5/15/23 13:33, Mel Gorman wrote: > >> >> >> > isolate_migratepages_block should mark a pageblock as skip if scanning > >> >> >> > started on an aligned pageblock boundary but it only updates the skip > >> >> >> > flag if the first migration candidate is also aligned. Tracing during > >> >> >> > a compaction stress load (mmtests: workload-usemem-stress-numa-compact) > >> >> >> > that many pageblocks are not marked skip causing excessive scanning of > >> >> >> > blocks that had been recently checked. Update pageblock skip based on > >> >> >> > "valid_page" which is set if scanning started on a pageblock boundary. > >> >> >> > > >> >> >> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> > >> >> >> > >> >> >> I wonder if this has an unintended side-effect that if we resume > >> >> >> isolate_migratepages_block() of a partially compacted pageblock to finish > >> >> >> it, test_and_set_skip() will now tell us to abort, because we already set > >> >> >> the skip bit in the previous call. This would include the > >> >> >> cc->finish_pageblock rescan cases. > >> >> >> > >> >> >> So unless I miss something that already prevents that, I agree we should not > >> >> >> tie setting the skip bit to pageblock_aligned(pfn), but maybe if we are not > >> >> >> pageblock aligned, we should ignore the already-set skip bit, as it was most > >> >> >> likely being set by us in the previous iteration and should not prevent us > >> >> >> from finishing the pageblock? > >> >> >> > >> >> > > >> >> > Hmm, I think you're right. While it should not hit the original bug, > >> >> > migration candidates are missed until the next compaction scan which > >> >> > could be tricky to detect. Something like this as a separate patch? > >> >> > Build tested only but the intent is for an unaligned start to set the skip > >> >> > bet if already unset but otherwise complete the scan. Like earlier fixes, > >> >> > this might overscan some pageblocks in a given context but we are probably > >> >> > hitting the limits on how compaction can run efficiently in the current > >> >> > scheme without causing other side-effects :( > >> >> > >> >> Yeah that should work! I think it should be even folded to 3/4 but if you > >> >> want separate, fine too. > >> >> > >> > > >> > I was not happy with the test results so limited the scope of the patch > >> > which performed much better both in terms of absolute performance and > >> > compaction activity. > >> > >> That's surprising. Does that mean that if we isolate COMPACT_CLUSTER_MAX > >> pages, migrate them without failing, but it's not enough to succeed (i.e. > >> there are more pages we need to migrate to free up a whole pageblock), it's > >> better to give up on the rest of the pageblock rather than continue? > > > > I don't have precise enough data to answer that with certainty but probably > > yes, at least in terms of scan activity. The first version had spikes of > > pages scanned for migration that are not always reproducible and not on > > all machines. > > Well, that kinda sucks. So for the patch (with adding the missing NOT below). > > Acked-by: Vlastimil Babka <vbabka@suse.cz> > > But in raises a question whether we should terminate compaction under the > right conditions after a successful migration immediately, rather than > invoke another iteration of isolate_migratepages_block() where we could skip > over some pages uselessly only to abort at first valid page due to the skip bit. > It would save some cycles and be much more obvious than now, where anyone > trying to understand how it works in detail might conclude it's an oversight? > It sounds like a solid idea and would be a good standalone patch with the usual supporting data. At a quick glance, the check for a page stored on compact_control happens surprisingly late which makes me think we probably over-compact in direct compaction in particular. It would need supporting data because it probably means that compaction cost gets spread over multiple tasks requiring high-order pages instead of one unlucky task doing compaction works that unrelated tasks indirectly benefit from. It's probably more sensible behaviour that tasks requiring high-order pages pay the cost if kcompactd cannot keep up but supporting data would tell us one way or the other.
diff --git a/mm/compaction.c b/mm/compaction.c index accc6568091a..d7be990b1d60 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -392,18 +392,14 @@ void reset_isolation_suitable(pg_data_t *pgdat) * Sets the pageblock skip bit if it was clear. Note that this is a hint as * locks are not required for read/writers. Returns true if it was already set. */ -static bool test_and_set_skip(struct compact_control *cc, struct page *page, - unsigned long pfn) +static bool test_and_set_skip(struct compact_control *cc, struct page *page) { bool skip; - /* Do no update if skip hint is being ignored */ + /* Do not update if skip hint is being ignored */ if (cc->ignore_skip_hint) return false; - if (!pageblock_aligned(pfn)) - return false; - skip = get_pageblock_skip(page); if (!skip && !cc->no_set_skip_hint) set_pageblock_skip(page); @@ -470,8 +466,7 @@ static void update_cached_migrate(struct compact_control *cc, unsigned long pfn) { } -static bool test_and_set_skip(struct compact_control *cc, struct page *page, - unsigned long pfn) +static bool test_and_set_skip(struct compact_control *cc, struct page *page) { return false; } @@ -1075,9 +1070,9 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, lruvec_memcg_debug(lruvec, page_folio(page)); /* Try get exclusive access under lock */ - if (!skip_updated) { + if (!skip_updated && valid_page) { skip_updated = true; - if (test_and_set_skip(cc, page, low_pfn)) + if (test_and_set_skip(cc, valid_page)) goto isolate_abort; }