Message ID | 20221021061558.34767-2-jiahao.os@bytedance.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp521408wrr; Thu, 20 Oct 2022 23:32:54 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6ly3MZ678ZV+y0wvMQgV/SKtfUxryDdtTzJyDomIk+fNT/dhe+/yrDhGVtYVcQvGjTxzE2 X-Received: by 2002:a05:6402:520f:b0:45f:b7a0:a31 with SMTP id s15-20020a056402520f00b0045fb7a00a31mr8138731edd.379.1666333974651; Thu, 20 Oct 2022 23:32:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666333974; cv=none; d=google.com; s=arc-20160816; b=xun0AzmeLLr0n3vQm6N0WiHzWh2w+6Z4joQEK1XXo1456pD/+WSAbjIFc9XkyMPV5c YruH/im4HPfk5E9AfXxXaY/vnEo6HQwmbGEljWkHHjdnaJdbTr2EybcAdETnYE8/4Xxc +6HoTJn/3fcAP46Us/ioPKrtDBv58CxG4kpCWya2X+rjvxsExs1xd3EVFYaQVfiG2tJP OwOzMK643txHh3o35VUkkWJY2Y3vjxrf/+VB30Oi/WCX/jLGfBuZMxknWJSnnzeQzFO1 G5NILRjGUf2nx/u7XTq+1oNVPyeJr/GXJVpHba2qgOlrGz/9PzwXrdERK4RVKH02WM/W 43VQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=i19y5QR9mtVVOPA0wEofADzlXW2aN27SSyV8zYQJio8=; b=fQUbKjFWGz1dySatZfGKI6SmALzFq1AWlOrXvs9frBPvpuwzbincfGolEXnCtibXEm u9o4ekpV/xzLtvVD0znLLB1x4BDDIakK/lgVOuSPX+1G4gJ382ohxbJH4Uuy6qnAAow1 1GOP9budJz6EtlkJhDBuEnpTa+s0HKnE2y57goutHowW27kdgrX0Ui6Wh3Qac81WUjpI KyZsF5NLq1waIlpom2GPn5xrWghihGAar2ePpf9Mo8018S8Ps2ryWJ7zN6VY4joYxnGL AtGQRAGbnEi3uNaGyKQV+WBVbQb1Q8KC3IqwvmK9+RSuQCoIizgYPsXvGvUqbP2UdDX0 NyWA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=u1ImNYfd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id qb7-20020a1709077e8700b007878f030816si18886607ejc.109.2022.10.20.23.32.30; Thu, 20 Oct 2022 23:32:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=u1ImNYfd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229763AbiJUGQ0 (ORCPT <rfc822;pwkd43@gmail.com> + 99 others); Fri, 21 Oct 2022 02:16:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60188 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230123AbiJUGQU (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 21 Oct 2022 02:16:20 -0400 Received: from mail-pj1-x1036.google.com (mail-pj1-x1036.google.com [IPv6:2607:f8b0:4864:20::1036]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0EA7D237960 for <linux-kernel@vger.kernel.org>; Thu, 20 Oct 2022 23:16:19 -0700 (PDT) Received: by mail-pj1-x1036.google.com with SMTP id g8-20020a17090a128800b0020c79f987ceso5711938pja.5 for <linux-kernel@vger.kernel.org>; Thu, 20 Oct 2022 23:16:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=i19y5QR9mtVVOPA0wEofADzlXW2aN27SSyV8zYQJio8=; b=u1ImNYfdUbWLriSGwT46wNN9Zle5dKFToW2gR/VcF1g2jLV1fk/3baGCIQJH1A/qHW 1JTbFXqdbvsDobztBPBy4yKK94U7TPmgYBPa1J+PTLkBOGD/KGHvzoQjyViTsLqPNBIM cC6L3nXkvhkY/jfftsf1rvrJoUGSg1MKMfmLEF5Lz1Kvn7o/h9VICAGgRrgUQ1uZTdIO Zce3wdZYmxrSGz4IC9ZI9BVPfk+gcnA7Wv0C+tH+zctYJ3FYSkc/qK/uTqve9BR6qLtb mnSIxAin35S7/lAr565wadFXn88M9dJqvIshpGqhVrBI/QSgp2Xw8VQBHZ4TDEnm9buC dQig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=i19y5QR9mtVVOPA0wEofADzlXW2aN27SSyV8zYQJio8=; b=qSKgqhBxPSF9LxS4/GKgIN8T/WvbJrh6uqkQRW+4MRaPf4CnFsdXRMzirPJjtSOftl tGgrR8/yVLl0TsLZgjLKp/j3Ig3bScNr7i7XCu9cMezHqOMDiQkHgFS+NyhIdjUIliK7 wcIO5jDKpZYfLgXjlzdCWuQeFuTh6pnNSrGmeZFo5fv/VHsk365VixH3xvwOix35ihuk uT6VymAC4aUw8GYwlpBRCnP9tqNvhVwaeOL3ysdkOjZg9XsCHhlHZGvmuf3y2H1xVUvb Cox/wp9AbT6wkqe3Sjp9boClnlXhzj84pfEI2o5Tis2kKFCFq1W3EHMfPEw8QwN96trg 1A/A== X-Gm-Message-State: ACrzQf1aoFa2sSAgtEETnshYDvQYI/O4pVg+48YGTlyps74g7aBurJW7 5dBvEk/Thg7f+BOl6NjoeqD4m+Cu21pAJmGD X-Received: by 2002:a17:90b:384e:b0:20d:4b52:ddf0 with SMTP id nl14-20020a17090b384e00b0020d4b52ddf0mr20742540pjb.113.1666332978010; Thu, 20 Oct 2022 23:16:18 -0700 (PDT) Received: from C02G87K0MD6R.bytedance.net ([139.177.225.244]) by smtp.gmail.com with ESMTPSA id o11-20020a17090a4b4b00b0020d45a155d9sm327532pjl.35.2022.10.20.23.16.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Oct 2022 23:16:17 -0700 (PDT) From: Hao Jia <jiahao.os@bytedance.com> To: mingo@redhat.com, peterz@infradead.org, mingo@kernel.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mgorman@techsingularity.net Cc: linux-kernel@vger.kernel.org, Hao Jia <jiahao.os@bytedance.com> Subject: [PATCH 1/2] sched/numa: Stop an exhastive search if an idle core is found Date: Fri, 21 Oct 2022 14:15:57 +0800 Message-Id: <20221021061558.34767-2-jiahao.os@bytedance.com> X-Mailer: git-send-email 2.37.0 (Apple Git-136) In-Reply-To: <20221021061558.34767-1-jiahao.os@bytedance.com> References: <20221021061558.34767-1-jiahao.os@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747277814025412699?= X-GMAIL-MSGID: =?utf-8?q?1747277814025412699?= |
Series |
Optimize the process of scanning CPU for some functions
|
|
Commit Message
Hao Jia
Oct. 21, 2022, 6:15 a.m. UTC
In update_numa_stats() we try to find an idle cpu on the NUMA node,
preferably an idle core. When we find an idle core,
we can stop searching.
Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
---
kernel/sched/fair.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
Comments
On Fri, Oct 21, 2022 at 02:15:57PM +0800, Hao Jia wrote: > In update_numa_stats() we try to find an idle cpu on the NUMA node, > preferably an idle core. When we find an idle core, > we can stop searching. > > Signed-off-by: Hao Jia <jiahao.os@bytedance.com> > --- > kernel/sched/fair.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index e4a0b8bd941c..b7cbec539c77 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -1792,7 +1792,7 @@ static void update_numa_stats(struct task_numa_env *env, > ns->nr_running += rq->cfs.h_nr_running; > ns->compute_capacity += capacity_of(cpu); > > - if (find_idle && !rq->nr_running && idle_cpu(cpu)) { > + if (find_idle && idle_core < 0 && !rq->nr_running && idle_cpu(cpu)) { > if (READ_ONCE(rq->numa_migrate_on) || > !cpumask_test_cpu(cpu, env->p->cpus_ptr)) > continue; > @@ -1801,6 +1801,9 @@ static void update_numa_stats(struct task_numa_env *env, > ns->idle_cpu = cpu; > > idle_core = numa_idle_core(idle_core, cpu); > + /* If we find an idle core, stop searching. */ > + if (idle_core >= 0) > + ns->idle_cpu = idle_core; > } > } > rcu_read_unlock(); > @@ -1808,9 +1811,6 @@ static void update_numa_stats(struct task_numa_env *env, > ns->weight = cpumask_weight(cpumask_of_node(nid)); > > ns->node_type = numa_classify(env->imbalance_pct, ns); > - > - if (idle_core >= 0) > - ns->idle_cpu = idle_core; > } > Remove the change in the first hunk and call break in the second hunk after updating ns->idle_cpu.
On 2022/10/24 Mel Gorman wrote: > On Fri, Oct 21, 2022 at 02:15:57PM +0800, Hao Jia wrote: >> In update_numa_stats() we try to find an idle cpu on the NUMA node, >> preferably an idle core. When we find an idle core, >> we can stop searching. >> >> Signed-off-by: Hao Jia <jiahao.os@bytedance.com> >> --- >> kernel/sched/fair.c | 8 ++++---- >> 1 file changed, 4 insertions(+), 4 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index e4a0b8bd941c..b7cbec539c77 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -1792,7 +1792,7 @@ static void update_numa_stats(struct task_numa_env *env, >> ns->nr_running += rq->cfs.h_nr_running; >> ns->compute_capacity += capacity_of(cpu); >> >> - if (find_idle && !rq->nr_running && idle_cpu(cpu)) { >> + if (find_idle && idle_core < 0 && !rq->nr_running && idle_cpu(cpu)) { >> if (READ_ONCE(rq->numa_migrate_on) || >> !cpumask_test_cpu(cpu, env->p->cpus_ptr)) >> continue; >> @@ -1801,6 +1801,9 @@ static void update_numa_stats(struct task_numa_env *env, >> ns->idle_cpu = cpu; >> >> idle_core = numa_idle_core(idle_core, cpu); >> + /* If we find an idle core, stop searching. */ >> + if (idle_core >= 0) >> + ns->idle_cpu = idle_core; >> } >> } >> rcu_read_unlock(); >> @@ -1808,9 +1811,6 @@ static void update_numa_stats(struct task_numa_env *env, >> ns->weight = cpumask_weight(cpumask_of_node(nid)); >> >> ns->node_type = numa_classify(env->imbalance_pct, ns); >> - >> - if (idle_core >= 0) >> - ns->idle_cpu = idle_core; >> } >> > > Remove the change in the first hunk and call break in the second hunk > after updating ns->idle_cpu. > Yes, thanks for your review. If I understand correctly, some things might look like this. diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e4a0b8bd941c..dfcb620bfe50 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1792,7 +1792,7 @@ static void update_numa_stats(struct task_numa_env *env, ns->nr_running += rq->cfs.h_nr_running; ns->compute_capacity += capacity_of(cpu); - if (find_idle && !rq->nr_running && idle_cpu(cpu)) { + if (find_idle && idle_core < 0 && !rq->nr_running && idle_cpu(cpu)) { if (READ_ONCE(rq->numa_migrate_on) || !cpumask_test_cpu(cpu, env->p->cpus_ptr)) continue; Thanks, Hao
On Tue, Oct 25, 2022 at 11:16:29AM +0800, Hao Jia wrote: > > Remove the change in the first hunk and call break in the second hunk > > after updating ns->idle_cpu. > > > > Yes, thanks for your review. > If I understand correctly, some things might look like this. > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index e4a0b8bd941c..dfcb620bfe50 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -1792,7 +1792,7 @@ static void update_numa_stats(struct task_numa_env > *env, > ns->nr_running += rq->cfs.h_nr_running; > ns->compute_capacity += capacity_of(cpu); > > - if (find_idle && !rq->nr_running && idle_cpu(cpu)) { > + if (find_idle && idle_core < 0 && !rq->nr_running && > idle_cpu(cpu)) { > if (READ_ONCE(rq->numa_migrate_on) || > !cpumask_test_cpu(cpu, env->p->cpus_ptr)) > continue; > I meant more like the below but today I wondered why did I not do this in the first place? The answer is because it's wrong and broken in concept. The full loop is needed to calculate approximate NUMA stats at a point in time. For example, the src and dst nr_running is needed by task_numa_find_cpu. The search for an idle CPU or core in update_numa_stats is simply taking advantage of the fact we are scanning anyway to keep track of an idle CPU or core to avoid a second search as per ff7db0bf24db ("sched/numa: Prefer using an idle CPU as a migration target instead of comparing tasks") The patch I had in mind is below but that said, for both your version and my initial suggestion Naked-by: Mel Gorman <mgorman@suse.de> For the record, this is what I was suggesting initially because it's more efficient but it's wrong, don't do it. diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e4a0b8bd941c..7f1f6a1736a5 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1800,7 +1800,12 @@ static void update_numa_stats(struct task_numa_env *env, if (ns->idle_cpu == -1) ns->idle_cpu = cpu; + /* If we find an idle core, stop searching. */ idle_core = numa_idle_core(idle_core, cpu); + if (idle_core >= 0) { + ns->idle_cpu = idle_core; + break; + } } } rcu_read_unlock(); @@ -1808,9 +1813,6 @@ static void update_numa_stats(struct task_numa_env *env, ns->weight = cpumask_weight(cpumask_of_node(nid)); ns->node_type = numa_classify(env->imbalance_pct, ns); - - if (idle_core >= 0) - ns->idle_cpu = idle_core; } static void task_numa_assign(struct task_numa_env *env,
On 2022/10/25 Mel Gorman wrote: > On Tue, Oct 25, 2022 at 11:16:29AM +0800, Hao Jia wrote: >>> Remove the change in the first hunk and call break in the second hunk >>> after updating ns->idle_cpu. >>> >> >> Yes, thanks for your review. >> If I understand correctly, some things might look like this. >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index e4a0b8bd941c..dfcb620bfe50 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -1792,7 +1792,7 @@ static void update_numa_stats(struct task_numa_env >> *env, >> ns->nr_running += rq->cfs.h_nr_running; >> ns->compute_capacity += capacity_of(cpu); >> >> - if (find_idle && !rq->nr_running && idle_cpu(cpu)) { >> + if (find_idle && idle_core < 0 && !rq->nr_running && >> idle_cpu(cpu)) { >> if (READ_ONCE(rq->numa_migrate_on) || >> !cpumask_test_cpu(cpu, env->p->cpus_ptr)) >> continue; >> > > I meant more like the below but today I wondered why did I not do this in > the first place? The answer is because it's wrong and broken in concept. > > The full loop is needed to calculate approximate NUMA stats at a > point in time. For example, the src and dst nr_running is needed by > task_numa_find_cpu. The search for an idle CPU or core in update_numa_stats > is simply taking advantage of the fact we are scanning anyway to keep > track of an idle CPU or core to avoid a second search as per ff7db0bf24db > ("sched/numa: Prefer using an idle CPU as a migration target instead of > comparing tasks") > > The patch I had in mind is below but that said, for both your version and > my initial suggestion > > Naked-by: Mel Gorman <mgorman@suse.de> > > For the record, this is what I was suggesting initially because it's more > efficient but it's wrong, don't do it. > Thanks for the detailed explanation, maybe my commit message misled you. Yes, we can't stop the whole loop of scanning the CPU because we have a lot of NUMA information to count. But we can stop looking for the next idle core or idle cpu after finding an idle core. So, please review the previous code. Thanks, Hao > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index e4a0b8bd941c..7f1f6a1736a5 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -1800,7 +1800,12 @@ static void update_numa_stats(struct task_numa_env *env, > if (ns->idle_cpu == -1) > ns->idle_cpu = cpu; > > + /* If we find an idle core, stop searching. */ > idle_core = numa_idle_core(idle_core, cpu); > + if (idle_core >= 0) { > + ns->idle_cpu = idle_core; > + break; > + } > } > } > rcu_read_unlock(); > @@ -1808,9 +1813,6 @@ static void update_numa_stats(struct task_numa_env *env, > ns->weight = cpumask_weight(cpumask_of_node(nid)); > > ns->node_type = numa_classify(env->imbalance_pct, ns); > - > - if (idle_core >= 0) > - ns->idle_cpu = idle_core; > } > > static void task_numa_assign(struct task_numa_env *env, >
On Tue, Oct 25, 2022 at 07:10:22PM +0800, Hao Jia wrote: > On 2022/10/25 Mel Gorman wrote: > > On Tue, Oct 25, 2022 at 11:16:29AM +0800, Hao Jia wrote: > > > > Remove the change in the first hunk and call break in the second hunk > > > > after updating ns->idle_cpu. > > > > > > > > > > Yes, thanks for your review. > > > If I understand correctly, some things might look like this. > > > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > index e4a0b8bd941c..dfcb620bfe50 100644 > > > --- a/kernel/sched/fair.c > > > +++ b/kernel/sched/fair.c > > > @@ -1792,7 +1792,7 @@ static void update_numa_stats(struct task_numa_env > > > *env, > > > ns->nr_running += rq->cfs.h_nr_running; > > > ns->compute_capacity += capacity_of(cpu); > > > > > > - if (find_idle && !rq->nr_running && idle_cpu(cpu)) { > > > + if (find_idle && idle_core < 0 && !rq->nr_running && > > > idle_cpu(cpu)) { > > > if (READ_ONCE(rq->numa_migrate_on) || > > > !cpumask_test_cpu(cpu, env->p->cpus_ptr)) > > > continue; > > > > > > > I meant more like the below but today I wondered why did I not do this in > > the first place? The answer is because it's wrong and broken in concept. > > > > The full loop is needed to calculate approximate NUMA stats at a > > point in time. For example, the src and dst nr_running is needed by > > task_numa_find_cpu. The search for an idle CPU or core in update_numa_stats > > is simply taking advantage of the fact we are scanning anyway to keep > > track of an idle CPU or core to avoid a second search as per ff7db0bf24db > > ("sched/numa: Prefer using an idle CPU as a migration target instead of > > comparing tasks") > > > > The patch I had in mind is below but that said, for both your version and > > my initial suggestion > > > > Naked-by: Mel Gorman <mgorman@suse.de> > > > > For the record, this is what I was suggesting initially because it's more > > efficient but it's wrong, don't do it. > > > > Thanks for the detailed explanation, maybe my commit message misled you. > Yes, I did end up confusing myself. The title and changelog referred to stopping a search which made me think of terms of "this whole loop can terminate early" which it can't but it *can* stop checking for a new idle core. If an idle core has been found, it follows that an idle CPU has also been found. While numa_idle_core checks this explicitly, your patch avoids an unnecessary cpumask_test_cpu so it has value. > Yes, we can't stop the whole loop of scanning the CPU because we have a lot > of NUMA information to count. > > But we can stop looking for the next idle core or idle cpu after finding an > idle core. > > So, please review the previous code. > You're right and sorry for the noise. Acked-by: Mel Gorman <mgorman@techsingularity.net>
On 2022/10/25 Mel Gorman wrote: > On Tue, Oct 25, 2022 at 07:10:22PM +0800, Hao Jia wrote: >> On 2022/10/25 Mel Gorman wrote: >>> On Tue, Oct 25, 2022 at 11:16:29AM +0800, Hao Jia wrote: >>>>> Remove the change in the first hunk and call break in the second hunk >>>>> after updating ns->idle_cpu. >>>>> >>>> >>>> Yes, thanks for your review. >>>> If I understand correctly, some things might look like this. >>>> >>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>>> index e4a0b8bd941c..dfcb620bfe50 100644 >>>> --- a/kernel/sched/fair.c >>>> +++ b/kernel/sched/fair.c >>>> @@ -1792,7 +1792,7 @@ static void update_numa_stats(struct task_numa_env >>>> *env, >>>> ns->nr_running += rq->cfs.h_nr_running; >>>> ns->compute_capacity += capacity_of(cpu); >>>> >>>> - if (find_idle && !rq->nr_running && idle_cpu(cpu)) { >>>> + if (find_idle && idle_core < 0 && !rq->nr_running && >>>> idle_cpu(cpu)) { >>>> if (READ_ONCE(rq->numa_migrate_on) || >>>> !cpumask_test_cpu(cpu, env->p->cpus_ptr)) >>>> continue; >>>> >>> >>> I meant more like the below but today I wondered why did I not do this in >>> the first place? The answer is because it's wrong and broken in concept. >>> >>> The full loop is needed to calculate approximate NUMA stats at a >>> point in time. For example, the src and dst nr_running is needed by >>> task_numa_find_cpu. The search for an idle CPU or core in update_numa_stats >>> is simply taking advantage of the fact we are scanning anyway to keep >>> track of an idle CPU or core to avoid a second search as per ff7db0bf24db >>> ("sched/numa: Prefer using an idle CPU as a migration target instead of >>> comparing tasks") >>> >>> The patch I had in mind is below but that said, for both your version and >>> my initial suggestion >>> >>> Naked-by: Mel Gorman <mgorman@suse.de> >>> >>> For the record, this is what I was suggesting initially because it's more >>> efficient but it's wrong, don't do it. >>> >> >> Thanks for the detailed explanation, maybe my commit message misled you. >> > > Yes, I did end up confusing myself. The title and changelog referred to > stopping a search which made me think of terms of "this whole loop can > terminate early" which it can't but it *can* stop checking for a new idle > core. If an idle core has been found, it follows that an idle CPU has also > been found. While numa_idle_core checks this explicitly, your patch avoids > an unnecessary cpumask_test_cpu so it has value. > Thank you for your review, I will change the commit message and send patch v2. >> Yes, we can't stop the whole loop of scanning the CPU because we have a lot >> of NUMA information to count. >> >> But we can stop looking for the next idle core or idle cpu after finding an >> idle core. >> >> So, please review the previous code. >> > > You're right and sorry for the noise. > > Acked-by: Mel Gorman <mgorman@techsingularity.net> Thanks! > Thanks, Hao
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e4a0b8bd941c..b7cbec539c77 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1792,7 +1792,7 @@ static void update_numa_stats(struct task_numa_env *env, ns->nr_running += rq->cfs.h_nr_running; ns->compute_capacity += capacity_of(cpu); - if (find_idle && !rq->nr_running && idle_cpu(cpu)) { + if (find_idle && idle_core < 0 && !rq->nr_running && idle_cpu(cpu)) { if (READ_ONCE(rq->numa_migrate_on) || !cpumask_test_cpu(cpu, env->p->cpus_ptr)) continue; @@ -1801,6 +1801,9 @@ static void update_numa_stats(struct task_numa_env *env, ns->idle_cpu = cpu; idle_core = numa_idle_core(idle_core, cpu); + /* If we find an idle core, stop searching. */ + if (idle_core >= 0) + ns->idle_cpu = idle_core; } } rcu_read_unlock(); @@ -1808,9 +1811,6 @@ static void update_numa_stats(struct task_numa_env *env, ns->weight = cpumask_weight(cpumask_of_node(nid)); ns->node_type = numa_classify(env->imbalance_pct, ns); - - if (idle_core >= 0) - ns->idle_cpu = idle_core; } static void task_numa_assign(struct task_numa_env *env,