Message ID | 20221111040027.621646-4-yury.norov@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp526853wru; Thu, 10 Nov 2022 20:03:58 -0800 (PST) X-Google-Smtp-Source: AMsMyM7Tk00sxnPbUH68cvc3iZjaioi0vVVpdboo3M5i6nni73e9xCH2tT/7I62knVWwNXqxUDA7 X-Received: by 2002:a05:6402:31ee:b0:461:9e1f:222b with SMTP id dy14-20020a05640231ee00b004619e1f222bmr4228417edb.312.1668139438070; Thu, 10 Nov 2022 20:03:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668139438; cv=none; d=google.com; s=arc-20160816; b=ep319yKJsD/n+PbS9plDxw1DeDPsf/FcFzRXcg7YjWFGKIbSyoRCxpO5PQ1q7JjurG q/wei5CmJtp91hV3yGCD/wuz53uDIVbuNWKotp9TNlAkFn9L+ENTgWuJtCyIS99bH89r BFa9H1lDDJe4ykl7UE8lRKoQgGuCNdI0HtHM50P0re+Dze8ceyBvQle8irEAj5orA4ex ToH08F0bHzYGEWFIGh55E1b6q82WoWmnsV51c+Q1rvtNeUSI1wl/eaaMU2NuXcs5p2gf WZOluny6jqsSHxS3rgbzmkorBnr5ezyF7w/55SuuIPqVTko5aaYjDsoa/YvX0lt9Oh57 1T9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=tzRDKkKs+46lV+ntk985L0+P6jvYS0wrm3ecpZliFYs=; b=l1NpDw5bGorvaCP0c2Q1emiRPMgk8fO8EXiaHNTyqJi8Dv+OzsCFpsyFT2k/MkT4MW 0HWWDKmtBWsbAMVS3gKP6Yp7NPtYFUFw5TFvSAQWa4v6FCIPFOe2sOaztgh1xC+2cEAZ n8Rk+Usw8ve3OoXlVdnFVvUEKvuCnwSuILNIvWKwgijNM3uUpCzXrPDUTWewwSYzCIZN 3zVOOp18ojjzBpUOxr/QmOo4XaQK7SDpmT4BN3b7Gimt7k8TyH3B+iJFWcLmN9v41P5i Bita2pA622F5VnDCDrUUtNNbxQ11m3kAJZp3nQ+w/Mefc+vJWAFSPDohAG1ZbLlhtwhc HQrQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=EXI9KlBN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r19-20020a05640251d300b00461c852af19si1631663edd.633.2022.11.10.20.03.35; Thu, 10 Nov 2022 20:03:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=EXI9KlBN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232609AbiKKEA7 (ORCPT <rfc822;winker.wchi@gmail.com> + 99 others); Thu, 10 Nov 2022 23:00:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232471AbiKKEAg (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 10 Nov 2022 23:00:36 -0500 Received: from mail-oi1-x231.google.com (mail-oi1-x231.google.com [IPv6:2607:f8b0:4864:20::231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6D34965862; Thu, 10 Nov 2022 20:00:35 -0800 (PST) Received: by mail-oi1-x231.google.com with SMTP id m204so3902408oib.6; Thu, 10 Nov 2022 20:00:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=tzRDKkKs+46lV+ntk985L0+P6jvYS0wrm3ecpZliFYs=; b=EXI9KlBNRoNAC/HBM2t606C6k+BIKeV0yCiHoTnP+qKe2h/QdHKK058z8mFYDDYIlB ifkh+NI5b9E6h2EXFXvU85dmBLejYquO5VlXYt/3sRXdPsArf1afj7+1PSOQc5OJNxeR WpViD79eNh7CtAQmQ1AIZVuEQIYhkjGZUBl4ooje1+Dk006Ye6nXJ26TLl7vE19RoxLc NUCFe0PwYE2HPYsdwExhlg652eT5hm0Mv+hVE6/jc6eVjKam9IoGSJKONth26M0jCcbv 9fRQfEUyt1VKN206SefuI3XcTv+uPDvBFHZqYObM3nM0TicHayrNYGhqRaen7IjUlW9s HGUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tzRDKkKs+46lV+ntk985L0+P6jvYS0wrm3ecpZliFYs=; b=53x5w+Yq+gjxX1P/ZPTBR03t/lxVRyX174TSzf2rlnycEDbZxTe/aykhA9CdDWZBg2 tHnhQT4mWIFiBBMu9qOoiA6IG7qW392JjWOMT+7/mmvMa9qVw10B5JodpWEzAw7eQJax UnMGjY74OH3SLwsunf0I1FHPO/sGr2ZjAoF0EGKYM55NFVmJxgGUDk0exg65DNvXtIm/ QEr/BJbO5LyHTts4RO087Vz41dFibiLuHj9g6FfgJOi6mfdP48ir6Bga4yBTUOoq8GtT q7SY4rtMcgwZq0KPuRCttldh4NMLrPJ8EIWhjQpCM9pPBjc/6M/k/nRfORJ1qi/MlI/6 1iTQ== X-Gm-Message-State: ACrzQf1U0xF1UCaIz1xeznoInY09qVdOV2HAj+OnY0oS7xrab9VfS1V6 W/lmLbcryE2ic1S5TJq8nTgbbSKaPKQ= X-Received: by 2002:aca:dad4:0:b0:359:b842:e383 with SMTP id r203-20020acadad4000000b00359b842e383mr2859251oig.123.1668139234518; Thu, 10 Nov 2022 20:00:34 -0800 (PST) Received: from localhost ([12.97.180.36]) by smtp.gmail.com with ESMTPSA id f8-20020a056830204800b0066101e9dccdsm591533otp.45.2022.11.10.20.00.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Nov 2022 20:00:34 -0800 (PST) From: Yury Norov <yury.norov@gmail.com> To: linux-kernel@vger.kernel.org, "David S. Miller" <davem@davemloft.net>, Andy Shevchenko <andriy.shevchenko@linux.intel.com>, Barry Song <baohua@kernel.org>, Ben Segall <bsegall@google.com>, Daniel Bristot de Oliveira <bristot@redhat.com>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Gal Pressman <gal@nvidia.com>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Heiko Carstens <hca@linux.ibm.com>, Ingo Molnar <mingo@redhat.com>, Jakub Kicinski <kuba@kernel.org>, Jason Gunthorpe <jgg@nvidia.com>, Jesse Brandeburg <jesse.brandeburg@intel.com>, Jonathan Cameron <Jonathan.Cameron@huawei.com>, Juri Lelli <juri.lelli@redhat.com>, Leon Romanovsky <leonro@nvidia.com>, Mel Gorman <mgorman@suse.de>, Peter Zijlstra <peterz@infradead.org>, Rasmus Villemoes <linux@rasmusvillemoes.dk>, Saeed Mahameed <saeedm@nvidia.com>, Steven Rostedt <rostedt@goodmis.org>, Tariq Toukan <tariqt@nvidia.com>, Tariq Toukan <ttoukan.linux@gmail.com>, Tony Luck <tony.luck@intel.com>, Valentin Schneider <vschneid@redhat.com>, Vincent Guittot <vincent.guittot@linaro.org> Cc: Yury Norov <yury.norov@gmail.com>, linux-crypto@vger.kernel.org, netdev@vger.kernel.org, linux-rdma@vger.kernel.org Subject: [PATCH 3/4] sched: add sched_numa_find_nth_cpu() Date: Thu, 10 Nov 2022 20:00:26 -0800 Message-Id: <20221111040027.621646-4-yury.norov@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221111040027.621646-1-yury.norov@gmail.com> References: <20221111040027.621646-1-yury.norov@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749170979157033392?= X-GMAIL-MSGID: =?utf-8?q?1749170979157033392?= |
Series | cpumask: improve on cpumask_local_spread() locality | |
Commit Message
Yury Norov
Nov. 11, 2022, 4 a.m. UTC
The function finds Nth set CPU in a given cpumask starting from a given
node.
Leveraging the fact that each hop in sched_domains_numa_masks includes the
same or greater number of CPUs than the previous one, we can use binary
search on hops instead of linear walk, which makes the overall complexity
of O(log n) in terms of number of cpumask_weight() calls.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
include/linux/topology.h | 8 ++++++++
kernel/sched/topology.c | 42 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 50 insertions(+)
Comments
On Thu, Nov 10, 2022 at 08:00:26PM -0800, Yury Norov wrote: > The function finds Nth set CPU in a given cpumask starting from a given > node. > > Leveraging the fact that each hop in sched_domains_numa_masks includes the > same or greater number of CPUs than the previous one, we can use binary > search on hops instead of linear walk, which makes the overall complexity > of O(log n) in terms of number of cpumask_weight() calls. > > Signed-off-by: Yury Norov <yury.norov@gmail.com> > --- > include/linux/topology.h | 8 ++++++++ > kernel/sched/topology.c | 42 ++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 50 insertions(+) > > diff --git a/include/linux/topology.h b/include/linux/topology.h > index 4564faafd0e1..63048ac3207c 100644 > --- a/include/linux/topology.h > +++ b/include/linux/topology.h > @@ -245,5 +245,13 @@ static inline const struct cpumask *cpu_cpu_mask(int cpu) > return cpumask_of_node(cpu_to_node(cpu)); > } > > +#ifdef CONFIG_NUMA > +int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node); > +#else > +int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node) Ah, this should be static of course. > +{ > + return cpumask_nth(cpu, cpus); > +} > +#endif /* CONFIG_NUMA */ > > #endif /* _LINUX_TOPOLOGY_H */ > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > index 8739c2a5a54e..c8f56287de46 100644 > --- a/kernel/sched/topology.c > +++ b/kernel/sched/topology.c > @@ -2067,6 +2067,48 @@ int sched_numa_find_closest(const struct cpumask *cpus, int cpu) > return found; > } > > +/* > + * sched_numa_find_nth_cpu() - given the NUMA topology, find the Nth next cpu > + * closest to @cpu from @cpumask. > + * cpumask: cpumask to find a cpu from > + * cpu: Nth cpu to find > + * > + * returns: cpu, or >= nr_cpu_ids when nothing found. > + */ > +int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node) > +{ > + unsigned int first = 0, mid, last = sched_domains_numa_levels; > + struct cpumask ***masks; > + int w, ret = nr_cpu_ids; > + > + rcu_read_lock(); > + masks = rcu_dereference(sched_domains_numa_masks); > + if (!masks) > + goto out; > + > + while (last >= first) { > + mid = (last + first) / 2; > + > + if (cpumask_weight_and(cpus, masks[mid][node]) <= cpu) { > + first = mid + 1; > + continue; > + } > + > + w = (mid == 0) ? 0 : cpumask_weight_and(cpus, masks[mid - 1][node]); > + if (w <= cpu) > + break; > + > + last = mid - 1; > + } > + > + ret = (mid == 0) ? > + cpumask_nth_and(cpu - w, cpus, masks[mid][node]) : > + cpumask_nth_and_andnot(cpu - w, cpus, masks[mid][node], masks[mid - 1][node]); > +out: > + rcu_read_unlock(); > + return ret; > +} > +EXPORT_SYMBOL_GPL(sched_numa_find_nth_cpu); > #endif /* CONFIG_NUMA */ > > static int __sdt_alloc(const struct cpumask *cpu_map) > -- > 2.34.1
On Thu, Nov 10, 2022 at 08:00:26PM -0800, Yury Norov wrote: > The function finds Nth set CPU in a given cpumask starting from a given > node. > > Leveraging the fact that each hop in sched_domains_numa_masks includes the > same or greater number of CPUs than the previous one, we can use binary > search on hops instead of linear walk, which makes the overall complexity > of O(log n) in terms of number of cpumask_weight() calls. ... > +int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node) > +{ > + unsigned int first = 0, mid, last = sched_domains_numa_levels; > + struct cpumask ***masks; *** ? Hmm... Do we really need such deep indirection? > + int w, ret = nr_cpu_ids; > + > + rcu_read_lock(); > + masks = rcu_dereference(sched_domains_numa_masks); > + if (!masks) > + goto out; > + > + while (last >= first) { > + mid = (last + first) / 2; > + > + if (cpumask_weight_and(cpus, masks[mid][node]) <= cpu) { > + first = mid + 1; > + continue; > + } > + > + w = (mid == 0) ? 0 : cpumask_weight_and(cpus, masks[mid - 1][node]); See below. > + if (w <= cpu) > + break; > + > + last = mid - 1; > + } We have lib/bsearch.h. I haven't really looked deeply into the above, but my gut feelings that that might be useful here. Can you check that? > + ret = (mid == 0) ? > + cpumask_nth_and(cpu - w, cpus, masks[mid][node]) : > + cpumask_nth_and_andnot(cpu - w, cpus, masks[mid][node], masks[mid - 1][node]); You can also shorten this by inversing the conditional: ret = mid ? ...not 0... : ...for 0...; > +out: out_unlock: ? > + rcu_read_unlock(); > + return ret; > +}
On Fri, Nov 11, 2022 at 01:42:29PM +0200, Andy Shevchenko wrote: > On Thu, Nov 10, 2022 at 08:00:26PM -0800, Yury Norov wrote: > > The function finds Nth set CPU in a given cpumask starting from a given > > node. > > > > Leveraging the fact that each hop in sched_domains_numa_masks includes the > > same or greater number of CPUs than the previous one, we can use binary > > search on hops instead of linear walk, which makes the overall complexity > > of O(log n) in terms of number of cpumask_weight() calls. > > ... > > > +int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node) > > +{ > > + unsigned int first = 0, mid, last = sched_domains_numa_levels; > > + struct cpumask ***masks; > > *** ? > Hmm... Do we really need such deep indirection? It's 2d array of pointers, so - yes. > > + int w, ret = nr_cpu_ids; > > + > > + rcu_read_lock(); > > + masks = rcu_dereference(sched_domains_numa_masks); > > + if (!masks) > > + goto out; > > + > > + while (last >= first) { > > + mid = (last + first) / 2; > > + > > + if (cpumask_weight_and(cpus, masks[mid][node]) <= cpu) { > > + first = mid + 1; > > + continue; > > + } > > + > > + w = (mid == 0) ? 0 : cpumask_weight_and(cpus, masks[mid - 1][node]); > > See below. > > > + if (w <= cpu) > > + break; > > + > > + last = mid - 1; > > + } > > We have lib/bsearch.h. I haven't really looked deeply into the above, but my > gut feelings that that might be useful here. Can you check that? Yes we do. I tried it, and it didn't work because nodes arrays are allocated dynamically, and distance between different pairs of hops for a given node is not a constant, which is a requirement for bsearch. However, distance between hops pointers in 1st level array should be constant, and we can try feeding bsearch with it. I'll experiment with bsearch for more. > > + ret = (mid == 0) ? > > + cpumask_nth_and(cpu - w, cpus, masks[mid][node]) : > > + cpumask_nth_and_andnot(cpu - w, cpus, masks[mid][node], masks[mid - 1][node]); > > You can also shorten this by inversing the conditional: > > ret = mid ? ...not 0... : ...for 0...; Yep, why not. > > +out: > > out_unlock: ? Do you think it's better? > > + rcu_read_unlock(); > > + return ret; > > +} > > -- > With Best Regards, > Andy Shevchenko >
On Fri, Nov 11, 2022 at 09:07:15AM -0800, Yury Norov wrote: > On Fri, Nov 11, 2022 at 01:42:29PM +0200, Andy Shevchenko wrote: > > On Thu, Nov 10, 2022 at 08:00:26PM -0800, Yury Norov wrote: ... > > > +out: > > > > out_unlock: ? > > Do you think it's better? Yes. It shows what will happen at goto. So when one reads the "goto out;" it's something like "return ret;". But "goto out_unlock;" immediately pictures "unlock; return ret;". P.S. That's basically the way how we name labels. > > > + rcu_read_unlock(); > > > + return ret;
On Fri, Nov 11, 2022 at 09:07:17AM -0800, Yury Norov wrote: > On Fri, Nov 11, 2022 at 01:42:29PM +0200, Andy Shevchenko wrote: > > On Thu, Nov 10, 2022 at 08:00:26PM -0800, Yury Norov wrote: > > > + int w, ret = nr_cpu_ids; > > > + > > > + rcu_read_lock(); > > > + masks = rcu_dereference(sched_domains_numa_masks); > > > + if (!masks) > > > + goto out; > > > + > > > + while (last >= first) { > > > + mid = (last + first) / 2; > > > + > > > + if (cpumask_weight_and(cpus, masks[mid][node]) <= cpu) { > > > + first = mid + 1; > > > + continue; > > > + } > > > + > > > + w = (mid == 0) ? 0 : cpumask_weight_and(cpus, masks[mid - 1][node]); > > > > See below. > > > > > + if (w <= cpu) > > > + break; > > > + > > > + last = mid - 1; > > > + } > > > > We have lib/bsearch.h. I haven't really looked deeply into the above, but my > > gut feelings that that might be useful here. Can you check that? > > Yes we do. I tried it, and it didn't work because nodes arrays are > allocated dynamically, and distance between different pairs of hops > for a given node is not a constant, which is a requirement for > bsearch. > > However, distance between hops pointers in 1st level array should be > constant, and we can try feeding bsearch with it. I'll experiment with > bsearch for more. OK, I tried bsearch on array of hops, and it works. But it requires adding some black pointers magic. I'll send v2 based on bsearch soon. Thanks, Yury
diff --git a/include/linux/topology.h b/include/linux/topology.h index 4564faafd0e1..63048ac3207c 100644 --- a/include/linux/topology.h +++ b/include/linux/topology.h @@ -245,5 +245,13 @@ static inline const struct cpumask *cpu_cpu_mask(int cpu) return cpumask_of_node(cpu_to_node(cpu)); } +#ifdef CONFIG_NUMA +int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node); +#else +int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node) +{ + return cpumask_nth(cpu, cpus); +} +#endif /* CONFIG_NUMA */ #endif /* _LINUX_TOPOLOGY_H */ diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 8739c2a5a54e..c8f56287de46 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -2067,6 +2067,48 @@ int sched_numa_find_closest(const struct cpumask *cpus, int cpu) return found; } +/* + * sched_numa_find_nth_cpu() - given the NUMA topology, find the Nth next cpu + * closest to @cpu from @cpumask. + * cpumask: cpumask to find a cpu from + * cpu: Nth cpu to find + * + * returns: cpu, or >= nr_cpu_ids when nothing found. + */ +int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node) +{ + unsigned int first = 0, mid, last = sched_domains_numa_levels; + struct cpumask ***masks; + int w, ret = nr_cpu_ids; + + rcu_read_lock(); + masks = rcu_dereference(sched_domains_numa_masks); + if (!masks) + goto out; + + while (last >= first) { + mid = (last + first) / 2; + + if (cpumask_weight_and(cpus, masks[mid][node]) <= cpu) { + first = mid + 1; + continue; + } + + w = (mid == 0) ? 0 : cpumask_weight_and(cpus, masks[mid - 1][node]); + if (w <= cpu) + break; + + last = mid - 1; + } + + ret = (mid == 0) ? + cpumask_nth_and(cpu - w, cpus, masks[mid][node]) : + cpumask_nth_and_andnot(cpu - w, cpus, masks[mid][node], masks[mid - 1][node]); +out: + rcu_read_unlock(); + return ret; +} +EXPORT_SYMBOL_GPL(sched_numa_find_nth_cpu); #endif /* CONFIG_NUMA */ static int __sdt_alloc(const struct cpumask *cpu_map)