Message ID | 20221128132100.30253-3-ricardo.neri-calderon@linux.intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp5657011wrr; Mon, 28 Nov 2022 05:14:44 -0800 (PST) X-Google-Smtp-Source: AA0mqf7+A5beBwDQlBQuOKk1NWALD2lDFglkJUmgJ+ARR3JolQC3I7I9XHzK3VL97XuHcxLBn8j+ X-Received: by 2002:a17:902:aa06:b0:182:f36b:3221 with SMTP id be6-20020a170902aa0600b00182f36b3221mr32056949plb.36.1669641283952; Mon, 28 Nov 2022 05:14:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669641283; cv=none; d=google.com; s=arc-20160816; b=eN8xLUgGYZ3UffyLtZ4Qi6gXRu7v57XYzBuRwtIVa4IQz5Gh4doYF/BB/4Jn+RkXuc GkY/rDWCD4U8gChqirHFWIlDh51s+fPTT9p4bgSwq3Fi5FCf77kxqygvxBvCFVxYKR5h 42eVopZJsiatzkAJBkumPN2G4baZawdyRjYauFwO/I4hY1aUF/SyU0W5pAzjz5nmA4ul jHtHRjN+BA4Tmy3GKFDRRB2NkJq7G8ZwWDeH4GNYuMxrr4eMyMOqwKOSuYtNauWi5GJp 6u5qm7ECcLrpX2o0ujOj8N4mSAMVXNwMOwnQXVc+lzOXEG83MfdLXltY/K490ELbNctu qwsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=skNG8/ZfQF/Vc9Rc1zfjS6CgcgfonYU1VL6OCDUwwf8=; b=tUj7/xPiu2uM9P3yrViEgaJO+tVDuHQEmWBWTJsNVsTDntHS0Vb6ij8btYcC+X+Ef6 dIHOUc9sjP5n9yaGrFzogP+CBm4gOPwdL9cfusXwfJ0nY3+pwNb4tU4ezy25CysOPf8O iExa8Z9UzSN5z4/utn3dGFrIvSildky/9PAOfuieu68umAmMMywF2I2M/mnBIjfstReu n+40iqfI9Oo77XBs12Z8jFTsqLNagCydbo/7aSu5f0vLyqTuWx+6pMznHyGiw61M1yfj zNNeB6bxDKynkxndEUkw3Dxbn0TwrJBDkxdBbigsJUO/C2MRTC9Ayf+qXGjU21XCkMDK i1cA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=MB5KWN8H; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q9-20020a17090aa00900b0020d2286c30csi11200310pjp.134.2022.11.28.05.14.24; Mon, 28 Nov 2022 05:14:43 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=MB5KWN8H; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231650AbiK1NN5 (ORCPT <rfc822;gah0developer@gmail.com> + 99 others); Mon, 28 Nov 2022 08:13:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39378 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231453AbiK1NNg (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 28 Nov 2022 08:13:36 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E22F71CB36; Mon, 28 Nov 2022 05:13:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1669641214; x=1701177214; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=PygCb/lFlxpA6HfJbBXFOfXsvHEBrsUtKwJwsV3g7Oo=; b=MB5KWN8Hs73tFgcXJ1rUQbxcugySDvxKDxIqy2ada6vR8zTy5Ce63b0s WvuhiLAgTbZYjYeSIu3yji1SNy2hVhWrUY7txId5iRlkKu520H7PEJ44b HuM1Ry7dltAUyRt+RmyqhcviLeTFMZkDq8Rr81v8/DLANjSyrrw2lKzmX gvqoowsX56aTuzSc3e/LdbOCpZJiC+YrwPMg38U9KBnmBBxfzSsnb5D12 XJB4EbUEOHYZ6cZxiD8s8nIOyKHanDeddF4yPGYLv3guBQWtaRS3gToPM yreGGCV/0xpbRcXIhcx1K6ENWrkkLjlv9r2yts7xF/vDc4synfkRJ4zFh Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10544"; a="401117061" X-IronPort-AV: E=Sophos;i="5.96,200,1665471600"; d="scan'208";a="401117061" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Nov 2022 05:13:31 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10544"; a="749381327" X-IronPort-AV: E=Sophos;i="5.96,200,1665471600"; d="scan'208";a="749381327" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by fmsmga002.fm.intel.com with ESMTP; 28 Nov 2022 05:13:31 -0800 From: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> To: "Peter Zijlstra (Intel)" <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>, Vincent Guittot <vincent.guittot@linaro.org> Cc: Ricardo Neri <ricardo.neri@intel.com>, "Ravi V. Shankar" <ravi.v.shankar@intel.com>, Ben Segall <bsegall@google.com>, Daniel Bristot de Oliveira <bristot@redhat.com>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Len Brown <len.brown@intel.com>, Mel Gorman <mgorman@suse.de>, "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>, Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>, Steven Rostedt <rostedt@goodmis.org>, Tim Chen <tim.c.chen@linux.intel.com>, Valentin Schneider <vschneid@redhat.com>, x86@kernel.org, "Joel Fernandes (Google)" <joel@joelfernandes.org>, linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri <ricardo.neri-calderon@linux.intel.com>, "Tim C . Chen" <tim.c.chen@intel.com> Subject: [PATCH v2 02/22] sched: Add interfaces for IPC classes Date: Mon, 28 Nov 2022 05:20:40 -0800 Message-Id: <20221128132100.30253-3-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221128132100.30253-1-ricardo.neri-calderon@linux.intel.com> References: <20221128132100.30253-1-ricardo.neri-calderon@linux.intel.com> X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1750745778990059925?= X-GMAIL-MSGID: =?utf-8?q?1750745778990059925?= |
Series |
sched: Introduce IPC classes for load balance
|
|
Commit Message
Ricardo Neri
Nov. 28, 2022, 1:20 p.m. UTC
Add the interfaces that architectures shall implement to convey the data
to support IPC classes.
arch_update_ipcc() updates the IPC classification of the current task as
given by hardware.
arch_get_ipcc_score() provides a performance score for a given IPC class
when placed on a specific CPU. Higher scores indicate higher performance.
The number of classes and the score of each class of task are determined
by hardware.
Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
* Shortened the names of the IPCC interfaces (PeterZ):
sched_task_classes_enabled >> sched_ipcc_enabled
arch_has_task_classes >> arch_has_ipc_classes
arch_update_task_class >> arch_update_ipcc
arch_get_task_class_score >> arch_get_ipcc_score
* Removed smt_siblings_idle argument from arch_update_ipcc(). (PeterZ)
---
kernel/sched/sched.h | 60 +++++++++++++++++++++++++++++++++++++++++
kernel/sched/topology.c | 8 ++++++
2 files changed, 68 insertions(+)
Comments
Hi, On Monday 28 Nov 2022 at 05:20:40 (-0800), Ricardo Neri wrote: [..] > +#ifndef arch_has_ipc_classes > +/** > + * arch_has_ipc_classes() - Check whether hardware supports IPC classes of tasks > + * > + * Returns: true of IPC classes of tasks are supported. > + */ > +static __always_inline > +bool arch_has_ipc_classes(void) > +{ > + return false; > +} > +#endif > + > +#ifndef arch_update_ipcc > +/** > + * arch_update_ipcc() - Update the IPC class of the current task > + * @curr: The current task > + * > + * Request that the IPC classification of @curr is updated. > + * > + * Returns: none > + */ > +static __always_inline > +void arch_update_ipcc(struct task_struct *curr) > +{ > +} > +#endif > + > +#ifndef arch_get_ipcc_score > +/** > + * arch_get_ipcc_score() - Get the IPC score of a class of task > + * @ipcc: The IPC class > + * @cpu: A CPU number > + * > + * Returns the performance score of an IPC class when running on @cpu. > + * Error when either @class or @cpu are invalid. > + */ > +static __always_inline > +int arch_get_ipcc_score(unsigned short ipcc, int cpu) > +{ > + return 1; > +} > +#endif The interface looks mostly alright but this arch_get_ipcc_score() leaves unclear what are the characteristics of the returned value. Does it have a meaning as an absolute value or is it a value on an abstract scale? If it should be interpreted as instructions per cycle, if I wanted to have a proper comparison between the ability of two CPUs to handle this class of tasks then I would need to take into consideration the maximum frequency of each CPU. If it's a performance value on an abstract scale (more likely), similar cu capacity, then it might be good to better define this abstract scale. That would help with the default implementation where possibly the best choice for a return value would be the maximum value on the scale, suggesting equal/maximum performance for different CPUs handling the class of tasks. I suppose you avoided returning 0 for the default implementation as you intend that to mean the inability of the CPU to handle that class of tasks? It would be good to document this. > +#else /* CONFIG_IPC_CLASSES */ > + > +#define arch_get_ipcc_score(ipcc, cpu) (-EINVAL) > +#define arch_update_ipcc(curr) > + > +static inline bool sched_ipcc_enabled(void) { return false; } > + > +#endif /* CONFIG_IPC_CLASSES */ > + > #ifndef arch_scale_freq_capacity > /** > * arch_scale_freq_capacity - get the frequency scale factor of a given CPU. > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > index 8154ef590b9f..eb1654b64df7 100644 > --- a/kernel/sched/topology.c > +++ b/kernel/sched/topology.c > @@ -669,6 +669,9 @@ DEFINE_PER_CPU(struct sched_domain __rcu *, sd_numa); > DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing); > DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity); > DEFINE_STATIC_KEY_FALSE(sched_asym_cpucapacity); > +#ifdef CONFIG_IPC_CLASSES > +DEFINE_STATIC_KEY_FALSE(sched_ipcc); > +#endif > > static void update_top_cache_domain(int cpu) > { > @@ -2388,6 +2391,11 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att > if (has_asym) > static_branch_inc_cpuslocked(&sched_asym_cpucapacity); > > +#ifdef CONFIG_IPC_CLASSES > + if (arch_has_ipc_classes()) > + static_branch_enable_cpuslocked(&sched_ipcc); > +#endif Wouldn't this be better placed directly in sched_init_smp()? It's not gated by and it does not need any sched domains information. Hope it helps, Ionela. > + > if (rq && sched_debug_verbose) { > pr_info("root domain span: %*pbl (max cpu_capacity = %lu)\n", > cpumask_pr_args(cpu_map), rq->rd->max_cpu_capacity); > -- > 2.25.1 > >
On Thu, Dec 08, 2022 at 08:48:46AM +0000, Ionela Voinescu wrote: > Hi, > > On Monday 28 Nov 2022 at 05:20:40 (-0800), Ricardo Neri wrote: > [..] > > +#ifndef arch_has_ipc_classes > > +/** > > + * arch_has_ipc_classes() - Check whether hardware supports IPC classes of tasks > > + * > > + * Returns: true of IPC classes of tasks are supported. > > + */ > > +static __always_inline > > +bool arch_has_ipc_classes(void) > > +{ > > + return false; > > +} > > +#endif > > + > > +#ifndef arch_update_ipcc > > +/** > > + * arch_update_ipcc() - Update the IPC class of the current task > > + * @curr: The current task > > + * > > + * Request that the IPC classification of @curr is updated. > > + * > > + * Returns: none > > + */ > > +static __always_inline > > +void arch_update_ipcc(struct task_struct *curr) > > +{ > > +} > > +#endif > > + > > +#ifndef arch_get_ipcc_score > > +/** > > + * arch_get_ipcc_score() - Get the IPC score of a class of task > > + * @ipcc: The IPC class > > + * @cpu: A CPU number > > + * > > + * Returns the performance score of an IPC class when running on @cpu. > > + * Error when either @class or @cpu are invalid. > > + */ > > +static __always_inline > > +int arch_get_ipcc_score(unsigned short ipcc, int cpu) > > +{ > > + return 1; > > +} > > +#endif Thank you very much for your feedback Ionela! > > The interface looks mostly alright but this arch_get_ipcc_score() leaves > unclear what are the characteristics of the returned value. Fair point. I mean for the return value to be defined by architectures; but yes, architectures need to know how to implement this function. > > Does it have a meaning as an absolute value or is it a value on an > abstract scale? If it should be interpreted as instructions per cycle, > if I wanted to have a proper comparison between the ability of two CPUs > to handle this class of tasks then I would need to take into consideration > the maximum frequency of each CPU. Do you mean when calling arch_get_ipcc_score()? If yes, then I agree, IPC class may not be the only factor, but the criteria to use the return value is up to the caller. In asym_packing it is assumed that higher-priority CPUs are preferred. When balancing load, IPC class scores are used to select between otherwise identical runqueues. This should also be the case for migrate_misfit: we know already that the tasks being considered do not fit on their current CPU. We would need to think what to do with other type of balancing, if at all. That said, arch_get_ipcc_score() should only return a metric of the instructions-per-*cycle*, independent of frequency, no? > If it's a performance value on an > abstract scale (more likely), similar cu capacity, then it might be good > to better define this abstract scale. That would help with the default > implementation where possibly the best choice for a return value would > be the maximum value on the scale, suggesting equal/maximum performance > for different CPUs handling the class of tasks. I guess something like: #define SCHED_IPCC_DEFAULT_SCALE 1024 ? I think I am fine with this value being the default. I also think that it is up to architectures to whether scale all IPC class scores from the best-performing class on the best-performing CPU. Doing so would introduce overhead, especially if hardware updates the IPC class scores multiple times during runtime. > > I suppose you avoided returning 0 for the default implementation as you > intend that to mean the inability of the CPU to handle that class of > tasks? It would be good to document this. I meant this to be minimum possible IPC class score for any CPU: any CPU should be able handle any IPC class. If not implemented, all CPUs handle all IPC classes equally. > > > +#else /* CONFIG_IPC_CLASSES */ > > + > > +#define arch_get_ipcc_score(ipcc, cpu) (-EINVAL) > > +#define arch_update_ipcc(curr) > > + > > +static inline bool sched_ipcc_enabled(void) { return false; } > > + > > +#endif /* CONFIG_IPC_CLASSES */ > > + > > #ifndef arch_scale_freq_capacity > > /** > > * arch_scale_freq_capacity - get the frequency scale factor of a given CPU. > > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > > index 8154ef590b9f..eb1654b64df7 100644 > > --- a/kernel/sched/topology.c > > +++ b/kernel/sched/topology.c > > @@ -669,6 +669,9 @@ DEFINE_PER_CPU(struct sched_domain __rcu *, sd_numa); > > DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing); > > DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity); > > DEFINE_STATIC_KEY_FALSE(sched_asym_cpucapacity); > > +#ifdef CONFIG_IPC_CLASSES > > +DEFINE_STATIC_KEY_FALSE(sched_ipcc); > > +#endif > > > > static void update_top_cache_domain(int cpu) > > { > > @@ -2388,6 +2391,11 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att > > if (has_asym) > > static_branch_inc_cpuslocked(&sched_asym_cpucapacity); > > > > +#ifdef CONFIG_IPC_CLASSES > > + if (arch_has_ipc_classes()) > > + static_branch_enable_cpuslocked(&sched_ipcc); > > +#endif > > Wouldn't this be better placed directly in sched_init_smp()? > It's not gated by and it does not need any sched domains information. Very true. I will take your suggestion. > > Hope it helps, It does help significantly. Thanks again for your feedback. Thanks and BR, Ricardo
Hi Richardo, I have some generic comment for the design of those interfaces. On 11/28/22 13:20, Ricardo Neri wrote: > Add the interfaces that architectures shall implement to convey the data > to support IPC classes. > > arch_update_ipcc() updates the IPC classification of the current task as > given by hardware. > > arch_get_ipcc_score() provides a performance score for a given IPC class > when placed on a specific CPU. Higher scores indicate higher performance. > > The number of classes and the score of each class of task are determined > by hardware. > > Cc: Ben Segall <bsegall@google.com> > Cc: Daniel Bristot de Oliveira <bristot@redhat.com> > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> > Cc: Joel Fernandes (Google) <joel@joelfernandes.org> > Cc: Len Brown <len.brown@intel.com> > Cc: Mel Gorman <mgorman@suse.de> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> > Cc: Steven Rostedt <rostedt@goodmis.org> > Cc: Tim C. Chen <tim.c.chen@intel.com> > Cc: Valentin Schneider <vschneid@redhat.com> > Cc: x86@kernel.org > Cc: linux-pm@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> > --- > Changes since v1: > * Shortened the names of the IPCC interfaces (PeterZ): > sched_task_classes_enabled >> sched_ipcc_enabled > arch_has_task_classes >> arch_has_ipc_classes > arch_update_task_class >> arch_update_ipcc > arch_get_task_class_score >> arch_get_ipcc_score > * Removed smt_siblings_idle argument from arch_update_ipcc(). (PeterZ) > --- > kernel/sched/sched.h | 60 +++++++++++++++++++++++++++++++++++++++++ > kernel/sched/topology.c | 8 ++++++ > 2 files changed, 68 insertions(+) > > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index b1d338a740e5..75e22baa2622 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -2531,6 +2531,66 @@ void arch_scale_freq_tick(void) > } > #endif > > +#ifdef CONFIG_IPC_CLASSES > +DECLARE_STATIC_KEY_FALSE(sched_ipcc); > + > +static inline bool sched_ipcc_enabled(void) > +{ > + return static_branch_unlikely(&sched_ipcc); > +} > + > +#ifndef arch_has_ipc_classes > +/** > + * arch_has_ipc_classes() - Check whether hardware supports IPC classes of tasks > + * > + * Returns: true of IPC classes of tasks are supported. > + */ > +static __always_inline > +bool arch_has_ipc_classes(void) > +{ > + return false; > +} > +#endif > + > +#ifndef arch_update_ipcc > +/** > + * arch_update_ipcc() - Update the IPC class of the current task > + * @curr: The current task > + * > + * Request that the IPC classification of @curr is updated. > + * > + * Returns: none > + */ > +static __always_inline > +void arch_update_ipcc(struct task_struct *curr) > +{ > +} > +#endif > + > +#ifndef arch_get_ipcc_score > +/** > + * arch_get_ipcc_score() - Get the IPC score of a class of task > + * @ipcc: The IPC class > + * @cpu: A CPU number > + * > + * Returns the performance score of an IPC class when running on @cpu. > + * Error when either @class or @cpu are invalid. > + */ > +static __always_inline > +int arch_get_ipcc_score(unsigned short ipcc, int cpu) > +{ > + return 1; > +} > +#endif Those interfaces are quite simple, probably work really OK with your HW/FW. If any other architecture is going to re-use them in future, we might face some issue. Let me explain why. These kernel functions are start to be used very early in boot. Your HW/FW is probably instantly ready to work from the very beginning during boot. What is some other HW needs some preparation code, like setup communication channel to FW or enable needed clocks/events/etc. What I would like to see is a similar mechanism to the one in schedutil. Schedutil governor has to wait till cpufreq initialize the cpu freq driver and policy objects (which sometimes takes ~2-3 sec). After that cpufreq fwk starts the governor which populates this hook [1]. It's based on RCU mechanism with function pointer that can be then called from the task scheduler when everything is ready to work. If we (Arm) is going to use your proposed interfaces, we might need different mechanisms because the platform likely would be ready after our SCMI FW channels and cpufreq are setup. Would it be possible to address such need now or I would have to change that interface code later? Regards, Lukasz [1] https://elixir.bootlin.com/linux/latest/source/kernel/sched/cpufreq.c#L29
Hi, On Tuesday 13 Dec 2022 at 16:31:28 (-0800), Ricardo Neri wrote: > On Thu, Dec 08, 2022 at 08:48:46AM +0000, Ionela Voinescu wrote: > > Hi, > > > > On Monday 28 Nov 2022 at 05:20:40 (-0800), Ricardo Neri wrote: > > [..] > > > +#ifndef arch_has_ipc_classes > > > +/** > > > + * arch_has_ipc_classes() - Check whether hardware supports IPC classes of tasks > > > + * > > > + * Returns: true of IPC classes of tasks are supported. > > > + */ > > > +static __always_inline > > > +bool arch_has_ipc_classes(void) > > > +{ > > > + return false; > > > +} > > > +#endif > > > + > > > +#ifndef arch_update_ipcc > > > +/** > > > + * arch_update_ipcc() - Update the IPC class of the current task > > > + * @curr: The current task > > > + * > > > + * Request that the IPC classification of @curr is updated. > > > + * > > > + * Returns: none > > > + */ > > > +static __always_inline > > > +void arch_update_ipcc(struct task_struct *curr) > > > +{ > > > +} > > > +#endif > > > + > > > +#ifndef arch_get_ipcc_score > > > +/** > > > + * arch_get_ipcc_score() - Get the IPC score of a class of task > > > + * @ipcc: The IPC class > > > + * @cpu: A CPU number > > > + * > > > + * Returns the performance score of an IPC class when running on @cpu. > > > + * Error when either @class or @cpu are invalid. > > > + */ > > > +static __always_inline > > > +int arch_get_ipcc_score(unsigned short ipcc, int cpu) > > > +{ > > > + return 1; > > > +} > > > +#endif > > Thank you very much for your feedback Ionela! > > > > > The interface looks mostly alright but this arch_get_ipcc_score() leaves > > unclear what are the characteristics of the returned value. > > Fair point. I mean for the return value to be defined by architectures; > but yes, architectures need to know how to implement this function. > > > > > Does it have a meaning as an absolute value or is it a value on an > > abstract scale? If it should be interpreted as instructions per cycle, > > if I wanted to have a proper comparison between the ability of two CPUs > > to handle this class of tasks then I would need to take into consideration > > the maximum frequency of each CPU. > > Do you mean when calling arch_get_ipcc_score()? If yes, then I agree, IPC > class may not be the only factor, but the criteria to use the return value > is up to the caller. > Yes, but if different architectures give different meanings to this score (scale, relative difference between two values, etc) while the policies are common (uses of arch_get_ipcc_score() in common scheduler paths) then the outcome can be vastly different. If the "criteria to use the returned value is up to the caller", then the caller of arch_get_ipcc_score() should always be architecture specific code, which currently is not (see 09/22). > In asym_packing it is assumed that higher-priority CPUs are preferred. > When balancing load, IPC class scores are used to select between otherwise > identical runqueues. This should also be the case for migrate_misfit: we > know already that the tasks being considered do not fit on their current > CPU. > > We would need to think what to do with other type of balancing, if at all. > > That said, arch_get_ipcc_score() should only return a metric of the > instructions-per-*cycle*, independent of frequency, no? > Yes, performance on an abstract scale is preferred here. We would not want to have to scale the score by frequency :). It was just an example showing that the description of arch_get_ipcc_score() should be clarified. Another possible clarification: is it expected that the scores scale linearly with performance (does double the score mean double the performance?). > > If it's a performance value on an > > abstract scale (more likely), similar cu capacity, then it might be good > > to better define this abstract scale. That would help with the default > > implementation where possibly the best choice for a return value would > > be the maximum value on the scale, suggesting equal/maximum performance > > for different CPUs handling the class of tasks. > > I guess something like: > > #define SCHED_IPCC_DEFAULT_SCALE 1024 > > ? > > I think I am fine with this value being the default. I also think that it > is up to architectures to whether scale all IPC class scores from the > best-performing class on the best-performing CPU. Doing so would introduce > overhead, especially if hardware updates the IPC class scores multiple > times during runtime. > Yes, it's a very good point. Initially I thought that one would need to rescale the values anyway for them to make sense relative to each other, but I now realise that would not be needed. Therefore, you are right, to avoid this extra work it's best to leave the range of possible score values up to the implementer and not force something like [0 - 1024]. But again, this raises the point that if one architecture decides to return its scores on a scale [0 - 1024] and possibly use these scores to scale utilization/alter capacity for example, this cannot be generic policy as not all architectures are guaranteed to use this scale for its scores. So leaving the score unrestricted makes it more difficult to have generic policies across architectures that use them. > > > > I suppose you avoided returning 0 for the default implementation as you > > intend that to mean the inability of the CPU to handle that class of > > tasks? It would be good to document this. > > I meant this to be minimum possible IPC class score for any CPU: any > CPU should be able handle any IPC class. If not implemented, all CPUs > handle all IPC classes equally. > Ah, I see. In this case you might as well return 0 in the default implementation of arch_get_ipcc_score(). I know it does not matter much what gets returned there, but returning a meaningless "1" is strange to me :). Thanks, Ionela.
On Wed, Dec 14, 2022 at 07:36:44AM +0000, Lukasz Luba wrote: > Hi Richardo, > > I have some generic comment for the design of those interfaces. > > On 11/28/22 13:20, Ricardo Neri wrote: > > Add the interfaces that architectures shall implement to convey the data > > to support IPC classes. > > > > arch_update_ipcc() updates the IPC classification of the current task as > > given by hardware. > > > > arch_get_ipcc_score() provides a performance score for a given IPC class > > when placed on a specific CPU. Higher scores indicate higher performance. > > > > The number of classes and the score of each class of task are determined > > by hardware. > > > > Cc: Ben Segall <bsegall@google.com> > > Cc: Daniel Bristot de Oliveira <bristot@redhat.com> > > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> > > Cc: Joel Fernandes (Google) <joel@joelfernandes.org> > > Cc: Len Brown <len.brown@intel.com> > > Cc: Mel Gorman <mgorman@suse.de> > > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> > > Cc: Steven Rostedt <rostedt@goodmis.org> > > Cc: Tim C. Chen <tim.c.chen@intel.com> > > Cc: Valentin Schneider <vschneid@redhat.com> > > Cc: x86@kernel.org > > Cc: linux-pm@vger.kernel.org > > Cc: linux-kernel@vger.kernel.org > > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> > > --- > > Changes since v1: > > * Shortened the names of the IPCC interfaces (PeterZ): > > sched_task_classes_enabled >> sched_ipcc_enabled > > arch_has_task_classes >> arch_has_ipc_classes > > arch_update_task_class >> arch_update_ipcc > > arch_get_task_class_score >> arch_get_ipcc_score > > * Removed smt_siblings_idle argument from arch_update_ipcc(). (PeterZ) > > --- > > kernel/sched/sched.h | 60 +++++++++++++++++++++++++++++++++++++++++ > > kernel/sched/topology.c | 8 ++++++ > > 2 files changed, 68 insertions(+) > > > > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > > index b1d338a740e5..75e22baa2622 100644 > > --- a/kernel/sched/sched.h > > +++ b/kernel/sched/sched.h > > @@ -2531,6 +2531,66 @@ void arch_scale_freq_tick(void) > > } > > #endif > > +#ifdef CONFIG_IPC_CLASSES > > +DECLARE_STATIC_KEY_FALSE(sched_ipcc); > > + > > +static inline bool sched_ipcc_enabled(void) > > +{ > > + return static_branch_unlikely(&sched_ipcc); > > +} > > + > > +#ifndef arch_has_ipc_classes > > +/** > > + * arch_has_ipc_classes() - Check whether hardware supports IPC classes of tasks > > + * > > + * Returns: true of IPC classes of tasks are supported. > > + */ > > +static __always_inline > > +bool arch_has_ipc_classes(void) > > +{ > > + return false; > > +} > > +#endif > > + > > +#ifndef arch_update_ipcc > > +/** > > + * arch_update_ipcc() - Update the IPC class of the current task > > + * @curr: The current task > > + * > > + * Request that the IPC classification of @curr is updated. > > + * > > + * Returns: none > > + */ > > +static __always_inline > > +void arch_update_ipcc(struct task_struct *curr) > > +{ > > +} > > +#endif > > + > > +#ifndef arch_get_ipcc_score > > +/** > > + * arch_get_ipcc_score() - Get the IPC score of a class of task > > + * @ipcc: The IPC class > > + * @cpu: A CPU number > > + * > > + * Returns the performance score of an IPC class when running on @cpu. > > + * Error when either @class or @cpu are invalid. > > + */ > > +static __always_inline > > +int arch_get_ipcc_score(unsigned short ipcc, int cpu) > > +{ > > + return 1; > > +} > > +#endif > > Those interfaces are quite simple, probably work really OK with > your HW/FW. If any other architecture is going to re-use them > in future, we might face some issue. Let me explain why. > > These kernel functions are start to be used very early in boot. > Your HW/FW is probably instantly ready to work from the very > beginning during boot. What is some other HW needs some > preparation code, like setup communication channel to FW or enable > needed clocks/events/etc. > > What I would like to see is a similar mechanism to the one in schedutil. > Schedutil governor has to wait till cpufreq initialize the cpu freq > driver and policy objects (which sometimes takes ~2-3 sec). After that > cpufreq fwk starts the governor which populates this hook [1]. > It's based on RCU mechanism with function pointer that can be then > called from the task scheduler when everything is ready to work. > > If we (Arm) is going to use your proposed interfaces, we might need > different mechanisms because the platform likely would be ready after > our SCMI FW channels and cpufreq are setup. > > Would it be possible to address such need now or I would have to > change that interface code later? Thank you very much for your feeback, Lukas! I took a look a cpufreq implementation you refer. I can certainly try to accommodate your requirements. Before jumping into it, I have a few questions. I see that cpufreq_update_util() only does something when the per-CPU pointers cpufreq_update_util_data become non-NULL. I use static key for the same purpose. Is this not usable for you? Indeed, arch_has_ipc_classes() implies that has to return true very early after boot if called, as per Ionela's suggestion from sched_init_smp(). I can convert this interface to an arch_enable_ipc_classes() that drivers or preparation code can call when ready. Would this be acceptable? Do think that a hook per CPU would be needed? If unsure, perhaps this can be left for future work. Thanks and BR, Ricardo > > [1] > https://elixir.bootlin.com/linux/latest/source/kernel/sched/cpufreq.c#L29 >
On Wed, Dec 14, 2022 at 11:15:56PM +0000, Ionela Voinescu wrote: > Hi, > > On Tuesday 13 Dec 2022 at 16:31:28 (-0800), Ricardo Neri wrote: > > On Thu, Dec 08, 2022 at 08:48:46AM +0000, Ionela Voinescu wrote: > > > Hi, > > > > > > On Monday 28 Nov 2022 at 05:20:40 (-0800), Ricardo Neri wrote: > > > [..] > > > > +#ifndef arch_has_ipc_classes > > > > +/** > > > > + * arch_has_ipc_classes() - Check whether hardware supports IPC classes of tasks > > > > + * > > > > + * Returns: true of IPC classes of tasks are supported. > > > > + */ > > > > +static __always_inline > > > > +bool arch_has_ipc_classes(void) > > > > +{ > > > > + return false; > > > > +} > > > > +#endif > > > > + > > > > +#ifndef arch_update_ipcc > > > > +/** > > > > + * arch_update_ipcc() - Update the IPC class of the current task > > > > + * @curr: The current task > > > > + * > > > > + * Request that the IPC classification of @curr is updated. > > > > + * > > > > + * Returns: none > > > > + */ > > > > +static __always_inline > > > > +void arch_update_ipcc(struct task_struct *curr) > > > > +{ > > > > +} > > > > +#endif > > > > + > > > > +#ifndef arch_get_ipcc_score > > > > +/** > > > > + * arch_get_ipcc_score() - Get the IPC score of a class of task > > > > + * @ipcc: The IPC class > > > > + * @cpu: A CPU number > > > > + * > > > > + * Returns the performance score of an IPC class when running on @cpu. > > > > + * Error when either @class or @cpu are invalid. > > > > + */ > > > > +static __always_inline > > > > +int arch_get_ipcc_score(unsigned short ipcc, int cpu) > > > > +{ > > > > + return 1; > > > > +} > > > > +#endif > > > > Thank you very much for your feedback Ionela! > > > > > > > > The interface looks mostly alright but this arch_get_ipcc_score() leaves > > > unclear what are the characteristics of the returned value. > > > > Fair point. I mean for the return value to be defined by architectures; > > but yes, architectures need to know how to implement this function. > > > > > > > > Does it have a meaning as an absolute value or is it a value on an > > > abstract scale? If it should be interpreted as instructions per cycle, > > > if I wanted to have a proper comparison between the ability of two CPUs > > > to handle this class of tasks then I would need to take into consideration > > > the maximum frequency of each CPU. > > > > Do you mean when calling arch_get_ipcc_score()? If yes, then I agree, IPC > > class may not be the only factor, but the criteria to use the return value > > is up to the caller. > > > > Yes, but if different architectures give different meanings to this score > (scale, relative difference between two values, etc) while the policies > are common (uses of arch_get_ipcc_score() in common scheduler paths) > then the outcome can be vastly different. One more reason to leave to the caller the handling of the returned value. > > If the "criteria to use the returned value is up to the caller", then > the caller of arch_get_ipcc_score() should always be architecture > specific code, which currently is not (see 09/22). Agreed. I now get your point. I'll change my patch accordingly. > > > In asym_packing it is assumed that higher-priority CPUs are preferred. > > When balancing load, IPC class scores are used to select between otherwise > > identical runqueues. This should also be the case for migrate_misfit: we > > know already that the tasks being considered do not fit on their current > > CPU. > > > > We would need to think what to do with other type of balancing, if at all. > > > > That said, arch_get_ipcc_score() should only return a metric of the > > instructions-per-*cycle*, independent of frequency, no? > > > > Yes, performance on an abstract scale is preferred here. We would not > want to have to scale the score by frequency :). It was just an example > showing that the description of arch_get_ipcc_score() should be clarified. > Another possible clarification: is it expected that the scores scale > linearly with performance (does double the score mean double the > performance?). Indeed this seems sensible. > > > > If it's a performance value on an > > > abstract scale (more likely), similar cu capacity, then it might be good > > > to better define this abstract scale. That would help with the default > > > implementation where possibly the best choice for a return value would > > > be the maximum value on the scale, suggesting equal/maximum performance > > > for different CPUs handling the class of tasks. > > > > I guess something like: > > > > #define SCHED_IPCC_DEFAULT_SCALE 1024 > > > > ? > > > > I think I am fine with this value being the default. I also think that it > > is up to architectures to whether scale all IPC class scores from the > > best-performing class on the best-performing CPU. Doing so would introduce > > overhead, especially if hardware updates the IPC class scores multiple > > times during runtime. > > > > Yes, it's a very good point. Initially I thought that one would need to > rescale the values anyway for them to make sense relative to each other, > but I now realise that would not be needed. > > Therefore, you are right, to avoid this extra work it's best to leave > the range of possible score values up to the implementer and not force > something like [0 - 1024]. > > But again, this raises the point that if one architecture decides to > return its scores on a scale [0 - 1024] and possibly use these scores to > scale utilization/alter capacity for example, this cannot be generic > policy as not all architectures are guaranteed to use this scale for its > scores. Very good point. > > So leaving the score unrestricted makes it more difficult to have > generic policies across architectures that use them. > In asym_packing we select the CPU of higher priority, regardless of how big the priority delta is. IPC classes extend the same mechanism. (We do have a throughput calculation, but it does not require IPC class to be scaled). So yes, IPC classes need to be scaled when combined with another metric. Another addition to the documentation of the interface? :) > > > > > > > I suppose you avoided returning 0 for the default implementation as you > > > intend that to mean the inability of the CPU to handle that class of > > > tasks? It would be good to document this. > > > > I meant this to be minimum possible IPC class score for any CPU: any > > CPU should be able handle any IPC class. If not implemented, all CPUs > > handle all IPC classes equally. > > > > Ah, I see. In this case you might as well return 0 in the default > implementation of arch_get_ipcc_score(). I know it does not matter much > what gets returned there, but returning a meaningless "1" is strange to > me :). Yes, the value does not really matter to my use case, as long as it the same for all all CPUs. I can use 1024 as other scheduler metrics. Thanks and BR, Ricardo
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index b1d338a740e5..75e22baa2622 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2531,6 +2531,66 @@ void arch_scale_freq_tick(void) } #endif +#ifdef CONFIG_IPC_CLASSES +DECLARE_STATIC_KEY_FALSE(sched_ipcc); + +static inline bool sched_ipcc_enabled(void) +{ + return static_branch_unlikely(&sched_ipcc); +} + +#ifndef arch_has_ipc_classes +/** + * arch_has_ipc_classes() - Check whether hardware supports IPC classes of tasks + * + * Returns: true of IPC classes of tasks are supported. + */ +static __always_inline +bool arch_has_ipc_classes(void) +{ + return false; +} +#endif + +#ifndef arch_update_ipcc +/** + * arch_update_ipcc() - Update the IPC class of the current task + * @curr: The current task + * + * Request that the IPC classification of @curr is updated. + * + * Returns: none + */ +static __always_inline +void arch_update_ipcc(struct task_struct *curr) +{ +} +#endif + +#ifndef arch_get_ipcc_score +/** + * arch_get_ipcc_score() - Get the IPC score of a class of task + * @ipcc: The IPC class + * @cpu: A CPU number + * + * Returns the performance score of an IPC class when running on @cpu. + * Error when either @class or @cpu are invalid. + */ +static __always_inline +int arch_get_ipcc_score(unsigned short ipcc, int cpu) +{ + return 1; +} +#endif +#else /* CONFIG_IPC_CLASSES */ + +#define arch_get_ipcc_score(ipcc, cpu) (-EINVAL) +#define arch_update_ipcc(curr) + +static inline bool sched_ipcc_enabled(void) { return false; } + +#endif /* CONFIG_IPC_CLASSES */ + #ifndef arch_scale_freq_capacity /** * arch_scale_freq_capacity - get the frequency scale factor of a given CPU. diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 8154ef590b9f..eb1654b64df7 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -669,6 +669,9 @@ DEFINE_PER_CPU(struct sched_domain __rcu *, sd_numa); DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing); DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity); DEFINE_STATIC_KEY_FALSE(sched_asym_cpucapacity); +#ifdef CONFIG_IPC_CLASSES +DEFINE_STATIC_KEY_FALSE(sched_ipcc); +#endif static void update_top_cache_domain(int cpu) { @@ -2388,6 +2391,11 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att if (has_asym) static_branch_inc_cpuslocked(&sched_asym_cpucapacity); +#ifdef CONFIG_IPC_CLASSES + if (arch_has_ipc_classes()) + static_branch_enable_cpuslocked(&sched_ipcc); +#endif + if (rq && sched_debug_verbose) { pr_info("root domain span: %*pbl (max cpu_capacity = %lu)\n", cpumask_pr_args(cpu_map), rq->rd->max_cpu_capacity);