From patchwork Fri Sep 29 15:52:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shrikanth Hegde X-Patchwork-Id: 146746 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6359:6f87:b0:13f:353d:d1ed with SMTP id tl7csp3525996rwb; Fri, 29 Sep 2023 12:54:56 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGsJzCLkiR5BO1eTlcQnA+gcEUAY9tODPLOAqSLnKvMRDuji4Wg7hLoPRwINauq4kk5z3wB X-Received: by 2002:a17:902:ecce:b0:1c6:2acc:62d5 with SMTP id a14-20020a170902ecce00b001c62acc62d5mr5879680plh.22.1696017295943; Fri, 29 Sep 2023 12:54:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696017295; cv=none; d=google.com; s=arc-20160816; b=j2UR/aI+Nl2N70gbtgHPnhVZwnMOu+St6NBHuKLPOAlmVbC5aW9PEn6heTmEJ+We3/ uu4pxcLf15M2/+yUVB4XKEAThBKB9Mq9g1igSpcIT3n8E5/dDBD65jdLbllK7yFN+X+1 RKbWYWCEND6Ib45pmtcLT6OUaZdjP2RkKFDjxS0vLzpV45Q3f5sKcOexrJp+P3fykwaE Izz0y1beyJ6XtjsGJFfBCJuI4C5oun4BEGffWc3e5nupleKQj/slN68in/v4o3fMI752 txJ0n7HnyTx1jmSWw1xUDH8BKNf8FgukqEIQUkwhjhhgNLX6vMQ1jg4N11a4QRHcUZes xSSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=tou9JDepNKER24jEFjelxgOmoauG7gMvSM4jJkk+5/w=; fh=de++Nbu25FzV42Ppy1SBv7xh82NPCe6fQOFiVvQQnTs=; b=ZRhfTWA1ARtOa1MSVGMBjcwU+7zb6EGobtcguIUr4nr9vaRsFa8cv0zHZve772As30 RYs/EMLOTDp0n9OzJwZG4PJFfMr0S8xTosF4NpkCXENDTIFZ/3vkMcuWlzprjRV2T6Ww p3MQ2FK6CC9OpK8sTx1P1QizEMQ7ltP8ezlSYJshtMwK2mTyzAQkpCDZpo6XCq8hd9st A+a74Ixtqa+SmE3MkX9NvtqZpAXPmKplfNPcj99pXW3Nt3qrTxwuwSJZ9aL76p9BWOLB QDYUTt+KsrfUO4uAWDFbP+DadkLRz6ijo4/mWPdIg1HqHJck8gqHSLm8nsE9BpK14BuX HkUA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=QEHJCd2u; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id z1-20020a170903018100b001c611f285aasi16322152plg.541.2023.09.29.12.54.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Sep 2023 12:54:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=QEHJCd2u; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 4993482AAC64; Fri, 29 Sep 2023 08:54:30 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233765AbjI2PyS (ORCPT + 20 others); Fri, 29 Sep 2023 11:54:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37326 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233760AbjI2PyR (ORCPT ); Fri, 29 Sep 2023 11:54:17 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 39438195; Fri, 29 Sep 2023 08:54:13 -0700 (PDT) Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 38TFluf5024690; Fri, 29 Sep 2023 15:53:50 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=tou9JDepNKER24jEFjelxgOmoauG7gMvSM4jJkk+5/w=; b=QEHJCd2ueO4yPpcjOwuTSt6nSfP0M3xw7bq5MlK5+fGSBliV9Pt6YC/Wz5mrPm0+lJJV xvMY+reTKgENU51W11fgPOK8yHawSrc8hS6Y6eSWE8lUHoq1IL8w2dQpjTSOs9Q2PFh9 Hd8lnVIO/O78KfEhRGgjrrxxL4elPWSZFMzE9d3a64FovuhSUtvFyIMUWs7Xl/EACJmA VRGlDsq5iLSkZQ569fF5v+viFYyGhuJwQ6ds2oxptDJePlJKhfTAwYAAMyA22ZkhTwFO OhXy+xoi87CMDWS3vQd0gh7x/d+UDeh4zGvOF1BFkNE9VvlBADkppcEFxtiQkiYH+PRE 5g== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3te09utfn7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 29 Sep 2023 15:53:50 +0000 Received: from m0360083.ppops.net (m0360083.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 38TFXcWJ020640; Fri, 29 Sep 2023 15:53:49 GMT Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3te09utfma-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 29 Sep 2023 15:53:49 +0000 Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 38TDRd93030392; Fri, 29 Sep 2023 15:53:47 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 3tad22d7p9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 29 Sep 2023 15:53:47 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 38TFrj5l13238910 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 29 Sep 2023 15:53:45 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A5B0020043; Fri, 29 Sep 2023 15:53:45 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EE0A820040; Fri, 29 Sep 2023 15:53:40 +0000 (GMT) Received: from li-c1fdab4c-355a-11b2-a85c-ef242fe9efb4.ibm.com.com (unknown [9.171.6.110]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 29 Sep 2023 15:53:40 +0000 (GMT) From: Shrikanth Hegde To: mingo@redhat.com, peterz@infradead.org, vincent.guittot@linaro.org, vschneid@redhat.com Cc: sshegde@linux.vnet.ibm.com, dietmar.eggemann@arm.com, linux-kernel@vger.kernel.org, ionela.voinescu@arm.com, qperret@google.com, srikar@linux.vnet.ibm.com, mgorman@techsingularity.net, mingo@kernel.org, pierre.gondois@arm.com, yu.c.chen@intel.com, tim.c.chen@linux.intel.com, pauld@redhat.com, lukasz.luba@arm.com, linux-doc@vger.kernel.org, bsegall@google.com, linux-eng@arm.com Subject: [PATCH v5 1/2] sched/topology: Remove EM_MAX_COMPLEXITY limit Date: Fri, 29 Sep 2023 21:22:08 +0530 Message-Id: <20230929155209.667764-2-sshegde@linux.vnet.ibm.com> X-Mailer: git-send-email 2.39.3 In-Reply-To: <20230929155209.667764-1-sshegde@linux.vnet.ibm.com> References: <20230929155209.667764-1-sshegde@linux.vnet.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: auAb-659wjVaCut74BsmO6kKcG2LAp6T X-Proofpoint-ORIG-GUID: DSMsGcphK1c4wunnMXHSMWIiO7Wjo2ya X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.980,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-09-29_13,2023-09-28_03,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 suspectscore=0 clxscore=1015 phishscore=0 impostorscore=0 priorityscore=1501 mlxscore=0 spamscore=0 mlxlogscore=999 bulkscore=0 lowpriorityscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2309180000 definitions=main-2309290134 X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Fri, 29 Sep 2023 08:54:30 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1778403032384920106 X-GMAIL-MSGID: 1778403032384920106 From: Pierre Gondois The Energy Aware Scheduler (EAS) estimates the energy consumption of placing a task on different CPUs. The goal is to minimize this energy consumption. Estimating the energy of different task placements is increasingly complex with the size of the platform. To avoid having a slow wake-up path, EAS is only enabled if this complexity is low enough. The current complexity limit was set in: commit b68a4c0dba3b1 ("sched/topology: Disable EAS on inappropriate platforms"). base on the first implementation of EAS, which was re-computing the power of the whole platform for each task placement scenario, cf: commit 390031e4c309 ("sched/fair: Introduce an energy estimation helper function"). but the complexity of EAS was reduced in: commit eb92692b2544d ("sched/fair: Speed-up energy-aware wake-ups") and find_energy_efficient_cpu() (feec) algorithm was updated in: commit 3e8c6c9aac42 ("sched/fair: Remove task_util from effective utilization in feec()") find_energy_efficient_cpu() (feec) is now doing: feec() \_ for_each_pd(pd) [0] // get max_spare_cap_cpu and compute_prev_delta \_ for_each_cpu(pd) [1] \_ eenv_pd_busy_time(pd) [2] \_ for_each_cpu(pd) // compute_energy(pd) without the task \_ eenv_pd_max_util(pd, -1) [3.0] \_ for_each_cpu(pd) \_ em_cpu_energy(pd, -1) \_ for_each_ps(pd) // compute_energy(pd) with the task on prev_cpu \_ eenv_pd_max_util(pd, prev_cpu) [3.1] \_ for_each_cpu(pd) \_ em_cpu_energy(pd, prev_cpu) \_ for_each_ps(pd) // compute_energy(pd) with the task on max_spare_cap_cpu \_ eenv_pd_max_util(pd, max_spare_cap_cpu) [3.2] \_ for_each_cpu(pd) \_ em_cpu_energy(pd, max_spare_cap_cpu) \_ for_each_ps(pd) [3.1] happens only once since prev_cpu is unique. With the same definitions for nr_pd, nr_cpus and nr_ps, the complexity is of: nr_pd * (2 * [nr_cpus in pd] + 2 * ([nr_cpus in pd] + [nr_ps in pd])) + ([nr_cpus in pd] + [nr_ps in pd]) [0] * ( [1] + [2] + [3.0] + [3.2] ) + [3.1] = nr_pd * (4 * [nr_cpus in pd] + 2 * [nr_ps in pd]) + [nr_cpus in prev pd] + nr_ps The complexity limit was set to 2048 in: commit b68a4c0dba3b1 ("sched/topology: Disable EAS on inappropriate platforms") to make "EAS usable up to 16 CPUs with per-CPU DVFS and less than 8 performance states each". For the same platform, the complexity would actually be of: 16 * (4 + 2 * 7) + 1 + 7 = 296 Since the EAS complexity was greatly reduced, bigger platforms can handle EAS. For instance, a platform with 112 CPUs with 7 performance states each would not reach it: 112 * (4 + 2 * 7) + 1 + 7 = 2024 To reflect this improvement, remove the EAS complexity check. Note that a limit on the number of CPUs still holds against EM_MAX_NUM_CPUS to avoid overflows during the energy estimation. Signed-off-by: Pierre Gondois Reviewed-by: Lukasz Luba Reviewed-by: Dietmar Eggemann --- Documentation/scheduler/sched-energy.rst | 29 ++---------------- kernel/sched/topology.c | 39 ++---------------------- 2 files changed, 6 insertions(+), 62 deletions(-) -- 2.39.3 diff --git a/Documentation/scheduler/sched-energy.rst b/Documentation/scheduler/sched-energy.rst index fc853c8cc346..70e2921ef725 100644 --- a/Documentation/scheduler/sched-energy.rst +++ b/Documentation/scheduler/sched-energy.rst @@ -359,32 +359,9 @@ in milli-Watts or in an 'abstract scale'. 6.3 - Energy Model complexity ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -The task wake-up path is very latency-sensitive. When the EM of a platform is -too complex (too many CPUs, too many performance domains, too many performance -states, ...), the cost of using it in the wake-up path can become prohibitive. -The energy-aware wake-up algorithm has a complexity of: - - C = Nd * (Nc + Ns) - -with: Nd the number of performance domains; Nc the number of CPUs; and Ns the -total number of OPPs (ex: for two perf. domains with 4 OPPs each, Ns = 8). - -A complexity check is performed at the root domain level, when scheduling -domains are built. EAS will not start on a root domain if its C happens to be -higher than the completely arbitrary EM_MAX_COMPLEXITY threshold (2048 at the -time of writing). - -If you really want to use EAS but the complexity of your platform's Energy -Model is too high to be used with a single root domain, you're left with only -two possible options: - - 1. split your system into separate, smaller, root domains using exclusive - cpusets and enable EAS locally on each of them. This option has the - benefit to work out of the box but the drawback of preventing load - balance between root domains, which can result in an unbalanced system - overall; - 2. submit patches to reduce the complexity of the EAS wake-up algorithm, - hence enabling it to cope with larger EMs in reasonable time. +EAS does not impose any complexity limit on the number of PDs/OPPs/CPUs but +restricts the number of CPUs to EM_MAX_NUM_CPUS to prevent overflows during +the energy estimation. 6.4 - Schedutil governor diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index a7b50bba7829..e0b9920e7e3e 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -348,32 +348,13 @@ static void sched_energy_set(bool has_eas) * 1. an Energy Model (EM) is available; * 2. the SD_ASYM_CPUCAPACITY flag is set in the sched_domain hierarchy. * 3. no SMT is detected. - * 4. the EM complexity is low enough to keep scheduling overheads low; - * 5. schedutil is driving the frequency of all CPUs of the rd; - * 6. frequency invariance support is present; - * - * The complexity of the Energy Model is defined as: - * - * C = nr_pd * (nr_cpus + nr_ps) - * - * with parameters defined as: - * - nr_pd: the number of performance domains - * - nr_cpus: the number of CPUs - * - nr_ps: the sum of the number of performance states of all performance - * domains (for example, on a system with 2 performance domains, - * with 10 performance states each, nr_ps = 2 * 10 = 20). - * - * It is generally not a good idea to use such a model in the wake-up path on - * very complex platforms because of the associated scheduling overheads. The - * arbitrary constraint below prevents that. It makes EAS usable up to 16 CPUs - * with per-CPU DVFS and less than 8 performance states each, for example. + * 4. schedutil is driving the frequency of all CPUs of the rd; + * 5. frequency invariance support is present; */ -#define EM_MAX_COMPLEXITY 2048 - extern struct cpufreq_governor schedutil_gov; static bool build_perf_domains(const struct cpumask *cpu_map) { - int i, nr_pd = 0, nr_ps = 0, nr_cpus = cpumask_weight(cpu_map); + int i; struct perf_domain *pd = NULL, *tmp; int cpu = cpumask_first(cpu_map); struct root_domain *rd = cpu_rq(cpu)->rd; @@ -431,20 +412,6 @@ static bool build_perf_domains(const struct cpumask *cpu_map) goto free; tmp->next = pd; pd = tmp; - - /* - * Count performance domains and performance states for the - * complexity check. - */ - nr_pd++; - nr_ps += em_pd_nr_perf_states(pd->em_pd); - } - - /* Bail out if the Energy Model complexity is too high. */ - if (nr_pd * (nr_ps + nr_cpus) > EM_MAX_COMPLEXITY) { - WARN(1, "rd %*pbl: Failed to start EAS, EM complexity is too high\n", - cpumask_pr_args(cpu_map)); - goto free; } perf_domain_debug(cpu_map, pd); From patchwork Fri Sep 29 15:52:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shrikanth Hegde X-Patchwork-Id: 146824 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2a8e:b0:403:3b70:6f57 with SMTP id in14csp114725vqb; Fri, 29 Sep 2023 17:08:23 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGP3GljVdYn9uDo/6EwnHEI6iQ8tpG1Lv0BRT991DNv3BXbm+2SBRVMtIKD6IBC12kX7Qio X-Received: by 2002:a17:902:efc4:b0:1c7:365a:566d with SMTP id ja4-20020a170902efc400b001c7365a566dmr4339122plb.66.1696032503306; Fri, 29 Sep 2023 17:08:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696032503; cv=none; d=google.com; s=arc-20160816; b=Fsv0ALJG3OJ5uTJeUJqwAGJDoiOo5X5jst2h1YXAUrbB4ezqsP9/x5walC2G11nFsl faywdA19SZ7dOR9maR0W30MRoM+nX5/EQOUI0rfbE2XBDnPchJmtOrx0+jkTyFbc25sW TOO+nePBU9hrHgJa6PI3p2+PfqoD5p44G3ZXxmJjJwVKWBg7k0zFXU82vIEhur4vVcFh +CLHOxvnEmnXlEhjPb8W6Ft9SERac/bwUILwchx84xAX/K+h4pupWNcmUH0ggTJZupPv TpNDAt7jNIA3PaDgob23lunHDxXYhylZ7n6fVszl6w7uf9eSnNI4icZL4CzSfXjTzUel +BUw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=/UoWLaCgVeHqwKpdLyNPiJRQU74xtqknmO9AMUSuOrg=; fh=de++Nbu25FzV42Ppy1SBv7xh82NPCe6fQOFiVvQQnTs=; b=QrDDIAbm8bvaKUqU49zroSQdEfQ1QrRmLCS+W5n/4z+0tFBCjmLIyU8OyT2JHB7H8w X2jutoo1Xl9qWhVemFHN+/X5Dbimu5LhwECPXKfKHeSSqLyIo7/6Sb4OBwUvJVc7bcIt AzLS/V5thzQa+TJZWrAdw8kawgUqwws2bqFBUz9xTGO0rTDyWy1R5hYAkOg5a5WfECFe +Qv1QEiuXSU51S0wJZlUk/VAFEUefoxn0UpPmgBBBBRK1/WQFZChNwbPh9ZiDXUkM0qO WHnfOQ3i7KHCwdXbGJ8pm2wKplt9ZcVjOc24gka5TeYa1k/EAYT/D3J/mwqFJbTd6rJI dJyg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=P6ZMHsy9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from howler.vger.email (howler.vger.email. [23.128.96.34]) by mx.google.com with ESMTPS id l11-20020a170903120b00b001b5589848absi22949359plh.234.2023.09.29.17.08.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Sep 2023 17:08:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) client-ip=23.128.96.34; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=P6ZMHsy9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 1F9018023926; Fri, 29 Sep 2023 08:54:26 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233779AbjI2PyU (ORCPT + 20 others); Fri, 29 Sep 2023 11:54:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37328 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233755AbjI2PyS (ORCPT ); Fri, 29 Sep 2023 11:54:18 -0400 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EF1BE199; Fri, 29 Sep 2023 08:54:14 -0700 (PDT) Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 38TFf7SU010269; Fri, 29 Sep 2023 15:53:57 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=/UoWLaCgVeHqwKpdLyNPiJRQU74xtqknmO9AMUSuOrg=; b=P6ZMHsy9YRtEulJUsNAA3TCG1oyEzTJNP1OXRgqZuAhd/aw6H1Sb7vwbn77u57u/J+l5 1BZVeQA+zLAo4Mu6JE5mCwt9mAGJf9C3TG+1BeCz1erD3RUlwo5Rq8u7p8aWgKHv/Avy sMOHwOMIXQbXMAp/dQ+lGXvYGBBNf2qIdRLKs1hbUaRelpLbYXx1RbelvUPiDsxRzeYN drd5w8CgoPcgWn2aplqwmKFHy3qH3f4DidhY8jD+EskDSM2JrNDFvOV1xZ4FuHTHrMW0 Y7ej4ZGlYGfHI8l4sMV8SKGf1tQvGU28pDlU7GCC2/2r3PdKdL1dUagf/JTHfnw3LWnz zQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3te082tpr4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 29 Sep 2023 15:53:56 +0000 Received: from m0356516.ppops.net (m0356516.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 38TFiirf022252; Fri, 29 Sep 2023 15:53:56 GMT Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3te082tpqq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 29 Sep 2023 15:53:55 +0000 Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 38TDiMPQ030719; Fri, 29 Sep 2023 15:53:55 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 3tacjkna0k-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 29 Sep 2023 15:53:54 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 38TFrrWU24314540 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 29 Sep 2023 15:53:53 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1AB4A20043; Fri, 29 Sep 2023 15:53:53 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 73CC120040; Fri, 29 Sep 2023 15:53:48 +0000 (GMT) Received: from li-c1fdab4c-355a-11b2-a85c-ef242fe9efb4.ibm.com.com (unknown [9.171.6.110]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 29 Sep 2023 15:53:48 +0000 (GMT) From: Shrikanth Hegde To: mingo@redhat.com, peterz@infradead.org, vincent.guittot@linaro.org, vschneid@redhat.com Cc: sshegde@linux.vnet.ibm.com, dietmar.eggemann@arm.com, linux-kernel@vger.kernel.org, ionela.voinescu@arm.com, qperret@google.com, srikar@linux.vnet.ibm.com, mgorman@techsingularity.net, mingo@kernel.org, pierre.gondois@arm.com, yu.c.chen@intel.com, tim.c.chen@linux.intel.com, pauld@redhat.com, lukasz.luba@arm.com, linux-doc@vger.kernel.org, bsegall@google.com, linux-eng@arm.com Subject: [PATCH v5 2/2] sched/topology: change behaviour of sysctl sched_energy_aware based on the platform Date: Fri, 29 Sep 2023 21:22:09 +0530 Message-Id: <20230929155209.667764-3-sshegde@linux.vnet.ibm.com> X-Mailer: git-send-email 2.39.3 In-Reply-To: <20230929155209.667764-1-sshegde@linux.vnet.ibm.com> References: <20230929155209.667764-1-sshegde@linux.vnet.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: y4pPGvhj2U6kSmSDwVR2cjkh7J6VQTiq X-Proofpoint-GUID: jCEk-5lameQVH8GeB76N09QjkvEbBix- X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.980,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-09-29_13,2023-09-28_03,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 suspectscore=0 mlxscore=0 mlxlogscore=999 priorityscore=1501 lowpriorityscore=0 spamscore=0 bulkscore=0 phishscore=0 adultscore=0 impostorscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2309180000 definitions=main-2309290134 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Fri, 29 Sep 2023 08:54:26 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1778418978040790862 X-GMAIL-MSGID: 1778418978040790862 sysctl sched_energy_aware is available for the admin to disable/enable energy aware scheduling(EAS). EAS is enabled only if few conditions are met by the platform. They are, asymmetric CPU capacity, no SMT, schedutil CPUfreq governor, frequency invariant load tracking etc. A platform may boot without EAS capability, but could gain such capability at runtime For example, changing/registering the CPUfreq governor to schedutil. At present, though platform doesn't support EAS, this sysctl returns 1 and it ends up calling build_perf_domains on write to 1 and NOP when writing to 0. That is confusing and un-necessary. Desired behavior would be to, have this sysctl to enable/disable the EAS on supported platform. On Non supported platform write to the sysctl would return not supported error and read of the sysctl would return empty. So sched_energy_aware returns empty - EAS is not possible at this moment This will include EAS capable platforms which have at least one EAS condition false during startup, e.g. using a Performance CPUfreq governor sched_energy_aware returns 0 - EAS is supported but disabled by admin. sched_energy_aware returns 1 - EAS is supported and enabled. User can find out the reason why EAS is not possible by checking info messages. sched_is_eas_possible returns true if the platform can do EAS at this moment. Depends on [PATCH v5 1/2] sched/topology: Remove EM_MAX_COMPLEXITY limit to be applied first. Signed-off-by: Shrikanth Hegde Tested-by: Pierre Gondois --- Documentation/admin-guide/sysctl/kernel.rst | 3 +- kernel/sched/topology.c | 112 +++++++++++++------- 2 files changed, 76 insertions(+), 39 deletions(-) -- 2.39.3 diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index cf33de56da27..d89ac2bd8dc4 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -1182,7 +1182,8 @@ automatically on platforms where it can run (that is, platforms with asymmetric CPU topologies and having an Energy Model available). If your platform happens to meet the requirements for EAS but you do not want to use it, change -this value to 0. +this value to 0. On Non-EAS platforms, write operation fails and +read doesn't return anything. task_delayacct =============== diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index e0b9920e7e3e..a654d0186ac0 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -212,6 +212,70 @@ static unsigned int sysctl_sched_energy_aware = 1; static DEFINE_MUTEX(sched_energy_mutex); static bool sched_energy_update; +extern struct cpufreq_governor schedutil_gov; +static bool sched_is_eas_possible(const struct cpumask *cpu_mask) +{ + bool any_asym_capacity = false; + struct cpufreq_policy *policy; + struct cpufreq_governor *gov; + int i; + + /* EAS is enabled for asymmetric CPU capacity topologies. */ + for_each_cpu(i, cpu_mask) { + if (per_cpu(sd_asym_cpucapacity, i)) { + any_asym_capacity = true; + break; + } + } + if (!any_asym_capacity) { + if (sched_debug()) { + pr_info("rd %*pbl: Checking EAS, CPUs do not have asymmetric capacities\n", + cpumask_pr_args(cpu_mask)); + } + return false; + } + + /* EAS definitely does *not* handle SMT */ + if (sched_smt_active()) { + if (sched_debug()) { + pr_info("rd %*pbl: Checking EAS, SMT is not supported\n", + cpumask_pr_args(cpu_mask)); + } + return false; + } + + if (!arch_scale_freq_invariant()) { + if (sched_debug()) { + pr_info("rd %*pbl: Checking EAS: frequency-invariant load tracking not yet supported", + cpumask_pr_args(cpu_mask)); + } + return false; + } + + /* Do not attempt EAS if schedutil is not being used. */ + for_each_cpu(i, cpu_mask) { + policy = cpufreq_cpu_get(i); + if (!policy) { + if (sched_debug()) { + pr_info("rd %*pbl: Checking EAS, cpufreq policy not set for CPU: %d", + cpumask_pr_args(cpu_mask), i); + } + return false; + } + gov = policy->governor; + cpufreq_cpu_put(policy); + if (gov != &schedutil_gov) { + if (sched_debug()) { + pr_info("rd %*pbl: Checking EAS, schedutil is mandatory\n", + cpumask_pr_args(cpu_mask)); + } + return false; + } + } + + return true; +} + void rebuild_sched_domains_energy(void) { mutex_lock(&sched_energy_mutex); @@ -231,6 +295,15 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write, return -EPERM; ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); + if (!sched_is_eas_possible(cpu_active_mask)) { + if (write) { + return -EOPNOTSUPP; + } else { + *lenp = 0; + return 0; + } + } + if (!ret && write) { state = static_branch_unlikely(&sched_energy_present); if (state != sysctl_sched_energy_aware) @@ -351,61 +424,24 @@ static void sched_energy_set(bool has_eas) * 4. schedutil is driving the frequency of all CPUs of the rd; * 5. frequency invariance support is present; */ -extern struct cpufreq_governor schedutil_gov; static bool build_perf_domains(const struct cpumask *cpu_map) { int i; struct perf_domain *pd = NULL, *tmp; int cpu = cpumask_first(cpu_map); struct root_domain *rd = cpu_rq(cpu)->rd; - struct cpufreq_policy *policy; - struct cpufreq_governor *gov; if (!sysctl_sched_energy_aware) goto free; - /* EAS is enabled for asymmetric CPU capacity topologies. */ - if (!per_cpu(sd_asym_cpucapacity, cpu)) { - if (sched_debug()) { - pr_info("rd %*pbl: CPUs do not have asymmetric capacities\n", - cpumask_pr_args(cpu_map)); - } - goto free; - } - - /* EAS definitely does *not* handle SMT */ - if (sched_smt_active()) { - pr_warn("rd %*pbl: Disabling EAS, SMT is not supported\n", - cpumask_pr_args(cpu_map)); - goto free; - } - - if (!arch_scale_freq_invariant()) { - if (sched_debug()) { - pr_warn("rd %*pbl: Disabling EAS: frequency-invariant load tracking not yet supported", - cpumask_pr_args(cpu_map)); - } + if (!sched_is_eas_possible(cpu_map)) goto free; - } for_each_cpu(i, cpu_map) { /* Skip already covered CPUs. */ if (find_pd(pd, i)) continue; - /* Do not attempt EAS if schedutil is not being used. */ - policy = cpufreq_cpu_get(i); - if (!policy) - goto free; - gov = policy->governor; - cpufreq_cpu_put(policy); - if (gov != &schedutil_gov) { - if (rd->pd) - pr_warn("rd %*pbl: Disabling EAS, schedutil is mandatory\n", - cpumask_pr_args(cpu_map)); - goto free; - } - /* Create the new pd and add it to the local list. */ tmp = pd_init(i); if (!tmp)