From patchwork Thu Feb 1 13:11:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hongyan Xia X-Patchwork-Id: 195322 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2719:b0:106:209c:c626 with SMTP id hl25csp138864dyb; Thu, 1 Feb 2024 05:12:41 -0800 (PST) X-Google-Smtp-Source: AGHT+IEbTht/3M02rsezRBHUmOzrg94IzHn+VEeD+XG6qrro2T2TSi9S3H4NKUl+22DkeMY6VQ8g X-Received: by 2002:a0c:f108:0:b0:68c:668d:a671 with SMTP id i8-20020a0cf108000000b0068c668da671mr2369850qvl.7.1706793160930; Thu, 01 Feb 2024 05:12:40 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706793160; cv=pass; d=google.com; s=arc-20160816; b=kIak6HKCMJnoMRB/zeb2RHfPLu5S/5Qb7hwRP5QtSz13FfIb99lpm2R2JdPQX/medI 9HZHyXrz/alOipsxpuYvsf3pEmDN5LX3yRjhRXKXmbehMFeEkqgdsnSIGolJDioT4bzf mhk5WmIQsE9IV3+yLH4wKENq87AC0nRkO//yhlFzhC3YC8pIkS0hPsn99XVZY3R6tX1v c32hv+XDUyM7zPy0f0NfOMRO1ArRzU6GRQRhK5Ckq3aV5Qu/pq56upwS0Ty5qjs0gknA jq3ibV4alOcONk921HBBcJFZTpccvAWXa1Wi2+Lnhc7tNKg5nCYVRtOaQLNCDh8U2kuw 8cDg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=uOb9sf3OujVjH18n/gPojKC8sP8H17OoJApcd8mHnGI=; fh=ABmjE1OSvVq1hvAi8ZyAYYefhaBHSNFlyd9DLF0WKv4=; b=nJmk+Vr6mFpowQakCZblYIYNjheFn0jz96aeeahvyNfrnd+48u9xI14/LZWiO51t57 MldYwPh/Ra4yOHclZqJOt+Wd81EISuFn0iipx3PA3AaXsPtvxHH5A8P2wtOAf/68gmh6 hYkpojJ++CWzmuUBPKuc4m7oqHAst0xLkzNkmugmdjk3sLY92s/aQLZ8g1ujBptLOu0h 0RVVaI8TVweRKj1t0W3NhbxbZebdyrGGeYFYu1TQt0uAW0BjzByMuE0JpLoVsvgavyat Icu1fk1+TE3hQTJvW65nlwFIYwoQz3FrEVeQHPRvwd2QI0LVPKvHP0lo9eaqPaCo6WGp SB7A==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-48203-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-48203-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com X-Forwarded-Encrypted: i=1; AJvYcCUXG6cuOjrshPfTblQ5Y8JWF9kheEoTG7oIcByeQLcy0v5vY4d3EMVwqz7W0NSdEmSs3gAL6YK8LihmnJ94C8tLJQPD5w== Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id r7-20020ad45767000000b006852fb947b1si14333187qvx.412.2024.02.01.05.12.40 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Feb 2024 05:12:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-48203-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-48203-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-48203-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id B63941C26395 for ; Thu, 1 Feb 2024 13:12:40 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 884AC5B673; Thu, 1 Feb 2024 13:12:19 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A4FA88665E for ; Thu, 1 Feb 2024 13:12:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706793138; cv=none; b=LWi6hG2F0E23K4DfSJIgMzgtTZci3zWMCw9gXSk31I8LZKm6eX4Dm9lMoH8C5T6sjntN2Hyp8qO/4XKMbl3Ww9uwRjA9LEVKePDG+kXqaOx7aJ8tyjzLDrgm7KJDf2Vo26QIL2y8HJ2t2ieXB0067v2O/46ou2JwtjZcX6+SSUY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706793138; c=relaxed/simple; bh=fFtMnS0pExRU9XxAc52P1T23Oszmuznw5qpiL99BkH4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=uZXS/h9AVZIoGu1bvxoKlS3kaaEG09XjSGrX0XPeeEhv6jO4nSwoSL+BXS68tfmpo0gMhsySc2OIfltwc2+9aG1hjrDCbtQHPonhAag4R8xTu2aFyj8jVOntL39Pb76se+GsHDissJax3JepmtE3IMiOR1gRDQhkUzW/SIgYwy0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A837B11FB; Thu, 1 Feb 2024 05:12:57 -0800 (PST) Received: from e130256.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C1AC63F762; Thu, 1 Feb 2024 05:12:12 -0800 (PST) From: Hongyan Xia To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , Dietmar Eggemann , Juri Lelli , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider Cc: Qais Yousef , Morten Rasmussen , Lukasz Luba , Christian Loehle , linux-kernel@vger.kernel.org, David Dai , Saravana Kannan , Hongyan Xia Subject: [RFC PATCH v2 1/7] Revert "sched/uclamp: Set max_spare_cap_cpu even if max_spare_cap is 0" Date: Thu, 1 Feb 2024 13:11:57 +0000 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789702345766307738 X-GMAIL-MSGID: 1789702345766307738 From: Hongyan Xia That commit creates further problems because 0 spare capacity can be either a real indication that the CPU is maxed out, or the CPU is UCLAMP_MAX throttled, but we end up giving all of them a chance which can results in bogus energy calculations. It also tends to schedule tasks on the same CPU and requires load balancing patches. Sum aggregation solves these problems and this patch is not needed. This reverts commit 6b00a40147653c8ea748e8f4396510f252763364. Signed-off-by: Hongyan Xia --- kernel/sched/fair.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index b803030c3a03..d5cc87db4845 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7978,10 +7978,11 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) for (; pd; pd = pd->next) { unsigned long util_min = p_util_min, util_max = p_util_max; unsigned long cpu_cap, cpu_thermal_cap, util; - long prev_spare_cap = -1, max_spare_cap = -1; + unsigned long cur_delta, max_spare_cap = 0; unsigned long rq_util_min, rq_util_max; - unsigned long cur_delta, base_energy; + unsigned long prev_spare_cap = 0; int max_spare_cap_cpu = -1; + unsigned long base_energy; int fits, max_fits = -1; cpumask_and(cpus, perf_domain_span(pd), cpu_online_mask); @@ -8044,7 +8045,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) prev_spare_cap = cpu_cap; prev_fits = fits; } else if ((fits > max_fits) || - ((fits == max_fits) && ((long)cpu_cap > max_spare_cap))) { + ((fits == max_fits) && (cpu_cap > max_spare_cap))) { /* * Find the CPU with the maximum spare capacity * among the remaining CPUs in the performance @@ -8056,7 +8057,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) } } - if (max_spare_cap_cpu < 0 && prev_spare_cap < 0) + if (max_spare_cap_cpu < 0 && prev_spare_cap == 0) continue; eenv_pd_busy_time(&eenv, cpus, p); @@ -8064,7 +8065,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) base_energy = compute_energy(&eenv, pd, cpus, p, -1); /* Evaluate the energy impact of using prev_cpu. */ - if (prev_spare_cap > -1) { + if (prev_spare_cap > 0) { prev_delta = compute_energy(&eenv, pd, cpus, p, prev_cpu); /* CPU utilization has changed */ From patchwork Thu Feb 1 13:11:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hongyan Xia X-Patchwork-Id: 195323 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2719:b0:106:209c:c626 with SMTP id hl25csp139044dyb; Thu, 1 Feb 2024 05:12:57 -0800 (PST) X-Google-Smtp-Source: AGHT+IET4Ie7RbMpUfmCpbRm/QNjKSXODYLeZw6rh5H1nsjUW/EkiqmKKtjKRAeXhrknPRYkfSum X-Received: by 2002:a05:620a:1133:b0:783:9432:8bf9 with SMTP id p19-20020a05620a113300b0078394328bf9mr2690525qkk.52.1706793176892; Thu, 01 Feb 2024 05:12:56 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706793176; cv=pass; d=google.com; s=arc-20160816; b=ZL4+EXqlLWUW+6RNTYe7LoRpDWS5mlG9T4M6l2zmtBRHAXA0vysK4xeEjVuu2sU7eK WwkdZXcsygDiI/S0ShpHyxZnUAJ/uj/FiSp4YkSe1vnMpw+urILGQSMdLl40RvthYVqW OmEtgy8llT82jUeQabA1kbA2yz6mRk09ms0kPTK13D6tdpO8IwVbDSovn1QTNCdpcQRz YScsExLaTTWQA3vbVJ2985u1N+J12jCZnSwi4V/kD4tz1tnAkexnNOmsJdL734HbSUpH Pt/eEtnsfsxUgMZML/C84Qx/Q7xwodNZ7NEAafakYFLY+VYD5xwo0vTJJ6Q374iYjqEu L0ig== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=B+lWkQWyPkLsJ5W725rM9d7GRP/Fm9FAXkv1k61TXV0=; fh=hmL/UjucdOSkoKy8ub4iugJxdL1H61Bn9r5DBRSj7Z8=; b=UBrMH6oA9wUgybLoFWe22w+pYLtlj5Zvt0NOo9Y16ibZGG3k4x/jcv5ru8bDMq5jL7 2MS4D6kScH2hFXvVLgOsq1eIsJ7+aO+ZW3pztBbilX6/+sHWP3YrRKPB0OMo4F3TBtBw rhFoy1pGm6oForv1QBvK+Ib+hqPfAHVI6R8srLQE4E14r9CriyvaIup0XIegCktVtnN3 bIzWXCIkGbRr7wG49tHeaoX4AUw/wBqfDy72sO2HD4RyBcpfTR/x4Y7gWulmolLvQCvt Hgi+Uh1zY6A6aeKgPLbCRoChBMCXIuihANmYaXMFBURKbCjppx6KTZ1YTljMEeERGUMj JUJQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-48204-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-48204-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com X-Forwarded-Encrypted: i=1; AJvYcCV8E8t8kecrRZ62mhpcYUbqnLh7nvtNqoCI36FyxOBHZy4bqMf1aITLMLRSTErRulR89HUpdOEr1Rp1r2rYKtxzjieIpA== Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id pi25-20020a05620a379900b0078307831c6bsi13418273qkn.566.2024.02.01.05.12.56 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Feb 2024 05:12:56 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-48204-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-48204-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-48204-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 9F1241C253E5 for ; Thu, 1 Feb 2024 13:12:56 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7EA625CDD6; Thu, 1 Feb 2024 13:12:23 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id CFDD75338E for ; Thu, 1 Feb 2024 13:12:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706793141; cv=none; b=dVyAFyDM0Vvwa9pxfpoNBehs3sF9Hs1W/bLlxDbGcCBjJGA/JFU5rcxPIMSb/KVwijlomIzZwJL5YEGqkdc1KO0kQKqLGW2rAXzo4pxYl6bv1kEARP75Ux5YLmlS+ESeXY3AwUujg9f/tnZ6DOtN8qUNrqfEI5nwF61h5/7X8TQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706793141; c=relaxed/simple; bh=8GQcDzINuKnM88L0X3+SWU8VavTC8VBBqvmgixz9hgw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=U+6adyVFjTFjPDJrbUd/eUf6I6ASsUO3enuvgnWJKtfGWPn737JomBZB30EBDanI2BMnoj38ZsZNjcJs0vPzu4p3u9lY6r1+9NTbHQBqf0LeP0Sp2axFre3gqlMuFHrlX+em9ondIidq435YVR9d/EWzKhOkCF9UFfj6qAHKvlg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D99C7175D; Thu, 1 Feb 2024 05:13:00 -0800 (PST) Received: from e130256.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 19CCD3F762; Thu, 1 Feb 2024 05:12:15 -0800 (PST) From: Hongyan Xia To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , Dietmar Eggemann , Juri Lelli , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider Cc: Qais Yousef , Morten Rasmussen , Lukasz Luba , Christian Loehle , linux-kernel@vger.kernel.org, David Dai , Saravana Kannan Subject: [RFC PATCH v2 2/7] sched/uclamp: Track uclamped util_avg in sched_avg Date: Thu, 1 Feb 2024 13:11:58 +0000 Message-Id: <92b6ffbffa4dd9ac5d27809bb14528183a54c3a3.1706792708.git.hongyan.xia2@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789702362390586943 X-GMAIL-MSGID: 1789702362390586943 Track a uclamped version of util_avg in sched_avg, which clamps util_avg within [uclamp[UCLAMP_MIN], uclamp[UCLAMP_MAX]] every time util_avg is updated. At the root CFS rq level, just like util_est, rq->cfs.avg.util_avg_uclamp must always be the sum of all util_avg_uclamp of CFS tasks on this rq. So, each time the util_avg_uclamp of a task gets updated, we also track the delta and update the root cfs_rq. When a CFS task gets enqueued or dequeued, the rq->cfs.avg.util_avg_uclamp also needs to add or subtract the util_avg_uclamp of this task. Signed-off-by: Hongyan Xia --- include/linux/sched.h | 3 +++ kernel/sched/fair.c | 21 +++++++++++++++++++ kernel/sched/pelt.c | 48 +++++++++++++++++++++++++++++++++++-------- kernel/sched/pelt.h | 5 +++-- kernel/sched/sched.h | 27 ++++++++++++++++++++++++ 5 files changed, 94 insertions(+), 10 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 03bfe9ab2951..f28eeff169ff 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -470,6 +470,9 @@ struct sched_avg { unsigned long runnable_avg; unsigned long util_avg; unsigned int util_est; +#ifdef CONFIG_UCLAMP_TASK + unsigned int util_avg_uclamp; +#endif } ____cacheline_aligned; /* diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d5cc87db4845..4f535c96463b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1089,6 +1089,9 @@ void post_init_entity_util_avg(struct task_struct *p) } sa->runnable_avg = sa->util_avg; +#ifdef CONFIG_UCLAMP_TASK + sa->util_avg_uclamp = sa->util_avg; +#endif } #else /* !CONFIG_SMP */ @@ -6763,6 +6766,12 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) /* At this point se is NULL and we are at root level*/ add_nr_running(rq, 1); +#ifdef CONFIG_UCLAMP_TASK + util_uclamp_enqueue(&rq->cfs.avg, p); + update_util_uclamp(0, 0, 0, &rq->cfs.avg, p); + /* TODO: Better skip the frequency update in the for loop above. */ + cpufreq_update_util(rq, 0); +#endif /* * Since new tasks are assigned an initial util_avg equal to @@ -6854,6 +6863,9 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags) /* At this point se is NULL and we are at root level*/ sub_nr_running(rq, 1); +#ifdef CONFIG_UCLAMP_TASK + util_uclamp_dequeue(&rq->cfs.avg, p); +#endif /* balance early to pull high priority tasks */ if (unlikely(!was_sched_idle && sched_idle_rq(rq))) @@ -6862,6 +6874,15 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags) dequeue_throttle: util_est_update(&rq->cfs, p, task_sleep); hrtick_update(rq); + +#ifdef CONFIG_UCLAMP_TASK + if (rq->cfs.h_nr_running == 0) { + WARN_ONCE(rq->cfs.avg.util_avg_uclamp, + "0 tasks on CFS of CPU %d, but util_avg_uclamp is %u\n", + rq->cpu, rq->cfs.avg.util_avg_uclamp); + WRITE_ONCE(rq->cfs.avg.util_avg_uclamp, 0); + } +#endif } #ifdef CONFIG_SMP diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c index 63b6cf898220..eca45a863f9f 100644 --- a/kernel/sched/pelt.c +++ b/kernel/sched/pelt.c @@ -266,6 +266,39 @@ ___update_load_avg(struct sched_avg *sa, unsigned long load) WRITE_ONCE(sa->util_avg, sa->util_sum / divider); } +#ifdef CONFIG_UCLAMP_TASK +/* avg must belong to the queue this se is on. */ +void update_util_uclamp(struct sched_avg *avg, struct task_struct *p) +{ + unsigned int util, uclamp_min, uclamp_max; + int delta; + + if (!p->se.on_rq) + return; + + if (!avg) + return; + + util = READ_ONCE(p->se.avg.util_avg); + uclamp_min = uclamp_eff_value(p, UCLAMP_MIN); + uclamp_max = uclamp_eff_value(p, UCLAMP_MAX); + util = clamp(util, uclamp_min, uclamp_max); + + delta = util - READ_ONCE(p->se.avg.util_avg_uclamp); + if (delta == 0) + return; + + WRITE_ONCE(p->se.avg.util_avg_uclamp, util); + util = READ_ONCE(avg->util_avg_uclamp); + util += delta; + WRITE_ONCE(avg->util_avg_uclamp, util); +} +#else /* !CONFIG_UCLAMP_TASK */ +void update_util_uclamp(struct sched_avg *avg, struct task_struct *p) +{ +} +#endif + /* * sched_entity: * @@ -292,29 +325,28 @@ ___update_load_avg(struct sched_avg *sa, unsigned long load) * load_avg = \Sum se->avg.load_avg */ -int __update_load_avg_blocked_se(u64 now, struct sched_entity *se) +void __update_load_avg_blocked_se(u64 now, struct sched_entity *se) { if (___update_load_sum(now, &se->avg, 0, 0, 0)) { ___update_load_avg(&se->avg, se_weight(se)); + if (entity_is_task(se)) + update_util_uclamp(NULL, task_of(se)); trace_pelt_se_tp(se); - return 1; } - - return 0; } -int __update_load_avg_se(u64 now, struct cfs_rq *cfs_rq, struct sched_entity *se) +void __update_load_avg_se(u64 now, struct cfs_rq *cfs_rq, struct sched_entity *se) { if (___update_load_sum(now, &se->avg, !!se->on_rq, se_runnable(se), cfs_rq->curr == se)) { ___update_load_avg(&se->avg, se_weight(se)); cfs_se_util_change(&se->avg); + if (entity_is_task(se)) + update_util_uclamp(&rq_of(cfs_rq)->cfs.avg, + task_of(se)); trace_pelt_se_tp(se); - return 1; } - - return 0; } int __update_load_avg_cfs_rq(u64 now, struct cfs_rq *cfs_rq) diff --git a/kernel/sched/pelt.h b/kernel/sched/pelt.h index 9e1083465fbc..6862f79e0fcd 100644 --- a/kernel/sched/pelt.h +++ b/kernel/sched/pelt.h @@ -1,8 +1,9 @@ #ifdef CONFIG_SMP #include "sched-pelt.h" -int __update_load_avg_blocked_se(u64 now, struct sched_entity *se); -int __update_load_avg_se(u64 now, struct cfs_rq *cfs_rq, struct sched_entity *se); +void update_util_uclamp(struct sched_avg *avg, struct task_struct *p); +void __update_load_avg_blocked_se(u64 now, struct sched_entity *se); +void __update_load_avg_se(u64 now, struct cfs_rq *cfs_rq, struct sched_entity *se); int __update_load_avg_cfs_rq(u64 now, struct cfs_rq *cfs_rq); int update_rt_rq_load_avg(u64 now, struct rq *rq, int running); int update_dl_rq_load_avg(u64 now, struct rq *rq, int running); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index e58a54bda77d..35036246824b 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3085,6 +3085,33 @@ static inline bool uclamp_is_used(void) { return static_branch_likely(&sched_uclamp_used); } + +static inline void util_uclamp_enqueue(struct sched_avg *avg, + struct task_struct *p) +{ + unsigned int avg_val = READ_ONCE(avg->util_avg_uclamp); + unsigned int p_val = READ_ONCE(p->se.avg.util_avg_uclamp); + + WRITE_ONCE(avg->util_avg_uclamp, avg_val + p_val); +} + +static inline void util_uclamp_dequeue(struct sched_avg *avg, + struct task_struct *p) +{ + unsigned int avg_val = READ_ONCE(avg->util_avg_uclamp); + unsigned int p_val = READ_ONCE(p->se.avg.util_avg_uclamp), new_val; + + if (avg_val > p_val) + new_val = avg_val - p_val; + else { + WARN_ONCE(avg_val < p_val, + "avg_val underflow. avg_val %u is even less than p_val %u before subtraction\n", + avg_val, p_val); + new_val = 0; + } + + WRITE_ONCE(avg->util_avg_uclamp, new_val); +} #else /* CONFIG_UCLAMP_TASK */ static inline unsigned long uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id) From patchwork Thu Feb 1 13:11:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hongyan Xia X-Patchwork-Id: 195324 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2719:b0:106:209c:c626 with SMTP id hl25csp139628dyb; Thu, 1 Feb 2024 05:13:56 -0800 (PST) X-Google-Smtp-Source: AGHT+IHxKp3QemHESKwJijbTFR2f4Zcf1ShBIOprOHJ3hp0Xs2ut5KcifiWAj93Q24UHIPhJWkfK X-Received: by 2002:a05:6a20:d38c:b0:19c:8673:77 with SMTP id iq12-20020a056a20d38c00b0019c86730077mr7568572pzb.2.1706793235821; Thu, 01 Feb 2024 05:13:55 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706793235; cv=pass; d=google.com; s=arc-20160816; b=hfMDROlVOGjy9HKO+g8VRJJJrpK4woEKcTnuvhZO5EOhXM9ptKQ0EkoAayNuM+3s/V aIPIgGE6S6EM2vxR6fYPXCJ/lRPe4A5zZi8Iku2oCJftypsWXmrvvJ8xoPjXDIHu6MJ5 gxAGJCF8BptRwsLmpwNpLgyb7keZ3d11ZexMDBKnhQoNNHt3Q1fYM1Cb0+qXODKMT84Y Jv36VDsFzHJ9iob8oIvTrf5hsjg+Heks0ZQQbd03lcZZI3t3IEhPvEN1psIkDkN40RzC PiVcoYi4zGxlvwL1zX42pXtGwtsRFPk5UlUMeGCYR6KNqPQWS7A1jV85jGUaWzEIBn24 I2BA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=Oh52SiMk9f8TmsTMrmIlJuHHeFhUaX82F8bqLk16cSA=; fh=dvxQyrJqS+lZXzztJMgOblAlEHN9YK1UWzAKB5hsw4M=; b=A7AbFrKgHnnOu2TXzClNc+PJp+PhFROR13mlX1Vz5uv5vzkQGtU9QD2EkhPJ2XlduY MvFijy8jX2EQP4vSq6014gFFx0XFWwahRGTqUE2v7Fh7FcRaYs1gZQHi+K30t2lHmqlO 1lg+azCAkcXfyt6rFvByafB8RVKQs0NqKeF8QV5aneHts8Yk8Nzau4yz4pOftn4U/MhX F9KUvqxlKwgKkXJfvcxsASwqs7kKCmZKEGGjGoK8Kw/RRIdZHIxF9Sr4QBONhf1tk4Of p1QphbamuP2IuNpfnXlyZJHLJj8D02hZ5lSFB9uxfoX8j5nFhNOVSAateS+o0Lf1W86k JFYw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-48205-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-48205-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com X-Forwarded-Encrypted: i=1; AJvYcCWtlEB888VojrJDEgoPBzsIzjeXgloZCtZPF179jaqGJdFTJTUV9+F0bWHE3MMWsaKyQQKsRXUeB6XLZGpZbwXe6Xt7hA== Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id y11-20020a056a00190b00b006dbb1f82e81si12351308pfi.170.2024.02.01.05.13.55 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Feb 2024 05:13:55 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-48205-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-48205-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-48205-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 18D44286529 for ; Thu, 1 Feb 2024 13:13:13 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 4855A5CDFA; Thu, 1 Feb 2024 13:12:28 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 065CB5B68D for ; Thu, 1 Feb 2024 13:12:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706793145; cv=none; b=GaIrfsPsfil384E+jKhrhlocEgLhcm7WQTjYyATEvzTwPHaWTi96vT326KaYi/Cf21ntNvnfMVD6HjYPTq0R1K+C0YX4wyUdF64Wxw7P5g7q7o4ArrhSla/8vwwXe/lBm4WBXW3J4slGnGL+GPDknJMkwq6/FX13a4eyz5e3MSo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706793145; c=relaxed/simple; bh=MF8KG6hTtTJAn69SRdLKWSXk+01l8dQMQa0VhYm+Aq0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=QtV9RW0yZDshgoKzx8QT8brFjmjFZV3HvriB39KrBABdYSbGf5LHiZYN0/Q/ngKYKAVu0sbHCdSQ0STFca2z8EAYP3TVQQMgoN3TJKs/hU2fVQA53uB8Hvhwyl+n5p8k7iT+aVwkKRBFBObyXfe4F6bcdHVvtpwvOY71MNeadqQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0B7E01762; Thu, 1 Feb 2024 05:13:04 -0800 (PST) Received: from e130256.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3FDE43F762; Thu, 1 Feb 2024 05:12:19 -0800 (PST) From: Hongyan Xia To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , Dietmar Eggemann , Juri Lelli , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider Cc: Qais Yousef , Morten Rasmussen , Lukasz Luba , Christian Loehle , linux-kernel@vger.kernel.org, David Dai , Saravana Kannan Subject: [RFC PATCH v2 3/7] sched/uclamp: Introduce root_cfs_util_uclamp for rq Date: Thu, 1 Feb 2024 13:11:59 +0000 Message-Id: <68fbd0c0bb7e2ef7a80e7359512672a235a963b1.1706792708.git.hongyan.xia2@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789702423775776338 X-GMAIL-MSGID: 1789702423775776338 The problem with rq->cfs.avg.util_avg_uclamp is that it only tracks the sum of contributions of CFS tasks that are on the rq. However, CFS tasks that belong to a CPU which were just dequeued from the rq->cfs still have decaying contributions to the rq utilization due to PELT. Introduce root_cfs_util_uclamp to capture the total utilization of CFS tasks both on and off this rq. Theoretically, keeping track of the sum of all tasks on a CPU (either on or off the rq) requires us to periodically sample the decaying PELT utilization of all off-rq tasks and then sum them up, which introduces substantial extra code and overhead. However, we can avoid the overhead, shown in this example: Let's assume 3 tasks, A, B and C. A is still on rq->cfs but B and C have just been dequeued. The cfs.avg.util_avg_uclamp has dropped from A + B + C to just A but the instantaneous utilization only just starts to decay and is now still A + B + C. Let's denote root_cfs_util_uclamp_old as the instantaneous total utilization right before B and C are dequeued. After p periods, with y being the decay factor, the new root_cfs_util_uclamp becomes: root_cfs_util_uclamp = A + B * y^p + C * y^p = A + (A + B + C - A) * y^p = cfs.avg.util_avg_uclamp + (root_cfs_util_uclamp_old - cfs.avg.util_avg_uclamp) * y^p = cfs.avg.util_avg_uclamp + diff * y^p So, whenever we want to calculate the new root_cfs_util_uclamp (including both on- and off-rq CFS tasks of a CPU), we could just decay the diff between root_cfs_util_uclamp and cfs.avg.util_avg_uclamp, and add the decayed diff to cfs.avg.util_avg_uclamp to obtain the new root_cfs_util_uclamp, without bothering to periodically sample off-rq CFS tasks and sum them up. This significantly reduces the overhead needed to maintain this signal, and makes sure we now also include the decaying contributions of CFS tasks that are dequeued. NOTE: In no way do we change how PELT and util_avg work. The original PELT signal is kept as-is and is used when needed. The new signals, util_avg_uclamp and root_cfs_util_uclamp are additional hints to the scheduler and are not meant to replace the original PELT signals. Signed-off-by: Hongyan Xia --- kernel/sched/fair.c | 7 +++ kernel/sched/pelt.c | 106 +++++++++++++++++++++++++++++++++++++++---- kernel/sched/pelt.h | 3 +- kernel/sched/sched.h | 16 +++++++ 4 files changed, 123 insertions(+), 9 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 4f535c96463b..36357cfaf48d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6710,6 +6710,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) struct sched_entity *se = &p->se; int idle_h_nr_running = task_has_idle_policy(p); int task_new = !(flags & ENQUEUE_WAKEUP); + bool __maybe_unused migrated = p->se.avg.last_update_time == 0; /* * The code below (indirectly) updates schedutil which looks at @@ -6769,6 +6770,10 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) #ifdef CONFIG_UCLAMP_TASK util_uclamp_enqueue(&rq->cfs.avg, p); update_util_uclamp(0, 0, 0, &rq->cfs.avg, p); + if (migrated) + rq->root_cfs_util_uclamp += p->se.avg.util_avg_uclamp; + rq->root_cfs_util_uclamp = max(rq->root_cfs_util_uclamp, + rq->cfs.avg.util_avg_uclamp); /* TODO: Better skip the frequency update in the for loop above. */ cpufreq_update_util(rq, 0); #endif @@ -8252,6 +8257,7 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu) migrate_se_pelt_lag(se); } + remove_root_cfs_util_uclamp(p); /* Tell new CPU we are migrated */ se->avg.last_update_time = 0; @@ -8261,6 +8267,7 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu) static void task_dead_fair(struct task_struct *p) { remove_entity_load_avg(&p->se); + remove_root_cfs_util_uclamp(p); } static int diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c index eca45a863f9f..9ba208ac26db 100644 --- a/kernel/sched/pelt.c +++ b/kernel/sched/pelt.c @@ -267,14 +267,78 @@ ___update_load_avg(struct sched_avg *sa, unsigned long load) } #ifdef CONFIG_UCLAMP_TASK +static int ___update_util_uclamp_towards(u64 now, + u64 last_update_time, + u32 period_contrib, + unsigned int *old, + unsigned int new_val) +{ + unsigned int old_val = READ_ONCE(*old); + u64 delta, periods; + + if (old_val <= new_val) { + WRITE_ONCE(*old, new_val); + return old_val < new_val; + } + + if (!last_update_time) + return 0; + delta = now - last_update_time; + if ((s64)delta < 0) + return 0; + delta >>= 10; + if (!delta) + return 0; + + delta += period_contrib; + periods = delta / 1024; + if (periods) { + u64 diff = old_val - new_val; + + /* + * Let's assume 3 tasks, A, B and C. A is still on rq but B and + * C have just been dequeued. The cfs.avg.util_avg_uclamp has + * become A but root_cfs_util_uclamp just starts to decay and is + * now still A + B + C. + * + * After p periods with y being the decay factor, the new + * root_cfs_util_uclamp should become + * + * A + B * y^p + C * y^p == A + (A + B + C - A) * y^p + * == cfs.avg.util_avg_uclamp + + * (root_cfs_util_uclamp_at_the_start - cfs.avg.util_avg_uclamp) * y^p + * == cfs.avg.util_avg_uclamp + diff * y^p + * + * So, instead of summing up each individual decayed values, we + * could just decay the diff and not bother with the summation + * at all. This is why we decay the diff here. + */ + diff = decay_load(diff, periods); + WRITE_ONCE(*old, new_val + diff); + return old_val != *old; + } + + return 0; +} + /* avg must belong to the queue this se is on. */ -void update_util_uclamp(struct sched_avg *avg, struct task_struct *p) +void update_util_uclamp(u64 now, + u64 last_update_time, + u32 period_contrib, + struct sched_avg *avg, + struct task_struct *p) { unsigned int util, uclamp_min, uclamp_max; int delta; - if (!p->se.on_rq) + if (!p->se.on_rq) { + ___update_util_uclamp_towards(now, + last_update_time, + period_contrib, + &p->se.avg.util_avg_uclamp, + 0); return; + } if (!avg) return; @@ -294,7 +358,11 @@ void update_util_uclamp(struct sched_avg *avg, struct task_struct *p) WRITE_ONCE(avg->util_avg_uclamp, util); } #else /* !CONFIG_UCLAMP_TASK */ -void update_util_uclamp(struct sched_avg *avg, struct task_struct *p) +void update_util_uclamp(u64 now, + u64 last_update_time, + u32 period_contrib, + struct sched_avg *avg, + struct task_struct *p) { } #endif @@ -327,23 +395,32 @@ void update_util_uclamp(struct sched_avg *avg, struct task_struct *p) void __update_load_avg_blocked_se(u64 now, struct sched_entity *se) { + u64 last_update_time = se->avg.last_update_time; + u32 period_contrib = se->avg.period_contrib; + if (___update_load_sum(now, &se->avg, 0, 0, 0)) { ___update_load_avg(&se->avg, se_weight(se)); if (entity_is_task(se)) - update_util_uclamp(NULL, task_of(se)); + update_util_uclamp(now, last_update_time, + period_contrib, NULL, task_of(se)); trace_pelt_se_tp(se); } } void __update_load_avg_se(u64 now, struct cfs_rq *cfs_rq, struct sched_entity *se) { + u64 last_update_time = se->avg.last_update_time; + u32 period_contrib = se->avg.period_contrib; + if (___update_load_sum(now, &se->avg, !!se->on_rq, se_runnable(se), cfs_rq->curr == se)) { ___update_load_avg(&se->avg, se_weight(se)); cfs_se_util_change(&se->avg); if (entity_is_task(se)) - update_util_uclamp(&rq_of(cfs_rq)->cfs.avg, + update_util_uclamp(now, last_update_time, + period_contrib, + &rq_of(cfs_rq)->cfs.avg, task_of(se)); trace_pelt_se_tp(se); } @@ -351,17 +428,30 @@ void __update_load_avg_se(u64 now, struct cfs_rq *cfs_rq, struct sched_entity *s int __update_load_avg_cfs_rq(u64 now, struct cfs_rq *cfs_rq) { + u64 __maybe_unused last_update_time = cfs_rq->avg.last_update_time; + u32 __maybe_unused period_contrib = cfs_rq->avg.period_contrib; + int ret = 0; + if (___update_load_sum(now, &cfs_rq->avg, scale_load_down(cfs_rq->load.weight), cfs_rq->h_nr_running, cfs_rq->curr != NULL)) { ___update_load_avg(&cfs_rq->avg, 1); - trace_pelt_cfs_tp(cfs_rq); - return 1; + ret = 1; } - return 0; +#ifdef CONFIG_UCLAMP_TASK + if (&rq_of(cfs_rq)->cfs == cfs_rq) + ret = ___update_util_uclamp_towards(now, + last_update_time, period_contrib, + &rq_of(cfs_rq)->root_cfs_util_uclamp, + READ_ONCE(cfs_rq->avg.util_avg_uclamp)); +#endif + if (ret) + trace_pelt_cfs_tp(cfs_rq); + + return ret; } /* diff --git a/kernel/sched/pelt.h b/kernel/sched/pelt.h index 6862f79e0fcd..a2852d5e862d 100644 --- a/kernel/sched/pelt.h +++ b/kernel/sched/pelt.h @@ -1,7 +1,8 @@ #ifdef CONFIG_SMP #include "sched-pelt.h" -void update_util_uclamp(struct sched_avg *avg, struct task_struct *p); +void update_util_uclamp(u64 now, u64 last_update_time, u32 period_contrib, + struct sched_avg *avg, struct task_struct *p); void __update_load_avg_blocked_se(u64 now, struct sched_entity *se); void __update_load_avg_se(u64 now, struct cfs_rq *cfs_rq, struct sched_entity *se); int __update_load_avg_cfs_rq(u64 now, struct cfs_rq *cfs_rq); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 35036246824b..ce80b87b549b 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -998,6 +998,7 @@ struct rq { /* Utilization clamp values based on CPU's RUNNABLE tasks */ struct uclamp_rq uclamp[UCLAMP_CNT] ____cacheline_aligned; unsigned int uclamp_flags; + unsigned int root_cfs_util_uclamp; #define UCLAMP_FLAG_IDLE 0x01 #endif @@ -3112,6 +3113,17 @@ static inline void util_uclamp_dequeue(struct sched_avg *avg, WRITE_ONCE(avg->util_avg_uclamp, new_val); } + +static inline void remove_root_cfs_util_uclamp(struct task_struct *p) +{ + struct rq *rq = task_rq(p); + unsigned int root_util = READ_ONCE(rq->root_cfs_util_uclamp); + unsigned int p_util = READ_ONCE(p->se.avg.util_avg_uclamp), new_util; + + new_util = (root_util > p_util) ? root_util - p_util : 0; + new_util = max(new_util, READ_ONCE(rq->cfs.avg.util_avg_uclamp)); + WRITE_ONCE(rq->root_cfs_util_uclamp, new_util); +} #else /* CONFIG_UCLAMP_TASK */ static inline unsigned long uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id) @@ -3147,6 +3159,10 @@ static inline bool uclamp_rq_is_idle(struct rq *rq) { return false; } + +static inline void remove_root_cfs_util_uclamp(struct task_struct *p) +{ +} #endif /* CONFIG_UCLAMP_TASK */ #ifdef CONFIG_HAVE_SCHED_AVG_IRQ From patchwork Thu Feb 1 13:12:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hongyan Xia X-Patchwork-Id: 195336 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2719:b0:106:209c:c626 with SMTP id hl25csp151883dyb; Thu, 1 Feb 2024 05:32:36 -0800 (PST) X-Google-Smtp-Source: AGHT+IFy9OXPQueE8AytrIZdVkwWIDVDGFw32i8ucmLCuip8dWGzY6xll31mjLsnv2J0gPXNwSU0 X-Received: by 2002:a05:6a20:b9e:b0:19e:3096:ea24 with SMTP id i30-20020a056a200b9e00b0019e3096ea24mr2046992pzh.37.1706794356632; Thu, 01 Feb 2024 05:32:36 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706794356; cv=pass; d=google.com; s=arc-20160816; b=lsH06gWMZQotPW5kklEF/R4R927pyPeKWhAcDCL7jpFpsdhSvereNzQyWQSELjGMI0 REsGuwgnKW7Uc0KrveG0LK06r5luPvCa6w+OsSSzLNJZejlR6HbkkS6ZCYvFnQuWqxhW ezk3DIv09g8hTUjQN0tmpELfocVXSwOIpzbwUmaULx46eU6OcYRiEPG+bpaerBR20v8V HCf0xjYteSMQI/mstK1ELHKFB/4ZKdvi2d2lOdYGOB1bO1Qya9O/fmO9jZH2p0p294Pu 0yGCC/BsxwyafuazMC4PrMnZtgAhRgdCw234t9ecpWJKC0Gx2qDUq5cB8KC+FE0jFVFe hDXA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=38m/ZFJNQLbGVRSi3ypoOkyHbt0v54wJqsAqS0PeRCM=; fh=FKzwMjdQ91cKKmZMXnHQajCIYMh42RMNc+tEFf1B/HI=; b=zTgdimZN9GjDiw4GjYoQoULeYYGp3zflo+Zrt+FHJfezSqLbqIbdE5rniITEtu8edw fk4TqWZrjHs/Mz9eKJexLYz8TDG1R3aulLHmY4u28XJ+mtIdVr7PSvJHbgVBeFkLcCJD Kyo9WuvsDPdnkf6OaG62pAumbojnHjf48DAIYP+9X182aMrUlNpk+CFgLl3KUDBp/Ljh HRKTSabFKONYOKy2ILosaRaxl9GbHu0wtj2tEkQz8H2QRcJDUL/tZp5UG+MxndHv1xS4 REA+1Z6mKvOm9jI4fujvqMUlSTkVIMEvEd5dwrTASOLZCIJxFOVABz8DPXbzEfs6vm4p yLJg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-48206-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-48206-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com X-Forwarded-Encrypted: i=1; AJvYcCVUWuBnkMsF21aC2j9h5t0cvT8gptfNQhHzEIm4LXDOb/f/Vbvrq431zw1eXPfXJ1XTNB7kN2DDDqL4x8q0qmXApN2h0w== Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id b16-20020a656690000000b005ced2a6b89fsi11815468pgw.74.2024.02.01.05.32.36 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Feb 2024 05:32:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-48206-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-48206-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-48206-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 906ADB23E5B for ; Thu, 1 Feb 2024 13:13:25 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 817935D472; Thu, 1 Feb 2024 13:12:30 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 868D35336B; Thu, 1 Feb 2024 13:12:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706793147; cv=none; b=M0SOYbiWhHh+LdEk68kOg3MhL08qO3tXD9SXjlAGfraSZ8kU79A6QmrfLQBko3QQ73rZeeqL5QN1bjXNHMSd+P1lgEGRwb+tuyeHUJ7oAiyvCikz/Nush/4cCnf7XRZMXcLezn1nV7bsRy0FR58jqGYg81Y6ELtY30OIzMSWmu4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706793147; c=relaxed/simple; bh=g9to9mHQTMEKme4LM6eXw5e3J3La88oBjiXs5Njofgs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=DOg2YSLvlsgDvR02CqZAgQn6vEVa3OWscYyRR9xs5cJyf2hhdTfUf0sO/ZsvShpSNoy9E/1EFgieag6A+FDMRbVZoiopVskdsY4v4Za58IMIISyQY+g2DEgHVS55BxR/UlFyIPsiefZFzXChVdJlljOK8Yi9Z2CqdZLQz6S1pBY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7641A1763; Thu, 1 Feb 2024 05:13:07 -0800 (PST) Received: from e130256.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5A3A43F762; Thu, 1 Feb 2024 05:12:22 -0800 (PST) From: Hongyan Xia To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , Dietmar Eggemann , Juri Lelli , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , "Rafael J. Wysocki" , Viresh Kumar Cc: Qais Yousef , Morten Rasmussen , Lukasz Luba , Christian Loehle , linux-kernel@vger.kernel.org, David Dai , Saravana Kannan , linux-pm@vger.kernel.org Subject: [RFC PATCH v2 4/7] sched/fair: Use CFS util_avg_uclamp for utilization and frequency Date: Thu, 1 Feb 2024 13:12:00 +0000 Message-Id: <4f755ae12895bbc74a74bac56bf2ef0f30413a32.1706792708.git.hongyan.xia2@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789703599331617775 X-GMAIL-MSGID: 1789703599331617775 Switch to the new util_avg_uclamp for task and runqueue utilization. Since task_util_est() calls task_util() which now uses util_avg_uclamp, this means util_est is now also a clamped value. Now that we have the sum aggregated CFS util value, we do not need to consult uclamp buckets to know how the frequency should be clamped. We simply look at the aggregated top level root_cfs_util_uclamp to know what frequency to choose. TODO: Sum aggregation for RT tasks. I have already implemented RT sum aggregation, which is only 49 lines of code, but I don't want RT to distract this series which is mainly CFS-focused. RT will be sent in a separate mini series. Signed-off-by: Hongyan Xia --- kernel/sched/core.c | 17 ++++---------- kernel/sched/cpufreq_schedutil.c | 10 ++------ kernel/sched/fair.c | 39 ++++++++++++++++---------------- kernel/sched/sched.h | 21 +++++++++++++---- 4 files changed, 42 insertions(+), 45 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index db4be4921e7f..0bedc05c883f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7465,6 +7465,9 @@ int sched_core_idle_cpu(int cpu) * The DL bandwidth number otoh is not a measured metric but a value computed * based on the task model parameters and gives the minimal utilization * required to meet deadlines. + * + * The util_cfs parameter has already taken uclamp into account (unless uclamp + * support is not compiled in). */ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs, unsigned long *min, @@ -7490,13 +7493,7 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs, } if (min) { - /* - * The minimum utilization returns the highest level between: - * - the computed DL bandwidth needed with the IRQ pressure which - * steals time to the deadline task. - * - The minimum performance requirement for CFS and/or RT. - */ - *min = max(irq + cpu_bw_dl(rq), uclamp_rq_get(rq, UCLAMP_MIN)); + *min = irq + cpu_bw_dl(rq); /* * When an RT task is runnable and uclamp is not used, we must @@ -7515,12 +7512,8 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs, util = util_cfs + cpu_util_rt(rq); util += cpu_util_dl(rq); - /* - * The maximum hint is a soft bandwidth requirement, which can be lower - * than the actual utilization because of uclamp_max requirements. - */ if (max) - *max = min(scale, uclamp_rq_get(rq, UCLAMP_MAX)); + *max = scale; if (util >= scale) return scale; diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 95c3c097083e..48a4e4a685d0 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -381,11 +381,8 @@ static void sugov_update_single_freq(struct update_util_data *hook, u64 time, /* * Do not reduce the frequency if the CPU has not been idle * recently, as the reduction is likely to be premature then. - * - * Except when the rq is capped by uclamp_max. */ - if (!uclamp_rq_is_capped(cpu_rq(sg_cpu->cpu)) && - sugov_cpu_is_busy(sg_cpu) && next_f < sg_policy->next_freq && + if (sugov_cpu_is_busy(sg_cpu) && next_f < sg_policy->next_freq && !sg_policy->need_freq_update) { next_f = sg_policy->next_freq; @@ -435,11 +432,8 @@ static void sugov_update_single_perf(struct update_util_data *hook, u64 time, /* * Do not reduce the target performance level if the CPU has not been * idle recently, as the reduction is likely to be premature then. - * - * Except when the rq is capped by uclamp_max. */ - if (!uclamp_rq_is_capped(cpu_rq(sg_cpu->cpu)) && - sugov_cpu_is_busy(sg_cpu) && sg_cpu->util < prev_util) + if (sugov_cpu_is_busy(sg_cpu) && sg_cpu->util < prev_util) sg_cpu->util = prev_util; cpufreq_driver_adjust_perf(sg_cpu->cpu, sg_cpu->bw_min, diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 36357cfaf48d..b92739e1c52f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4821,10 +4821,17 @@ static inline unsigned long cfs_rq_load_avg(struct cfs_rq *cfs_rq) static int newidle_balance(struct rq *this_rq, struct rq_flags *rf); +#ifdef CONFIG_UCLAMP_TASK +static inline unsigned long task_util(struct task_struct *p) +{ + return READ_ONCE(p->se.avg.util_avg_uclamp); +} +#else static inline unsigned long task_util(struct task_struct *p) { return READ_ONCE(p->se.avg.util_avg); } +#endif static inline unsigned long task_runnable(struct task_struct *p) { @@ -4932,8 +4939,13 @@ static inline void util_est_update(struct cfs_rq *cfs_rq, * To avoid underestimate of task utilization, skip updates of EWMA if * we cannot grant that thread got all CPU time it wanted. */ - if ((dequeued + UTIL_EST_MARGIN) < task_runnable(p)) + if ((READ_ONCE(p->se.avg.util_avg) + UTIL_EST_MARGIN) < + task_runnable(p)) { + ewma = clamp(ewma, + uclamp_eff_value(p, UCLAMP_MIN), + uclamp_eff_value(p, UCLAMP_MAX)); goto done; + } /* @@ -7685,11 +7697,13 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target) static unsigned long cpu_util(int cpu, struct task_struct *p, int dst_cpu, int boost) { - struct cfs_rq *cfs_rq = &cpu_rq(cpu)->cfs; - unsigned long util = READ_ONCE(cfs_rq->avg.util_avg); + struct rq *rq = cpu_rq(cpu); + struct cfs_rq *cfs_rq = &rq->cfs; + unsigned long util = root_cfs_util(rq); + bool capped = uclamp_rq_is_capped(rq); unsigned long runnable; - if (boost) { + if (boost && !capped) { runnable = READ_ONCE(cfs_rq->avg.runnable_avg); util = max(util, runnable); } @@ -7867,7 +7881,6 @@ eenv_pd_max_util(struct energy_env *eenv, struct cpumask *pd_cpus, int cpu; for_each_cpu(cpu, pd_cpus) { - struct task_struct *tsk = (cpu == dst_cpu) ? p : NULL; unsigned long util = cpu_util(cpu, p, dst_cpu, 1); unsigned long eff_util, min, max; @@ -7880,20 +7893,6 @@ eenv_pd_max_util(struct energy_env *eenv, struct cpumask *pd_cpus, */ eff_util = effective_cpu_util(cpu, util, &min, &max); - /* Task's uclamp can modify min and max value */ - if (tsk && uclamp_is_used()) { - min = max(min, uclamp_eff_value(p, UCLAMP_MIN)); - - /* - * If there is no active max uclamp constraint, - * directly use task's one, otherwise keep max. - */ - if (uclamp_rq_is_idle(cpu_rq(cpu))) - max = uclamp_eff_value(p, UCLAMP_MAX); - else - max = max(max, uclamp_eff_value(p, UCLAMP_MAX)); - } - eff_util = sugov_effective_cpu_perf(cpu, eff_util, min, max); max_util = max(max_util, eff_util); } @@ -7996,7 +7995,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) target = prev_cpu; sync_entity_load_avg(&p->se); - if (!task_util_est(p) && p_util_min == 0) + if (!task_util_est(p)) goto unlock; eenv_task_busy_time(&eenv, p, prev_cpu); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index ce80b87b549b..3ee28822f48f 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3062,16 +3062,17 @@ static inline bool uclamp_rq_is_idle(struct rq *rq) /* Is the rq being capped/throttled by uclamp_max? */ static inline bool uclamp_rq_is_capped(struct rq *rq) { - unsigned long rq_util; - unsigned long max_util; + unsigned long rq_uclamp_util, rq_real_util; if (!static_branch_likely(&sched_uclamp_used)) return false; - rq_util = cpu_util_cfs(cpu_of(rq)) + cpu_util_rt(rq); - max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value); + rq_uclamp_util = cpu_util_cfs(cpu_of(rq)) + cpu_util_rt(rq); + rq_real_util = READ_ONCE(rq->cfs.avg.util_avg) + + READ_ONCE(rq->avg_rt.util_avg); - return max_util != SCHED_CAPACITY_SCALE && rq_util >= max_util; + return rq_uclamp_util < SCHED_CAPACITY_SCALE && + rq_real_util > rq_uclamp_util; } /* @@ -3087,6 +3088,11 @@ static inline bool uclamp_is_used(void) return static_branch_likely(&sched_uclamp_used); } +static inline unsigned long root_cfs_util(struct rq *rq) +{ + return READ_ONCE(rq->root_cfs_util_uclamp); +} + static inline void util_uclamp_enqueue(struct sched_avg *avg, struct task_struct *p) { @@ -3160,6 +3166,11 @@ static inline bool uclamp_rq_is_idle(struct rq *rq) return false; } +static inline unsigned long root_cfs_util(struct rq *rq) +{ + return READ_ONCE(rq->cfs.avg.util_avg); +} + static inline void remove_root_cfs_util_uclamp(struct task_struct *p) { } From patchwork Thu Feb 1 13:12:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hongyan Xia X-Patchwork-Id: 195326 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2719:b0:106:209c:c626 with SMTP id hl25csp140032dyb; Thu, 1 Feb 2024 05:14:33 -0800 (PST) X-Google-Smtp-Source: AGHT+IG3+e1ubROKLgiROLIrDk0UJ8AFLB0JM/eJAwxpL6SLcDvQ2A8qufoo6zdIEYXK+3w0p3q7 X-Received: by 2002:a92:b12:0:b0:363:9ce7:42ed with SMTP id b18-20020a920b12000000b003639ce742edmr3772217ilf.22.1706793273225; Thu, 01 Feb 2024 05:14:33 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706793273; cv=pass; d=google.com; s=arc-20160816; b=DosdLXHxjQXNtl34/Lhe2q0hmfzSssvM6hESwnFouRTcadR7bqhPuqSMm0KsfhNLTF nSv0oWrOvTMdgZeLBvCzN0Zt/l5v9T6MHSMDyxgEarrisk24P+aPgAEvCyMCqC53xk93 onUP0XhdHsOpPxsJLtmKktzQPScjWTZy9YS82KfZj1E9nenshwmekwkajxXUPn5Ffbnw aXxOxQJS2BFLbYEwdHnNFU4DQ/CianKPcub+fazL/p00iPf1AGO4nl55yHH/xXJoGSTA p52DDGAs+7/nwODo/4tY5fCsLizGlpfG0iqiSZWkUUNkB6BugN9KZ0UZVlEAgT1HR1RU K3OA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=WPYlwH7mMXA9c3EErzXkY4mWc6mFV1r84JFotcSmp3g=; fh=+4HbHJwsCDaSME1+KU4I5gQK7TIVVa7yH/48JtF1aqo=; b=gckoLeVogFBtCKgiLWZM7QzB7BITi8iwJYQEixivfHSl4pLZj7SJWJ7gJtQIOmiqJk yTeZeYsrRgS4KtjuKwn0T2CKcxY1TpSS6O14myCxinWb7HwzjH468b1zYFJ1uaiCAwqR hXlRcPxbNjqYaNg+F8uf1qfiD4eJ8b8D0KFTl5bktmG0L9zqJu/2V89tGtw3ckCuPuy9 mN9j4Np2YWhVu8UWOx8TcudMzF83C8ad4QiSom+8B3DEOyRT3aHt/fy8HRJGnzHWY1nJ 7H/95SvP79jDDuaWQjMzpgO+TYNqjN9cScCSjPtDolPLyDei8IddumF6DWKjlNfu3g9m d8dg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-48207-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-48207-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com X-Forwarded-Encrypted: i=1; AJvYcCVjGmTM7NTiK7fHFH5yFen5rmkq+wr07/WU3edKWU484bmBNpyH1FUyugnelO6smniiOUHvFr0JYpQ6PY4VgDrG6M9oUQ== Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id s133-20020a632c8b000000b005cf268595besi11672971pgs.157.2024.02.01.05.14.33 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Feb 2024 05:14:33 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-48207-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-48207-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-48207-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 7A862285D7D for ; Thu, 1 Feb 2024 13:13:56 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C80485F473; Thu, 1 Feb 2024 13:12:35 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 735F25CDFD for ; Thu, 1 Feb 2024 13:12:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706793151; cv=none; b=lWp06rbFPC6TcqgiDvTLhTPPnY/mfIz3fFJw/oY6H50MbHkuW4pZAlstHsgokpF0QkDOTR6lL0DnrDt6U6iF091e4wCJtGWjYc0mBjOZrZm61PmycG3itbeHpZH4H9TzJRDxePOr7YLNqkLw2NgOCtprXqaA4TiwkbYVum43HAQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706793151; c=relaxed/simple; bh=t7Hg4wTa+RM0YHgJQOc3/BebdVYjqWwE6KBAvsLFOVs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=urVen31Z1aQcVwfX29KQvmlnjyFBPG+ScEPuRYsXOn0uLysVrT/sfayYMRXgXkkj786i2oIpjNt2shYY1/yRaS1OcQW0dp1nG3QI0Rtnp9ydrFteuqxg2BdZEDZGz4yZKQOOoaIQ7HeB5OZI7Rjyeget8cRZ+P0p3aTdhkywI3Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 88A6F1764; Thu, 1 Feb 2024 05:13:10 -0800 (PST) Received: from e130256.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A19333F762; Thu, 1 Feb 2024 05:12:25 -0800 (PST) From: Hongyan Xia To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , Dietmar Eggemann , Juri Lelli , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider Cc: Qais Yousef , Morten Rasmussen , Lukasz Luba , Christian Loehle , linux-kernel@vger.kernel.org, David Dai , Saravana Kannan Subject: [RFC PATCH v2 5/7] sched/fair: Massively simplify util_fits_cpu() Date: Thu, 1 Feb 2024 13:12:01 +0000 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789702463319669958 X-GMAIL-MSGID: 1789702463319669958 Currently, there's no way to distinguish the difference between 1) a CPU that is actually maxed out at its highest frequency, or 2) one that is throttled because of UCLAMP_MAX, since both present util_avg values of 1024. This is problematic because when we try to pick a CPU for a task to run, we would like to give 2) a chance, or at least prefer 2) to 1). Current upstream now gives all 0 spare capacity CPUs a chance to consider queuing more tasks because there's a chance that 0 spare capacity is due to UCLAMP_MAX. However, this creates further problems because energy calculations are now bogus when spare capacity is already 0, and tasks tend to pile up on one CPU. Fix by using util_avg_uclamp for util_fits_cpu(). This way, case 1) will still keep its utilization at 1024 whereas 2) shows spare capacities if the sum of util_avg_uclamp values is still under the CPU capacity. Under sum aggregation, checking whether a task fits a CPU becomes much simpler. We simply do fits_capacity() and there does not need to be all kinds of code checking all corner cases for uclamp. This means util_fits_cpu() returns to true and false instead of tri-state, simplifying a huge amount of code. [1]: https://lore.kernel.org/all/20230205224318.2035646-2-qyousef@layalina.io/ Signed-off-by: Hongyan Xia --- kernel/sched/fair.c | 253 ++++---------------------------------------- 1 file changed, 23 insertions(+), 230 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index b92739e1c52f..49997f1f58fb 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4974,135 +4974,19 @@ static inline void util_est_update(struct cfs_rq *cfs_rq, trace_sched_util_est_se_tp(&p->se); } -static inline int util_fits_cpu(unsigned long util, - unsigned long uclamp_min, - unsigned long uclamp_max, - int cpu) +/* util must be the uclamp'ed value (i.e. from util_avg_uclamp). */ +static inline int util_fits_cpu(unsigned long util, int cpu) { - unsigned long capacity_orig, capacity_orig_thermal; unsigned long capacity = capacity_of(cpu); - bool fits, uclamp_max_fits; - /* - * Check if the real util fits without any uclamp boost/cap applied. - */ - fits = fits_capacity(util, capacity); - - if (!uclamp_is_used()) - return fits; - - /* - * We must use arch_scale_cpu_capacity() for comparing against uclamp_min and - * uclamp_max. We only care about capacity pressure (by using - * capacity_of()) for comparing against the real util. - * - * If a task is boosted to 1024 for example, we don't want a tiny - * pressure to skew the check whether it fits a CPU or not. - * - * Similarly if a task is capped to arch_scale_cpu_capacity(little_cpu), it - * should fit a little cpu even if there's some pressure. - * - * Only exception is for thermal pressure since it has a direct impact - * on available OPP of the system. - * - * We honour it for uclamp_min only as a drop in performance level - * could result in not getting the requested minimum performance level. - * - * For uclamp_max, we can tolerate a drop in performance level as the - * goal is to cap the task. So it's okay if it's getting less. - */ - capacity_orig = arch_scale_cpu_capacity(cpu); - capacity_orig_thermal = capacity_orig - arch_scale_thermal_pressure(cpu); - - /* - * We want to force a task to fit a cpu as implied by uclamp_max. - * But we do have some corner cases to cater for.. - * - * - * C=z - * | ___ - * | C=y | | - * |_ _ _ _ _ _ _ _ _ ___ _ _ _ | _ | _ _ _ _ _ uclamp_max - * | C=x | | | | - * | ___ | | | | - * | | | | | | | (util somewhere in this region) - * | | | | | | | - * | | | | | | | - * +---------------------------------------- - * cpu0 cpu1 cpu2 - * - * In the above example if a task is capped to a specific performance - * point, y, then when: - * - * * util = 80% of x then it does not fit on cpu0 and should migrate - * to cpu1 - * * util = 80% of y then it is forced to fit on cpu1 to honour - * uclamp_max request. - * - * which is what we're enforcing here. A task always fits if - * uclamp_max <= capacity_orig. But when uclamp_max > capacity_orig, - * the normal upmigration rules should withhold still. - * - * Only exception is when we are on max capacity, then we need to be - * careful not to block overutilized state. This is so because: - * - * 1. There's no concept of capping at max_capacity! We can't go - * beyond this performance level anyway. - * 2. The system is being saturated when we're operating near - * max capacity, it doesn't make sense to block overutilized. - */ - uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE); - uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig); - fits = fits || uclamp_max_fits; - - /* - * - * C=z - * | ___ (region a, capped, util >= uclamp_max) - * | C=y | | - * |_ _ _ _ _ _ _ _ _ ___ _ _ _ | _ | _ _ _ _ _ uclamp_max - * | C=x | | | | - * | ___ | | | | (region b, uclamp_min <= util <= uclamp_max) - * |_ _ _|_ _|_ _ _ _| _ | _ _ _| _ | _ _ _ _ _ uclamp_min - * | | | | | | | - * | | | | | | | (region c, boosted, util < uclamp_min) - * +---------------------------------------- - * cpu0 cpu1 cpu2 - * - * a) If util > uclamp_max, then we're capped, we don't care about - * actual fitness value here. We only care if uclamp_max fits - * capacity without taking margin/pressure into account. - * See comment above. - * - * b) If uclamp_min <= util <= uclamp_max, then the normal - * fits_capacity() rules apply. Except we need to ensure that we - * enforce we remain within uclamp_max, see comment above. - * - * c) If util < uclamp_min, then we are boosted. Same as (b) but we - * need to take into account the boosted value fits the CPU without - * taking margin/pressure into account. - * - * Cases (a) and (b) are handled in the 'fits' variable already. We - * just need to consider an extra check for case (c) after ensuring we - * handle the case uclamp_min > uclamp_max. - */ - uclamp_min = min(uclamp_min, uclamp_max); - if (fits && (util < uclamp_min) && (uclamp_min > capacity_orig_thermal)) - return -1; - - return fits; + return fits_capacity(util, capacity); } static inline int task_fits_cpu(struct task_struct *p, int cpu) { - unsigned long uclamp_min = uclamp_eff_value(p, UCLAMP_MIN); - unsigned long uclamp_max = uclamp_eff_value(p, UCLAMP_MAX); unsigned long util = task_util_est(p); - /* - * Return true only if the cpu fully fits the task requirements, which - * include the utilization but also the performance hints. - */ - return (util_fits_cpu(util, uclamp_min, uclamp_max, cpu) > 0); + + return util_fits_cpu(util, cpu); } static inline void update_misfit_status(struct task_struct *p, struct rq *rq) @@ -6678,11 +6562,8 @@ static inline void hrtick_update(struct rq *rq) #ifdef CONFIG_SMP static inline bool cpu_overutilized(int cpu) { - unsigned long rq_util_min = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MIN); - unsigned long rq_util_max = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MAX); - /* Return true only if the utilization doesn't fit CPU's capacity */ - return !util_fits_cpu(cpu_util_cfs(cpu), rq_util_min, rq_util_max, cpu); + return !util_fits_cpu(cpu_util_cfs(cpu), cpu); } static inline void update_overutilized_status(struct rq *rq) @@ -7463,8 +7344,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool static int select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target) { - unsigned long task_util, util_min, util_max, best_cap = 0; - int fits, best_fits = 0; + unsigned long task_util, best_cap = 0; int cpu, best_cpu = -1; struct cpumask *cpus; @@ -7472,8 +7352,6 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target) cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); task_util = task_util_est(p); - util_min = uclamp_eff_value(p, UCLAMP_MIN); - util_max = uclamp_eff_value(p, UCLAMP_MAX); for_each_cpu_wrap(cpu, cpus, target) { unsigned long cpu_cap = capacity_of(cpu); @@ -7481,44 +7359,22 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target) if (!available_idle_cpu(cpu) && !sched_idle_cpu(cpu)) continue; - fits = util_fits_cpu(task_util, util_min, util_max, cpu); - - /* This CPU fits with all requirements */ - if (fits > 0) + if (util_fits_cpu(task_util, cpu)) return cpu; - /* - * Only the min performance hint (i.e. uclamp_min) doesn't fit. - * Look for the CPU with best capacity. - */ - else if (fits < 0) - cpu_cap = arch_scale_cpu_capacity(cpu) - thermal_load_avg(cpu_rq(cpu)); - /* - * First, select CPU which fits better (-1 being better than 0). - * Then, select the one with best capacity at same level. - */ - if ((fits < best_fits) || - ((fits == best_fits) && (cpu_cap > best_cap))) { + if (cpu_cap > best_cap) { best_cap = cpu_cap; best_cpu = cpu; - best_fits = fits; } } return best_cpu; } -static inline bool asym_fits_cpu(unsigned long util, - unsigned long util_min, - unsigned long util_max, - int cpu) +static inline bool asym_fits_cpu(unsigned long util, int cpu) { if (sched_asym_cpucap_active()) - /* - * Return true only if the cpu fully fits the task requirements - * which include the utilization and the performance hints. - */ - return (util_fits_cpu(util, util_min, util_max, cpu) > 0); + return util_fits_cpu(util, cpu); return true; } @@ -7530,7 +7386,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target) { bool has_idle_core = false; struct sched_domain *sd; - unsigned long task_util, util_min, util_max; + unsigned long task_util; int i, recent_used_cpu, prev_aff = -1; /* @@ -7540,8 +7396,6 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target) if (sched_asym_cpucap_active()) { sync_entity_load_avg(&p->se); task_util = task_util_est(p); - util_min = uclamp_eff_value(p, UCLAMP_MIN); - util_max = uclamp_eff_value(p, UCLAMP_MAX); } /* @@ -7550,7 +7404,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target) lockdep_assert_irqs_disabled(); if ((available_idle_cpu(target) || sched_idle_cpu(target)) && - asym_fits_cpu(task_util, util_min, util_max, target)) + asym_fits_cpu(task_util, target)) return target; /* @@ -7558,7 +7412,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target) */ if (prev != target && cpus_share_cache(prev, target) && (available_idle_cpu(prev) || sched_idle_cpu(prev)) && - asym_fits_cpu(task_util, util_min, util_max, prev)) { + asym_fits_cpu(task_util, prev)) { if (!static_branch_unlikely(&sched_cluster_active) || cpus_share_resources(prev, target)) @@ -7579,7 +7433,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target) in_task() && prev == smp_processor_id() && this_rq()->nr_running <= 1 && - asym_fits_cpu(task_util, util_min, util_max, prev)) { + asym_fits_cpu(task_util, prev)) { return prev; } @@ -7591,7 +7445,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target) cpus_share_cache(recent_used_cpu, target) && (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu)) && cpumask_test_cpu(recent_used_cpu, p->cpus_ptr) && - asym_fits_cpu(task_util, util_min, util_max, recent_used_cpu)) { + asym_fits_cpu(task_util, recent_used_cpu)) { if (!static_branch_unlikely(&sched_cluster_active) || cpus_share_resources(recent_used_cpu, target)) @@ -7966,13 +7820,8 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) { struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_rq_mask); unsigned long prev_delta = ULONG_MAX, best_delta = ULONG_MAX; - unsigned long p_util_min = uclamp_is_used() ? uclamp_eff_value(p, UCLAMP_MIN) : 0; - unsigned long p_util_max = uclamp_is_used() ? uclamp_eff_value(p, UCLAMP_MAX) : 1024; struct root_domain *rd = this_rq()->rd; int cpu, best_energy_cpu, target = -1; - int prev_fits = -1, best_fits = -1; - unsigned long best_thermal_cap = 0; - unsigned long prev_thermal_cap = 0; struct sched_domain *sd; struct perf_domain *pd; struct energy_env eenv; @@ -8001,14 +7850,11 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) eenv_task_busy_time(&eenv, p, prev_cpu); for (; pd; pd = pd->next) { - unsigned long util_min = p_util_min, util_max = p_util_max; unsigned long cpu_cap, cpu_thermal_cap, util; unsigned long cur_delta, max_spare_cap = 0; - unsigned long rq_util_min, rq_util_max; unsigned long prev_spare_cap = 0; int max_spare_cap_cpu = -1; unsigned long base_energy; - int fits, max_fits = -1; cpumask_and(cpus, perf_domain_span(pd), cpu_online_mask); @@ -8024,8 +7870,6 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) eenv.pd_cap = 0; for_each_cpu(cpu, cpus) { - struct rq *rq = cpu_rq(cpu); - eenv.pd_cap += cpu_thermal_cap; if (!cpumask_test_cpu(cpu, sched_domain_span(sd))) @@ -8036,31 +7880,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) util = cpu_util(cpu, p, cpu, 0); cpu_cap = capacity_of(cpu); - - /* - * Skip CPUs that cannot satisfy the capacity request. - * IOW, placing the task there would make the CPU - * overutilized. Take uclamp into account to see how - * much capacity we can get out of the CPU; this is - * aligned with sched_cpu_util(). - */ - if (uclamp_is_used() && !uclamp_rq_is_idle(rq)) { - /* - * Open code uclamp_rq_util_with() except for - * the clamp() part. Ie: apply max aggregation - * only. util_fits_cpu() logic requires to - * operate on non clamped util but must use the - * max-aggregated uclamp_{min, max}. - */ - rq_util_min = uclamp_rq_get(rq, UCLAMP_MIN); - rq_util_max = uclamp_rq_get(rq, UCLAMP_MAX); - - util_min = max(rq_util_min, p_util_min); - util_max = max(rq_util_max, p_util_max); - } - - fits = util_fits_cpu(util, util_min, util_max, cpu); - if (!fits) + if (!util_fits_cpu(util, cpu)) continue; lsub_positive(&cpu_cap, util); @@ -8068,9 +7888,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) if (cpu == prev_cpu) { /* Always use prev_cpu as a candidate. */ prev_spare_cap = cpu_cap; - prev_fits = fits; - } else if ((fits > max_fits) || - ((fits == max_fits) && (cpu_cap > max_spare_cap))) { + } else if (cpu_cap > max_spare_cap) { /* * Find the CPU with the maximum spare capacity * among the remaining CPUs in the performance @@ -8078,7 +7896,6 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) */ max_spare_cap = cpu_cap; max_spare_cap_cpu = cpu; - max_fits = fits; } } @@ -8097,50 +7914,26 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) if (prev_delta < base_energy) goto unlock; prev_delta -= base_energy; - prev_thermal_cap = cpu_thermal_cap; best_delta = min(best_delta, prev_delta); } /* Evaluate the energy impact of using max_spare_cap_cpu. */ if (max_spare_cap_cpu >= 0 && max_spare_cap > prev_spare_cap) { - /* Current best energy cpu fits better */ - if (max_fits < best_fits) - continue; - - /* - * Both don't fit performance hint (i.e. uclamp_min) - * but best energy cpu has better capacity. - */ - if ((max_fits < 0) && - (cpu_thermal_cap <= best_thermal_cap)) - continue; - cur_delta = compute_energy(&eenv, pd, cpus, p, max_spare_cap_cpu); /* CPU utilization has changed */ if (cur_delta < base_energy) goto unlock; cur_delta -= base_energy; - - /* - * Both fit for the task but best energy cpu has lower - * energy impact. - */ - if ((max_fits > 0) && (best_fits > 0) && - (cur_delta >= best_delta)) - continue; - - best_delta = cur_delta; - best_energy_cpu = max_spare_cap_cpu; - best_fits = max_fits; - best_thermal_cap = cpu_thermal_cap; + if (cur_delta < best_delta) { + best_delta = cur_delta; + best_energy_cpu = max_spare_cap_cpu; + } } } rcu_read_unlock(); - if ((best_fits > prev_fits) || - ((best_fits > 0) && (best_delta < prev_delta)) || - ((best_fits < 0) && (best_thermal_cap > prev_thermal_cap))) + if (best_delta < prev_delta) target = best_energy_cpu; return target; From patchwork Thu Feb 1 13:12:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hongyan Xia X-Patchwork-Id: 195327 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2719:b0:106:209c:c626 with SMTP id hl25csp140269dyb; Thu, 1 Feb 2024 05:14:55 -0800 (PST) X-Google-Smtp-Source: AGHT+IGwayCbihKVw8emb3HazX/2Wi0F7KS1DzTmFf/100m2dmBkfg43IFQf4Q/kL1ZcfUhc3iIy X-Received: by 2002:a05:6359:3112:b0:178:7fc9:3991 with SMTP id rh18-20020a056359311200b001787fc93991mr3589601rwb.23.1706793295065; Thu, 01 Feb 2024 05:14:55 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706793295; cv=pass; d=google.com; s=arc-20160816; b=mzIE4V0/rJAM1PAAu7EJJVO+ZtC5QJTqJnwmPIcoTccFEPUEexaTMtXRU0GyDN1YPK cyMYV5pe5DwQ7SZJJejcfIuOIAAEJeDCDRxFf904Lh9NkgSWnntQYzneSAe41d+5TI9q 5UDBXXKEMECA2OfWRUzuYQBEE+M17sKf4MoDB3OY5JmdUdyWsyWNmfiCyssMqlNTLWIX P5lmmJt+NivnJuckTjSOT0kA4btzdIS2Gc3xCfmmdG+ja2RtmheJIPMUgPpnLEKx//uF nt/eeJKJqmS5PGmAXpHSaAB4nWg3Z6UlwUzCLm61O+T4HPrdVDPIgcFV5Of+ILbNaINi niiA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=H//aA21oJzWHf42m/vftMWruKm44bdlPt5if3bW2G9Y=; fh=EK9UOTlNi196G5RZUfTK3RQk6+lLf//y1n6Brl+1QFU=; b=g1J1imbFF4J4DBq8mvssW8bhrPOX4VXdTASWl0Yq4o0I5h0wQa45EvbjD6LFxumYNd 3sHX5JgDFmt8Z+H20HrAafwQj5I8Eclx8RfiJ6jwP4o+wk03IhF10K9UiQvUnUz7EkoY nGBblY1W8apJX8M0KCUUfSiWalIJQiO+/O0CN9hxxzfX5WXEH5EzgzFdgNAsrf56MFI+ OZWQVBYR8fP9CbRQdfpurpuDeewu6hN/I6ttRupOhQbeS2NiJ4/JzaCfMw0A39SkMSYg XvUNJW/RWyGGqswdCnFiFZMJQlaSS20SfdtoVHPERSBTFeFilndO8L6ullIEdln2lCas 9y6g==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-48209-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-48209-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com X-Forwarded-Encrypted: i=1; AJvYcCWCVN0Wuc3CG6z+8zkx2xBx1gX9yepPBVH7dvn1WBJZm9Ww4rmZR8K3NReCd945ZiNtcDn6WWRjAOSR7DLY5hpuvu8U2A== Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id p8-20020a056a000b4800b006d9d2ba854esi12488354pfo.177.2024.02.01.05.14.54 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Feb 2024 05:14:55 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-48209-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-48209-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-48209-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 196F128B435 for ; Thu, 1 Feb 2024 13:14:21 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 748EC5339F; Thu, 1 Feb 2024 13:12:40 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 640725D47E for ; Thu, 1 Feb 2024 13:12:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706793155; cv=none; b=uR7nS4kmwIRXcW/vs0hQMee8PZvEo6lKOIAfNZnk58Pd+5wCVk3v/Of/AXPjD/iJUZaCCBktQAuRmp9BxsdpgMwnCvggFRZmpHIQtVgmcpqXXGqjCEZ/z7O+GEDNi8uVu+X5qChCQK202/8ZThWC+xCJ77IcCVToNLBlRH/NuIo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706793155; c=relaxed/simple; bh=JaenQUXRXmbunSHjMbrcYbbBgnAH6NQCxQv62fQFkRg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=jxN8TgQn6xSjuQPypSnnvri84RMT2ld6QCg+oOzHy8ILthZ9p/YsZ3Nz2RkROqfEMvdse3gfZBiBumwB0nCxpnG9cP+l7EMwE6PB6aUsbvJ+FGG9NAf577gMkPqfURi4z5W1pbh2X1Q8qbgblO78UFT+HNpkSzd0OBliiP8+ncI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 329CA176A; Thu, 1 Feb 2024 05:13:14 -0800 (PST) Received: from e130256.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4D0743F762; Thu, 1 Feb 2024 05:12:29 -0800 (PST) From: Hongyan Xia To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , Dietmar Eggemann , Juri Lelli , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider Cc: Qais Yousef , Morten Rasmussen , Lukasz Luba , Christian Loehle , linux-kernel@vger.kernel.org, David Dai , Saravana Kannan Subject: [RFC PATCH v2 6/7] sched/uclamp: Remove all uclamp bucket logic Date: Thu, 1 Feb 2024 13:12:02 +0000 Message-Id: <61ef1a11325838e8b50e76a1b6c6d93bd5f2982c.1706792708.git.hongyan.xia2@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789702486139376181 X-GMAIL-MSGID: 1789702486139376181 Also rewrite uclamp_update_active() so that the effective uclamp values are updated every time we change task group properties, change system defaults or a request is issued from userspace. This also signnificantly reduces uclamp overhead because we no longer need to compute effective uclamp values and manipulate buckets every time a task is enqueued or dequeued (in uclamp_rq_{inc/dec}()). TODO: Rewrite documentation to match the new logic. Signed-off-by: Hongyan Xia --- Changed in v2: - Remove stale comments about 'uclamp buckets'. --- include/linux/sched.h | 4 - init/Kconfig | 32 ----- kernel/sched/core.c | 300 +++--------------------------------------- kernel/sched/fair.c | 4 - kernel/sched/rt.c | 4 - kernel/sched/sched.h | 85 ------------ 6 files changed, 19 insertions(+), 410 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index f28eeff169ff..291b6781b221 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -678,9 +678,6 @@ struct sched_dl_entity { }; #ifdef CONFIG_UCLAMP_TASK -/* Number of utilization clamp buckets (shorter alias) */ -#define UCLAMP_BUCKETS CONFIG_UCLAMP_BUCKETS_COUNT - /* * Utilization clamp for a scheduling entity * @value: clamp value "assigned" to a se @@ -706,7 +703,6 @@ struct sched_dl_entity { */ struct uclamp_se { unsigned int value : bits_per(SCHED_CAPACITY_SCALE); - unsigned int bucket_id : bits_per(UCLAMP_BUCKETS); unsigned int active : 1; unsigned int user_defined : 1; }; diff --git a/init/Kconfig b/init/Kconfig index 9ffb103fc927..1c8e11dcda17 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -808,38 +808,6 @@ config UCLAMP_TASK enforce or grant any specific bandwidth for tasks. If in doubt, say N. - -config UCLAMP_BUCKETS_COUNT - int "Number of supported utilization clamp buckets" - range 5 20 - default 5 - depends on UCLAMP_TASK - help - Defines the number of clamp buckets to use. The range of each bucket - will be SCHED_CAPACITY_SCALE/UCLAMP_BUCKETS_COUNT. The higher the - number of clamp buckets the finer their granularity and the higher - the precision of clamping aggregation and tracking at run-time. - - For example, with the minimum configuration value we will have 5 - clamp buckets tracking 20% utilization each. A 25% boosted tasks will - be refcounted in the [20..39]% bucket and will set the bucket clamp - effective value to 25%. - If a second 30% boosted task should be co-scheduled on the same CPU, - that task will be refcounted in the same bucket of the first task and - it will boost the bucket clamp effective value to 30%. - The clamp effective value of a bucket is reset to its nominal value - (20% in the example above) when there are no more tasks refcounted in - that bucket. - - An additional boost/capping margin can be added to some tasks. In the - example above the 25% task will be boosted to 30% until it exits the - CPU. If that should be considered not acceptable on certain systems, - it's always possible to reduce the margin by increasing the number of - clamp buckets to trade off used memory for run-time tracking - precision. - - If in doubt, use the default value. - endmenu # diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 0bedc05c883f..a3b36adc4dcc 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1408,17 +1408,9 @@ static struct uclamp_se uclamp_default[UCLAMP_CNT]; */ DEFINE_STATIC_KEY_FALSE(sched_uclamp_used); -/* Integer rounded range for each bucket */ -#define UCLAMP_BUCKET_DELTA DIV_ROUND_CLOSEST(SCHED_CAPACITY_SCALE, UCLAMP_BUCKETS) - #define for_each_clamp_id(clamp_id) \ for ((clamp_id) = 0; (clamp_id) < UCLAMP_CNT; (clamp_id)++) -static inline unsigned int uclamp_bucket_id(unsigned int clamp_value) -{ - return min_t(unsigned int, clamp_value / UCLAMP_BUCKET_DELTA, UCLAMP_BUCKETS - 1); -} - static inline unsigned int uclamp_none(enum uclamp_id clamp_id) { if (clamp_id == UCLAMP_MIN) @@ -1430,58 +1422,9 @@ static inline void uclamp_se_set(struct uclamp_se *uc_se, unsigned int value, bool user_defined) { uc_se->value = value; - uc_se->bucket_id = uclamp_bucket_id(value); uc_se->user_defined = user_defined; } -static inline unsigned int -uclamp_idle_value(struct rq *rq, enum uclamp_id clamp_id, - unsigned int clamp_value) -{ - /* - * Avoid blocked utilization pushing up the frequency when we go - * idle (which drops the max-clamp) by retaining the last known - * max-clamp. - */ - if (clamp_id == UCLAMP_MAX) { - rq->uclamp_flags |= UCLAMP_FLAG_IDLE; - return clamp_value; - } - - return uclamp_none(UCLAMP_MIN); -} - -static inline void uclamp_idle_reset(struct rq *rq, enum uclamp_id clamp_id, - unsigned int clamp_value) -{ - /* Reset max-clamp retention only on idle exit */ - if (!(rq->uclamp_flags & UCLAMP_FLAG_IDLE)) - return; - - uclamp_rq_set(rq, clamp_id, clamp_value); -} - -static inline -unsigned int uclamp_rq_max_value(struct rq *rq, enum uclamp_id clamp_id, - unsigned int clamp_value) -{ - struct uclamp_bucket *bucket = rq->uclamp[clamp_id].bucket; - int bucket_id = UCLAMP_BUCKETS - 1; - - /* - * Since both min and max clamps are max aggregated, find the - * top most bucket with tasks in. - */ - for ( ; bucket_id >= 0; bucket_id--) { - if (!bucket[bucket_id].tasks) - continue; - return bucket[bucket_id].value; - } - - /* No tasks -- default clamp values */ - return uclamp_idle_value(rq, clamp_id, clamp_value); -} - static void __uclamp_update_util_min_rt_default(struct task_struct *p) { unsigned int default_util_min; @@ -1537,8 +1480,7 @@ uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id) } /* - * The effective clamp bucket index of a task depends on, by increasing - * priority: + * The effective uclamp value of a task depends on, by increasing priority: * - the task specific clamp value, when explicitly requested from userspace * - the task group effective clamp value, for tasks not either in the root * group or in an autogroup @@ -1559,196 +1501,24 @@ uclamp_eff_get(struct task_struct *p, enum uclamp_id clamp_id) unsigned long uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id) { - struct uclamp_se uc_eff; - - /* Task currently refcounted: use back-annotated (effective) value */ - if (p->uclamp[clamp_id].active) - return (unsigned long)p->uclamp[clamp_id].value; - - uc_eff = uclamp_eff_get(p, clamp_id); - - return (unsigned long)uc_eff.value; -} - -/* - * When a task is enqueued on a rq, the clamp bucket currently defined by the - * task's uclamp::bucket_id is refcounted on that rq. This also immediately - * updates the rq's clamp value if required. - * - * Tasks can have a task-specific value requested from user-space, track - * within each bucket the maximum value for tasks refcounted in it. - * This "local max aggregation" allows to track the exact "requested" value - * for each bucket when all its RUNNABLE tasks require the same clamp. - */ -static inline void uclamp_rq_inc_id(struct rq *rq, struct task_struct *p, - enum uclamp_id clamp_id) -{ - struct uclamp_rq *uc_rq = &rq->uclamp[clamp_id]; - struct uclamp_se *uc_se = &p->uclamp[clamp_id]; - struct uclamp_bucket *bucket; - - lockdep_assert_rq_held(rq); + if (!uclamp_is_used() || !p->uclamp[clamp_id].active) + return uclamp_none(clamp_id); - /* Update task effective clamp */ - p->uclamp[clamp_id] = uclamp_eff_get(p, clamp_id); - - bucket = &uc_rq->bucket[uc_se->bucket_id]; - bucket->tasks++; - uc_se->active = true; - - uclamp_idle_reset(rq, clamp_id, uc_se->value); - - /* - * Local max aggregation: rq buckets always track the max - * "requested" clamp value of its RUNNABLE tasks. - */ - if (bucket->tasks == 1 || uc_se->value > bucket->value) - bucket->value = uc_se->value; - - if (uc_se->value > uclamp_rq_get(rq, clamp_id)) - uclamp_rq_set(rq, clamp_id, uc_se->value); + return p->uclamp[clamp_id].value; } -/* - * When a task is dequeued from a rq, the clamp bucket refcounted by the task - * is released. If this is the last task reference counting the rq's max - * active clamp value, then the rq's clamp value is updated. - * - * Both refcounted tasks and rq's cached clamp values are expected to be - * always valid. If it's detected they are not, as defensive programming, - * enforce the expected state and warn. - */ -static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p, - enum uclamp_id clamp_id) -{ - struct uclamp_rq *uc_rq = &rq->uclamp[clamp_id]; - struct uclamp_se *uc_se = &p->uclamp[clamp_id]; - struct uclamp_bucket *bucket; - unsigned int bkt_clamp; - unsigned int rq_clamp; - - lockdep_assert_rq_held(rq); - - /* - * If sched_uclamp_used was enabled after task @p was enqueued, - * we could end up with unbalanced call to uclamp_rq_dec_id(). - * - * In this case the uc_se->active flag should be false since no uclamp - * accounting was performed at enqueue time and we can just return - * here. - * - * Need to be careful of the following enqueue/dequeue ordering - * problem too - * - * enqueue(taskA) - * // sched_uclamp_used gets enabled - * enqueue(taskB) - * dequeue(taskA) - * // Must not decrement bucket->tasks here - * dequeue(taskB) - * - * where we could end up with stale data in uc_se and - * bucket[uc_se->bucket_id]. - * - * The following check here eliminates the possibility of such race. - */ - if (unlikely(!uc_se->active)) - return; - - bucket = &uc_rq->bucket[uc_se->bucket_id]; - - SCHED_WARN_ON(!bucket->tasks); - if (likely(bucket->tasks)) - bucket->tasks--; - - uc_se->active = false; - - /* - * Keep "local max aggregation" simple and accept to (possibly) - * overboost some RUNNABLE tasks in the same bucket. - * The rq clamp bucket value is reset to its base value whenever - * there are no more RUNNABLE tasks refcounting it. - */ - if (likely(bucket->tasks)) - return; - - rq_clamp = uclamp_rq_get(rq, clamp_id); - /* - * Defensive programming: this should never happen. If it happens, - * e.g. due to future modification, warn and fixup the expected value. - */ - SCHED_WARN_ON(bucket->value > rq_clamp); - if (bucket->value >= rq_clamp) { - bkt_clamp = uclamp_rq_max_value(rq, clamp_id, uc_se->value); - uclamp_rq_set(rq, clamp_id, bkt_clamp); - } -} - -static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p) -{ - enum uclamp_id clamp_id; - - /* - * Avoid any overhead until uclamp is actually used by the userspace. - * - * The condition is constructed such that a NOP is generated when - * sched_uclamp_used is disabled. - */ - if (!static_branch_unlikely(&sched_uclamp_used)) - return; - - if (unlikely(!p->sched_class->uclamp_enabled)) - return; - - for_each_clamp_id(clamp_id) - uclamp_rq_inc_id(rq, p, clamp_id); - - /* Reset clamp idle holding when there is one RUNNABLE task */ - if (rq->uclamp_flags & UCLAMP_FLAG_IDLE) - rq->uclamp_flags &= ~UCLAMP_FLAG_IDLE; -} - -static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p) +static inline void +uclamp_update_active_nolock(struct task_struct *p) { enum uclamp_id clamp_id; - /* - * Avoid any overhead until uclamp is actually used by the userspace. - * - * The condition is constructed such that a NOP is generated when - * sched_uclamp_used is disabled. - */ - if (!static_branch_unlikely(&sched_uclamp_used)) - return; - - if (unlikely(!p->sched_class->uclamp_enabled)) - return; - for_each_clamp_id(clamp_id) - uclamp_rq_dec_id(rq, p, clamp_id); -} - -static inline void uclamp_rq_reinc_id(struct rq *rq, struct task_struct *p, - enum uclamp_id clamp_id) -{ - if (!p->uclamp[clamp_id].active) - return; - - uclamp_rq_dec_id(rq, p, clamp_id); - uclamp_rq_inc_id(rq, p, clamp_id); - - /* - * Make sure to clear the idle flag if we've transiently reached 0 - * active tasks on rq. - */ - if (clamp_id == UCLAMP_MAX && (rq->uclamp_flags & UCLAMP_FLAG_IDLE)) - rq->uclamp_flags &= ~UCLAMP_FLAG_IDLE; + p->uclamp[clamp_id] = uclamp_eff_get(p, clamp_id); } static inline void uclamp_update_active(struct task_struct *p) { - enum uclamp_id clamp_id; struct rq_flags rf; struct rq *rq; @@ -1762,14 +1532,7 @@ uclamp_update_active(struct task_struct *p) */ rq = task_rq_lock(p, &rf); - /* - * Setting the clamp bucket is serialized by task_rq_lock(). - * If the task is not yet RUNNABLE and its task_struct is not - * affecting a valid clamp bucket, the next time it's enqueued, - * it will already see the updated clamp bucket value. - */ - for_each_clamp_id(clamp_id) - uclamp_rq_reinc_id(rq, p, clamp_id); + uclamp_update_active_nolock(p); task_rq_unlock(rq, p, &rf); } @@ -1998,26 +1761,22 @@ static void __setscheduler_uclamp(struct task_struct *p, uclamp_se_set(&p->uclamp_req[UCLAMP_MAX], attr->sched_util_max, true); } + + uclamp_update_active_nolock(p); } static void uclamp_fork(struct task_struct *p) { enum uclamp_id clamp_id; - /* - * We don't need to hold task_rq_lock() when updating p->uclamp_* here - * as the task is still at its early fork stages. - */ - for_each_clamp_id(clamp_id) - p->uclamp[clamp_id].active = false; - - if (likely(!p->sched_reset_on_fork)) - return; - - for_each_clamp_id(clamp_id) { - uclamp_se_set(&p->uclamp_req[clamp_id], - uclamp_none(clamp_id), false); + if (unlikely(p->sched_reset_on_fork)) { + for_each_clamp_id(clamp_id) { + uclamp_se_set(&p->uclamp_req[clamp_id], + uclamp_none(clamp_id), false); + } } + + uclamp_update_active(p); } static void uclamp_post_fork(struct task_struct *p) @@ -2025,28 +1784,10 @@ static void uclamp_post_fork(struct task_struct *p) uclamp_update_util_min_rt_default(p); } -static void __init init_uclamp_rq(struct rq *rq) -{ - enum uclamp_id clamp_id; - struct uclamp_rq *uc_rq = rq->uclamp; - - for_each_clamp_id(clamp_id) { - uc_rq[clamp_id] = (struct uclamp_rq) { - .value = uclamp_none(clamp_id) - }; - } - - rq->uclamp_flags = UCLAMP_FLAG_IDLE; -} - static void __init init_uclamp(void) { struct uclamp_se uc_max = {}; enum uclamp_id clamp_id; - int cpu; - - for_each_possible_cpu(cpu) - init_uclamp_rq(cpu_rq(cpu)); for_each_clamp_id(clamp_id) { uclamp_se_set(&init_task.uclamp_req[clamp_id], @@ -2065,8 +1806,7 @@ static void __init init_uclamp(void) } #else /* CONFIG_UCLAMP_TASK */ -static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p) { } -static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p) { } +static inline void uclamp_update_active_nolock(struct task_struct *p) { } static inline int uclamp_validate(struct task_struct *p, const struct sched_attr *attr) { @@ -2113,7 +1853,6 @@ static inline void enqueue_task(struct rq *rq, struct task_struct *p, int flags) psi_enqueue(p, (flags & ENQUEUE_WAKEUP) && !(flags & ENQUEUE_MIGRATED)); } - uclamp_rq_inc(rq, p); p->sched_class->enqueue_task(rq, p, flags); if (sched_core_enabled(rq)) @@ -2133,7 +1872,6 @@ static inline void dequeue_task(struct rq *rq, struct task_struct *p, int flags) psi_dequeue(p, flags & DEQUEUE_SLEEP); } - uclamp_rq_dec(rq, p); p->sched_class->dequeue_task(rq, p, flags); } @@ -10480,6 +10218,7 @@ void sched_move_task(struct task_struct *tsk) put_prev_task(rq, tsk); sched_change_group(tsk, group); + uclamp_update_active_nolock(tsk); if (queued) enqueue_task(rq, tsk, queue_flags); @@ -10612,7 +10351,6 @@ static void cpu_util_update_eff(struct cgroup_subsys_state *css) if (eff[clamp_id] == uc_se[clamp_id].value) continue; uc_se[clamp_id].value = eff[clamp_id]; - uc_se[clamp_id].bucket_id = uclamp_bucket_id(eff[clamp_id]); clamps |= (0x1 << clamp_id); } if (!clamps) { diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 49997f1f58fb..ac1dd5739ec6 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -12996,10 +12996,6 @@ DEFINE_SCHED_CLASS(fair) = { #ifdef CONFIG_SCHED_CORE .task_is_throttled = task_is_throttled_fair, #endif - -#ifdef CONFIG_UCLAMP_TASK - .uclamp_enabled = 1, -#endif }; #ifdef CONFIG_SCHED_DEBUG diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 3261b067b67e..86733bed0e3c 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2681,10 +2681,6 @@ DEFINE_SCHED_CLASS(rt) = { #ifdef CONFIG_SCHED_CORE .task_is_throttled = task_is_throttled_rt, #endif - -#ifdef CONFIG_UCLAMP_TASK - .uclamp_enabled = 1, -#endif }; #ifdef CONFIG_RT_GROUP_SCHED diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 3ee28822f48f..81578410984c 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -913,46 +913,6 @@ extern void rto_push_irq_work_func(struct irq_work *work); #endif /* CONFIG_SMP */ #ifdef CONFIG_UCLAMP_TASK -/* - * struct uclamp_bucket - Utilization clamp bucket - * @value: utilization clamp value for tasks on this clamp bucket - * @tasks: number of RUNNABLE tasks on this clamp bucket - * - * Keep track of how many tasks are RUNNABLE for a given utilization - * clamp value. - */ -struct uclamp_bucket { - unsigned long value : bits_per(SCHED_CAPACITY_SCALE); - unsigned long tasks : BITS_PER_LONG - bits_per(SCHED_CAPACITY_SCALE); -}; - -/* - * struct uclamp_rq - rq's utilization clamp - * @value: currently active clamp values for a rq - * @bucket: utilization clamp buckets affecting a rq - * - * Keep track of RUNNABLE tasks on a rq to aggregate their clamp values. - * A clamp value is affecting a rq when there is at least one task RUNNABLE - * (or actually running) with that value. - * - * There are up to UCLAMP_CNT possible different clamp values, currently there - * are only two: minimum utilization and maximum utilization. - * - * All utilization clamping values are MAX aggregated, since: - * - for util_min: we want to run the CPU at least at the max of the minimum - * utilization required by its currently RUNNABLE tasks. - * - for util_max: we want to allow the CPU to run up to the max of the - * maximum utilization allowed by its currently RUNNABLE tasks. - * - * Since on each system we expect only a limited number of different - * utilization clamp values (UCLAMP_BUCKETS), use a simple array to track - * the metrics required to compute all the per-rq utilization clamp values. - */ -struct uclamp_rq { - unsigned int value; - struct uclamp_bucket bucket[UCLAMP_BUCKETS]; -}; - DECLARE_STATIC_KEY_FALSE(sched_uclamp_used); #endif /* CONFIG_UCLAMP_TASK */ @@ -995,11 +955,7 @@ struct rq { u64 nr_switches; #ifdef CONFIG_UCLAMP_TASK - /* Utilization clamp values based on CPU's RUNNABLE tasks */ - struct uclamp_rq uclamp[UCLAMP_CNT] ____cacheline_aligned; - unsigned int uclamp_flags; unsigned int root_cfs_util_uclamp; -#define UCLAMP_FLAG_IDLE 0x01 #endif struct cfs_rq cfs; @@ -2247,11 +2203,6 @@ struct affinity_context { extern s64 update_curr_common(struct rq *rq); struct sched_class { - -#ifdef CONFIG_UCLAMP_TASK - int uclamp_enabled; -#endif - void (*enqueue_task) (struct rq *rq, struct task_struct *p, int flags); void (*dequeue_task) (struct rq *rq, struct task_struct *p, int flags); void (*yield_task) (struct rq *rq); @@ -3042,23 +2993,6 @@ static inline unsigned long cpu_util_rt(struct rq *rq) #ifdef CONFIG_UCLAMP_TASK unsigned long uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id); -static inline unsigned long uclamp_rq_get(struct rq *rq, - enum uclamp_id clamp_id) -{ - return READ_ONCE(rq->uclamp[clamp_id].value); -} - -static inline void uclamp_rq_set(struct rq *rq, enum uclamp_id clamp_id, - unsigned int value) -{ - WRITE_ONCE(rq->uclamp[clamp_id].value, value); -} - -static inline bool uclamp_rq_is_idle(struct rq *rq) -{ - return rq->uclamp_flags & UCLAMP_FLAG_IDLE; -} - /* Is the rq being capped/throttled by uclamp_max? */ static inline bool uclamp_rq_is_capped(struct rq *rq) { @@ -3147,25 +3081,6 @@ static inline bool uclamp_is_used(void) return false; } -static inline unsigned long uclamp_rq_get(struct rq *rq, - enum uclamp_id clamp_id) -{ - if (clamp_id == UCLAMP_MIN) - return 0; - - return SCHED_CAPACITY_SCALE; -} - -static inline void uclamp_rq_set(struct rq *rq, enum uclamp_id clamp_id, - unsigned int value) -{ -} - -static inline bool uclamp_rq_is_idle(struct rq *rq) -{ - return false; -} - static inline unsigned long root_cfs_util(struct rq *rq) { return READ_ONCE(rq->cfs.avg.util_avg); From patchwork Thu Feb 1 13:12:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hongyan Xia X-Patchwork-Id: 195325 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2719:b0:106:209c:c626 with SMTP id hl25csp139856dyb; Thu, 1 Feb 2024 05:14:15 -0800 (PST) X-Google-Smtp-Source: AGHT+IF7COK8JsKGQsDjhL+Zu/1brvcMGVUOpJoznTuHmdVfozmKr1HJvhSYpJiOnpvqljDSFFK6 X-Received: by 2002:ac5:cb6b:0:b0:4b6:c3ae:97f6 with SMTP id l11-20020ac5cb6b000000b004b6c3ae97f6mr3861534vkn.0.1706793255557; Thu, 01 Feb 2024 05:14:15 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706793255; cv=pass; d=google.com; s=arc-20160816; b=kxyPC48pi6iwybrIFvb68Zx7BuAnLthdYrI1omNXwbWsH/NAYquWap/ptkEhSG62+7 fCKcNWCqQnr7zobPhhHDjXnvdnNbT9GDbroUxoP+fBmRHUWLdOn/634j39hZNQ15Ys+J dknh6luOdGhBN+TI5SgEq5wc3Cs7hGnDJQzAfmHTKb89bGsswEctrl+WqvFHtg8P9p1T Z3xB9YGUQ4G1oWexN8lBA8ygsIm9uIBAxvbWXBuLdl3mhZV2K28nvNBwD/8v6ygnFQ2f W9gkDvro0/ULjHmDH9hJmY2IOB+EqSkFPkUaNY+EC6PEyLLKhkwoHK/KqnigLHv9tmVW plUw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=IyvWki8q+kJAGpq51ZzJuLeW2zKD0gRPQQBpgVJlcv4=; fh=mlu4Y5UUPhkoqMXx694uYjlecARuE7VJRo50FKfEKig=; b=c+aC4KuQd0vsbsMlfK8LtbPpop4ddSO/CPU7yf/AdLxKglA4SCmQVPKJDPm28+Y1Om Y84CDMbHFnD6rd51uiYtl/54P9xadR+94ApE4VNNIhm9UWcuXMib1VIjQiFntv6z8G+I OU8IfCoH3ugIa3d1jkcj942rQVQK5EcRwpJzTjgdrIhMmxHyi+/6t4B3+4UYx9qBwbb+ vP+NQu70DZ4XzzdhVPH82Y1W0U/+vFm1hoej9E4lI3/prqEywCIOxFbtUJVDeoN3KJ5i H0yEAgVeQ/7tfINy0UtwsjFErvgsvdiHCh030aPxhTfw4SPo1jcjLpm/XxiC9aEl+tcJ OkzQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-48210-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-48210-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com X-Forwarded-Encrypted: i=1; AJvYcCVNzosMb29DaSvdAZZ1lE77l2FZ7fL+0n8MZJo7im2tqGgFBI0Rgar0djEw9KVIwFqkGP+x9F43odiVMqXbOCugFzLMSw== Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id gc15-20020a056214230f00b00680b1758fdbsi14354544qvb.383.2024.02.01.05.14.15 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Feb 2024 05:14:15 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-48210-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-48210-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-48210-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 53D5F1C270A8 for ; Thu, 1 Feb 2024 13:14:15 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C39DD5F490; Thu, 1 Feb 2024 13:12:38 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E04285D48A for ; Thu, 1 Feb 2024 13:12:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706793157; cv=none; b=nbDCH1/u8l4RFM9ZyTF0zXfps7VXitYP5aypWjF/Gm3iSxNiiWCHqI2/SEY38x/Sa+lBEe4Oab0lr4SViwC9VdwpcMTb2lLHeWO1ZD7FMKh+aqqz7C6XqZSC1ZuRI2oqnj0aUW0QF2Z9VxtOeJkIENcHTHuwPE7MoZvHqfMJczE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706793157; c=relaxed/simple; bh=qyzq9pirjl2WQPG64jgIbtQm4pn18T/tfWhWUBDLcXY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=UNgaMlQB3LAVvuHRY+D5fJV18eCSinpFyUM6FOkBFSxTSKlFva4ZZF8ouk+ndZj/CyxZevoJG+GwckJIV8QJG5QI6yLtlHxLhbhWsWv+y7534HvJyj3OmzTXMr3VEqOPb8PUc1kkWBdqmr3YuTAOFL4RTGNnzX43T7U4ZCc60Ag= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 108E2176B; Thu, 1 Feb 2024 05:13:17 -0800 (PST) Received: from e130256.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 442993F762; Thu, 1 Feb 2024 05:12:32 -0800 (PST) From: Hongyan Xia To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , Dietmar Eggemann , Juri Lelli , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider Cc: Qais Yousef , Morten Rasmussen , Lukasz Luba , Christian Loehle , linux-kernel@vger.kernel.org, David Dai , Saravana Kannan Subject: [RFC PATCH v2 7/7] sched/uclamp: Simplify uclamp_eff_value() Date: Thu, 1 Feb 2024 13:12:03 +0000 Message-Id: <215a6377e1aef10460d1aa870fb06774680925c5.1706792708.git.hongyan.xia2@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789702444618848997 X-GMAIL-MSGID: 1789702444618848997 The commit sched: Remove all uclamp bucket logic removes uclamp_rq_{inc/dec}() functions, so now p->uclamp contains the correct values all the time after a uclamp_update_active() call, and there's no need to toggle the boolean `active` after an update. As a result, this function is fairly simple now and can live as a static inline function. Signed-off-by: Hongyan Xia --- kernel/sched/core.c | 13 ++++--------- kernel/sched/sched.h | 14 ++++++++++++-- 2 files changed, 16 insertions(+), 11 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index a3b36adc4dcc..f5f5f056525c 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1499,21 +1499,15 @@ uclamp_eff_get(struct task_struct *p, enum uclamp_id clamp_id) return uc_req; } -unsigned long uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id) -{ - if (!uclamp_is_used() || !p->uclamp[clamp_id].active) - return uclamp_none(clamp_id); - - return p->uclamp[clamp_id].value; -} - static inline void uclamp_update_active_nolock(struct task_struct *p) { enum uclamp_id clamp_id; - for_each_clamp_id(clamp_id) + for_each_clamp_id(clamp_id) { p->uclamp[clamp_id] = uclamp_eff_get(p, clamp_id); + p->uclamp[clamp_id].active = 1; + } } static inline void @@ -1773,6 +1767,7 @@ static void uclamp_fork(struct task_struct *p) for_each_clamp_id(clamp_id) { uclamp_se_set(&p->uclamp_req[clamp_id], uclamp_none(clamp_id), false); + p->uclamp[clamp_id].active = 0; } } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 81578410984c..2caefc3344bb 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2991,8 +2991,6 @@ static inline unsigned long cpu_util_rt(struct rq *rq) #endif #ifdef CONFIG_UCLAMP_TASK -unsigned long uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id); - /* Is the rq being capped/throttled by uclamp_max? */ static inline bool uclamp_rq_is_capped(struct rq *rq) { @@ -3022,6 +3020,18 @@ static inline bool uclamp_is_used(void) return static_branch_likely(&sched_uclamp_used); } +static inline unsigned long uclamp_eff_value(struct task_struct *p, + enum uclamp_id clamp_id) +{ + if (uclamp_is_used() && p->uclamp[clamp_id].active) + return p->uclamp[clamp_id].value; + + if (clamp_id == UCLAMP_MIN) + return 0; + + return SCHED_CAPACITY_SCALE; +} + static inline unsigned long root_cfs_util(struct rq *rq) { return READ_ONCE(rq->root_cfs_util_uclamp);