From patchwork Sat Nov 4 10:59:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Bristot de Oliveira X-Patchwork-Id: 161572 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:8f47:0:b0:403:3b70:6f57 with SMTP id j7csp1580991vqu; Sat, 4 Nov 2023 04:01:28 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFwby+ZwpQNnm1pdbslORQnupWbwbvuWloZNSfitnSXvnI7V3J1vbEg/5bTKzzyyBn5RLYm X-Received: by 2002:a05:6871:490f:b0:1f0:8122:554c with SMTP id tw15-20020a056871490f00b001f08122554cmr3147604oab.45.1699095688532; Sat, 04 Nov 2023 04:01:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1699095688; cv=none; d=google.com; s=arc-20160816; b=OJgjJ+VCsT4J2f+vOJK5WBifsjolbS82F2Ozv/zBU6mHEb0XCLmTWxKWDgmG47243b DUdoGxulkAPVjN49D6XD0yi2pKAFdep7BgWx2FaqP9XejzY0T6pDEtJh6yTC8ycK37zs VtNchhbHIzwbeNhGkAO1CAPFLn/dLdKXShmskULyqsjvBilM0vNShiKH4+Wyk2jf1jHb Jl6A6/3Q7lNdE4xC7btmpdy5GrZhTwSDSR+bKDJFc0pdiAYIu0ip+TCXs1SiRal+1997 qLTFmLHH2fMkqeQMQ3Nu1UkZrCC04i4V9uQmyniHGt44JCnQf8qGDZOWdaef4t69J/Fg lbKg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=GK9S8n0GW/w9dJCHzAB9RXcIByGm/urSdvY3ssLtCAM=; fh=hclUZirF6ioy5RJ9ji4zprhkXEWhJA3xztaBC4vzfLk=; b=GKR9zibJdoMCp40yxXNJk4DMAM5qG/mAEwV2zr7Sqy0ef9+LuTskrfCrmRteli7vgd URvf4OBSeeUY1TBcVrhXuhxQsm7hatLWqQVDeyPM4Cz4BchKP3/R5XZ7u7pfzzf8ZGey jCOrlJj3asCwxxfkcX7QE7MqwoFWc0TMdnzWiz+tp4v7cH2YI1sNvvxDCByADgGKmqgO r9BA4U3MZC//3JNjdTXQmLDkFynCDO5vJswmhHzu88nnXyi0yOnWO8ZMBEhzGdr7/Hd2 D7nghSFUt0hhfkQ8Tawmr6yK3eGOwkWXBPdvPDjo6Qq5BfzJhOr/LC6TtHB0OFvGR+Yo o9uw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=KOGxPKKE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id td23-20020a056871879700b001e133b942b1si1486606oab.189.2023.11.04.04.01.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 04 Nov 2023 04:01:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=KOGxPKKE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 3A7BC808D4B4; Sat, 4 Nov 2023 04:00:16 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232434AbjKDLAF (ORCPT + 35 others); Sat, 4 Nov 2023 07:00:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43620 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232224AbjKDK7u (ORCPT ); Sat, 4 Nov 2023 06:59:50 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C527F10EF for ; Sat, 4 Nov 2023 03:59:37 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 98649C433C9; Sat, 4 Nov 2023 10:59:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1699095577; bh=30veBuA8sJLSUCT/beJzGCN3zaP/svCVEhvNMdYkQl4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=KOGxPKKEujenwLXbIOUstwsPjUBAt7UocvVLv5XQvT1RH/gXE30/zgMJ76azrYmqo JcbI/pMuV4kqwXlzsuAMSldLQ1F7CgvP0GzAr9UJHPwrHepZOWmZ6kDzT46K8t2NtW r1SdFZpMFaqIuWE7mbFCfVl0gaj1leSQRJNbOEOrn6nJyPQ9g7hEMBhhqulUctBjzZ YYith9xpTjo3gmTpjf/FV77q7DFu4BWWroiIT9qzVJWAMIVg01ewFExlzF/DJWe/A/ XvrQkTyvPwGhu/s2B+oNipSwMT5qL/SmTKsFEA9ou4yoLH3AINbSUN2NQNjzfLqNNS OQ03KMqm7ZyOg== From: Daniel Bristot de Oliveira To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot Cc: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , linux-kernel@vger.kernel.org, Luca Abeni , Tommaso Cucinotta , Thomas Gleixner , Joel Fernandes , Vineeth Pillai , Shuah Khan , bristot@kernel.org, Phil Auld Subject: [PATCH v5 1/7] sched: Unify runtime accounting across classes Date: Sat, 4 Nov 2023 11:59:18 +0100 Message-Id: <54d148a144f26d9559698c4dd82d8859038a7380.1699095159.git.bristot@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 X-Spam-Status: No, score=-1.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Sat, 04 Nov 2023 04:00:16 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781630960555242789 X-GMAIL-MSGID: 1781630960555242789 From: Peter Zijlstra All classes use sched_entity::exec_start to track runtime and have copies of the exact same code around to compute runtime. Collapse all that. Reviewed-by: Phil Auld Reviewed-by: Valentin Schneider Reviewed-by: Steven Rostedt (Google) Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Daniel Bristot de Oliveira --- include/linux/sched.h | 2 +- kernel/sched/deadline.c | 15 +++-------- kernel/sched/fair.c | 57 ++++++++++++++++++++++++++++++---------- kernel/sched/rt.c | 15 +++-------- kernel/sched/sched.h | 12 ++------- kernel/sched/stop_task.c | 13 +-------- 6 files changed, 53 insertions(+), 61 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 12ec109ce8c9..31eee8b03dcd 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -523,7 +523,7 @@ struct sched_statistics { u64 block_max; s64 sum_block_runtime; - u64 exec_max; + s64 exec_max; u64 slice_max; u64 nr_migrations_cold; diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index b28114478b82..de79719c63c0 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1275,9 +1275,8 @@ static void update_curr_dl(struct rq *rq) { struct task_struct *curr = rq->curr; struct sched_dl_entity *dl_se = &curr->dl; - u64 delta_exec, scaled_delta_exec; + s64 delta_exec, scaled_delta_exec; int cpu = cpu_of(rq); - u64 now; if (!dl_task(curr) || !on_dl_rq(dl_se)) return; @@ -1290,21 +1289,13 @@ static void update_curr_dl(struct rq *rq) * natural solution, but the full ramifications of this * approach need further study. */ - now = rq_clock_task(rq); - delta_exec = now - curr->se.exec_start; - if (unlikely((s64)delta_exec <= 0)) { + delta_exec = update_curr_common(rq); + if (unlikely(delta_exec <= 0)) { if (unlikely(dl_se->dl_yielded)) goto throttle; return; } - schedstat_set(curr->stats.exec_max, - max(curr->stats.exec_max, delta_exec)); - - trace_sched_stat_runtime(curr, delta_exec, 0); - - update_current_exec_runtime(curr, now, delta_exec); - if (dl_entity_is_special(dl_se)) return; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 8767988242ee..2613704a2d2d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1129,23 +1129,17 @@ static void update_tg_load_avg(struct cfs_rq *cfs_rq) } #endif /* CONFIG_SMP */ -/* - * Update the current task's runtime statistics. - */ -static void update_curr(struct cfs_rq *cfs_rq) +static s64 update_curr_se(struct rq *rq, struct sched_entity *curr) { - struct sched_entity *curr = cfs_rq->curr; - u64 now = rq_clock_task(rq_of(cfs_rq)); - u64 delta_exec; - - if (unlikely(!curr)) - return; + u64 now = rq_clock_task(rq); + s64 delta_exec; delta_exec = now - curr->exec_start; - if (unlikely((s64)delta_exec <= 0)) - return; + if (unlikely(delta_exec <= 0)) + return delta_exec; curr->exec_start = now; + curr->sum_exec_runtime += delta_exec; if (schedstat_enabled()) { struct sched_statistics *stats; @@ -1155,8 +1149,43 @@ static void update_curr(struct cfs_rq *cfs_rq) max(delta_exec, stats->exec_max)); } - curr->sum_exec_runtime += delta_exec; - schedstat_add(cfs_rq->exec_clock, delta_exec); + return delta_exec; +} + +/* + * Used by other classes to account runtime. + */ +s64 update_curr_common(struct rq *rq) +{ + struct task_struct *curr = rq->curr; + s64 delta_exec; + + delta_exec = update_curr_se(rq, &curr->se); + if (unlikely(delta_exec <= 0)) + return delta_exec; + + trace_sched_stat_runtime(curr, delta_exec, 0); + + account_group_exec_runtime(curr, delta_exec); + cgroup_account_cputime(curr, delta_exec); + + return delta_exec; +} + +/* + * Update the current task's runtime statistics. + */ +static void update_curr(struct cfs_rq *cfs_rq) +{ + struct sched_entity *curr = cfs_rq->curr; + s64 delta_exec; + + if (unlikely(!curr)) + return; + + delta_exec = update_curr_se(rq_of(cfs_rq), curr); + if (unlikely(delta_exec <= 0)) + return; curr->vruntime += calc_delta_fair(delta_exec, curr); update_deadline(cfs_rq, curr); diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 6aaf0a3d6081..3261b067b67e 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -1002,24 +1002,15 @@ static void update_curr_rt(struct rq *rq) { struct task_struct *curr = rq->curr; struct sched_rt_entity *rt_se = &curr->rt; - u64 delta_exec; - u64 now; + s64 delta_exec; if (curr->sched_class != &rt_sched_class) return; - now = rq_clock_task(rq); - delta_exec = now - curr->se.exec_start; - if (unlikely((s64)delta_exec <= 0)) + delta_exec = update_curr_common(rq); + if (unlikely(delta_exec <= 0)) return; - schedstat_set(curr->stats.exec_max, - max(curr->stats.exec_max, delta_exec)); - - trace_sched_stat_runtime(curr, delta_exec, 0); - - update_current_exec_runtime(curr, now, delta_exec); - if (!rt_bandwidth_enabled()) return; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 2e5a95486a42..3e0e4fc8734b 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2212,6 +2212,8 @@ struct affinity_context { unsigned int flags; }; +extern s64 update_curr_common(struct rq *rq); + struct sched_class { #ifdef CONFIG_UCLAMP_TASK @@ -3261,16 +3263,6 @@ extern int sched_dynamic_mode(const char *str); extern void sched_dynamic_update(int mode); #endif -static inline void update_current_exec_runtime(struct task_struct *curr, - u64 now, u64 delta_exec) -{ - curr->se.sum_exec_runtime += delta_exec; - account_group_exec_runtime(curr, delta_exec); - - curr->se.exec_start = now; - cgroup_account_cputime(curr, delta_exec); -} - #ifdef CONFIG_SCHED_MM_CID #define SCHED_MM_CID_PERIOD_NS (100ULL * 1000000) /* 100ms */ diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c index 6cf7304e6449..b1b8fe61c532 100644 --- a/kernel/sched/stop_task.c +++ b/kernel/sched/stop_task.c @@ -70,18 +70,7 @@ static void yield_task_stop(struct rq *rq) static void put_prev_task_stop(struct rq *rq, struct task_struct *prev) { - struct task_struct *curr = rq->curr; - u64 now, delta_exec; - - now = rq_clock_task(rq); - delta_exec = now - curr->se.exec_start; - if (unlikely((s64)delta_exec < 0)) - delta_exec = 0; - - schedstat_set(curr->stats.exec_max, - max(curr->stats.exec_max, delta_exec)); - - update_current_exec_runtime(curr, now, delta_exec); + update_curr_common(rq); } /* From patchwork Sat Nov 4 10:59:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Bristot de Oliveira X-Patchwork-Id: 161568 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:8f47:0:b0:403:3b70:6f57 with SMTP id j7csp1580869vqu; Sat, 4 Nov 2023 04:01:18 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG3clyi8nFfjNhCgkD52RdVF6b+B53pa3sevYsvKXi9E9obooHMe8ZpUEbjHNZISIgIIR4n X-Received: by 2002:a05:6602:340e:b0:7a9:4a6e:66a2 with SMTP id n14-20020a056602340e00b007a94a6e66a2mr31612501ioz.16.1699095677991; Sat, 04 Nov 2023 04:01:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1699095677; cv=none; d=google.com; s=arc-20160816; b=S4xeEjJIwJsxvrKkgMBaH4RHoXa8qKZHfYCPMczXZK+065KElLYmFWm/BlFrDW031A lU6qDXGxc/KBkvUn61guC+tTh/V2d8KgWZc6LdOyYNwim8ipN9aqrHgVVMVb7u+IBRVo np3499TIlgFcBAWbi+Oi+r2qzkVIdev9lXdnnoraKddwQIlZitG0pEGLeuD6otUFaoJn O8lHy6v4xa8HDkIRk8C8plV9PwXSQbB3y3YU99zvkDZLM5j1xUDUietteuiZX/diKdk0 S2aOKlQNQlJSMvBtkrDWbG4SqIsMm4T9oIs1TwRxd9m7ZmSUtp8mQpPf7SSF8TxhdYuf 1SDg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Dfs2XRPeKsTWDY51BaFMUWmP+4pTQh8S0ZCI8/F2mAM=; fh=hclUZirF6ioy5RJ9ji4zprhkXEWhJA3xztaBC4vzfLk=; b=boYU3K/Y3LtKm7Q0CEuD9YngXeaWZmxCsRzN8MaEsQMhJYKwMreAM57tjRnfqEOglI kqW1t/y9b+U376E84o54G1W3rpIJ7gvJlt74m9Da9bEVgCIGN93QpZKSu5ulLJfBi74L 9u5ziwp+zk0xIEV3an6GBYV7Iv78nhreJfdRLzsyJ7dRdQdTDKHFZh96F4BM/r31EFIJ L5XF6eww4n3y49F6jiwdtrTJKMe19CH5ZMZGBXMMnhalArlWZDkTlGRSxXYJ2kU30i+A njJ6BnvlPvVqDLHVRfM4xhQ8Ac2xIkWj1zif87DTxxmvbOROu6IPr4I0eh6Kyx1OtML6 TkkA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=ph805Tps; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id e21-20020a056638021500b00430a3e4a141si938500jaq.105.2023.11.04.04.01.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 04 Nov 2023 04:01:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=ph805Tps; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 2B8E0807D93F; Sat, 4 Nov 2023 04:00:22 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232463AbjKDLAJ (ORCPT + 35 others); Sat, 4 Nov 2023 07:00:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54878 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232351AbjKDK7v (ORCPT ); Sat, 4 Nov 2023 06:59:51 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E43F10F7 for ; Sat, 4 Nov 2023 03:59:42 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D2541C433CB; Sat, 4 Nov 2023 10:59:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1699095581; bh=MDvdVHguLiwpNS3H7vcRQ+zWxJrkMCtJsjHIN2wo64Q=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ph805TpsTvJ/jJ0OQwVNG5ehTEytS9Q3ZOmM1q0SF23ZX4QeD8laH7PB896zYS33G 8gx1MOiolibtU76jNCAiuR6bPKmOv3B3bl0q5NQqUBudwfpedNKHs9RbuJdZgjFZvR QSIBBQckVEyOnUMNi0Z8xIx3qkQne5vxPniejx+xRr8HL3rync1BrCtzQljQvisd74 FWb9BXrc/O0NeL0m4sosuk2DhVNhaawfm4IWDvwCk86Sdfm69MHkuaN36suMLMjlLR wW4tMh8eHJ/GFju+XY8RWwf5iQLHkAKMrSnrZfc9xbslYQYsARFVKNgtnZO+JaaMNF H5/xeUAIDNbSQ== From: Daniel Bristot de Oliveira To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot Cc: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , linux-kernel@vger.kernel.org, Luca Abeni , Tommaso Cucinotta , Thomas Gleixner , Joel Fernandes , Vineeth Pillai , Shuah Khan , bristot@kernel.org, Phil Auld Subject: [PATCH v5 2/7] sched/deadline: Collect sched_dl_entity initialization Date: Sat, 4 Nov 2023 11:59:19 +0100 Message-Id: <51acc695eecf0a1a2f78f9a044e11ffd9b316bcf.1699095159.git.bristot@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Sat, 04 Nov 2023 04:00:22 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781630949818347616 X-GMAIL-MSGID: 1781630949818347616 From: Peter Zijlstra Create a single function that initializes a sched_dl_entity. Reviewed-by: Phil Auld Reviewed-by: Valentin Schneider Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Daniel Bristot de Oliveira --- kernel/sched/core.c | 5 +---- kernel/sched/deadline.c | 22 +++++++++++++++------- kernel/sched/sched.h | 5 +---- 3 files changed, 17 insertions(+), 15 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 3d7e2d702699..257369d30303 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4509,10 +4509,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p) memset(&p->stats, 0, sizeof(p->stats)); #endif - RB_CLEAR_NODE(&p->dl.rb_node); - init_dl_task_timer(&p->dl); - init_dl_inactive_task_timer(&p->dl); - __dl_clear_params(p); + init_dl_entity(&p->dl); INIT_LIST_HEAD(&p->rt.run_list); p->rt.timeout = 0; diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index de79719c63c0..e80bb884262d 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -335,6 +335,8 @@ static void dl_change_utilization(struct task_struct *p, u64 new_bw) __add_rq_bw(new_bw, &rq->dl); } +static void __dl_clear_params(struct sched_dl_entity *dl_se); + /* * The utilization of a task cannot be immediately removed from * the rq active utilization (running_bw) when the task blocks. @@ -434,7 +436,7 @@ static void task_non_contending(struct task_struct *p) raw_spin_lock(&dl_b->lock); __dl_sub(dl_b, p->dl.dl_bw, dl_bw_cpus(task_cpu(p))); raw_spin_unlock(&dl_b->lock); - __dl_clear_params(p); + __dl_clear_params(dl_se); } return; @@ -1183,7 +1185,7 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer) return HRTIMER_NORESTART; } -void init_dl_task_timer(struct sched_dl_entity *dl_se) +static void init_dl_task_timer(struct sched_dl_entity *dl_se) { struct hrtimer *timer = &dl_se->dl_timer; @@ -1389,7 +1391,7 @@ static enum hrtimer_restart inactive_task_timer(struct hrtimer *timer) raw_spin_lock(&dl_b->lock); __dl_sub(dl_b, p->dl.dl_bw, dl_bw_cpus(task_cpu(p))); raw_spin_unlock(&dl_b->lock); - __dl_clear_params(p); + __dl_clear_params(dl_se); goto unlock; } @@ -1405,7 +1407,7 @@ static enum hrtimer_restart inactive_task_timer(struct hrtimer *timer) return HRTIMER_NORESTART; } -void init_dl_inactive_task_timer(struct sched_dl_entity *dl_se) +static void init_dl_inactive_task_timer(struct sched_dl_entity *dl_se) { struct hrtimer *timer = &dl_se->inactive_timer; @@ -2957,10 +2959,8 @@ bool __checkparam_dl(const struct sched_attr *attr) /* * This function clears the sched_dl_entity static params. */ -void __dl_clear_params(struct task_struct *p) +static void __dl_clear_params(struct sched_dl_entity *dl_se) { - struct sched_dl_entity *dl_se = &p->dl; - dl_se->dl_runtime = 0; dl_se->dl_deadline = 0; dl_se->dl_period = 0; @@ -2978,6 +2978,14 @@ void __dl_clear_params(struct task_struct *p) #endif } +void init_dl_entity(struct sched_dl_entity *dl_se) +{ + RB_CLEAR_NODE(&dl_se->rb_node); + init_dl_task_timer(dl_se); + init_dl_inactive_task_timer(dl_se); + __dl_clear_params(dl_se); +} + bool dl_param_changed(struct task_struct *p, const struct sched_attr *attr) { struct sched_dl_entity *dl_se = &p->dl; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 3e0e4fc8734b..4f5f5a2778a9 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -273,8 +273,6 @@ struct rt_bandwidth { unsigned int rt_period_active; }; -void __dl_clear_params(struct task_struct *p); - static inline int dl_bandwidth_enabled(void) { return sysctl_sched_rt_runtime >= 0; @@ -2427,8 +2425,7 @@ extern struct rt_bandwidth def_rt_bandwidth; extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime); extern bool sched_rt_bandwidth_account(struct rt_rq *rt_rq); -extern void init_dl_task_timer(struct sched_dl_entity *dl_se); -extern void init_dl_inactive_task_timer(struct sched_dl_entity *dl_se); +extern void init_dl_entity(struct sched_dl_entity *dl_se); #define BW_SHIFT 20 #define BW_UNIT (1 << BW_SHIFT) From patchwork Sat Nov 4 10:59:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Bristot de Oliveira X-Patchwork-Id: 161574 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:8f47:0:b0:403:3b70:6f57 with SMTP id j7csp1581151vqu; Sat, 4 Nov 2023 04:01:42 -0700 (PDT) X-Google-Smtp-Source: AGHT+IED6oAga+UOb6H1aXL7RrJZf/62V1sdZvMv4fjuo6C1VqW4bgxlZQBhYzXjD5gs892b87qc X-Received: by 2002:a05:6870:13d5:b0:1f0:5248:c6a6 with SMTP id 21-20020a05687013d500b001f05248c6a6mr6234878oat.51.1699095701756; Sat, 04 Nov 2023 04:01:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1699095701; cv=none; d=google.com; s=arc-20160816; b=ZSLfT/9Y4LoScC1N10rElteQiNy7X63KsZWWLleJPfbzV5ARbRsTRjGXojdTsfFRhu MZrnV2rZPqQ0aP2vAPl4m+63cJ5L8Kq6RyQIxRu7rih9PsnSQEbY7EM5reH13ugIyady +ETp4XfNPqiRb0HDxzfv7H2gdz2bdzlugbCRsDcDgssjQlJ21Gq+8geJWS8y/xXy8skI QwNojfX8IARFuaVW/Zs/iy7h0f3gwDwqkC4x2HwWSR/bQY0dl7oK0+UzSMgeL/KPIaei yQJlAlfalesqqhNVO0DDi9x9w8r65lKPNDZqAYDYzhj9H4ZQW1+N37yFSAH52Jqw3/cX 6J+g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=LUdu/JUvR+QN3fLvPo1/3zSkO/z5WwTcYs2DXOXpEt8=; fh=hclUZirF6ioy5RJ9ji4zprhkXEWhJA3xztaBC4vzfLk=; b=lUKhnMDTrWV12PT67GMud//uykUXehnlR6UWNRa4yy5d8fNE+VM28TBbvvewwf+xwa efZn6mf7xThOvmtU2PNlhT07fMvTOQQqEhch4/S9TTOMwuhCC9yyMX2azNivsJcIcPHh r/9X17LOcQjLUeW9x4SvNYu2OGgynnB4tSpYqZoHdeUUvBN5oYclFXXkOr6bT6xXaSuC RXkgWIlCYrlGJKYBzkPFzwYjNm213FvDFObMDGLlUU3S5B8B/l+iZCkPhemJ4QHxPYoo 9nK9ND85LzcrZArARMTu9Y7Kd6ErgOiZHvhmq/xIrOAN/d+cZv1+u26Fgvehz3HMBt/D L4+A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=hffLETw9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id op3-20020a0568702dc300b001eacda6197dsi1478984oab.128.2023.11.04.04.01.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 04 Nov 2023 04:01:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=hffLETw9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 0B93C8061CE1; Sat, 4 Nov 2023 04:01:35 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231998AbjKDLAW (ORCPT + 35 others); Sat, 4 Nov 2023 07:00:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36026 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232285AbjKDK7z (ORCPT ); Sat, 4 Nov 2023 06:59:55 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 877B3D76 for ; Sat, 4 Nov 2023 03:59:46 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1AC5BC433C7; Sat, 4 Nov 2023 10:59:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1699095586; bh=pTX0QChNryydEm13yCO7Bn0B1LFdU9e8cQwRn7G8/DU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=hffLETw9RwPsBtZtP92tI3PT0f/Ze/Xydn0LC+7a5GThBKvXGQQosJ4D964kMwskJ VnnTrhbeDrv50BVDgIrma0I52EViG5Qd+wNgIaU9PmEHa+6+I8QyYq3fCqHqiXyuvR 1qaKOKeLlDAZdweJpb/5LhQKGigFXdh9ZurqrX8ctErYKByN70k5PzahHC7wtf103k MOYa2HWc818BxE7p9po4b1y6R0dRfGBB29tEkRzq4BfmWs7dv/de9e9gzINtXWycTj N8+sxUK71V6Yur/jNikt80tAoywi9hCwW1OodkMG85uzIl2eKFx+txyUGrI8/4wYRo quzh/B0YEpBbw== From: Daniel Bristot de Oliveira To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot Cc: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , linux-kernel@vger.kernel.org, Luca Abeni , Tommaso Cucinotta , Thomas Gleixner , Joel Fernandes , Vineeth Pillai , Shuah Khan , bristot@kernel.org, Phil Auld Subject: [PATCH v5 3/7] sched/deadline: Move bandwidth accounting into {en,de}queue_dl_entity Date: Sat, 4 Nov 2023 11:59:20 +0100 Message-Id: X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 X-Spam-Status: No, score=-1.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Sat, 04 Nov 2023 04:01:35 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781630974571567080 X-GMAIL-MSGID: 1781630974571567080 From: Peter Zijlstra In preparation of introducing !task sched_dl_entity; move the bandwidth accounting into {en.de}queue_dl_entity(). Reviewed-by: Phil Auld Reviewed-by: Valentin Schneider Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Daniel Bristot de Oliveira --- kernel/sched/deadline.c | 130 ++++++++++++++++++++++------------------ kernel/sched/sched.h | 6 ++ 2 files changed, 78 insertions(+), 58 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index e80bb884262d..81810f67df7a 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -391,12 +391,12 @@ static void __dl_clear_params(struct sched_dl_entity *dl_se); * up, and checks if the task is still in the "ACTIVE non contending" * state or not (in the second case, it updates running_bw). */ -static void task_non_contending(struct task_struct *p) +static void task_non_contending(struct sched_dl_entity *dl_se) { - struct sched_dl_entity *dl_se = &p->dl; struct hrtimer *timer = &dl_se->inactive_timer; struct dl_rq *dl_rq = dl_rq_of_se(dl_se); struct rq *rq = rq_of_dl_rq(dl_rq); + struct task_struct *p = dl_task_of(dl_se); s64 zerolag_time; /* @@ -428,13 +428,14 @@ static void task_non_contending(struct task_struct *p) if ((zerolag_time < 0) || hrtimer_active(&dl_se->inactive_timer)) { if (dl_task(p)) sub_running_bw(dl_se, dl_rq); + if (!dl_task(p) || READ_ONCE(p->__state) == TASK_DEAD) { struct dl_bw *dl_b = dl_bw_of(task_cpu(p)); if (READ_ONCE(p->__state) == TASK_DEAD) - sub_rq_bw(&p->dl, &rq->dl); + sub_rq_bw(dl_se, &rq->dl); raw_spin_lock(&dl_b->lock); - __dl_sub(dl_b, p->dl.dl_bw, dl_bw_cpus(task_cpu(p))); + __dl_sub(dl_b, dl_se->dl_bw, dl_bw_cpus(task_cpu(p))); raw_spin_unlock(&dl_b->lock); __dl_clear_params(dl_se); } @@ -1601,6 +1602,41 @@ enqueue_dl_entity(struct sched_dl_entity *dl_se, int flags) update_stats_enqueue_dl(dl_rq_of_se(dl_se), dl_se, flags); + /* + * Check if a constrained deadline task was activated + * after the deadline but before the next period. + * If that is the case, the task will be throttled and + * the replenishment timer will be set to the next period. + */ + if (!dl_se->dl_throttled && !dl_is_implicit(dl_se)) + dl_check_constrained_dl(dl_se); + + if (flags & (ENQUEUE_RESTORE|ENQUEUE_MIGRATING)) { + struct dl_rq *dl_rq = dl_rq_of_se(dl_se); + + add_rq_bw(dl_se, dl_rq); + add_running_bw(dl_se, dl_rq); + } + + /* + * If p is throttled, we do not enqueue it. In fact, if it exhausted + * its budget it needs a replenishment and, since it now is on + * its rq, the bandwidth timer callback (which clearly has not + * run yet) will take care of this. + * However, the active utilization does not depend on the fact + * that the task is on the runqueue or not (but depends on the + * task's state - in GRUB parlance, "inactive" vs "active contending"). + * In other words, even if a task is throttled its utilization must + * be counted in the active utilization; hence, we need to call + * add_running_bw(). + */ + if (dl_se->dl_throttled && !(flags & ENQUEUE_REPLENISH)) { + if (flags & ENQUEUE_WAKEUP) + task_contending(dl_se, flags); + + return; + } + /* * If this is a wakeup or a new instance, the scheduling * parameters of the task might need updating. Otherwise, @@ -1620,9 +1656,28 @@ enqueue_dl_entity(struct sched_dl_entity *dl_se, int flags) __enqueue_dl_entity(dl_se); } -static void dequeue_dl_entity(struct sched_dl_entity *dl_se) +static void dequeue_dl_entity(struct sched_dl_entity *dl_se, int flags) { __dequeue_dl_entity(dl_se); + + if (flags & (DEQUEUE_SAVE|DEQUEUE_MIGRATING)) { + struct dl_rq *dl_rq = dl_rq_of_se(dl_se); + + sub_running_bw(dl_se, dl_rq); + sub_rq_bw(dl_se, dl_rq); + } + + /* + * This check allows to start the inactive timer (or to immediately + * decrease the active utilization, if needed) in two cases: + * when the task blocks and when it is terminating + * (p->state == TASK_DEAD). We can handle the two cases in the same + * way, because from GRUB's point of view the same thing is happening + * (the task moves from "active contending" to "active non contending" + * or "inactive") + */ + if (flags & DEQUEUE_SLEEP) + task_non_contending(dl_se); } static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags) @@ -1667,76 +1722,35 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags) return; } - /* - * Check if a constrained deadline task was activated - * after the deadline but before the next period. - * If that is the case, the task will be throttled and - * the replenishment timer will be set to the next period. - */ - if (!p->dl.dl_throttled && !dl_is_implicit(&p->dl)) - dl_check_constrained_dl(&p->dl); - - if (p->on_rq == TASK_ON_RQ_MIGRATING || flags & ENQUEUE_RESTORE) { - add_rq_bw(&p->dl, &rq->dl); - add_running_bw(&p->dl, &rq->dl); - } - - /* - * If p is throttled, we do not enqueue it. In fact, if it exhausted - * its budget it needs a replenishment and, since it now is on - * its rq, the bandwidth timer callback (which clearly has not - * run yet) will take care of this. - * However, the active utilization does not depend on the fact - * that the task is on the runqueue or not (but depends on the - * task's state - in GRUB parlance, "inactive" vs "active contending"). - * In other words, even if a task is throttled its utilization must - * be counted in the active utilization; hence, we need to call - * add_running_bw(). - */ - if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH)) { - if (flags & ENQUEUE_WAKEUP) - task_contending(&p->dl, flags); - - return; - } - check_schedstat_required(); update_stats_wait_start_dl(dl_rq_of_se(&p->dl), &p->dl); + if (p->on_rq == TASK_ON_RQ_MIGRATING) + flags |= ENQUEUE_MIGRATING; + enqueue_dl_entity(&p->dl, flags); - if (!task_current(rq, p) && p->nr_cpus_allowed > 1) + if (!task_current(rq, p) && !p->dl.dl_throttled && p->nr_cpus_allowed > 1) enqueue_pushable_dl_task(rq, p); } static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags) { update_stats_dequeue_dl(&rq->dl, &p->dl, flags); - dequeue_dl_entity(&p->dl); - dequeue_pushable_dl_task(rq, p); + dequeue_dl_entity(&p->dl, flags); + + if (!p->dl.dl_throttled) + dequeue_pushable_dl_task(rq, p); } static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags) { update_curr_dl(rq); - __dequeue_task_dl(rq, p, flags); - if (p->on_rq == TASK_ON_RQ_MIGRATING || flags & DEQUEUE_SAVE) { - sub_running_bw(&p->dl, &rq->dl); - sub_rq_bw(&p->dl, &rq->dl); - } + if (p->on_rq == TASK_ON_RQ_MIGRATING) + flags |= DEQUEUE_MIGRATING; - /* - * This check allows to start the inactive timer (or to immediately - * decrease the active utilization, if needed) in two cases: - * when the task blocks and when it is terminating - * (p->state == TASK_DEAD). We can handle the two cases in the same - * way, because from GRUB's point of view the same thing is happening - * (the task moves from "active contending" to "active non contending" - * or "inactive") - */ - if (flags & DEQUEUE_SLEEP) - task_non_contending(p); + __dequeue_task_dl(rq, p, flags); } /* @@ -2551,7 +2565,7 @@ static void switched_from_dl(struct rq *rq, struct task_struct *p) * will reset the task parameters. */ if (task_on_rq_queued(p) && p->dl.dl_runtime) - task_non_contending(p); + task_non_contending(&p->dl); /* * In case a task is setscheduled out from SCHED_DEADLINE we need to diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 4f5f5a2778a9..a0cdc540029c 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2177,6 +2177,10 @@ extern const u32 sched_prio_to_wmult[40]; * MOVE - paired with SAVE/RESTORE, explicitly does not preserve the location * in the runqueue. * + * NOCLOCK - skip the update_rq_clock() (avoids double updates) + * + * MIGRATION - p->on_rq == TASK_ON_RQ_MIGRATING (used for DEADLINE) + * * ENQUEUE_HEAD - place at front of runqueue (tail if not specified) * ENQUEUE_REPLENISH - CBS (replenish runtime and postpone deadline) * ENQUEUE_MIGRATED - the task was migrated during wakeup @@ -2187,6 +2191,7 @@ extern const u32 sched_prio_to_wmult[40]; #define DEQUEUE_SAVE 0x02 /* Matches ENQUEUE_RESTORE */ #define DEQUEUE_MOVE 0x04 /* Matches ENQUEUE_MOVE */ #define DEQUEUE_NOCLOCK 0x08 /* Matches ENQUEUE_NOCLOCK */ +#define DEQUEUE_MIGRATING 0x100 /* Matches ENQUEUE_MIGRATING */ #define ENQUEUE_WAKEUP 0x01 #define ENQUEUE_RESTORE 0x02 @@ -2201,6 +2206,7 @@ extern const u32 sched_prio_to_wmult[40]; #define ENQUEUE_MIGRATED 0x00 #endif #define ENQUEUE_INITIAL 0x80 +#define ENQUEUE_MIGRATING 0x100 #define RETRY_TASK ((void *)-1UL) From patchwork Sat Nov 4 10:59:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Bristot de Oliveira X-Patchwork-Id: 161573 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:8f47:0:b0:403:3b70:6f57 with SMTP id j7csp1581065vqu; Sat, 4 Nov 2023 04:01:35 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHNuUzQTwWhOauGQhuZ0JIdNatURLw6CdrInxgB/tW5rKIoQxjBswnvIdlrGwG7wszMb2C9 X-Received: by 2002:a05:6871:54e:b0:1e9:dfc3:1e6c with SMTP id t14-20020a056871054e00b001e9dfc31e6cmr2730303oal.28.1699095694964; Sat, 04 Nov 2023 04:01:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1699095694; cv=none; d=google.com; s=arc-20160816; b=gRnLoN4ZamQQRMoDwodEkv15jNGu4Tus2Eq3fWN8+RAhxnhT2dohKpN95OvowzSClV dOgDgg83YzdgJUkBEhjTabnCwks5GIzlp2W3FZs03fz6Z+9RFUh6JRLmKEG805M8NnW2 VopwA7gpsxgVGPcSsm2ERKPU4/ylmy5i212/OV/ha68nMEnUow61H5QGprSHO3WEQNgL K64hxFaYf+uQhS3hkDXa1FLgeECRoEZcFGv+fIBoaF1uU4aPx7iYPU3ur/cBVeeylMKk OqASUHzu82RimX4cn6sDv+YseAHozyJEzgzfE9ud6xhwzOenzlSETuDVkvSbh0xlKQ7T s2Rg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Fgh8ItU/SJVSDOEQx9Zqto/V2STxoI5TQCJT6L7HFqw=; fh=hclUZirF6ioy5RJ9ji4zprhkXEWhJA3xztaBC4vzfLk=; b=zhPmX+2N9CaZGqNo8Dt9144j4mpqfdmMLwKPH+ogvXcIIgCiudaGOsItY7f0jj3i9F rGx6//SNgVBZfe4Td2+xWcuqJLRE4XFMqWYGWuq5huraCaQQu4mwzPEMzb/wJ2Q6+wJz eSKWLVMMpvvR2Vse7EsRL/3zowJzkFN6V9kZe3bei77KN+Ez24sSbFhHy9+aZJE6UP2N zpoKp1Lywyr75g4R77/Sm+mIaraXNJ14i6KQzlQUCpZBtCk5X7V3F2WUTRo4AxdXLDAd ywt3R7cQvOqeDagb0oldA7kWj51yGzq7e1DDAPFyYBm7uVzH7LXvrpp3ToLB1+lgBWeH sByw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=M3H8+IqS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id gq20-20020a056870d91400b001dd3a654cb6si1466867oab.63.2023.11.04.04.01.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 04 Nov 2023 04:01:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=M3H8+IqS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 0DA07808D4B4; Sat, 4 Nov 2023 04:01:29 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232243AbjKDLA1 (ORCPT + 35 others); Sat, 4 Nov 2023 07:00:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43636 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232202AbjKDLAA (ORCPT ); Sat, 4 Nov 2023 07:00:00 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 526E71704 for ; Sat, 4 Nov 2023 03:59:51 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 74368C433A9; Sat, 4 Nov 2023 10:59:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1699095590; bh=RRLkysKwJZGL4fP7lXNB800cEH4hUzgiBgz5X42SnAc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=M3H8+IqSuqkWRoO0KQOkVyS/pjjqwDVAKtD/srE2QLFNP0OUps12YfznJPwjeyWOV T0T82sUSSaJpu05J1EeD962Agq2VEj48Mgx6puZMImg8hu9ZUsK9C6JuwOyY90kPWM 8o+olb+ghCUWwDW94YB2P5ha4OyqzyM4RIGi96jSLg7EMQylMmCxIL0uowYtNJHy32 vjpQde0Smf76iDkS6S/Gbn3pfWL4LrtprpIKShgqkDjlj6z6TVpg4r2GiUmThyMnE/ QQg/XUhIdYHt6B2pWo9RUuYgRb8I5VyzgLlWzjthQidtPHKIljHuWc5Lg18cyXg1W8 nDCS4Cjyhox1Q== From: Daniel Bristot de Oliveira To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot Cc: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , linux-kernel@vger.kernel.org, Luca Abeni , Tommaso Cucinotta , Thomas Gleixner , Joel Fernandes , Vineeth Pillai , Shuah Khan , bristot@kernel.org, Phil Auld Subject: [PATCH v5 4/7] sched/deadline: Introduce deadline servers Date: Sat, 4 Nov 2023 11:59:21 +0100 Message-Id: <4968601859d920335cf85822eb573a5f179f04b8.1699095159.git.bristot@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 X-Spam-Status: No, score=-1.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Sat, 04 Nov 2023 04:01:29 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781630967770390230 X-GMAIL-MSGID: 1781630967770390230 From: Peter Zijlstra Low priority tasks (e.g., SCHED_OTHER) can suffer starvation if tasks with higher priority (e.g., SCHED_FIFO) monopolize CPU(s). RT Throttling has been introduced a while ago as a (mostly debug) countermeasure one can utilize to reserve some CPU time for low priority tasks (usually background type of work, e.g. workqueues, timers, etc.). It however has its own problems (see documentation) and the undesired effect of unconditionally throttling FIFO tasks even when no lower priority activity needs to run (there are mechanisms to fix this issue as well, but, again, with their own problems). Introduce deadline servers to service low priority tasks needs under starvation conditions. Deadline servers are built extending SCHED_DEADLINE implementation to allow 2-level scheduling (a sched_deadline entity becomes a container for lower priority scheduling entities). Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Daniel Bristot de Oliveira --- include/linux/sched.h | 22 ++- kernel/sched/core.c | 17 ++ kernel/sched/deadline.c | 332 +++++++++++++++++++++++++++------------- kernel/sched/fair.c | 4 + kernel/sched/sched.h | 27 ++++ 5 files changed, 294 insertions(+), 108 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 31eee8b03dcd..5ac1f252e136 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -63,11 +63,13 @@ struct robust_list_head; struct root_domain; struct rq; struct sched_attr; +struct sched_dl_entity; struct seq_file; struct sighand_struct; struct signal_struct; struct task_delay_info; struct task_group; +struct task_struct; struct user_event_mm; /* @@ -607,6 +609,9 @@ struct sched_rt_entity { #endif } __randomize_layout; +typedef bool (*dl_server_has_tasks_f)(struct sched_dl_entity *); +typedef struct task_struct *(*dl_server_pick_f)(struct sched_dl_entity *); + struct sched_dl_entity { struct rb_node rb_node; @@ -654,6 +659,7 @@ struct sched_dl_entity { unsigned int dl_yielded : 1; unsigned int dl_non_contending : 1; unsigned int dl_overrun : 1; + unsigned int dl_server : 1; /* * Bandwidth enforcement timer. Each -deadline task has its @@ -668,7 +674,20 @@ struct sched_dl_entity { * timer is needed to decrease the active utilization at the correct * time. */ - struct hrtimer inactive_timer; + struct hrtimer inactive_timer; + + /* + * Bits for DL-server functionality. Also see the comment near + * dl_server_update(). + * + * @rq the runqueue this server is for + * + * @server_has_tasks() returns true if @server_pick return a + * runnable task. + */ + struct rq *rq; + dl_server_has_tasks_f server_has_tasks; + dl_server_pick_f server_pick; #ifdef CONFIG_RT_MUTEXES /* @@ -795,6 +814,7 @@ struct task_struct { struct sched_entity se; struct sched_rt_entity rt; struct sched_dl_entity dl; + struct sched_dl_entity *server; const struct sched_class *sched_class; #ifdef CONFIG_SCHED_CORE diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 257369d30303..a721f6776b12 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3795,6 +3795,8 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags, rq->idle_stamp = 0; } #endif + + p->server = NULL; } /* @@ -6001,12 +6003,27 @@ __pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) p = pick_next_task_idle(rq); } + /* + * This is the fast path; it cannot be a DL server pick; + * therefore even if @p == @prev, ->server must be NULL. + */ + if (p->server) + p->server = NULL; + return p; } restart: put_prev_task_balance(rq, prev, rf); + /* + * We've updated @prev and no longer need the server link, clear it. + * Must be done before ->pick_next_task() because that can (re)set + * ->server. + */ + if (prev->server) + prev->server = NULL; + for_each_class(class) { p = class->pick_next_task(rq); if (p) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 81810f67df7a..541d547e1019 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -54,8 +54,14 @@ static int __init sched_dl_sysctl_init(void) late_initcall(sched_dl_sysctl_init); #endif +static bool dl_server(struct sched_dl_entity *dl_se) +{ + return dl_se->dl_server; +} + static inline struct task_struct *dl_task_of(struct sched_dl_entity *dl_se) { + BUG_ON(dl_server(dl_se)); return container_of(dl_se, struct task_struct, dl); } @@ -64,12 +70,19 @@ static inline struct rq *rq_of_dl_rq(struct dl_rq *dl_rq) return container_of(dl_rq, struct rq, dl); } -static inline struct dl_rq *dl_rq_of_se(struct sched_dl_entity *dl_se) +static inline struct rq *rq_of_dl_se(struct sched_dl_entity *dl_se) { - struct task_struct *p = dl_task_of(dl_se); - struct rq *rq = task_rq(p); + struct rq *rq = dl_se->rq; + + if (!dl_server(dl_se)) + rq = task_rq(dl_task_of(dl_se)); - return &rq->dl; + return rq; +} + +static inline struct dl_rq *dl_rq_of_se(struct sched_dl_entity *dl_se) +{ + return &rq_of_dl_se(dl_se)->dl; } static inline int on_dl_rq(struct sched_dl_entity *dl_se) @@ -394,9 +407,8 @@ static void __dl_clear_params(struct sched_dl_entity *dl_se); static void task_non_contending(struct sched_dl_entity *dl_se) { struct hrtimer *timer = &dl_se->inactive_timer; - struct dl_rq *dl_rq = dl_rq_of_se(dl_se); - struct rq *rq = rq_of_dl_rq(dl_rq); - struct task_struct *p = dl_task_of(dl_se); + struct rq *rq = rq_of_dl_se(dl_se); + struct dl_rq *dl_rq = &rq->dl; s64 zerolag_time; /* @@ -426,25 +438,33 @@ static void task_non_contending(struct sched_dl_entity *dl_se) * utilization now, instead of starting a timer */ if ((zerolag_time < 0) || hrtimer_active(&dl_se->inactive_timer)) { - if (dl_task(p)) + if (dl_server(dl_se)) { sub_running_bw(dl_se, dl_rq); + } else { + struct task_struct *p = dl_task_of(dl_se); - if (!dl_task(p) || READ_ONCE(p->__state) == TASK_DEAD) { - struct dl_bw *dl_b = dl_bw_of(task_cpu(p)); + if (dl_task(p)) + sub_running_bw(dl_se, dl_rq); - if (READ_ONCE(p->__state) == TASK_DEAD) - sub_rq_bw(dl_se, &rq->dl); - raw_spin_lock(&dl_b->lock); - __dl_sub(dl_b, dl_se->dl_bw, dl_bw_cpus(task_cpu(p))); - raw_spin_unlock(&dl_b->lock); - __dl_clear_params(dl_se); + if (!dl_task(p) || READ_ONCE(p->__state) == TASK_DEAD) { + struct dl_bw *dl_b = dl_bw_of(task_cpu(p)); + + if (READ_ONCE(p->__state) == TASK_DEAD) + sub_rq_bw(dl_se, &rq->dl); + raw_spin_lock(&dl_b->lock); + __dl_sub(dl_b, dl_se->dl_bw, dl_bw_cpus(task_cpu(p))); + raw_spin_unlock(&dl_b->lock); + __dl_clear_params(dl_se); + } } return; } dl_se->dl_non_contending = 1; - get_task_struct(p); + if (!dl_server(dl_se)) + get_task_struct(dl_task_of(dl_se)); + hrtimer_start(timer, ns_to_ktime(zerolag_time), HRTIMER_MODE_REL_HARD); } @@ -471,8 +491,10 @@ static void task_contending(struct sched_dl_entity *dl_se, int flags) * will not touch the rq's active utilization, * so we are still safe. */ - if (hrtimer_try_to_cancel(&dl_se->inactive_timer) == 1) - put_task_struct(dl_task_of(dl_se)); + if (hrtimer_try_to_cancel(&dl_se->inactive_timer) == 1) { + if (!dl_server(dl_se)) + put_task_struct(dl_task_of(dl_se)); + } } else { /* * Since "dl_non_contending" is not set, the @@ -485,10 +507,8 @@ static void task_contending(struct sched_dl_entity *dl_se, int flags) } } -static inline int is_leftmost(struct task_struct *p, struct dl_rq *dl_rq) +static inline int is_leftmost(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq) { - struct sched_dl_entity *dl_se = &p->dl; - return rb_first_cached(&dl_rq->root) == &dl_se->rb_node; } @@ -740,8 +760,10 @@ static inline void deadline_queue_pull_task(struct rq *rq) } #endif /* CONFIG_SMP */ +static void +enqueue_dl_entity(struct sched_dl_entity *dl_se, int flags); static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags); -static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags); +static void dequeue_dl_entity(struct sched_dl_entity *dl_se, int flags); static void wakeup_preempt_dl(struct rq *rq, struct task_struct *p, int flags); static inline void replenish_dl_new_period(struct sched_dl_entity *dl_se, @@ -989,8 +1011,7 @@ static inline bool dl_is_implicit(struct sched_dl_entity *dl_se) */ static void update_dl_entity(struct sched_dl_entity *dl_se) { - struct dl_rq *dl_rq = dl_rq_of_se(dl_se); - struct rq *rq = rq_of_dl_rq(dl_rq); + struct rq *rq = rq_of_dl_se(dl_se); if (dl_time_before(dl_se->deadline, rq_clock(rq)) || dl_entity_overflow(dl_se, rq_clock(rq))) { @@ -1021,11 +1042,11 @@ static inline u64 dl_next_period(struct sched_dl_entity *dl_se) * actually started or not (i.e., the replenishment instant is in * the future or in the past). */ -static int start_dl_timer(struct task_struct *p) +static int start_dl_timer(struct sched_dl_entity *dl_se) { - struct sched_dl_entity *dl_se = &p->dl; struct hrtimer *timer = &dl_se->dl_timer; - struct rq *rq = task_rq(p); + struct dl_rq *dl_rq = dl_rq_of_se(dl_se); + struct rq *rq = rq_of_dl_rq(dl_rq); ktime_t now, act; s64 delta; @@ -1059,13 +1080,33 @@ static int start_dl_timer(struct task_struct *p) * and observe our state. */ if (!hrtimer_is_queued(timer)) { - get_task_struct(p); + if (!dl_server(dl_se)) + get_task_struct(dl_task_of(dl_se)); hrtimer_start(timer, act, HRTIMER_MODE_ABS_HARD); } return 1; } +static void __push_dl_task(struct rq *rq, struct rq_flags *rf) +{ +#ifdef CONFIG_SMP + /* + * Queueing this task back might have overloaded rq, check if we need + * to kick someone away. + */ + if (has_pushable_dl_tasks(rq)) { + /* + * Nothing relies on rq->lock after this, so its safe to drop + * rq->lock. + */ + rq_unpin_lock(rq, rf); + push_dl_task(rq); + rq_repin_lock(rq, rf); + } +#endif +} + /* * This is the bandwidth enforcement timer callback. If here, we know * a task is not on its dl_rq, since the fact that the timer was running @@ -1084,10 +1125,34 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer) struct sched_dl_entity *dl_se = container_of(timer, struct sched_dl_entity, dl_timer); - struct task_struct *p = dl_task_of(dl_se); + struct task_struct *p; struct rq_flags rf; struct rq *rq; + if (dl_server(dl_se)) { + struct rq *rq = rq_of_dl_se(dl_se); + struct rq_flags rf; + + rq_lock(rq, &rf); + if (dl_se->dl_throttled) { + sched_clock_tick(); + update_rq_clock(rq); + + if (dl_se->server_has_tasks(dl_se)) { + enqueue_dl_entity(dl_se, ENQUEUE_REPLENISH); + resched_curr(rq); + __push_dl_task(rq, &rf); + } else { + replenish_dl_entity(dl_se); + } + + } + rq_unlock(rq, &rf); + + return HRTIMER_NORESTART; + } + + p = dl_task_of(dl_se); rq = task_rq_lock(p, &rf); /* @@ -1158,21 +1223,7 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer) else resched_curr(rq); -#ifdef CONFIG_SMP - /* - * Queueing this task back might have overloaded rq, check if we need - * to kick someone away. - */ - if (has_pushable_dl_tasks(rq)) { - /* - * Nothing relies on rq->lock after this, so its safe to drop - * rq->lock. - */ - rq_unpin_lock(rq, &rf); - push_dl_task(rq); - rq_repin_lock(rq, &rf); - } -#endif + __push_dl_task(rq, &rf); unlock: task_rq_unlock(rq, p, &rf); @@ -1214,12 +1265,11 @@ static void init_dl_task_timer(struct sched_dl_entity *dl_se) */ static inline void dl_check_constrained_dl(struct sched_dl_entity *dl_se) { - struct task_struct *p = dl_task_of(dl_se); - struct rq *rq = rq_of_dl_rq(dl_rq_of_se(dl_se)); + struct rq *rq = rq_of_dl_se(dl_se); if (dl_time_before(dl_se->deadline, rq_clock(rq)) && dl_time_before(rq_clock(rq), dl_next_period(dl_se))) { - if (unlikely(is_dl_boosted(dl_se) || !start_dl_timer(p))) + if (unlikely(is_dl_boosted(dl_se) || !start_dl_timer(dl_se))) return; dl_se->dl_throttled = 1; if (dl_se->runtime > 0) @@ -1270,29 +1320,13 @@ static u64 grub_reclaim(u64 delta, struct rq *rq, struct sched_dl_entity *dl_se) return (delta * u_act) >> BW_SHIFT; } -/* - * Update the current task's runtime statistics (provided it is still - * a -deadline task and has not been removed from the dl_rq). - */ -static void update_curr_dl(struct rq *rq) +static inline void +update_stats_dequeue_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se, + int flags); +static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64 delta_exec) { - struct task_struct *curr = rq->curr; - struct sched_dl_entity *dl_se = &curr->dl; - s64 delta_exec, scaled_delta_exec; - int cpu = cpu_of(rq); - - if (!dl_task(curr) || !on_dl_rq(dl_se)) - return; + s64 scaled_delta_exec; - /* - * Consumed budget is computed considering the time as - * observed by schedulable tasks (excluding time spent - * in hardirq context, etc.). Deadlines are instead - * computed using hard walltime. This seems to be the more - * natural solution, but the full ramifications of this - * approach need further study. - */ - delta_exec = update_curr_common(rq); if (unlikely(delta_exec <= 0)) { if (unlikely(dl_se->dl_yielded)) goto throttle; @@ -1310,10 +1344,9 @@ static void update_curr_dl(struct rq *rq) * according to current frequency and CPU maximum capacity. */ if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM)) { - scaled_delta_exec = grub_reclaim(delta_exec, - rq, - &curr->dl); + scaled_delta_exec = grub_reclaim(delta_exec, rq, dl_se); } else { + int cpu = cpu_of(rq); unsigned long scale_freq = arch_scale_freq_capacity(cpu); unsigned long scale_cpu = arch_scale_cpu_capacity(cpu); @@ -1332,11 +1365,20 @@ static void update_curr_dl(struct rq *rq) (dl_se->flags & SCHED_FLAG_DL_OVERRUN)) dl_se->dl_overrun = 1; - __dequeue_task_dl(rq, curr, 0); - if (unlikely(is_dl_boosted(dl_se) || !start_dl_timer(curr))) - enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH); + dequeue_dl_entity(dl_se, 0); + if (!dl_server(dl_se)) { + update_stats_dequeue_dl(&rq->dl, dl_se, 0); + dequeue_pushable_dl_task(rq, dl_task_of(dl_se)); + } - if (!is_leftmost(curr, &rq->dl)) + if (unlikely(is_dl_boosted(dl_se) || !start_dl_timer(dl_se))) { + if (dl_server(dl_se)) + enqueue_dl_entity(dl_se, ENQUEUE_REPLENISH); + else + enqueue_task_dl(rq, dl_task_of(dl_se), ENQUEUE_REPLENISH); + } + + if (!is_leftmost(dl_se, &rq->dl)) resched_curr(rq); } @@ -1366,20 +1408,82 @@ static void update_curr_dl(struct rq *rq) } } +void dl_server_update(struct sched_dl_entity *dl_se, s64 delta_exec) +{ + update_curr_dl_se(dl_se->rq, dl_se, delta_exec); +} + +void dl_server_start(struct sched_dl_entity *dl_se) +{ + if (!dl_server(dl_se)) { + dl_se->dl_server = 1; + setup_new_dl_entity(dl_se); + } + enqueue_dl_entity(dl_se, ENQUEUE_WAKEUP); +} + +void dl_server_stop(struct sched_dl_entity *dl_se) +{ + dequeue_dl_entity(dl_se, DEQUEUE_SLEEP); +} + +void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq, + dl_server_has_tasks_f has_tasks, + dl_server_pick_f pick) +{ + dl_se->rq = rq; + dl_se->server_has_tasks = has_tasks; + dl_se->server_pick = pick; +} + +/* + * Update the current task's runtime statistics (provided it is still + * a -deadline task and has not been removed from the dl_rq). + */ +static void update_curr_dl(struct rq *rq) +{ + struct task_struct *curr = rq->curr; + struct sched_dl_entity *dl_se = &curr->dl; + s64 delta_exec; + + if (!dl_task(curr) || !on_dl_rq(dl_se)) + return; + + /* + * Consumed budget is computed considering the time as + * observed by schedulable tasks (excluding time spent + * in hardirq context, etc.). Deadlines are instead + * computed using hard walltime. This seems to be the more + * natural solution, but the full ramifications of this + * approach need further study. + */ + delta_exec = update_curr_common(rq); + update_curr_dl_se(rq, dl_se, delta_exec); +} + static enum hrtimer_restart inactive_task_timer(struct hrtimer *timer) { struct sched_dl_entity *dl_se = container_of(timer, struct sched_dl_entity, inactive_timer); - struct task_struct *p = dl_task_of(dl_se); + struct task_struct *p = NULL; struct rq_flags rf; struct rq *rq; - rq = task_rq_lock(p, &rf); + if (!dl_server(dl_se)) { + p = dl_task_of(dl_se); + rq = task_rq_lock(p, &rf); + } else { + rq = dl_se->rq; + rq_lock(rq, &rf); + } sched_clock_tick(); update_rq_clock(rq); + if (dl_server(dl_se)) + goto no_task; + if (!dl_task(p) || READ_ONCE(p->__state) == TASK_DEAD) { struct dl_bw *dl_b = dl_bw_of(task_cpu(p)); @@ -1396,14 +1500,21 @@ static enum hrtimer_restart inactive_task_timer(struct hrtimer *timer) goto unlock; } + +no_task: if (dl_se->dl_non_contending == 0) goto unlock; sub_running_bw(dl_se, &rq->dl); dl_se->dl_non_contending = 0; unlock: - task_rq_unlock(rq, p, &rf); - put_task_struct(p); + + if (!dl_server(dl_se)) { + task_rq_unlock(rq, p, &rf); + put_task_struct(p); + } else { + rq_unlock(rq, &rf); + } return HRTIMER_NORESTART; } @@ -1466,10 +1577,8 @@ static inline void dec_dl_deadline(struct dl_rq *dl_rq, u64 deadline) {} static inline void inc_dl_tasks(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq) { - int prio = dl_task_of(dl_se)->prio; u64 deadline = dl_se->deadline; - WARN_ON(!dl_prio(prio)); dl_rq->dl_nr_running++; add_nr_running(rq_of_dl_rq(dl_rq), 1); @@ -1479,9 +1588,6 @@ void inc_dl_tasks(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq) static inline void dec_dl_tasks(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq) { - int prio = dl_task_of(dl_se)->prio; - - WARN_ON(!dl_prio(prio)); WARN_ON(!dl_rq->dl_nr_running); dl_rq->dl_nr_running--; sub_nr_running(rq_of_dl_rq(dl_rq), 1); @@ -1648,8 +1754,7 @@ enqueue_dl_entity(struct sched_dl_entity *dl_se, int flags) } else if (flags & ENQUEUE_REPLENISH) { replenish_dl_entity(dl_se); } else if ((flags & ENQUEUE_RESTORE) && - dl_time_before(dl_se->deadline, - rq_clock(rq_of_dl_rq(dl_rq_of_se(dl_se))))) { + dl_time_before(dl_se->deadline, rq_clock(rq_of_dl_se(dl_se)))) { setup_new_dl_entity(dl_se); } @@ -1730,19 +1835,13 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags) enqueue_dl_entity(&p->dl, flags); + if (dl_server(&p->dl)) + return; + if (!task_current(rq, p) && !p->dl.dl_throttled && p->nr_cpus_allowed > 1) enqueue_pushable_dl_task(rq, p); } -static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags) -{ - update_stats_dequeue_dl(&rq->dl, &p->dl, flags); - dequeue_dl_entity(&p->dl, flags); - - if (!p->dl.dl_throttled) - dequeue_pushable_dl_task(rq, p); -} - static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags) { update_curr_dl(rq); @@ -1750,7 +1849,9 @@ static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags) if (p->on_rq == TASK_ON_RQ_MIGRATING) flags |= DEQUEUE_MIGRATING; - __dequeue_task_dl(rq, p, flags); + dequeue_dl_entity(&p->dl, flags); + if (!p->dl.dl_throttled && !dl_server(&p->dl)) + dequeue_pushable_dl_task(rq, p); } /* @@ -1940,12 +2041,12 @@ static void wakeup_preempt_dl(struct rq *rq, struct task_struct *p, } #ifdef CONFIG_SCHED_HRTICK -static void start_hrtick_dl(struct rq *rq, struct task_struct *p) +static void start_hrtick_dl(struct rq *rq, struct sched_dl_entity *dl_se) { - hrtick_start(rq, p->dl.runtime); + hrtick_start(rq, dl_se->runtime); } #else /* !CONFIG_SCHED_HRTICK */ -static void start_hrtick_dl(struct rq *rq, struct task_struct *p) +static void start_hrtick_dl(struct rq *rq, struct sched_dl_entity *dl_se) { } #endif @@ -1965,9 +2066,6 @@ static void set_next_task_dl(struct rq *rq, struct task_struct *p, bool first) if (!first) return; - if (hrtick_enabled_dl(rq)) - start_hrtick_dl(rq, p); - if (rq->curr->sched_class != &dl_sched_class) update_dl_rq_load_avg(rq_clock_pelt(rq), rq, 0); @@ -1990,12 +2088,25 @@ static struct task_struct *pick_task_dl(struct rq *rq) struct dl_rq *dl_rq = &rq->dl; struct task_struct *p; +again: if (!sched_dl_runnable(rq)) return NULL; dl_se = pick_next_dl_entity(dl_rq); WARN_ON_ONCE(!dl_se); - p = dl_task_of(dl_se); + + if (dl_server(dl_se)) { + p = dl_se->server_pick(dl_se); + if (!p) { + WARN_ON_ONCE(1); + dl_se->dl_yielded = 1; + update_curr_dl_se(rq, dl_se, 0); + goto again; + } + p->server = dl_se; + } else { + p = dl_task_of(dl_se); + } return p; } @@ -2005,9 +2116,15 @@ static struct task_struct *pick_next_task_dl(struct rq *rq) struct task_struct *p; p = pick_task_dl(rq); - if (p) + if (!p) + return p; + + if (!p->server) set_next_task_dl(rq, p, true); + if (hrtick_enabled(rq)) + start_hrtick_dl(rq, &p->dl); + return p; } @@ -2045,8 +2162,8 @@ static void task_tick_dl(struct rq *rq, struct task_struct *p, int queued) * be set and schedule() will start a new hrtick for the next task. */ if (hrtick_enabled_dl(rq) && queued && p->dl.runtime > 0 && - is_leftmost(p, &rq->dl)) - start_hrtick_dl(rq, p); + is_leftmost(&p->dl, &rq->dl)) + start_hrtick_dl(rq, &p->dl); } static void task_fork_dl(struct task_struct *p) @@ -2986,6 +3103,7 @@ static void __dl_clear_params(struct sched_dl_entity *dl_se) dl_se->dl_yielded = 0; dl_se->dl_non_contending = 0; dl_se->dl_overrun = 0; + dl_se->dl_server = 0; #ifdef CONFIG_RT_MUTEXES dl_se->pi_se = dl_se; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 2613704a2d2d..bc3a4bc6c438 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1168,6 +1168,8 @@ s64 update_curr_common(struct rq *rq) account_group_exec_runtime(curr, delta_exec); cgroup_account_cputime(curr, delta_exec); + if (curr->server) + dl_server_update(curr->server, delta_exec); return delta_exec; } @@ -1197,6 +1199,8 @@ static void update_curr(struct cfs_rq *cfs_rq) trace_sched_stat_runtime(curtask, delta_exec, curr->vruntime); cgroup_account_cputime(curtask, delta_exec); account_group_exec_runtime(curtask, delta_exec); + if (curtask->server) + dl_server_update(curtask->server, delta_exec); } account_cfs_rq_runtime(cfs_rq, delta_exec); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index a0cdc540029c..24a2bc7c453b 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -313,6 +313,33 @@ extern bool dl_param_changed(struct task_struct *p, const struct sched_attr *att extern int dl_cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial); extern int dl_bw_check_overflow(int cpu); +/* + * SCHED_DEADLINE supports servers (nested scheduling) with the following + * interface: + * + * dl_se::rq -- runqueue we belong to. + * + * dl_se::server_has_tasks() -- used on bandwidth enforcement; we 'stop' the + * server when it runs out of tasks to run. + * + * dl_se::server_pick() -- nested pick_next_task(); we yield the period if this + * returns NULL. + * + * dl_server_update() -- called from update_curr_common(), propagates runtime + * to the server. + * + * dl_server_start() + * dl_server_stop() -- start/stop the server when it has (no) tasks. + * + * dl_server_init() -- initializes the server. + */ +extern void dl_server_update(struct sched_dl_entity *dl_se, s64 delta_exec); +extern void dl_server_start(struct sched_dl_entity *dl_se); +extern void dl_server_stop(struct sched_dl_entity *dl_se); +extern void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq, + dl_server_has_tasks_f has_tasks, + dl_server_pick_f pick); + #ifdef CONFIG_CGROUP_SCHED struct cfs_rq; From patchwork Sat Nov 4 10:59:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Bristot de Oliveira X-Patchwork-Id: 161567 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:8f47:0:b0:403:3b70:6f57 with SMTP id j7csp1580830vqu; Sat, 4 Nov 2023 04:01:14 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGv38yzXoqTWkq7caEeF+fbUW8WjOIAOF8s6Q/4iutzCCvPdqdL47i/sosFKkjtCRs0JXFV X-Received: by 2002:a05:6820:1e05:b0:586:a5f2:2a96 with SMTP id dh5-20020a0568201e0500b00586a5f22a96mr23349540oob.2.1699095674430; Sat, 04 Nov 2023 04:01:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1699095674; cv=none; d=google.com; s=arc-20160816; b=aW/I27BzR+IpIrzsn0GDcC0qzR47qOoZQa4hXTY+mT2gydEsJ1QChGGtMhmVY5B4H4 SvQRzglsNgxIWcTO8u36e4N1wb40SjvtPu9ozofwf2+ALqK+9y9+Js/05uLxvM72yeGZ xuLCzZJkNl/RIgYqZagUFVDWELwjRnh9bk3frgFlPqVgd5061ObqZGWmoipYpyijkvyE i0bMF8B42mmxiv5ccsGzWluRB6TzxoxGLtsFN1GilUw9vMEeJR2ameIHZRmx6GyFnAbz 0J3wd9R//o6jqWD/32zvV5TxYPT+JWsR9BZvFHUuq4Sjz1qJ+N3BTnplq/soPYpJ+HCK ABXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ICfQqtpGLbtSe+vZQpTwV4N0+rpKS90oxrjezaRldEk=; fh=hclUZirF6ioy5RJ9ji4zprhkXEWhJA3xztaBC4vzfLk=; b=CjTcrZ/EyX46MP6JwQMMHfaBylc8ZOsVQU58v0KFHmxDRYa+phervjBE8SmPnyH4Dl CmaKBUOZ2MkAVr1Fm5D53hPDOrmQSyAzEqv4ZHpYvv3gPMIhJRV7s3j2z9y8HMFGJ2Jm QWN1zH4IDB2YmIhvXgBFBILZ9qxHy2bPmspExeTaEmCYQQ3npw4HK9qEodCzWcQgbt7v lfQyHlAQZbk/i9z31J8jo+U5Hw3wdEwnSo2oEQaCjUVhd1qDoTaUe5uwlFdt6rQr2vcC 8hIuUAg1s/nEX7KGqiBoW9dmLN7LWoyncLFAdDJFw4c7uRxI+rdzTpBARaDPDA5bplWF M18g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="AVL4/tyF"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id b1-20020a4ae801000000b005820ade092esi1396588oob.8.2023.11.04.04.01.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 04 Nov 2023 04:01:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="AVL4/tyF"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 6DF82816F0E8; Sat, 4 Nov 2023 04:00:58 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231956AbjKDLAa (ORCPT + 35 others); Sat, 4 Nov 2023 07:00:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35942 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232340AbjKDLAB (ORCPT ); Sat, 4 Nov 2023 07:00:01 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2BD1A10D0 for ; Sat, 4 Nov 2023 03:59:55 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DA6D0C433CB; Sat, 4 Nov 2023 10:59:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1699095594; bh=1Wn77Swl4Ek25DMos+AE5sSW8ut9WCzWOgUJJ0gtsq0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=AVL4/tyFOIi4wjwakWUVUHTK1SgC3EDGT0aPFBbepSWJYrqLg/hCyg3TJSxgGJAFH 4Y5ql8Nt79JIvjdyx4Nk/qPbnC+Ce8docMPTFKHw7y3TwGlw5tC6pNsue0MC59fnlf jHXdHsPku61YscKt32ZrRq33ApU1h8W+DHxUs1qibKfUYmE9mWjoJgGdaBaonU0bm9 pK0QwHtiMcPO9PlG7vX8yP3Uk6T5QwfdeHkB0OvT351RjFqXzSJTNd5lMV3//aF0EA OveavfSOEmA1f7bysOFhnV0D3dTZTK472qdrNjbvge+p9HsWZJUVoh0Bk0b1UsALG2 2taGT7V0EqyAw== From: Daniel Bristot de Oliveira To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot Cc: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , linux-kernel@vger.kernel.org, Luca Abeni , Tommaso Cucinotta , Thomas Gleixner , Joel Fernandes , Vineeth Pillai , Shuah Khan , bristot@kernel.org, Phil Auld Subject: [PATCH v5 5/7] sched/fair: Add trivial fair server Date: Sat, 4 Nov 2023 11:59:22 +0100 Message-Id: <4e0d14eb6e0ec33055197ac7ddb57ef7ab3894a5.1699095159.git.bristot@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 X-Spam-Status: No, score=-1.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Sat, 04 Nov 2023 04:00:58 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781630946163531831 X-GMAIL-MSGID: 1781630946163531831 From: Peter Zijlstra Use deadline servers to service fair tasks. This patch adds a fair_server deadline entity which acts as a container for fair entities and can be used to fix starvation when higher priority (wrt fair) tasks are monopolizing CPU(s). [ dl_server do not account for rt ] Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Daniel Bristot de Oliveira --- kernel/sched/core.c | 1 + kernel/sched/deadline.c | 7 +++++++ kernel/sched/fair.c | 29 +++++++++++++++++++++++++++++ kernel/sched/sched.h | 4 ++++ 4 files changed, 41 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index a721f6776b12..939266d29681 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -10019,6 +10019,7 @@ void __init sched_init(void) #endif /* CONFIG_SMP */ hrtick_rq_init(rq); atomic_set(&rq->nr_iowait, 0); + fair_server_init(rq); #ifdef CONFIG_SCHED_CORE rq->core = rq; diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 541d547e1019..1d7b96ca9011 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1382,6 +1382,13 @@ static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64 resched_curr(rq); } + /* + * The fair server (sole dl_server) does not account for real-time + * workload because it is running fair work. + */ + if (dl_server(dl_se)) + return; + /* * Because -- for now -- we share the rt bandwidth, we need to * account our runtime there too, otherwise actual rt tasks diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index bc3a4bc6c438..b15f7f376a67 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6600,6 +6600,9 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) */ util_est_enqueue(&rq->cfs, p); + if (!rq->cfs.h_nr_running) + dl_server_start(&rq->fair_server); + /* * If in_iowait is set, the code below may not trigger any cpufreq * utilization updates, so do it here explicitly with the IOWAIT flag @@ -6744,6 +6747,9 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags) rq->next_balance = jiffies; dequeue_throttle: + if (!rq->cfs.h_nr_running) + dl_server_stop(&rq->fair_server); + util_est_update(&rq->cfs, p, task_sleep); hrtick_update(rq); } @@ -8396,6 +8402,29 @@ static struct task_struct *__pick_next_task_fair(struct rq *rq) return pick_next_task_fair(rq, NULL, NULL); } +static bool fair_server_has_tasks(struct sched_dl_entity *dl_se) +{ + return !!dl_se->rq->cfs.nr_running; +} + +static struct task_struct *fair_server_pick(struct sched_dl_entity *dl_se) +{ + return pick_next_task_fair(dl_se->rq, NULL, NULL); +} + +void fair_server_init(struct rq *rq) +{ + struct sched_dl_entity *dl_se = &rq->fair_server; + + init_dl_entity(dl_se); + + dl_se->dl_runtime = 50 * NSEC_PER_MSEC; + dl_se->dl_deadline = 1000 * NSEC_PER_MSEC; + dl_se->dl_period = 1000 * NSEC_PER_MSEC; + + dl_server_init(dl_se, rq, fair_server_has_tasks, fair_server_pick); +} + /* * Account for a descheduled task: */ diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 24a2bc7c453b..ec0e288c8e06 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -340,6 +340,8 @@ extern void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq, dl_server_has_tasks_f has_tasks, dl_server_pick_f pick); +extern void fair_server_init(struct rq *); + #ifdef CONFIG_CGROUP_SCHED struct cfs_rq; @@ -1005,6 +1007,8 @@ struct rq { struct rt_rq rt; struct dl_rq dl; + struct sched_dl_entity fair_server; + #ifdef CONFIG_FAIR_GROUP_SCHED /* list of leaf cfs_rq on this CPU: */ struct list_head leaf_cfs_rq_list; From patchwork Sat Nov 4 10:59:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Bristot de Oliveira X-Patchwork-Id: 161569 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:8f47:0:b0:403:3b70:6f57 with SMTP id j7csp1580946vqu; Sat, 4 Nov 2023 04:01:25 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFwB4PYL8AwvTuyGt0s4WMHf8mbe+2xXB04nxQiAShX+eo+AXYZJ3J+bFZR4WdO4g0bk7Ho X-Received: by 2002:a05:6808:2091:b0:3ae:501e:a64a with SMTP id s17-20020a056808209100b003ae501ea64amr29965549oiw.10.1699095685298; Sat, 04 Nov 2023 04:01:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1699095685; cv=none; d=google.com; s=arc-20160816; b=I+33fuDXBm5AWctxU9Z2P+tYLQdGsx3B1XtrMVns7IOESmbW9f/6gqqQgcgU6H9sZS P2LyyB/8DfkrKrRq02B0pWgeDc0QiD29FJFIG55Y2Vt0xVOqtRBi4xhaEmSZPqGvoktk eEias6gtdmq3mlSw3F0/X/VK4APelREub02c71LG+LJKq+04Av4eBedhXzDclVz+wqAV WbhTB0+Z5p+T+BDf7hUq7mZaUdik2Xbbs84B32dQhiCmY5eVT+ocU/Y/IPCBLqvXOLYr sae9KHUbDlPSz+Z+KoEz+M4T9x9D+iTP7GnI/S0SJNy1beTfMK7ufcKRgvQUAxPmys9O Kgjg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=qiv0ybnovbkZvF9aAHElIx0CPOJVXsShwRDmaEXcJ/U=; fh=hclUZirF6ioy5RJ9ji4zprhkXEWhJA3xztaBC4vzfLk=; b=g/FZgKD9r8IAaRvf3pJQQxAEIjwrZDBfOJ9pIV8IeB6toD1g+44r0rSaMiGueDH7jL DtK0mWG92lz4TQ/au3xRLaAjlEcBUkZqHwORb6OjK1DuSES3C7iFUVWJu0EXPGBUpQpe TU5O44F1mCdj4vLSBOLsfNiDxjUCkOtb+/0nh7zf/gPGG//AHaONjCSDQ0AGfd6OvcJJ pebhHpe2eyF/M4tsKtqg7/BNC0+zeL26YFetDwRqbBFeN9OyRb8oXaH4fXz6kXalYX7g BT7qpTh/2Cv/wJB8xswhqdnbPuL5QINQVKOP0BiRgFW6c1FoScUtRCYTSouqbe6q+sh1 ZsAA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="pjb/OjO7"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id fg19-20020a056808641300b003b2e4809927si1525152oib.299.2023.11.04.04.01.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 04 Nov 2023 04:01:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="pjb/OjO7"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 0D6E4803E79D; Sat, 4 Nov 2023 04:00:55 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232190AbjKDLAm (ORCPT + 35 others); Sat, 4 Nov 2023 07:00:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43688 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232277AbjKDLAP (ORCPT ); Sat, 4 Nov 2023 07:00:15 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5040FD47 for ; Sat, 4 Nov 2023 03:59:59 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 262C1C433D9; Sat, 4 Nov 2023 10:59:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1699095598; bh=/tkdPd8l0zJh8Zr2pNqa5tEoVWnxLX29JBambo76CEU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=pjb/OjO7mznWTe5H4GTk+cTznokt0oUlIksqNNSmLg2EnlRMNibQRq5R7IqXCHHn6 UNBjdrT56oh9Kw5n8kBphlQLB4gyaTK/vuB0pEUvrkwesB4OdSQIgbvuDcVOFlSvpF fyQGKK5ieWK5Ig5Gs5IXz1wPFhYNKjy9EeXfIWquJBqrrVgjxSqfq9zt5H9Lt8w/gu Oy4VrTW1ZA0k3ahJ6HL+RnrvE7L9VyQEL6bvMAZDUrFpF6FahqxNo9qU+CUJwm5j4h 5EWukEndM3JjSRZ9SFQhe04CYTj/EhJyHUWse+WyFCgwnXEcMOkmKpOjc3bFHI2WlB 4RPGbSWdcjFCw== From: Daniel Bristot de Oliveira To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot Cc: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , linux-kernel@vger.kernel.org, Luca Abeni , Tommaso Cucinotta , Thomas Gleixner , Joel Fernandes , Vineeth Pillai , Shuah Khan , bristot@kernel.org, Phil Auld Subject: [PATCH v5 6/7] sched/deadline: Deferrable dl server Date: Sat, 4 Nov 2023 11:59:23 +0100 Message-Id: X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Sat, 04 Nov 2023 04:00:55 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781630957510385685 X-GMAIL-MSGID: 1781630957510385685 Among the motivations for the DL servers is the real-time throttling mechanism. This mechanism works by throttling the rt_rq after running for a long period without leaving space for fair tasks. The base dl server avoids this problem by boosting fair tasks instead of throttling the rt_rq. The point is that it boosts without waiting for potential starvation, causing some non-intuitive cases. For example, an IRQ dispatches two tasks on an idle system, a fair and an RT. The DL server will be activated, running the fair task before the RT one. This problem can be avoided by deferring the dl server activation. By setting the zerolax option, the dl_server will dispatch an SCHED_DEADLINE reservation with replenished runtime, but throttled. The dl_timer will be set for (period - runtime) ns from start time. Thus boosting the fair rq on its 0-laxity time with respect to rt_rq. If the fair scheduler has the opportunity to run while waiting for zerolax time, the dl server runtime will be consumed. If the runtime is completely consumed before the zerolax time, the server will be replenished while still in a throttled state. Then, the dl_timer will be reset to the new zerolax time If the fair server reaches the zerolax time without consuming its runtime, the server will be boosted, following CBS rules (thus without breaking SCHED_DEADLINE). Signed-off-by: Daniel Bristot de Oliveira --- include/linux/sched.h | 2 + kernel/sched/deadline.c | 100 +++++++++++++++++++++++++++++++++++++++- kernel/sched/fair.c | 3 ++ 3 files changed, 103 insertions(+), 2 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 5ac1f252e136..56e53e6fd5a0 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -660,6 +660,8 @@ struct sched_dl_entity { unsigned int dl_non_contending : 1; unsigned int dl_overrun : 1; unsigned int dl_server : 1; + unsigned int dl_zerolax : 1; + unsigned int dl_zerolax_armed : 1; /* * Bandwidth enforcement timer. Each -deadline task has its diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 1d7b96ca9011..69ee1fbd60e4 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -772,6 +772,14 @@ static inline void replenish_dl_new_period(struct sched_dl_entity *dl_se, /* for non-boosted task, pi_of(dl_se) == dl_se */ dl_se->deadline = rq_clock(rq) + pi_of(dl_se)->dl_deadline; dl_se->runtime = pi_of(dl_se)->dl_runtime; + + /* + * If it is a zerolax reservation, throttle it. + */ + if (dl_se->dl_zerolax) { + dl_se->dl_throttled = 1; + dl_se->dl_zerolax_armed = 1; + } } /* @@ -828,6 +836,7 @@ static inline void setup_new_dl_entity(struct sched_dl_entity *dl_se) * could happen are, typically, a entity voluntarily trying to overcome its * runtime, or it just underestimated it during sched_setattr(). */ +static int start_dl_timer(struct sched_dl_entity *dl_se); static void replenish_dl_entity(struct sched_dl_entity *dl_se) { struct dl_rq *dl_rq = dl_rq_of_se(dl_se); @@ -874,6 +883,28 @@ static void replenish_dl_entity(struct sched_dl_entity *dl_se) dl_se->dl_yielded = 0; if (dl_se->dl_throttled) dl_se->dl_throttled = 0; + + /* + * If this is the replenishment of a zerolax reservation, + * clear the flag and return. + */ + if (dl_se->dl_zerolax_armed) { + dl_se->dl_zerolax_armed = 0; + return; + } + + /* + * A this point, if the zerolax server is not armed, and the deadline + * is in the future, throttle the server and arm the zerolax timer. + */ + if (dl_se->dl_zerolax && + dl_time_before(dl_se->deadline - dl_se->runtime, rq_clock(rq))) { + if (!is_dl_boosted(dl_se)) { + dl_se->dl_zerolax_armed = 1; + dl_se->dl_throttled = 1; + start_dl_timer(dl_se); + } + } } /* @@ -1024,6 +1055,13 @@ static void update_dl_entity(struct sched_dl_entity *dl_se) } replenish_dl_new_period(dl_se, rq); + } else if (dl_server(dl_se) && dl_se->dl_zerolax) { + /* + * The server can still use its previous deadline, so throttle + * and arm the zero-laxity timer. + */ + dl_se->dl_zerolax_armed = 1; + dl_se->dl_throttled = 1; } } @@ -1056,8 +1094,20 @@ static int start_dl_timer(struct sched_dl_entity *dl_se) * We want the timer to fire at the deadline, but considering * that it is actually coming from rq->clock and not from * hrtimer's time base reading. + * + * The zerolax reservation will have its timer set to the + * deadline - runtime. At that point, the CBS rule will decide + * if the current deadline can be used, or if a replenishment + * is required to avoid add too much pressure on the system + * (current u > U). */ - act = ns_to_ktime(dl_next_period(dl_se)); + if (dl_se->dl_zerolax_armed) { + WARN_ON_ONCE(!dl_se->dl_throttled); + act = ns_to_ktime(dl_se->deadline - dl_se->runtime); + } else { + act = ns_to_ktime(dl_next_period(dl_se)); + } + now = hrtimer_cb_get_time(timer); delta = ktime_to_ns(now) - rq_clock(rq); act = ktime_add_ns(act, delta); @@ -1333,6 +1383,9 @@ static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64 return; } + if (dl_server(dl_se) && dl_se->dl_throttled && !dl_se->dl_zerolax) + return; + if (dl_entity_is_special(dl_se)) return; @@ -1356,6 +1409,39 @@ static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64 dl_se->runtime -= scaled_delta_exec; + /* + * The fair server can consume its runtime while throttled (not queued). + * + * If the server consumes its entire runtime in this state. The server + * is not required for the current period. Thus, reset the server by + * starting a new period, pushing the activation to the zero-lax time. + */ + if (dl_se->dl_zerolax && dl_se->dl_throttled && dl_runtime_exceeded(dl_se)) { + s64 runtime_diff = dl_se->runtime + dl_se->dl_runtime; + + /* + * If this is a regular throttling case, let it run negative until + * the dl_runtime - runtime > 0. The reason being is that the next + * replenishment will result in a positive runtime one period ahead. + * + * Otherwise, the deadline will be pushed more than one period, not + * providing runtime/period anymore. + * + * If the dl_runtime - runtime < 0, then the server was able to get + * the runtime/period before the replenishment. So it is safe + * to start a new deffered period. + */ + if (!dl_se->dl_zerolax_armed && runtime_diff > 0) + return; + + hrtimer_try_to_cancel(&dl_se->dl_timer); + + replenish_dl_new_period(dl_se, dl_se->rq); + start_dl_timer(dl_se); + + return; + } + throttle: if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) { dl_se->dl_throttled = 1; @@ -1432,6 +1518,9 @@ void dl_server_start(struct sched_dl_entity *dl_se) void dl_server_stop(struct sched_dl_entity *dl_se) { dequeue_dl_entity(dl_se, DEQUEUE_SLEEP); + hrtimer_try_to_cancel(&dl_se->dl_timer); + dl_se->dl_zerolax_armed = 0; + dl_se->dl_throttled = 0; } void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq, @@ -1743,7 +1832,7 @@ enqueue_dl_entity(struct sched_dl_entity *dl_se, int flags) * be counted in the active utilization; hence, we need to call * add_running_bw(). */ - if (dl_se->dl_throttled && !(flags & ENQUEUE_REPLENISH)) { + if (!dl_se->dl_zerolax && dl_se->dl_throttled && !(flags & ENQUEUE_REPLENISH)) { if (flags & ENQUEUE_WAKEUP) task_contending(dl_se, flags); @@ -1765,6 +1854,13 @@ enqueue_dl_entity(struct sched_dl_entity *dl_se, int flags) setup_new_dl_entity(dl_se); } + /* + * If we are still throttled, eg. we got replenished but are a + * zero-laxity task and still got to wait, don't enqueue. + */ + if (dl_se->dl_throttled && start_dl_timer(dl_se)) + return; + __enqueue_dl_entity(dl_se); } diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index b15f7f376a67..399237cd9f59 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1201,6 +1201,8 @@ static void update_curr(struct cfs_rq *cfs_rq) account_group_exec_runtime(curtask, delta_exec); if (curtask->server) dl_server_update(curtask->server, delta_exec); + else + dl_server_update(&rq_of(cfs_rq)->fair_server, delta_exec); } account_cfs_rq_runtime(cfs_rq, delta_exec); @@ -8421,6 +8423,7 @@ void fair_server_init(struct rq *rq) dl_se->dl_runtime = 50 * NSEC_PER_MSEC; dl_se->dl_deadline = 1000 * NSEC_PER_MSEC; dl_se->dl_period = 1000 * NSEC_PER_MSEC; + dl_se->dl_zerolax = 1; dl_server_init(dl_se, rq, fair_server_has_tasks, fair_server_pick); } From patchwork Sat Nov 4 10:59:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Bristot de Oliveira X-Patchwork-Id: 161571 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:8f47:0:b0:403:3b70:6f57 with SMTP id j7csp1580979vqu; Sat, 4 Nov 2023 04:01:27 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGXspwVZOCiC3QEZ8YIPKrRXkf/JFMwb9W3tURO0K+bMCSsiwqK0Sd+/zm4BfCmGA0BNHe+ X-Received: by 2002:a05:6808:d49:b0:3b2:e0f0:e53d with SMTP id w9-20020a0568080d4900b003b2e0f0e53dmr28693993oik.37.1699095687173; Sat, 04 Nov 2023 04:01:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1699095687; cv=none; d=google.com; s=arc-20160816; b=v5FX5yTn56HFKboNhacGNXFuSkVP94gIddHMhqAGQpcdpqQP+/ToaVsGkPSFQSiPFQ 7gqzJQB9a5JH/GKlCr0QDxGr2dxAMU0WTGpDc+qOO4e5JP/oyJ1AC5fG4RSQWTJoLbv6 z/3vQmstlXM2GeUaNIgWfNnfwsVeXCbz8JmLp3Y7QdKDUvdhXJiIyqMQVcukfcjJU6E/ lm2A7vm3jPwtJzsiI8/9SJ2V7kh9qEsmCxinpyauiFsD85/0i83ChpspWIQ4WABTpeYi o25TmuFyV1ze1Z7Dmh6miKPp0p/Qwt7Cuvlijy63lTakiGC0HkvLvFpgeaGjtOsHqv0d l85g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=FmVhVrNCpBKay2gs3bA33KF83UhEVn8KoYDWnKiWtNU=; fh=hclUZirF6ioy5RJ9ji4zprhkXEWhJA3xztaBC4vzfLk=; b=AmMRBO2ko6OhTmi5xaw8Ts3gyVI9WeV5QefTA1Ct/J25C+PIo5mURcyjLKOM0pUhEb KfBvtLb2d718GaV20Ralyz2hLZ0TZC7usb9NmIh/ZJ4HtVa+nrdOoZlElCUFd6EKLIDM lvB0feRraySyd154I1enMMOfOJT/yzkAY0e2VajPmVfY42VqDBQC5Khk65bT9CEToVEI rkSe80mE6NMV0odc/bclaD+Rfx8PML6t/4GTKt8ccozLjg/N+S6EkwreTmCY6ce0atfK CjeSl0YvKPZzVvN0/UL/0+5R/W/xw1MWR6YUU39R6ra4Y3EhnkadfWlFhy6AmzmoFCJQ 9BxA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=cklzCtqB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id w13-20020a0568080d4d00b003a408c8acebsi1531717oik.63.2023.11.04.04.01.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 04 Nov 2023 04:01:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=cklzCtqB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 0A9AC803E7B0; Sat, 4 Nov 2023 04:01:04 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232279AbjKDLAu (ORCPT + 35 others); Sat, 4 Nov 2023 07:00:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47776 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232505AbjKDLAR (ORCPT ); Sat, 4 Nov 2023 07:00:17 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D5F32D67 for ; Sat, 4 Nov 2023 04:00:03 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5DCF8C433C7; Sat, 4 Nov 2023 10:59:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1699095603; bh=yNoCGgEwrXT571CuCNX7khSC0Fmbj52tE9/NrTraAwg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=cklzCtqB+aQmxgUupHu7SCHFn3UGt0OoCipmS7ED71dLkH8+1pj6FNmh0p2yOUjT8 qiTDGmMFpQh7iDVFH927gPIxOuV4f8ea6iCCe+yhcb0BXviwc9v+9PpAr8OzbEMKID SVrfH6lGPFGE1A/TrFIqQG30eIl3Z0IMoixd9CK8BR5iatcqHcJLm9jhDhun8VDwNS LBk9c+p2tqi8uT5kxLIbJcI49/FFZQU5z94cMZupeN8EurHI2ShvRYFhy419e+EghC 3ur0bcmYJR4ZQSn6aIBCUKbjbA3mEUdJspkLr8UWaZCmiiAu99tTkiYXCuL8oGNQFs y52iRRoNXQIIQ== From: Daniel Bristot de Oliveira To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot Cc: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , linux-kernel@vger.kernel.org, Luca Abeni , Tommaso Cucinotta , Thomas Gleixner , Joel Fernandes , Vineeth Pillai , Shuah Khan , bristot@kernel.org, Phil Auld Subject: [PATCH v5 7/7] sched/fair: Fair server interface Date: Sat, 4 Nov 2023 11:59:24 +0100 Message-Id: <26adad2378c8b15533e4f6216c2863341e587f57.1699095159.git.bristot@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Sat, 04 Nov 2023 04:01:04 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781630959109672417 X-GMAIL-MSGID: 1781630959109672417 Add an interface for fair server setup on debugfs. Each rq have three files under /sys/kernel/debug/sched/rq/CPU{ID}: - fair_server_runtime: set runtime in ns - fair_server_period: set period in ns - fair_server_defer: on/off for the defer mechanism Signed-off-by: Daniel Bristot de Oliveira --- kernel/sched/deadline.c | 89 +++++++++++++++--- kernel/sched/debug.c | 202 ++++++++++++++++++++++++++++++++++++++++ kernel/sched/fair.c | 6 -- kernel/sched/sched.h | 2 + 4 files changed, 279 insertions(+), 20 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 69ee1fbd60e4..1092ca8892e0 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -321,19 +321,12 @@ void sub_running_bw(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq) __sub_running_bw(dl_se->dl_bw, dl_rq); } -static void dl_change_utilization(struct task_struct *p, u64 new_bw) +static void dl_rq_change_utilization(struct rq *rq, struct sched_dl_entity *dl_se, u64 new_bw) { - struct rq *rq; - - WARN_ON_ONCE(p->dl.flags & SCHED_FLAG_SUGOV); - - if (task_on_rq_queued(p)) - return; + if (dl_se->dl_non_contending) { + sub_running_bw(dl_se, &rq->dl); + dl_se->dl_non_contending = 0; - rq = task_rq(p); - if (p->dl.dl_non_contending) { - sub_running_bw(&p->dl, &rq->dl); - p->dl.dl_non_contending = 0; /* * If the timer handler is currently running and the * timer cannot be canceled, inactive_task_timer() @@ -341,13 +334,25 @@ static void dl_change_utilization(struct task_struct *p, u64 new_bw) * will not touch the rq's active utilization, * so we are still safe. */ - if (hrtimer_try_to_cancel(&p->dl.inactive_timer) == 1) - put_task_struct(p); + if (hrtimer_try_to_cancel(&dl_se->inactive_timer) == 1) { + if (!dl_server(dl_se)) + put_task_struct(dl_task_of(dl_se)); + } } - __sub_rq_bw(p->dl.dl_bw, &rq->dl); + __sub_rq_bw(dl_se->dl_bw, &rq->dl); __add_rq_bw(new_bw, &rq->dl); } +static void dl_change_utilization(struct task_struct *p, u64 new_bw) +{ + WARN_ON_ONCE(p->dl.flags & SCHED_FLAG_SUGOV); + + if (task_on_rq_queued(p)) + return; + + dl_rq_change_utilization(task_rq(p), &p->dl, new_bw); +} + static void __dl_clear_params(struct sched_dl_entity *dl_se); /* @@ -1508,10 +1513,22 @@ void dl_server_update(struct sched_dl_entity *dl_se, s64 delta_exec) void dl_server_start(struct sched_dl_entity *dl_se) { + /* + * XXX: the apply do not work fine at the init phase for the + * fair server because things are not yet set. We need to improve + * this before getting generic. + */ if (!dl_server(dl_se)) { + u64 runtime = 50 * NSEC_PER_MSEC; + u64 period = 1000 * NSEC_PER_MSEC; + + dl_server_apply_params(dl_se, runtime, period, 1); + + dl_se->dl_zerolax = 1; dl_se->dl_server = 1; setup_new_dl_entity(dl_se); } + enqueue_dl_entity(dl_se, ENQUEUE_WAKEUP); } @@ -1532,6 +1549,50 @@ void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq, dl_se->server_pick = pick; } +int dl_server_apply_params(struct sched_dl_entity *dl_se, u64 runtime, u64 period, bool init) +{ + u64 old_bw = init ? 0 : to_ratio(dl_se->dl_period, dl_se->dl_runtime); + u64 new_bw = to_ratio(period, runtime); + struct rq *rq = dl_se->rq; + int cpu = cpu_of(rq); + struct dl_bw *dl_b; + unsigned long cap; + int retval = 0; + int cpus; + + dl_b = dl_bw_of(cpu); + raw_spin_lock(&dl_b->lock); + cpus = dl_bw_cpus(cpu); + cap = dl_bw_capacity(cpu); + + if (__dl_overflow(dl_b, cap, old_bw, new_bw)) { + retval = -EBUSY; + goto out; + } + + if (init) { + __add_rq_bw(new_bw, &rq->dl); + __dl_add(dl_b, new_bw, cpus); + } else { + __dl_sub(dl_b, dl_se->dl_bw, cpus); + __dl_add(dl_b, new_bw, cpus); + + dl_rq_change_utilization(rq, dl_se, new_bw); + } + + rq->fair_server.dl_runtime = runtime; + rq->fair_server.dl_deadline = period; + rq->fair_server.dl_period = period; + + dl_se->dl_bw = to_ratio(dl_se->dl_period, dl_se->dl_runtime); + dl_se->dl_density = to_ratio(dl_se->dl_deadline, dl_se->dl_runtime); + +out: + raw_spin_unlock(&dl_b->lock); + + return retval; +} + /* * Update the current task's runtime statistics (provided it is still * a -deadline task and has not been removed from the dl_rq). diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 4580a450700e..bd7ad6b8d3de 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -333,8 +333,208 @@ static const struct file_operations sched_debug_fops = { .release = seq_release, }; +enum dl_param { + DL_RUNTIME = 0, + DL_PERIOD, + DL_ZEROLAX +}; + +static unsigned long fair_server_period_max = (1 << 22) * NSEC_PER_USEC; /* ~4 seconds */ +static unsigned long fair_server_period_min = (100) * NSEC_PER_USEC; /* 100 us */ + +static ssize_t sched_fair_server_write(struct file *filp, const char __user *ubuf, + size_t cnt, loff_t *ppos, enum dl_param param) +{ + long cpu = (long) ((struct seq_file *) filp->private_data)->private; + u64 runtime, period, zerolax; + struct rq *rq = cpu_rq(cpu); + size_t err; + int retval; + u64 value; + + err = kstrtoull_from_user(ubuf, cnt, 10, &value); + if (err) + return err; + + scoped_guard (rq_lock_irqsave, rq) { + + runtime = rq->fair_server.dl_runtime; + period = rq->fair_server.dl_period; + zerolax = rq->fair_server.dl_zerolax; + + switch (param) { + case DL_RUNTIME: + if (runtime == value) + goto out; + runtime = value; + break; + case DL_PERIOD: + if (value == period) + goto out; + period = value; + break; + case DL_ZEROLAX: + if (zerolax == value) + goto out; + zerolax = value; + break; + } + + if (runtime > period + || period > fair_server_period_max + || period < fair_server_period_min + || zerolax > 1) { + cnt = -EINVAL; + goto out; + } + + if (rq->cfs.h_nr_running) { + update_rq_clock(rq); + dl_server_stop(&rq->fair_server); + } + + /* + * The zerolax does not change utilization, so just + * setting it is enough. + */ + if (rq->fair_server.dl_zerolax != zerolax) { + rq->fair_server.dl_zerolax = zerolax; + } else { + retval = dl_server_apply_params(&rq->fair_server, runtime, period, 0); + if (retval) + cnt = retval; + } + + if (rq->cfs.h_nr_running) + dl_server_start(&rq->fair_server); + } + +out: + *ppos += cnt; + return cnt; +} + +static size_t sched_fair_server_show(struct seq_file *m, void *v, enum dl_param param) +{ + unsigned long cpu = (unsigned long) m->private; + struct rq *rq = cpu_rq(cpu); + u64 value; + + switch (param) { + case DL_RUNTIME: + value = rq->fair_server.dl_runtime; + break; + case DL_PERIOD: + value = rq->fair_server.dl_period; + break; + case DL_ZEROLAX: + value = rq->fair_server.dl_zerolax; + } + + seq_printf(m, "%llu\n", value); + return 0; + +} + +static ssize_t +sched_fair_server_runtime_write(struct file *filp, const char __user *ubuf, + size_t cnt, loff_t *ppos) +{ + return sched_fair_server_write(filp, ubuf, cnt, ppos, DL_RUNTIME); +} + +static int sched_fair_server_runtime_show(struct seq_file *m, void *v) +{ + return sched_fair_server_show(m, v, DL_RUNTIME); +} + +static int sched_fair_server_runtime_open(struct inode *inode, struct file *filp) +{ + return single_open(filp, sched_fair_server_runtime_show, inode->i_private); +} + +static const struct file_operations fair_server_runtime_fops = { + .open = sched_fair_server_runtime_open, + .write = sched_fair_server_runtime_write, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +static ssize_t +sched_fair_server_period_write(struct file *filp, const char __user *ubuf, + size_t cnt, loff_t *ppos) +{ + return sched_fair_server_write(filp, ubuf, cnt, ppos, DL_PERIOD); +} + +static int sched_fair_server_period_show(struct seq_file *m, void *v) +{ + return sched_fair_server_show(m, v, DL_PERIOD); +} + +static int sched_fair_server_period_open(struct inode *inode, struct file *filp) +{ + return single_open(filp, sched_fair_server_period_show, inode->i_private); +} + +static const struct file_operations fair_server_period_fops = { + .open = sched_fair_server_period_open, + .write = sched_fair_server_period_write, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +static ssize_t +sched_fair_server_defer_write(struct file *filp, const char __user *ubuf, + size_t cnt, loff_t *ppos) +{ + return sched_fair_server_write(filp, ubuf, cnt, ppos, DL_ZEROLAX); +} + +static int sched_fair_server_defer_show(struct seq_file *m, void *v) +{ + return sched_fair_server_show(m, v, DL_ZEROLAX); +} + +static int sched_fair_server_defer_open(struct inode *inode, struct file *filp) +{ + return single_open(filp, sched_fair_server_defer_show, inode->i_private); +} + +static const struct file_operations fair_server_defer_fops = { + .open = sched_fair_server_defer_open, + .write = sched_fair_server_defer_write, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + static struct dentry *debugfs_sched; +static void debugfs_fair_server_init(void) +{ + long cpu; + struct dentry *rq_dentry; + + rq_dentry = debugfs_create_dir("rq", debugfs_sched); + if (!rq_dentry) + return; + + for_each_possible_cpu(cpu) { + struct dentry *d_cpu; + char buf[32]; + + snprintf(buf, sizeof(buf), "cpu%ld", cpu); + d_cpu = debugfs_create_dir(buf, rq_dentry); + + debugfs_create_file("fair_server_runtime", 0644, d_cpu, (void *) cpu, &fair_server_runtime_fops); + debugfs_create_file("fair_server_period", 0644, d_cpu, (void *) cpu, &fair_server_period_fops); + debugfs_create_file("fair_server_defer", 0644, d_cpu, (void *) cpu, &fair_server_defer_fops); + } +} + static __init int sched_init_debug(void) { struct dentry __maybe_unused *numa; @@ -374,6 +574,8 @@ static __init int sched_init_debug(void) debugfs_create_file("debug", 0444, debugfs_sched, NULL, &sched_debug_fops); + debugfs_fair_server_init(); + return 0; } late_initcall(sched_init_debug); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 399237cd9f59..5434c52f470d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8419,12 +8419,6 @@ void fair_server_init(struct rq *rq) struct sched_dl_entity *dl_se = &rq->fair_server; init_dl_entity(dl_se); - - dl_se->dl_runtime = 50 * NSEC_PER_MSEC; - dl_se->dl_deadline = 1000 * NSEC_PER_MSEC; - dl_se->dl_period = 1000 * NSEC_PER_MSEC; - dl_se->dl_zerolax = 1; - dl_server_init(dl_se, rq, fair_server_has_tasks, fair_server_pick); } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index ec0e288c8e06..312b31df5860 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -341,6 +341,8 @@ extern void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq, dl_server_pick_f pick); extern void fair_server_init(struct rq *); +extern int dl_server_apply_params(struct sched_dl_entity *dl_se, + u64 runtime, u64 period, bool init); #ifdef CONFIG_CGROUP_SCHED