From patchwork Mon Mar 6 13:25:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Zijlstra X-Patchwork-Id: 64772 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1905654wrd; Mon, 6 Mar 2023 07:31:26 -0800 (PST) X-Google-Smtp-Source: AK7set97/C09pPEPnoo6lcnSxW423PDiS4+106/uWjIi+nHbf2sRGeQ2rQRKG12ZQ+61Odoe5HJP X-Received: by 2002:a17:907:98d2:b0:8af:54d0:181d with SMTP id kd18-20020a17090798d200b008af54d0181dmr9527236ejc.35.1678116665650; Mon, 06 Mar 2023 07:31:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678116665; cv=none; d=google.com; s=arc-20160816; b=tAjaX7yiefxV91Pb3xBd5JRcyY1y6Lg8zNIjkDZfA7MmltB6du78uJgpra2nIXeFmu gkaNd/7Y4hRMWpmcdZzaRkzdKXIOfCza9t+xSw/KQrvBsEyYD5I9sOgl4AbwjRVbX9Ag DLYbeYfSPHBTzYhEdEWzZCIfl95VtAjD9Y3ADM5tTokcrsnp1/ZhoH9pKTt8JM0Hrx6E a50DbT7joQMFQoYngmjtQjgcHuSmAnuGOU5xxU+DXA6adu4ZML/jBx/Ym6YYMM9uy/G+ F7udaF2NKpiW1lsTGfqgmAOidiOdg7yHU6en+MJ8MzqH0fzxpn61hDNm8GJKEmc2I74r PGDw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id:dkim-signature; bh=sMJx6b/CxL1Su3B8XIuAOsNn/3tYb0QvTWKkBLS2Aaw=; b=Ks18q49xjLyrKnH/r9MhsQp45z4mRA6jqA/gkrGeddZ0EGavrzTa8PpjD613eggYwU E+lPktHHn2TIK7MEzJk9W2FBRjTm9hisRb+jOBIrwt6tZ2efE/4HLdDnBgkxVnzP8ANb dntWpB2WioZR6k6alPVqqFOlAWMR2v2+59me3FDPZCKaAmclKAg0zw4pBirsr2NTUpA4 28bR/WfMnqZvmNPN12FtYs7ZvKTO7acKfNsSKTVSGN6muDjD8gpEPCfmjV2RaEAzAsWh ltNSiWtFPWRi4wjUKX8933FmJ2qRQJkPhM5W3U6SLSUZT1hqn0vGPTovv/giWqk42hJ/ dZtw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=desiato.20200630 header.b=Eyhjj0kQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i9-20020a170906a28900b008ce03a3825csi9805933ejz.258.2023.03.06.07.30.29; Mon, 06 Mar 2023 07:31:05 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=desiato.20200630 header.b=Eyhjj0kQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231181AbjCFOpz (ORCPT + 99 others); Mon, 6 Mar 2023 09:45:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46392 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230113AbjCFOps (ORCPT ); Mon, 6 Mar 2023 09:45:48 -0500 Received: from desiato.infradead.org (desiato.infradead.org [IPv6:2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 604495B9F for ; Mon, 6 Mar 2023 06:45:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=sMJx6b/CxL1Su3B8XIuAOsNn/3tYb0QvTWKkBLS2Aaw=; b=Eyhjj0kQppj99/M6scwljF1Ci9 8Za4R/qWyEibkfq1nLAm4uHdW5E/l2jOoGyu2/FGYqS7PAXcb5c/PS8r7IWOoTjAP4nrACrpkDzDq R8NeObnb2aPiQJhuySIt7p2qoshmduNZnHPcY9cIK8OfLcdk5qdcCbUb07+UZCd5OitI/jl29/0w+ saoW2cuK2xV03csyIpXt2ITuXcRlBoGEU1XSKOsj7g5L6PQOPv8rfSBezgKe2QtfYrALrcBGOD8VV M+f6RIyjJ8mBiPpfzBFVwXwO8MeDJpfaXtcuyocyOgDg0fXB0NRLPzPvqgWklrWSF7PeGvi1MG8YE II0C8Lcw==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1pZBe6-00GhOv-09; Mon, 06 Mar 2023 14:16:58 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 0D96B300487; Mon, 6 Mar 2023 15:16:55 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id E9E7E23BC8E2B; Mon, 6 Mar 2023 15:16:54 +0100 (CET) Message-ID: <20230306141502.269897544@infradead.org> User-Agent: quilt/0.66 Date: Mon, 06 Mar 2023 14:25:22 +0100 From: Peter Zijlstra To: mingo@kernel.org, vincent.guittot@linaro.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, juri.lelli@redhat.com, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, corbet@lwn.net, qyousef@layalina.io, chris.hyser@oracle.com, patrick.bellasi@matbug.net, pjt@google.com, pavel@ucw.cz, qperret@google.com, tim.c.chen@linux.intel.com, joshdon@google.com, timj@gnu.org, kprateek.nayak@amd.com, yu.c.chen@intel.com, youssefesmat@chromium.org, joel@joelfernandes.org, Parth Shah Subject: [PATCH 01/10] sched: Introduce latency-nice as a per-task attribute References: <20230306132521.968182689@infradead.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759632860772043500?= X-GMAIL-MSGID: =?utf-8?q?1759632860772043500?= From: Parth Shah Latency-nice indicates the latency requirements of a task with respect to the other tasks in the system. The value of the attribute can be within the range of [-20, 19] both inclusive to be in-line with the values just like task nice values. latency_nice = -20 indicates the task to have the least latency as compared to the tasks having latency_nice = +19. The latency_nice may affect only the CFS SCHED_CLASS by getting latency requirements from the userspace. Additionally, add debugging bits for newly added latency_nice attribute. [rebase, move defines in sched/prio.h] Signed-off-by: Parth Shah Signed-off-by: Vincent Guittot Signed-off-by: Peter Zijlstra (Intel) Tested-by: K Prateek Nayak Link: https://lkml.kernel.org/r/20230224093454.956298-3-vincent.guittot@linaro.org --- include/linux/sched.h | 1 + include/linux/sched/prio.h | 18 ++++++++++++++++++ kernel/sched/debug.c | 1 + 3 files changed, 20 insertions(+) Index: linux-2.6/include/linux/sched.h =================================================================== --- linux-2.6.orig/include/linux/sched.h +++ linux-2.6/include/linux/sched.h @@ -784,6 +784,7 @@ struct task_struct { int static_prio; int normal_prio; unsigned int rt_priority; + int latency_nice; struct sched_entity se; struct sched_rt_entity rt; Index: linux-2.6/include/linux/sched/prio.h =================================================================== --- linux-2.6.orig/include/linux/sched/prio.h +++ linux-2.6/include/linux/sched/prio.h @@ -42,4 +42,22 @@ static inline long rlimit_to_nice(long p return (MAX_NICE - prio + 1); } +/* + * Latency nice is meant to provide scheduler hints about the relative + * latency requirements of a task with respect to other tasks. + * Thus a task with latency_nice == 19 can be hinted as the task with no + * latency requirements, in contrast to the task with latency_nice == -20 + * which should be given priority in terms of lower latency. + */ +#define MAX_LATENCY_NICE 19 +#define MIN_LATENCY_NICE -20 + +#define LATENCY_NICE_WIDTH \ + (MAX_LATENCY_NICE - MIN_LATENCY_NICE + 1) + +/* + * Default tasks should be treated as a task with latency_nice = 0. + */ +#define DEFAULT_LATENCY_NICE 0 + #endif /* _LINUX_SCHED_PRIO_H */ Index: linux-2.6/kernel/sched/debug.c =================================================================== --- linux-2.6.orig/kernel/sched/debug.c +++ linux-2.6/kernel/sched/debug.c @@ -1043,6 +1043,7 @@ void proc_sched_show_task(struct task_st #endif P(policy); P(prio); + P(latency_nice); if (task_has_dl_policy(p)) { P(dl.runtime); P(dl.deadline); From patchwork Mon Mar 6 13:25:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Zijlstra X-Patchwork-Id: 64776 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1906106wrd; Mon, 6 Mar 2023 07:32:07 -0800 (PST) X-Google-Smtp-Source: AK7set/zLNoiryUM+YzuuE8wb2dqBFjx58h5bmhpTNKt+NlVfjGPSs/Ysdh1qhi0R8Z3Y5wgM5K0 X-Received: by 2002:a05:6a21:329b:b0:cc:c80f:fa6b with SMTP id yt27-20020a056a21329b00b000ccc80ffa6bmr13073907pzb.27.1678116727325; Mon, 06 Mar 2023 07:32:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678116727; cv=none; d=google.com; s=arc-20160816; b=WssnvucKge+9RkcPGWfO8yY0t+aYFWcCMNjf25UoOc8LjM5tCwDbPVWdP8gVyu/mid uK9vs+0Gh7ykBaXKCSJH5ajKNfWEgdc/NR1NUoJ2LRZd+xZOuiGL7MQWJcFLj25vmwcA strMGKEB05gzG03Y8DwdFzMAKMqe3+oMkq105dZdIMmNNT6SQMihWmAQpl3KlpmZg2fQ xx5MhnBppN7GrLfQmfVYpbGyM9/S0hQGzEEDrGQBsP2mSbgF3pqYQgHUtwGPaovNvfUm isp16Szw/6UrwTm+lSZgGmO+q3PUJIa53IMXXfIPZRNp8QIg+2wfAbirWzMmWSRjNmBx pAMw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id:dkim-signature; bh=9t25UyY8GsyWpyj/eeeTlIlu/SI7oOBQaOdBT8jlk2E=; b=wM9R4cfLqxju+agE6woo09Ciq+6edr75JnkpqXk7NSbxqTdKMVrg5BE8hJ4xK1v6RU 5U4WIcT9Np7fiahVJhiUpAW91actpnxXK9QBr/Wzq9n4fkdAkkrAA/KRg0BbEErSIIPd K+qugDYYpFfe7Ae2I2h0nyeH7HHK5Sv5blfO+VaBv74kRp9nNCsYpAXwNXAIvYV3LH69 8HdBgDFCzVJTGUEV59nCg6NSqgjl0P0TXXccTlMlop586Se8JJ7tFmaEdwUeItGIeX4l MnmDu4lWSU1Q9i2iiE0hqRS9uqksMnLPVLfZfoLUoACMyLhTIjpH98y8AXfTKDEpPryc TYyg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=mLHD0T3u; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o7-20020a656a47000000b004fba03ee686si10724195pgu.202.2023.03.06.07.31.54; Mon, 06 Mar 2023 07:32:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=mLHD0T3u; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229835AbjCFOny (ORCPT + 99 others); Mon, 6 Mar 2023 09:43:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42878 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230135AbjCFOnw (ORCPT ); Mon, 6 Mar 2023 09:43:52 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B6FB0233D5 for ; Mon, 6 Mar 2023 06:43:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=9t25UyY8GsyWpyj/eeeTlIlu/SI7oOBQaOdBT8jlk2E=; b=mLHD0T3uw08quqSPuOSfIa/Eis cLX+zIruVrcLAtPCpU4ucMQCicOyoxoB43+Zl8DnpSztzrurL26L2gQf00z+2oAeDofyFWPYLP1Hv FST5d1HrmZPuh2YXco40pK6AG+mOwN39M95WFah0HxXS+tVxX00WSM+5HXcweJIj9suJGnzr5bPCL 20nswJtcmtpdLueRNWH8l6iRW4l/iV7DNrGtAky8AoD77NZJDgb1bLTGYt5N8X7WGFIVa+4ry2U9q vwDnVu/YKe0EhbwGozHeAo/FcMNy5fpnPIiAcB0KoybCL2dY8PZ45va8TVSmyfcC4QZscC3wDk7h2 e5RbAqNw==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1pZBe5-005P2O-JN; Mon, 06 Mar 2023 14:16:58 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 11DF23007FB; Mon, 6 Mar 2023 15:16:55 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id EBEA323BC8E2D; Mon, 6 Mar 2023 15:16:54 +0100 (CET) Message-ID: <20230306141502.329973596@infradead.org> User-Agent: quilt/0.66 Date: Mon, 06 Mar 2023 14:25:23 +0100 From: Peter Zijlstra To: mingo@kernel.org, vincent.guittot@linaro.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, juri.lelli@redhat.com, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, corbet@lwn.net, qyousef@layalina.io, chris.hyser@oracle.com, patrick.bellasi@matbug.net, pjt@google.com, pavel@ucw.cz, qperret@google.com, tim.c.chen@linux.intel.com, joshdon@google.com, timj@gnu.org, kprateek.nayak@amd.com, yu.c.chen@intel.com, youssefesmat@chromium.org, joel@joelfernandes.org, Parth Shah Subject: [PATCH 02/10] sched/core: Propagate parent tasks latency requirements to the child task References: <20230306132521.968182689@infradead.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759632925778033403?= X-GMAIL-MSGID: =?utf-8?q?1759632925778033403?= From: Parth Shah Clone parent task's latency_nice attribute to the forked child task. Reset the latency_nice value to default value when the child task is set to sched_reset_on_fork. Also, initialize init_task.latency_nice value with DEFAULT_LATENCY_NICE value [rebase] Signed-off-by: Parth Shah Signed-off-by: Vincent Guittot Signed-off-by: Peter Zijlstra (Intel) Tested-by: K Prateek Nayak Link: https://lkml.kernel.org/r/20230224093454.956298-4-vincent.guittot@linaro.org --- init/init_task.c | 1 + kernel/sched/core.c | 1 + 2 files changed, 2 insertions(+) Index: linux-2.6/init/init_task.c =================================================================== --- linux-2.6.orig/init/init_task.c +++ linux-2.6/init/init_task.c @@ -78,6 +78,7 @@ struct task_struct init_task .prio = MAX_PRIO - 20, .static_prio = MAX_PRIO - 20, .normal_prio = MAX_PRIO - 20, + .latency_nice = DEFAULT_LATENCY_NICE, .policy = SCHED_NORMAL, .cpus_ptr = &init_task.cpus_mask, .user_cpus_ptr = NULL, Index: linux-2.6/kernel/sched/core.c =================================================================== --- linux-2.6.orig/kernel/sched/core.c +++ linux-2.6/kernel/sched/core.c @@ -4684,6 +4684,7 @@ int sched_fork(unsigned long clone_flags p->prio = p->normal_prio = p->static_prio; set_load_weight(p, false); + p->latency_nice = DEFAULT_LATENCY_NICE; /* * We don't need the reset flag anymore after the fork. It has * fulfilled its duty: From patchwork Mon Mar 6 13:25:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Zijlstra X-Patchwork-Id: 64775 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1906077wrd; Mon, 6 Mar 2023 07:32:04 -0800 (PST) X-Google-Smtp-Source: AK7set/rvGLmKgl16nmIqg8eNRMLAwyUmOJEZIwmo4xjfHYp1KIutD7CiFastN4vJVjovUMfo1bn X-Received: by 2002:a05:6a20:7fa8:b0:cd:1a05:f4f4 with SMTP id d40-20020a056a207fa800b000cd1a05f4f4mr14515566pzj.19.1678116724098; Mon, 06 Mar 2023 07:32:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678116724; cv=none; d=google.com; s=arc-20160816; b=sUDb2c2AkzjpPrZOniakgk84mc0Vi/qiIw9Pkui706GnT2diGOcY46go4ooGlXNn1L 0PDo6kl7I9VyWOO+v+JDTgWBoqHnLT2vkrbFMgejZgHvj1MxM2gFmPStpLphKfj8fj2Q npxNhgxUczgKU9WLEG7edpqHeG18lGPyWMJJJjTnOpqWX8ex3TtDMDTGrEWltMJw6UTg /rLjttiwOsYSfqQSFoRdktEp1Zf50XqVw+FD8GvKLO15/29CfBLoeDSLkfUiJG26yC1+ a87lBWXe02QVBvEGrki5lm/+qGPLz1hIjE4pEqtoXVhQbb284/UKmw5yVG9i8KWMXYeX 12qQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id:dkim-signature; bh=XJd9UGDGVKJW7EfDM9UXnaZnhcyrhDo3UGzCNBoVjSY=; b=MGXnGFn19WIL//swyIrh9roEQ3L95KWcZBVHc5/7Kwrw8Y2qmQIZC48Y/1LKLYp6FG U7L6wmShsEshjumnWk8uYBddKQtpjzO8ZIOCTOJyLK/msD9bA3ffz8xufeIwYsJaYtYZ HC7x6yQypTNc/lZ4oO+MAlSYEWrGs7rRWjXg8LJTULtbw1DcP+49b9Vgldwehn2PvmFJ gU6RtKTwMRHnVbxC6cm5ZT+VOdnNJ5eNFg2O1L0CaOxI0p9GMoPI9aYOyaZrBl4ss0kh g7e1QQSF2uOODlWhRNClN/GWjpNJjcYXdy0qto4p0EMmX70MaZtXCmanSAgcITdwtLS5 HWKA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=oRDLCGaz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a13-20020a630b4d000000b00502d774baefsi6454560pgl.241.2023.03.06.07.31.40; Mon, 06 Mar 2023 07:32:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=oRDLCGaz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230063AbjCFOoR (ORCPT + 99 others); Mon, 6 Mar 2023 09:44:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43548 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230135AbjCFOoO (ORCPT ); Mon, 6 Mar 2023 09:44:14 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 74B147EFE for ; Mon, 6 Mar 2023 06:43:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=XJd9UGDGVKJW7EfDM9UXnaZnhcyrhDo3UGzCNBoVjSY=; b=oRDLCGazLHFKyJOz7QH5qVqiPk 1sgbpSg2stwFtIYhWT7tVdIHijgCy/jphUD9/eHr1Ku0PSY6Wo05Jr4NeA7B7afM48uSfXWCIrqB4 mQujD6RrWJ+sjVCkILF0Go4gL/pO7cbORmxOG90jvfSWaFmKmp1VM7kkm+/37+uUBVAfG5Tbn/XPe mK2pWstmttoB02kVNTtCe1p1u4dezjqs/U/xYGeoCxKURvEmpkvaaXTU4ZUD76O6E2532xGcOXHLA CWi6QleWwRa4FKoKIpid4B9LS9xYbKElExRNjY77e8TUwKO+Lq+BkLfn0qYIQ86FbYo8lBuis/q56 xD55RWiA==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1pZBe5-005P2N-JY; Mon, 06 Mar 2023 14:16:58 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 15EC430084B; Mon, 6 Mar 2023 15:16:55 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id F100323BC8E2A; Mon, 6 Mar 2023 15:16:54 +0100 (CET) Message-ID: <20230306141502.389860970@infradead.org> User-Agent: quilt/0.66 Date: Mon, 06 Mar 2023 14:25:24 +0100 From: Peter Zijlstra To: mingo@kernel.org, vincent.guittot@linaro.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, juri.lelli@redhat.com, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, corbet@lwn.net, qyousef@layalina.io, chris.hyser@oracle.com, patrick.bellasi@matbug.net, pjt@google.com, pavel@ucw.cz, qperret@google.com, tim.c.chen@linux.intel.com, joshdon@google.com, timj@gnu.org, kprateek.nayak@amd.com, yu.c.chen@intel.com, youssefesmat@chromium.org, joel@joelfernandes.org, Parth Shah Subject: [PATCH 03/10] sched: Allow sched_{get,set}attr to change latency_nice of the task References: <20230306132521.968182689@infradead.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759632921962781155?= X-GMAIL-MSGID: =?utf-8?q?1759632921962781155?= From: Parth Shah Introduce the latency_nice attribute to sched_attr and provide a mechanism to change the value with the use of sched_setattr/sched_getattr syscall. Also add new flag "SCHED_FLAG_LATENCY_NICE" to hint the change in latency_nice of the task on every sched_setattr syscall. [rebase and add a dedicated __setscheduler_latency ] Signed-off-by: Parth Shah Signed-off-by: Vincent Guittot Signed-off-by: Peter Zijlstra (Intel) Tested-by: K Prateek Nayak Link: https://lkml.kernel.org/r/20230224093454.956298-5-vincent.guittot@linaro.org --- include/uapi/linux/sched.h | 4 +++- include/uapi/linux/sched/types.h | 19 +++++++++++++++++++ kernel/sched/core.c | 24 ++++++++++++++++++++++++ tools/include/uapi/linux/sched.h | 4 +++- 4 files changed, 49 insertions(+), 2 deletions(-) Index: linux-2.6/include/uapi/linux/sched.h =================================================================== --- linux-2.6.orig/include/uapi/linux/sched.h +++ linux-2.6/include/uapi/linux/sched.h @@ -132,6 +132,7 @@ struct clone_args { #define SCHED_FLAG_KEEP_PARAMS 0x10 #define SCHED_FLAG_UTIL_CLAMP_MIN 0x20 #define SCHED_FLAG_UTIL_CLAMP_MAX 0x40 +#define SCHED_FLAG_LATENCY_NICE 0x80 #define SCHED_FLAG_KEEP_ALL (SCHED_FLAG_KEEP_POLICY | \ SCHED_FLAG_KEEP_PARAMS) @@ -143,6 +144,7 @@ struct clone_args { SCHED_FLAG_RECLAIM | \ SCHED_FLAG_DL_OVERRUN | \ SCHED_FLAG_KEEP_ALL | \ - SCHED_FLAG_UTIL_CLAMP) + SCHED_FLAG_UTIL_CLAMP | \ + SCHED_FLAG_LATENCY_NICE) #endif /* _UAPI_LINUX_SCHED_H */ Index: linux-2.6/include/uapi/linux/sched/types.h =================================================================== --- linux-2.6.orig/include/uapi/linux/sched/types.h +++ linux-2.6/include/uapi/linux/sched/types.h @@ -10,6 +10,7 @@ struct sched_param { #define SCHED_ATTR_SIZE_VER0 48 /* sizeof first published struct */ #define SCHED_ATTR_SIZE_VER1 56 /* add: util_{min,max} */ +#define SCHED_ATTR_SIZE_VER2 60 /* add: latency_nice */ /* * Extended scheduling parameters data structure. @@ -98,6 +99,22 @@ struct sched_param { * scheduled on a CPU with no more capacity than the specified value. * * A task utilization boundary can be reset by setting the attribute to -1. + * + * Latency Tolerance Attributes + * =========================== + * + * A subset of sched_attr attributes allows to specify the relative latency + * requirements of a task with respect to the other tasks running/queued in the + * system. + * + * @ sched_latency_nice task's latency_nice value + * + * The latency_nice of a task can have any value in a range of + * [MIN_LATENCY_NICE..MAX_LATENCY_NICE]. + * + * A task with latency_nice with the value of LATENCY_NICE_MIN can be + * taken for a task requiring a lower latency as opposed to the task with + * higher latency_nice. */ struct sched_attr { __u32 size; @@ -120,6 +137,8 @@ struct sched_attr { __u32 sched_util_min; __u32 sched_util_max; + /* latency requirement hints */ + __s32 sched_latency_nice; }; #endif /* _UAPI_LINUX_SCHED_TYPES_H */ Index: linux-2.6/kernel/sched/core.c =================================================================== --- linux-2.6.orig/kernel/sched/core.c +++ linux-2.6/kernel/sched/core.c @@ -7451,6 +7451,13 @@ static void __setscheduler_params(struct p->rt_priority = attr->sched_priority; p->normal_prio = normal_prio(p); set_load_weight(p, true); +} + +static void __setscheduler_latency(struct task_struct *p, + const struct sched_attr *attr) +{ + if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE) + p->latency_nice = attr->sched_latency_nice; } /* @@ -7593,6 +7601,13 @@ recheck: return retval; } + if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE) { + if (attr->sched_latency_nice > MAX_LATENCY_NICE) + return -EINVAL; + if (attr->sched_latency_nice < MIN_LATENCY_NICE) + return -EINVAL; + } + if (pi) cpuset_read_lock(); @@ -7627,6 +7642,9 @@ recheck: goto change; if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP) goto change; + if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE && + attr->sched_latency_nice != p->latency_nice) + goto change; p->sched_reset_on_fork = reset_on_fork; retval = 0; @@ -7715,6 +7733,7 @@ change: __setscheduler_params(p, attr); __setscheduler_prio(p, newprio); } + __setscheduler_latency(p, attr); __setscheduler_uclamp(p, attr); if (queued) { @@ -7925,6 +7944,9 @@ static int sched_copy_attr(struct sched_ size < SCHED_ATTR_SIZE_VER1) return -EINVAL; + if ((attr->sched_flags & SCHED_FLAG_LATENCY_NICE) && + size < SCHED_ATTR_SIZE_VER2) + return -EINVAL; /* * XXX: Do we want to be lenient like existing syscalls; or do we want * to be strict and return an error on out-of-bounds values? @@ -8162,6 +8184,8 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pi get_params(p, &kattr); kattr.sched_flags &= SCHED_FLAG_ALL; + kattr.sched_latency_nice = p->latency_nice; + #ifdef CONFIG_UCLAMP_TASK /* * This could race with another potential updater, but this is fine Index: linux-2.6/tools/include/uapi/linux/sched.h =================================================================== --- linux-2.6.orig/tools/include/uapi/linux/sched.h +++ linux-2.6/tools/include/uapi/linux/sched.h @@ -132,6 +132,7 @@ struct clone_args { #define SCHED_FLAG_KEEP_PARAMS 0x10 #define SCHED_FLAG_UTIL_CLAMP_MIN 0x20 #define SCHED_FLAG_UTIL_CLAMP_MAX 0x40 +#define SCHED_FLAG_LATENCY_NICE 0x80 #define SCHED_FLAG_KEEP_ALL (SCHED_FLAG_KEEP_POLICY | \ SCHED_FLAG_KEEP_PARAMS) @@ -143,6 +144,7 @@ struct clone_args { SCHED_FLAG_RECLAIM | \ SCHED_FLAG_DL_OVERRUN | \ SCHED_FLAG_KEEP_ALL | \ - SCHED_FLAG_UTIL_CLAMP) + SCHED_FLAG_UTIL_CLAMP | \ + SCHED_FLAG_LATENCY_NICE) #endif /* _UAPI_LINUX_SCHED_H */ From patchwork Mon Mar 6 13:25:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Zijlstra X-Patchwork-Id: 64740 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1884443wrd; Mon, 6 Mar 2023 06:55:24 -0800 (PST) X-Google-Smtp-Source: AK7set+yUfz3p8wzh3iw+ZprIaX4xY20kzccD2JDNyT0384ntkwbf+/WtHxd5FMs/paVeTLkJd2p X-Received: by 2002:a05:6a21:6d9f:b0:cd:a334:a531 with SMTP id wl31-20020a056a216d9f00b000cda334a531mr14527826pzb.62.1678114524134; Mon, 06 Mar 2023 06:55:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678114524; cv=none; d=google.com; s=arc-20160816; b=f7SVxyi5VYahhJDioBmXr/iLZ3Fdy1c8g2AWZMWkCMdfpRVYit/6L0dGYeVuIdBvnH 0PfUUaT2ewIj5H6A9o2QE3imhlf3eJoLeUYOM18zSUnpGDZcXZt2r2xXLXH4pagC10Cj mFzmql1eBrSmpcE+hOvfMn4x5TLZ4QvymIOp+SDlRMI7plGJu1sQzMEl5rsnyHH9UpsE ENOw2y/apXf7mvCwsnt7HAhMOOJ6Yphou0317Yh5Q2Dy0Csbq9jIL5JBBgCHEsTdoKOv F96DvlSB7BY4+2U1e/5JdLxO8wLxgHdKW82NOi0FAS0Tx2VFa/xuddi2AGbi1NUBgD7R lAew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id:dkim-signature; bh=XIEqWJFCXWuOHUykrcllyv7yeS4tw9zdWylKMEFcjy4=; b=NnyYSkZxTPXnw482YkLedF8LslB8eQJx9zC1jsjYWBDfa36cs4F6722/3Csl8/1GO6 ehapvYkd6IR7B+r8VZ/6L83IcFaVcscfP4fYmidgIhgzEXOhNQmT5v69QUvlWeraRZwh BdxzXsyexTlIGUIsIhYPNSLpKQX7tB5XYgRRJn2AxjqwDQISp9XQ+7scDQv6J5hKwrbn +AQ6QgX/GZEPk43WTajdJEzmzvB4e2pY4cOCi20X8Zk1fQhHnXpJDygbrFUaa7GkQVCd Rtde3HbTAnwzWWUNaOZOkYcnKynsYAXPACg8xrKrxhojj0AYxZXZoILM4kb4YeF3Yq1Z 91Kg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=ouW+UUfK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t14-20020a63460e000000b00502fdd173efsi9116820pga.774.2023.03.06.06.55.10; Mon, 06 Mar 2023 06:55:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=ouW+UUfK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230378AbjCFOnZ (ORCPT + 99 others); Mon, 6 Mar 2023 09:43:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40772 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229835AbjCFOnT (ORCPT ); Mon, 6 Mar 2023 09:43:19 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E3951724 for ; Mon, 6 Mar 2023 06:42:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=XIEqWJFCXWuOHUykrcllyv7yeS4tw9zdWylKMEFcjy4=; b=ouW+UUfKkXxl2Nh4mUyaux/y0S yV2Dz/4FAuV2JGWuoKy4ldrohsjXYOmSFr8+a1U6lDwIAdAnHaVCDPg5V6yBSFagKm5JzXchdxJK5 +fGofD4H39r+6LStukk5ow9wRv8Cu6WEEJkuWMqZlAXChzvkQE7bJg02/g7w4DS8XcWsP1N15zBct QeN0vev/rbzBb3kbSjsKkRqTDGF7d2I0S2LsBGH4tRja1mxE8SBd5bt4pLl1rT4hpOGzqLlBaY/sf oVj1voFKpHzH/z+hF/rwYmRc6jgNVFwZdxqDovv+clAD6Qm8fvkkoso/nB0r8m1uXAUbvgTR6YkWr 7mpeII3A==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1pZBe5-005P2L-J3; Mon, 06 Mar 2023 14:16:58 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 188D0300C1E; Mon, 6 Mar 2023 15:16:55 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id 0097520110FF9; Mon, 6 Mar 2023 15:16:54 +0100 (CET) Message-ID: <20230306141502.449738212@infradead.org> User-Agent: quilt/0.66 Date: Mon, 06 Mar 2023 14:25:25 +0100 From: Peter Zijlstra To: mingo@kernel.org, vincent.guittot@linaro.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, juri.lelli@redhat.com, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, corbet@lwn.net, qyousef@layalina.io, chris.hyser@oracle.com, patrick.bellasi@matbug.net, pjt@google.com, pavel@ucw.cz, qperret@google.com, tim.c.chen@linux.intel.com, joshdon@google.com, timj@gnu.org, kprateek.nayak@amd.com, yu.c.chen@intel.com, youssefesmat@chromium.org, joel@joelfernandes.org Subject: [PATCH 04/10] sched/fair: Add latency_offset References: <20230306132521.968182689@infradead.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759630614859698950?= X-GMAIL-MSGID: =?utf-8?q?1759630614859698950?= From: Vincent Guittot XXX fold back into previous patches Murdered-by: Peter Zijlstra (Intel) Signed-off-by: Vincent Guittot Signed-off-by: Peter Zijlstra (Intel) Tested-by: K Prateek Nayak --- include/linux/sched.h | 4 +++- include/linux/sched/prio.h | 9 +++++++++ init/init_task.c | 2 +- kernel/sched/core.c | 21 ++++++++++++++++----- kernel/sched/debug.c | 2 +- kernel/sched/fair.c | 8 ++++++++ kernel/sched/sched.h | 2 ++ 7 files changed, 40 insertions(+), 8 deletions(-) --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -568,6 +568,8 @@ struct sched_entity { /* cached value of my_q->h_nr_running */ unsigned long runnable_weight; #endif + /* preemption offset in ns */ + long latency_offset; #ifdef CONFIG_SMP /* @@ -784,7 +786,7 @@ struct task_struct { int static_prio; int normal_prio; unsigned int rt_priority; - int latency_nice; + int latency_prio; struct sched_entity se; struct sched_rt_entity rt; --- a/include/linux/sched/prio.h +++ b/include/linux/sched/prio.h @@ -59,5 +59,14 @@ static inline long rlimit_to_nice(long p * Default tasks should be treated as a task with latency_nice = 0. */ #define DEFAULT_LATENCY_NICE 0 +#define DEFAULT_LATENCY_PRIO (DEFAULT_LATENCY_NICE + LATENCY_NICE_WIDTH/2) + +/* + * Convert user-nice values [ -20 ... 0 ... 19 ] + * to static latency [ 0..39 ], + * and back. + */ +#define NICE_TO_LATENCY(nice) ((nice) + DEFAULT_LATENCY_PRIO) +#define LATENCY_TO_NICE(prio) ((prio) - DEFAULT_LATENCY_PRIO) #endif /* _LINUX_SCHED_PRIO_H */ --- a/init/init_task.c +++ b/init/init_task.c @@ -78,7 +78,7 @@ struct task_struct init_task .prio = MAX_PRIO - 20, .static_prio = MAX_PRIO - 20, .normal_prio = MAX_PRIO - 20, - .latency_nice = DEFAULT_LATENCY_NICE, + .latency_prio = DEFAULT_LATENCY_PRIO, .policy = SCHED_NORMAL, .cpus_ptr = &init_task.cpus_mask, .user_cpus_ptr = NULL, --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1285,6 +1285,11 @@ static void set_load_weight(struct task_ } } +static void set_latency_offset(struct task_struct *p) +{ + p->se.latency_offset = calc_latency_offset(p->latency_prio); +} + #ifdef CONFIG_UCLAMP_TASK /* * Serializes updates of utilization clamp values @@ -4433,6 +4438,8 @@ static void __sched_fork(unsigned long c p->se.vruntime = 0; INIT_LIST_HEAD(&p->se.group_node); + set_latency_offset(p); + #ifdef CONFIG_FAIR_GROUP_SCHED p->se.cfs_rq = NULL; #endif @@ -4684,7 +4691,9 @@ int sched_fork(unsigned long clone_flags p->prio = p->normal_prio = p->static_prio; set_load_weight(p, false); - p->latency_nice = DEFAULT_LATENCY_NICE; + p->latency_prio = NICE_TO_LATENCY(0); + set_latency_offset(p); + /* * We don't need the reset flag anymore after the fork. It has * fulfilled its duty: @@ -7456,8 +7465,10 @@ static void __setscheduler_params(struct static void __setscheduler_latency(struct task_struct *p, const struct sched_attr *attr) { - if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE) - p->latency_nice = attr->sched_latency_nice; + if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE) { + p->latency_prio = NICE_TO_LATENCY(attr->sched_latency_nice); + set_latency_offset(p); + } } /* @@ -7642,7 +7653,7 @@ static int __sched_setscheduler(struct t if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP) goto change; if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE && - attr->sched_latency_nice != p->latency_nice) + attr->sched_latency_nice != LATENCY_TO_NICE(p->latency_prio)) goto change; p->sched_reset_on_fork = reset_on_fork; @@ -8183,7 +8194,7 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pi get_params(p, &kattr); kattr.sched_flags &= SCHED_FLAG_ALL; - kattr.sched_latency_nice = p->latency_nice; + kattr.sched_latency_nice = LATENCY_TO_NICE(p->latency_prio); #ifdef CONFIG_UCLAMP_TASK /* --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -1043,7 +1043,7 @@ void proc_sched_show_task(struct task_st #endif P(policy); P(prio); - P(latency_nice); + P(latency_prio); if (task_has_dl_policy(p)) { P(dl.runtime); P(dl.deadline); --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -703,6 +703,14 @@ int sched_update_scaling(void) } #endif +long calc_latency_offset(int prio) +{ + u32 weight = sched_prio_to_weight[prio]; + u64 base = sysctl_sched_min_granularity; + + return div_u64(base << SCHED_FIXEDPOINT_SHIFT, weight); +} + /* * delta /= w */ --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2475,6 +2475,8 @@ extern unsigned int sysctl_numa_balancin extern unsigned int sysctl_numa_balancing_hot_threshold; #endif +extern long calc_latency_offset(int prio); + #ifdef CONFIG_SCHED_HRTICK /* From patchwork Mon Mar 6 13:25:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Zijlstra X-Patchwork-Id: 64737 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1882359wrd; Mon, 6 Mar 2023 06:51:00 -0800 (PST) X-Google-Smtp-Source: AK7set/9Sjl3lqlu0a7HhR4JeICOJe5E30xSZ9vcOkSDG8WWLezEiqyLVVRMLL0QG0BW9qCNK6Pm X-Received: by 2002:a62:1bd0:0:b0:5e1:6a3b:61a1 with SMTP id b199-20020a621bd0000000b005e16a3b61a1mr9407370pfb.34.1678114260312; Mon, 06 Mar 2023 06:51:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678114260; cv=none; d=google.com; s=arc-20160816; b=XmhfZmRAV/bp8gPqPnbV0MdSwKSHTPdukwCmMar5YJ7J6yN0QZcoCWjBuXfPZTsVlQ DfjMW+NK05wwDm8trEqkB2RMPD08Pjb02u20qN1uoW1Y9G65bYAAL/GbOetx7xYRCyyW RLPbE3zVkxhxhFy6a2wKBqqzjfzFOD2UpPPAzhFAkI62V4Bd6pgxLF2N+m2leOmOx+Xa aOWLN7lutFgdj3jknyoluxn6pvEoxUI6LbqS/OnM2oH7NmwWbr3Zq/Os356gXwOcdetx ikZEDcwmGkjQxc3Ej/IlUMoyBzlHIYZmYasJEfOW8Ynblm4zRRTy40XCCa6hMkdU2Egk DvdQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id:dkim-signature; bh=zNzXNO/aP0n4xSfGf9HPLQL+y8L78Cptdmz+WNPX24w=; b=b3tApTCxpl90cH/dQBzE0GMRnEkJPpOacvma7lltDTCMZh0bROxRtqtFMDm0GWLuws vL89yIqvkIZF5SJ96ERXHMJVHyKjlgcdjiU83bz0WjOM9K+QSwWDXXF1A3494WyTz9S8 35r+dHa2bWmaXrnOLf7p/Z1oCKC4007lpkwtbrrrTlLc3Ri+kMWLVqEh2Ws/dkHE82MT wUOsNpdDdU0Z0id7QU0M1BlRJFY5KPjzYanHF6vubfjZE2jxYWgnjJO0jZWzfHffuIoo L4oMMTQNRod5Q/fL+h87TbIHNli0KmDui70PQEN37NorHMW5hqJtIz7Pqq0fR0S4844I lBTg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=c3a4LgDm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q15-20020aa7960f000000b00593ebc7a6c6si10095600pfg.162.2023.03.06.06.50.48; Mon, 06 Mar 2023 06:51:00 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=c3a4LgDm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230296AbjCFOUq (ORCPT + 99 others); Mon, 6 Mar 2023 09:20:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55932 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231440AbjCFOTz (ORCPT ); Mon, 6 Mar 2023 09:19:55 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0D6C44490 for ; Mon, 6 Mar 2023 06:19:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=zNzXNO/aP0n4xSfGf9HPLQL+y8L78Cptdmz+WNPX24w=; b=c3a4LgDmW6Q7GvabSbISGwHqQX OR99apTT0J9EeeCCuzYMn6NVspJpaoKKQ9yhLZTT7MoyJNnGpz8w3mqDOjfjXppU6dYI7rVGG1j2a 7KKb5rSKX2to7yc3r3vHSl7q6s8/0W5WbLfTpkYU/rViiJBKHHqCIXpZxUHcBjrHFFRKve/HCEjd4 aaSMlNgf6LTttK60hiDRSl4a4VOJ7axz55g8wOO7a17szce2tRER7VxD7sAwycrMcKzRCVcW3D31j BtCqWDhcvJK45k5bf1+6XgVAOo+e4aDZABBF/i0KVmbGm4z9o1lzAHA0tPPF0+kvY7ZLfGYRStG+h SEblR9Xw==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1pZBe6-005P2f-Az; Mon, 06 Mar 2023 14:16:58 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 890F6301335; Mon, 6 Mar 2023 15:16:56 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id 0458023BC8E2C; Mon, 6 Mar 2023 15:16:55 +0100 (CET) Message-ID: <20230306141502.510052938@infradead.org> User-Agent: quilt/0.66 Date: Mon, 06 Mar 2023 14:25:26 +0100 From: Peter Zijlstra To: mingo@kernel.org, vincent.guittot@linaro.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, juri.lelli@redhat.com, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, corbet@lwn.net, qyousef@layalina.io, chris.hyser@oracle.com, patrick.bellasi@matbug.net, pjt@google.com, pavel@ucw.cz, qperret@google.com, tim.c.chen@linux.intel.com, joshdon@google.com, timj@gnu.org, kprateek.nayak@amd.com, yu.c.chen@intel.com, youssefesmat@chromium.org, joel@joelfernandes.org Subject: [PATCH 05/10] sched/fair: Add sched group latency support References: <20230306132521.968182689@infradead.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759630338786145000?= X-GMAIL-MSGID: =?utf-8?q?1759630338786145000?= From: Vincent Guittot Task can set its latency priority with sched_setattr(), which is then used to set the latency offset of its sched_enity, but sched group entities still have the default latency offset value. Add a latency.nice field in cpu cgroup controller to set the latency priority of the group similarly to sched_setattr(). The latency priority is then used to set the offset of the sched_entities of the group. Signed-off-by: Vincent Guittot Signed-off-by: Peter Zijlstra (Intel) Tested-by: K Prateek Nayak Link: https://lkml.kernel.org/r/20230224093454.956298-7-vincent.guittot@linaro.org --- Documentation/admin-guide/cgroup-v2.rst | 10 ++++++++++ kernel/sched/core.c | 30 ++++++++++++++++++++++++++++++ kernel/sched/fair.c | 32 ++++++++++++++++++++++++++++++++ kernel/sched/sched.h | 4 ++++ 4 files changed, 76 insertions(+) --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1121,6 +1121,16 @@ All time durations are in microseconds. values similar to the sched_setattr(2). This maximum utilization value is used to clamp the task specific maximum utilization clamp. + cpu.latency.nice + A read-write single value file which exists on non-root + cgroups. The default is "0". + + The nice value is in the range [-20, 19]. + + This interface file allows reading and setting latency using the + same values used by sched_setattr(2). The latency_nice of a group is + used to limit the impact of the latency_nice of a task outside the + group. Memory --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -11068,6 +11068,25 @@ static int cpu_idle_write_s64(struct cgr { return sched_group_set_idle(css_tg(css), idle); } + +static s64 cpu_latency_nice_read_s64(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + return LATENCY_TO_NICE(css_tg(css)->latency_prio); +} + +static int cpu_latency_nice_write_s64(struct cgroup_subsys_state *css, + struct cftype *cft, s64 nice) +{ + int prio; + + if (nice < MIN_LATENCY_NICE || nice > MAX_LATENCY_NICE) + return -ERANGE; + + prio = NICE_TO_LATENCY(nice); + + return sched_group_set_latency(css_tg(css), prio); +} #endif static struct cftype cpu_legacy_files[] = { @@ -11082,6 +11101,11 @@ static struct cftype cpu_legacy_files[] .read_s64 = cpu_idle_read_s64, .write_s64 = cpu_idle_write_s64, }, + { + .name = "latency.nice", + .read_s64 = cpu_latency_nice_read_s64, + .write_s64 = cpu_latency_nice_write_s64, + }, #endif #ifdef CONFIG_CFS_BANDWIDTH { @@ -11299,6 +11323,12 @@ static struct cftype cpu_files[] = { .read_s64 = cpu_idle_read_s64, .write_s64 = cpu_idle_write_s64, }, + { + .name = "latency.nice", + .flags = CFTYPE_NOT_ON_ROOT, + .read_s64 = cpu_latency_nice_read_s64, + .write_s64 = cpu_latency_nice_write_s64, + }, #endif #ifdef CONFIG_CFS_BANDWIDTH { --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -12264,6 +12264,7 @@ int alloc_fair_sched_group(struct task_g goto err; tg->shares = NICE_0_LOAD; + tg->latency_prio = DEFAULT_LATENCY_PRIO; init_cfs_bandwidth(tg_cfs_bandwidth(tg)); @@ -12362,6 +12363,9 @@ void init_tg_cfs_entry(struct task_group } se->my_q = cfs_rq; + + se->latency_offset = calc_latency_offset(tg->latency_prio); + /* guarantee group entities always have weight */ update_load_set(&se->load, NICE_0_LOAD); se->parent = parent; @@ -12490,6 +12494,34 @@ int sched_group_set_idle(struct task_gro mutex_unlock(&shares_mutex); return 0; +} + +int sched_group_set_latency(struct task_group *tg, int prio) +{ + long latency_offset; + int i; + + if (tg == &root_task_group) + return -EINVAL; + + mutex_lock(&shares_mutex); + + if (tg->latency_prio == prio) { + mutex_unlock(&shares_mutex); + return 0; + } + + tg->latency_prio = prio; + latency_offset = calc_latency_offset(prio); + + for_each_possible_cpu(i) { + struct sched_entity *se = tg->se[i]; + + WRITE_ONCE(se->latency_offset, latency_offset); + } + + mutex_unlock(&shares_mutex); + return 0; } #else /* CONFIG_FAIR_GROUP_SCHED */ --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -378,6 +378,8 @@ struct task_group { /* A positive value indicates that this is a SCHED_IDLE group. */ int idle; + /* latency priority of the group. */ + int latency_prio; #ifdef CONFIG_SMP /* @@ -488,6 +490,8 @@ extern int sched_group_set_shares(struct extern int sched_group_set_idle(struct task_group *tg, long idle); +extern int sched_group_set_latency(struct task_group *tg, int prio); + #ifdef CONFIG_SMP extern void set_task_rq_fair(struct sched_entity *se, struct cfs_rq *prev, struct cfs_rq *next); From patchwork Mon Mar 6 13:25:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Zijlstra X-Patchwork-Id: 64730 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1874176wrd; Mon, 6 Mar 2023 06:33:10 -0800 (PST) X-Google-Smtp-Source: AK7set+Cs1qfumc61acJaywmv1xXMv31BfxFkeMWm5bWhhpjWt+5CXSBLJHchDZiJtC8GFV2A60a X-Received: by 2002:a17:907:8747:b0:87f:a197:5666 with SMTP id qo7-20020a170907874700b0087fa1975666mr11866622ejc.5.1678113190483; Mon, 06 Mar 2023 06:33:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678113190; cv=none; d=google.com; s=arc-20160816; b=shQp0V2MOqUyAKr/0iQmQsoS5AcPkF0TjFmKamy+usFxX8YGbdXp8hehAdEMcQAIk9 4cgMmqa8YUfolNHqS9vqABeUYgKblXBaTBvUL+jvjLbTzfJIYOOKf4z+ob8MCW6/ZNkg kCeOQsz12FwrsQ3fXk1EMVdgsF9u3BKjzdaT7Hcw7Y31X/6BVioc1zjtBUdDQcqC6E1k HMwmrNPcUq1TkoWrRqXNDbnHVUXAuWkE98anVkwVkhAG0BDaD8Z11I/QaGIwZMJcB4Nt MYTNx32cCWQSZmos0ChVs01PhMzPl+HwIRnjaEqZU0SOGO+bAINnQfdvhhYRZOyEYZ7O Yqsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id:dkim-signature; bh=2tbxEoCPUrSuQdqvsz5yrww5k1UhZ+IwnZUoRosCkBQ=; b=QDXpgB8F85t80IqpZNx3gBMFM0RqfUckJtwLFjI8aBav0CVluRkchCYaRmiqSaogPq GT6hiElZ02bdFPCg4lhwSyu/W7EYqIMrl8iFpoytl2ZgrTBNjnL6i6kE6xO8m+TzaTvf 4T7Muv1KBW7QgPsQFMzGhg0XCcdYdAPIBnM577Zav8k8G0xDujQgSzRN8KgFWi1NDoQk ddHuv4smsA8tct/ljQTdaTIzzSNmJoTSUThwXRdn9y30VpwAeBZzMc/zABAf+xbczCgM MjkXV+RQrOKuxzxJBNKs6QlvsX+o+Bf1Im2SC3+gZIb8xzq6Im5N+0dz/dUIO33iyWHT USsg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=e04qVoGY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r21-20020aa7cfd5000000b004ace203c6c8si9517083edy.202.2023.03.06.06.32.46; Mon, 06 Mar 2023 06:33:10 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=e04qVoGY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229923AbjCFOUw (ORCPT + 99 others); Mon, 6 Mar 2023 09:20:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55910 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231429AbjCFOTz (ORCPT ); Mon, 6 Mar 2023 09:19:55 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CF8EB30D1 for ; Mon, 6 Mar 2023 06:19:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=2tbxEoCPUrSuQdqvsz5yrww5k1UhZ+IwnZUoRosCkBQ=; b=e04qVoGYbc5wvxNwYFGzNOBVZG D1E3RrsRUYjNsqoy01tsKge5ZagR5z4T8GGkYhTkZ5rOz/OL6m3UdCmgsVfpWGOyV9z71cho5VVBB l16Lvh9D8+TTZIsvZO5Xc6fL/0Tqb1JKXAhz/EJ69KvVW8+giAXIFBRzKm+E0pnCU6dnrrTW9s7Mv UFWtNJYn5tVjb/e6m4XCVz5kesS2TGYbQOopkyXrPl9wXtvk+SXf8ylnLWS6VvPDqHNCYAfsNx8x3 MvLZCAE08LA93ZByL0u4mE/iX3sgPlCn3iO9kHSB3A4FSX5k2EwL7eys96icgTPEuUw4OyLjH5zgd iA1d+9qw==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1pZBe6-005P2g-CQ; Mon, 06 Mar 2023 14:16:58 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 8904E300E10; Mon, 6 Mar 2023 15:16:56 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id 094DD23BC8E2E; Mon, 6 Mar 2023 15:16:55 +0100 (CET) Message-ID: <20230306141502.569748782@infradead.org> User-Agent: quilt/0.66 Date: Mon, 06 Mar 2023 14:25:27 +0100 From: Peter Zijlstra To: mingo@kernel.org, vincent.guittot@linaro.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, juri.lelli@redhat.com, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, corbet@lwn.net, qyousef@layalina.io, chris.hyser@oracle.com, patrick.bellasi@matbug.net, pjt@google.com, pavel@ucw.cz, qperret@google.com, tim.c.chen@linux.intel.com, joshdon@google.com, timj@gnu.org, kprateek.nayak@amd.com, yu.c.chen@intel.com, youssefesmat@chromium.org, joel@joelfernandes.org Subject: [PATCH 06/10] sched/fair: Add avg_vruntime References: <20230306132521.968182689@infradead.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759629217113553612?= X-GMAIL-MSGID: =?utf-8?q?1759629217113553612?= In order to move to an eligibility based scheduling policy it is needed to have a better approximation of the ideal scheduler. Specifically, for a virtual time weighted fair queueing based scheduler the ideal scheduler will be the weighted average of the individual virtual runtimes (math in the comment). As such, compute the weighted average to approximate the ideal scheduler -- note that the approximation is in the individual task behaviour, which isn't strictly conformant. Specifically consider adding a task with a vruntime left of center, in this case the average will move backwards in time -- something the ideal scheduler would of course never do. [ Note: try and replace cfs_rq::avg_load with cfs_rq->load.weight, the conditions are slightly different but should be possible ] Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/debug.c | 32 +++++++-------- kernel/sched/fair.c | 105 +++++++++++++++++++++++++++++++++++++++++++++++++-- kernel/sched/sched.h | 5 ++ 3 files changed, 122 insertions(+), 20 deletions(-) --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -580,10 +580,9 @@ static void print_rq(struct seq_file *m, void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq) { - s64 MIN_vruntime = -1, min_vruntime, max_vruntime = -1, - spread, rq0_min_vruntime, spread0; + s64 left_vruntime = -1, min_vruntime, right_vruntime = -1, spread; + struct sched_entity *last, *first; struct rq *rq = cpu_rq(cpu); - struct sched_entity *last; unsigned long flags; #ifdef CONFIG_FAIR_GROUP_SCHED @@ -597,26 +596,25 @@ void print_cfs_rq(struct seq_file *m, in SPLIT_NS(cfs_rq->exec_clock)); raw_spin_rq_lock_irqsave(rq, flags); - if (rb_first_cached(&cfs_rq->tasks_timeline)) - MIN_vruntime = (__pick_first_entity(cfs_rq))->vruntime; + first = __pick_first_entity(cfs_rq); + if (first) + left_vruntime = first->vruntime; last = __pick_last_entity(cfs_rq); if (last) - max_vruntime = last->vruntime; + right_vruntime = last->vruntime; min_vruntime = cfs_rq->min_vruntime; - rq0_min_vruntime = cpu_rq(0)->cfs.min_vruntime; raw_spin_rq_unlock_irqrestore(rq, flags); - SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "MIN_vruntime", - SPLIT_NS(MIN_vruntime)); + + SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "left_vruntime", + SPLIT_NS(left_vruntime)); SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "min_vruntime", SPLIT_NS(min_vruntime)); - SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "max_vruntime", - SPLIT_NS(max_vruntime)); - spread = max_vruntime - MIN_vruntime; - SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "spread", - SPLIT_NS(spread)); - spread0 = min_vruntime - rq0_min_vruntime; - SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "spread0", - SPLIT_NS(spread0)); + SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "avg_vruntime", + SPLIT_NS(avg_vruntime(cfs_rq))); + SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "right_vruntime", + SPLIT_NS(right_vruntime)); + spread = right_vruntime - left_vruntime; + SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "spread", SPLIT_NS(spread)); SEQ_printf(m, " .%-30s: %d\n", "nr_spread_over", cfs_rq->nr_spread_over); SEQ_printf(m, " .%-30s: %d\n", "nr_running", cfs_rq->nr_running); --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -601,9 +601,102 @@ static inline bool entity_before(const s return (s64)(a->vruntime - b->vruntime) < 0; } +static inline s64 entity_key(struct cfs_rq *cfs_rq, struct sched_entity *se) +{ + return (s64)(se->vruntime - cfs_rq->min_vruntime); +} + #define __node_2_se(node) \ rb_entry((node), struct sched_entity, run_node) +/* + * Compute virtual time from the per-task service numbers: + * + * Fair schedulers conserve lag: \Sum lag_i = 0 + * + * lag_i = S - s_i = w_i * (V - v_i) + * + * \Sum lag_i = 0 -> \Sum w_i * (V - v_i) = V * \Sum w_i - \Sum w_i * v_i = 0 + * + * From which we solve V: + * + * \Sum v_i * w_i + * V = -------------- + * \Sum w_i + * + * However, since v_i is u64, and the multiplcation could easily overflow + * transform it into a relative form that uses smaller quantities: + * + * Substitute: v_i == (v_i - v) + v + * + * \Sum ((v_i - v) + v) * w_i \Sum (v_i - v) * w_i + * V = -------------------------- = -------------------- + v + * \Sum w_i \Sum w_i + * + * min_vruntime = v + * avg_vruntime = \Sum (v_i - v) * w_i + * cfs_rq->load = \Sum w_i + * + * Since min_vruntime is a monotonic increasing variable that closely tracks + * the per-task service, these deltas: (v_i - v), will be in the order of the + * maximal (virtual) lag induced in the system due to quantisation. + */ +static void +avg_vruntime_add(struct cfs_rq *cfs_rq, struct sched_entity *se) +{ + s64 key = entity_key(cfs_rq, se); + cfs_rq->avg_vruntime += key * se->load.weight; + cfs_rq->avg_load += se->load.weight; +} + +static void +avg_vruntime_sub(struct cfs_rq *cfs_rq, struct sched_entity *se) +{ + s64 key = entity_key(cfs_rq, se); + cfs_rq->avg_vruntime -= key * se->load.weight; + cfs_rq->avg_load -= se->load.weight; +} + +static inline +void avg_vruntime_update(struct cfs_rq *cfs_rq, s64 delta) +{ + /* + * v' = v + d ==> avg_vruntime' = avg_runtime - d*avg_load + */ + cfs_rq->avg_vruntime -= cfs_rq->avg_load * delta; +} + +u64 avg_vruntime(struct cfs_rq *cfs_rq) +{ + struct sched_entity *curr = cfs_rq->curr; + s64 lag = cfs_rq->avg_vruntime; + long load = cfs_rq->avg_load; + + if (curr && curr->on_rq) { + lag += entity_key(cfs_rq, curr) * curr->load.weight; + load += curr->load.weight; + } + + if (load) + lag = div_s64(lag, load); + + return cfs_rq->min_vruntime + lag; +} + +static u64 __update_min_vruntime(struct cfs_rq *cfs_rq, u64 vruntime) +{ + u64 min_vruntime = cfs_rq->min_vruntime; + /* + * open coded max_vruntime() to allow updating avg_vruntime + */ + s64 delta = (s64)(vruntime - min_vruntime); + if (delta > 0) { + avg_vruntime_update(cfs_rq, delta); + min_vruntime = vruntime; + } + return min_vruntime; +} + static void update_min_vruntime(struct cfs_rq *cfs_rq) { struct sched_entity *curr = cfs_rq->curr; @@ -629,7 +722,7 @@ static void update_min_vruntime(struct c /* ensure we never gain time by being placed backwards. */ u64_u32_store(cfs_rq->min_vruntime, - max_vruntime(cfs_rq->min_vruntime, vruntime)); + __update_min_vruntime(cfs_rq, vruntime)); } static inline bool __entity_less(struct rb_node *a, const struct rb_node *b) @@ -642,12 +735,14 @@ static inline bool __entity_less(struct */ static void __enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se) { + avg_vruntime_add(cfs_rq, se); rb_add_cached(&se->run_node, &cfs_rq->tasks_timeline, __entity_less); } static void __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se) { rb_erase_cached(&se->run_node, &cfs_rq->tasks_timeline); + avg_vruntime_sub(cfs_rq, se); } struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq) @@ -3330,6 +3425,8 @@ static void reweight_entity(struct cfs_r /* commit outstanding execution time */ if (cfs_rq->curr == se) update_curr(cfs_rq); + else + avg_vruntime_sub(cfs_rq, se); update_load_sub(&cfs_rq->load, se->load.weight); } dequeue_load_avg(cfs_rq, se); @@ -3345,9 +3442,11 @@ static void reweight_entity(struct cfs_r #endif enqueue_load_avg(cfs_rq, se); - if (se->on_rq) + if (se->on_rq) { update_load_add(&cfs_rq->load, se->load.weight); - + if (cfs_rq->curr != se) + avg_vruntime_add(cfs_rq, se); + } } void reweight_task(struct task_struct *p, int prio) --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -558,6 +558,9 @@ struct cfs_rq { unsigned int idle_nr_running; /* SCHED_IDLE */ unsigned int idle_h_nr_running; /* SCHED_IDLE */ + s64 avg_vruntime; + u64 avg_load; + u64 exec_clock; u64 min_vruntime; #ifdef CONFIG_SCHED_CORE @@ -3312,4 +3315,6 @@ static inline void switch_mm_cid(struct static inline void switch_mm_cid(struct task_struct *prev, struct task_struct *next) { } #endif +extern u64 avg_vruntime(struct cfs_rq *cfs_rq); + #endif /* _KERNEL_SCHED_SCHED_H */ From patchwork Mon Mar 6 13:25:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Zijlstra X-Patchwork-Id: 64758 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1890850wrd; Mon, 6 Mar 2023 07:05:25 -0800 (PST) X-Google-Smtp-Source: AK7set9fDin27KjXJEc4AcyHKG9Tgx+KhfSrdlLJI3Cx+lYOGe1FDyllj3xwMSqaW7wLByLpxQKl X-Received: by 2002:a17:90a:1d1:b0:237:5834:294b with SMTP id 17-20020a17090a01d100b002375834294bmr12055315pjd.41.1678115124726; Mon, 06 Mar 2023 07:05:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678115124; cv=none; d=google.com; s=arc-20160816; b=akbir6VgGZ4MVMYrOqw+pOteO6SN4Uvpnr0tFiVM7sNmZFhdT2cbc5gdzZPW63EAy3 ht/pBiVWzIfxyGGMUYt3515qMCnjhKCEp2VssP3jMghQvxQ5rQ8vGn1dq5K8oKe9eJnn 7lawoWpFTgHy3ZMzl3TCQG9eH3i63FS0Nfu78kTHFceVlKgTW6P0o0sUyLxnAHL/6hSG jceqd9s0KMdKcm8B4ZXEi8LzavfzBF8O4plSaMUgnXxVJlb7CQnNs8wGkA9FiVuo6ZLo RlaH8XP8oDyXp4BWLkX1aqNYcKU7NBYwCUMf4voDfi9NcJXM4WKIgudSkC2UDhtss8qD 6l8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id:dkim-signature; bh=y1hHq06IVFLgCmtHDhph2FprsRfmZL4g/B58zFOJZbs=; b=j0gzo8pJ1SWg5kNduS/3UvdDbOEnRYDMwvbgYWi2Ljpz7HfK11gf4crOg3HkRqHkeB 7LhZx1UJDAX5hBi7pVkKKyBIeJw2mN9cZL2ON0d6DWw6YzUhkwkAv/f6G2oLLxAPw1AR ex9s5qJY3YQLW56ER7ClxJ5/SsmIoekftR/gdmIBFv2NHvVeKMIWao5XjrCx/4D0gq/o EGeXFO82UCmhc3nGkFQP2ExPIyQTP7aSO/7Gr5hPSrzY30oxcDU8/M4w21RJgr3EKnDq M+i57xND1EMOEAClJYaUrw/mYn79RcJTeRvTft6j3iXorqG1pEnNNEQCYDfrkhiZnste ZmeQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=desiato.20200630 header.b=PAl8trZm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lw4-20020a17090b180400b002298836bbecsi13239480pjb.171.2023.03.06.07.05.12; Mon, 06 Mar 2023 07:05:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=desiato.20200630 header.b=PAl8trZm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230106AbjCFOUg (ORCPT + 99 others); Mon, 6 Mar 2023 09:20:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55946 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230100AbjCFOTh (ORCPT ); Mon, 6 Mar 2023 09:19:37 -0500 Received: from desiato.infradead.org (desiato.infradead.org [IPv6:2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7A04132537 for ; Mon, 6 Mar 2023 06:18:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=y1hHq06IVFLgCmtHDhph2FprsRfmZL4g/B58zFOJZbs=; b=PAl8trZmAjkcEhb0ETt1i9sGlQ 8neeuXssu8wTHg8BFAGzIqeFQTOdGaMRxeR0N0EX6JIuI0Anc4rACpaH7P0Byr6RoOtOqDLTin5wi Z7nsRNktQx+VKfpXM7yENuDtWan9NbHnr7BaVDcZC+DuLN+HTDoz1RwSnSUWkBLOJYY0cSn1NxlGc ZROjYnYUgV/qig+cNIpsLm2MZR6fx1RNmloHl/H3GmHXwx97oeLU9IrY5m5r6WOB9347w/VQLgq62 kMR4vdZ7sz5ZN4B3sfcfLWKinWFPIC+Z/MhfBDbt7Tdx+Ypl8C0bNkiPcIid6PZwcJoIGobHAz+SX 2UoC0weA==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1pZBe6-00GhOy-2X; Mon, 06 Mar 2023 14:16:58 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 89113301C5D; Mon, 6 Mar 2023 15:16:56 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id 0CF4A23BC8E31; Mon, 6 Mar 2023 15:16:55 +0100 (CET) Message-ID: <20230306141502.630872931@infradead.org> User-Agent: quilt/0.66 Date: Mon, 06 Mar 2023 14:25:28 +0100 From: Peter Zijlstra To: mingo@kernel.org, vincent.guittot@linaro.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, juri.lelli@redhat.com, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, corbet@lwn.net, qyousef@layalina.io, chris.hyser@oracle.com, patrick.bellasi@matbug.net, pjt@google.com, pavel@ucw.cz, qperret@google.com, tim.c.chen@linux.intel.com, joshdon@google.com, timj@gnu.org, kprateek.nayak@amd.com, yu.c.chen@intel.com, youssefesmat@chromium.org, joel@joelfernandes.org Subject: [PATCH 07/10] sched/fair: Remove START_DEBIT References: <20230306132521.968182689@infradead.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DIET_1,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759631244760928815?= X-GMAIL-MSGID: =?utf-8?q?1759631244760928815?= With the introduction of avg_vruntime() there is no need to use worse approximations. Take the 0-lag point as starting point for inserting new tasks. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/fair.c | 21 +-------------------- kernel/sched/features.h | 6 ------ 2 files changed, 1 insertion(+), 26 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -882,16 +882,6 @@ static u64 sched_slice(struct cfs_rq *cf return slice; } -/* - * We calculate the vruntime slice of a to-be-inserted task. - * - * vs = s/w - */ -static u64 sched_vslice(struct cfs_rq *cfs_rq, struct sched_entity *se) -{ - return calc_delta_fair(sched_slice(cfs_rq, se), se); -} - #include "pelt.h" #ifdef CONFIG_SMP @@ -4758,18 +4748,9 @@ static void check_spread(struct cfs_rq * static void place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial) { - u64 vruntime = cfs_rq->min_vruntime; + u64 vruntime = avg_vruntime(cfs_rq); u64 sleep_time; - /* - * The 'current' period is already promised to the current tasks, - * however the extra weight of the new task will slow them down a - * little, place the new task so that it fits in the slot that - * stays open at the end. - */ - if (initial && sched_feat(START_DEBIT)) - vruntime += sched_vslice(cfs_rq, se); - /* sleeps up to a single latency don't count. */ if (!initial) { unsigned long thresh; --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -7,12 +7,6 @@ SCHED_FEAT(GENTLE_FAIR_SLEEPERS, true) /* - * Place new tasks ahead so that they do not starve already running - * tasks - */ -SCHED_FEAT(START_DEBIT, true) - -/* * Prefer to schedule the task we woke last (assuming it failed * wakeup-preemption), since its likely going to consume data we * touched, increases cache locality. From patchwork Mon Mar 6 13:25:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Zijlstra X-Patchwork-Id: 64757 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1890848wrd; Mon, 6 Mar 2023 07:05:24 -0800 (PST) X-Google-Smtp-Source: AK7set/x6JFbE1hhSfzA3Un2d7Jw8Lrfq/JxTzAsoXyk7Huy9gFc0Wc/7kdNNN9fLzSh7js5dgNh X-Received: by 2002:a17:902:8307:b0:19e:762b:9d49 with SMTP id bd7-20020a170902830700b0019e762b9d49mr9558467plb.7.1678115124630; Mon, 06 Mar 2023 07:05:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678115124; cv=none; d=google.com; s=arc-20160816; b=DSy5r5qF6VWhNPEuNgRvYNs+rxFTp3Nm5p+NF3ibNSD2/pHNMjRkt/RWdKH4Mqb6yU TH9in7GnqaSCnO/tAmDLR5GktouMOknOyoiZg1j0qLMuNtQqaobbR5KafAFJYjnpaAjZ YE9ofrLtgVu06z69nAgjWNkJqerv6JZ2rPlS/0C6kdm6MvhV5GMAPh7mugUrtIY49ycy yVEw5K7IV4XE0K+gUBjMPPL6tZPvLqJ85xLTMOVztboERQ+mUFzObtfNHJNz/njEYMXe kmL191b617BD01i9Cq2F2kaZCLIEqeE8iyk5h0MuyFp5pqWTVRFcilgQpC6phEfz76sY 1Krw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id:dkim-signature; bh=kChkAIziN/l/ohkn0x/Uv70FRWWM1V/aqgty+8Q0xK0=; b=b8OfKzPQ4g1O3gAu5DnSF0PHKWUAT9BiHQDQHB0NUcOH9dcl0g4kUaK/HUjjauHlgi B3iOgL1y1bm/PE/lq8ZmEY7LSYTdqKxMjg2/mmgXFFeLycAjZHa8MW49lmTeTzeuQXrB l8kzUllGL++xayUw05ozAlnAcjjTnANhY2P73Va7Zsf1GIiQWHnpReZTYF93a1rEXvNy 6EAaYGikg32fz9hVgIK7ZIvO5+TftNOUPj4e6efMexKw9OIB7zcg7+2jiurfkeYYalP2 vcvaYSVeFAyw53XRdy526rGv7iq0E9u1tqhF+6vJEkYCV7vEB782atoz7JlxMxYbXuNN NPYg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=DKIXdglK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x189-20020a6386c6000000b00503a2ab160esi9085740pgd.553.2023.03.06.07.05.12; Mon, 06 Mar 2023 07:05:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=DKIXdglK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230400AbjCFOnC (ORCPT + 99 others); Mon, 6 Mar 2023 09:43:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39636 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230366AbjCFOmw (ORCPT ); Mon, 6 Mar 2023 09:42:52 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5FEEB360AA for ; Mon, 6 Mar 2023 06:42:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=kChkAIziN/l/ohkn0x/Uv70FRWWM1V/aqgty+8Q0xK0=; b=DKIXdglKYd7SsFfYO3jfvX9q6t Jg0/Crz9CCV8YAgZ7ZXBfQQlzq8nd+f/wihT2HToGSsSPBeuq8jsMT8HdTbdZUGItrHIfanEkZoWs 3j0FMqKhNkyIz1466pcchZqhekF9Q+GVq898noqvv/NoSp4dLchLNe0UgEBfygggduyHr8Uq++OgQ KHdkjfMvK2Znj2d/f8xljlPHgYvZ/V21JkhYXB7bdBQiK1yJqsImWROVpWiC8apFj+PBnsxjnG9BD 1rGrfBPPjehktc9W5YuV1rcTM4mlX5esM+lZSMLKKNcixPFsPLZHbWEVsjs0mfQJrcGkKiUXy3u3B 2ALQ81Kg==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1pZBe6-005P2b-89; Mon, 06 Mar 2023 14:16:58 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 8908C3010ED; Mon, 6 Mar 2023 15:16:56 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id 1142823BC8E32; Mon, 6 Mar 2023 15:16:55 +0100 (CET) Message-ID: <20230306141502.691294694@infradead.org> User-Agent: quilt/0.66 Date: Mon, 06 Mar 2023 14:25:29 +0100 From: Peter Zijlstra To: mingo@kernel.org, vincent.guittot@linaro.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, juri.lelli@redhat.com, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, corbet@lwn.net, qyousef@layalina.io, chris.hyser@oracle.com, patrick.bellasi@matbug.net, pjt@google.com, pavel@ucw.cz, qperret@google.com, tim.c.chen@linux.intel.com, joshdon@google.com, timj@gnu.org, kprateek.nayak@amd.com, yu.c.chen@intel.com, youssefesmat@chromium.org, joel@joelfernandes.org Subject: [PATCH 08/10] sched/fair: Add lag based placement References: <20230306132521.968182689@infradead.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759631244621587623?= X-GMAIL-MSGID: =?utf-8?q?1759631244621587623?= With the introduction of avg_vruntime, it is possible to approximate lag (the entire purpose of introducing it in fact). Use this to do lag based placement over sleep+wake. Specifically, the FAIR_SLEEPERS thing places things too far to the left and messes up the deadline aspect of EEVDF. Signed-off-by: Peter Zijlstra (Intel) --- include/linux/sched.h | 1 kernel/sched/core.c | 1 kernel/sched/fair.c | 63 +++++++++++++++++++++++++++--------------------- kernel/sched/features.h | 8 ++++++ 4 files changed, 46 insertions(+), 27 deletions(-) --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -555,6 +555,7 @@ struct sched_entity { u64 sum_exec_runtime; u64 vruntime; u64 prev_sum_exec_runtime; + s64 lag; u64 nr_migrations; --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4436,6 +4436,7 @@ static void __sched_fork(unsigned long c p->se.prev_sum_exec_runtime = 0; p->se.nr_migrations = 0; p->se.vruntime = 0; + p->se.lag = 0; INIT_LIST_HEAD(&p->se.group_node); set_latency_offset(p); --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4749,39 +4749,45 @@ static void place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial) { u64 vruntime = avg_vruntime(cfs_rq); - u64 sleep_time; - /* sleeps up to a single latency don't count. */ - if (!initial) { - unsigned long thresh; - - if (se_is_idle(se)) - thresh = sysctl_sched_min_granularity; - else - thresh = sysctl_sched_latency; + if (sched_feat(FAIR_SLEEPERS)) { + u64 sleep_time; + + /* sleeps up to a single latency don't count. */ + if (!initial) { + unsigned long thresh; + + if (se_is_idle(se)) + thresh = sysctl_sched_min_granularity; + else + thresh = sysctl_sched_latency; + + /* + * Halve their sleep time's effect, to allow + * for a gentler effect of sleepers: + */ + if (sched_feat(GENTLE_FAIR_SLEEPERS)) + thresh >>= 1; + + vruntime -= thresh; + } /* - * Halve their sleep time's effect, to allow - * for a gentler effect of sleepers: + * Pull vruntime of the entity being placed to the base level of + * cfs_rq, to prevent boosting it if placed backwards. If the entity + * slept for a long time, don't even try to compare its vruntime with + * the base as it may be too far off and the comparison may get + * inversed due to s64 overflow. */ - if (sched_feat(GENTLE_FAIR_SLEEPERS)) - thresh >>= 1; - - vruntime -= thresh; + sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start; + if ((s64)sleep_time < 60LL * NSEC_PER_SEC) + vruntime = max_vruntime(se->vruntime, vruntime); } - /* - * Pull vruntime of the entity being placed to the base level of - * cfs_rq, to prevent boosting it if placed backwards. If the entity - * slept for a long time, don't even try to compare its vruntime with - * the base as it may be too far off and the comparison may get - * inversed due to s64 overflow. - */ - sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start; - if ((s64)sleep_time > 60LL * NSEC_PER_SEC) - se->vruntime = vruntime; - else - se->vruntime = max_vruntime(se->vruntime, vruntime); + if (sched_feat(PRESERVE_LAG)) + vruntime -= se->lag; + + se->vruntime = vruntime; } static void check_enqueue_throttle(struct cfs_rq *cfs_rq); @@ -4949,6 +4955,9 @@ dequeue_entity(struct cfs_rq *cfs_rq, st clear_buddies(cfs_rq, se); + if (sched_feat(PRESERVE_LAG) && (flags & DEQUEUE_SLEEP)) + se->lag = avg_vruntime(cfs_rq) - se->vruntime; + if (se != cfs_rq->curr) __dequeue_entity(cfs_rq, se); se->on_rq = 0; --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -1,12 +1,20 @@ /* SPDX-License-Identifier: GPL-2.0 */ + /* * Only give sleepers 50% of their service deficit. This allows * them to run sooner, but does not allow tons of sleepers to * rip the spread apart. */ +SCHED_FEAT(FAIR_SLEEPERS, false) SCHED_FEAT(GENTLE_FAIR_SLEEPERS, true) /* + * Using the avg_vruntime, do the right thing and preserve lag + * across sleep+wake cycles. + */ +SCHED_FEAT(PRESERVE_LAG, true) + +/* * Prefer to schedule the task we woke last (assuming it failed * wakeup-preemption), since its likely going to consume data we * touched, increases cache locality. From patchwork Mon Mar 6 13:25:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Zijlstra X-Patchwork-Id: 64748 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1886547wrd; Mon, 6 Mar 2023 06:59:41 -0800 (PST) X-Google-Smtp-Source: AK7set+/btfoyk8e7mjBJJIPyK/t3yesNqzx+UMinqZ69mmG/sxF2rbGzrKPxbgtoTXhqsys26SP X-Received: by 2002:a17:90b:3812:b0:23a:4875:6e1a with SMTP id mq18-20020a17090b381200b0023a48756e1amr12155851pjb.25.1678114781469; Mon, 06 Mar 2023 06:59:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678114781; cv=none; d=google.com; s=arc-20160816; b=ZzUIsKojsJ+MBAurXbhBt4eYCmt7ORJVMDGTQpgvQ+aUx1yFTm2mM7by+CBH+ALQws WuAShHmxMfZ2bI2ryU/p5LxZsRM0IPwzvTiZXxbisP09+5ov2hN4nItBrvMnYMb0gPno ulT0J5ERqUxxgzuBrEwLEx0WtRMglQkZlh94tZlPo1cz5XbXm6319gTtLh4n6CE8Ewpt 6vDJ0ncJGRUKU5IA7eTkvkAw2WpMWT7QUrN8zPQzzS0W9FEjSVymYnd29MnwxJgQcWhJ EDsDnCBVM84GB1hU3KLXQI/WNvuD183xRRPXIACZ1F5bLxqMTSpJE4bHDTi0GAI/NJ+f 9Riw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id:dkim-signature; bh=2nCdUjbTqHSXIpP/as6UIFPlbvoMHockVe8OM3H17xE=; b=HQ7kajWxZ9ZsP7iLm56jvJLuE8np1HowK+PYRjCXG+RotaZutcHkfRjrn6g3JuBuXm 5Ym+ZJ5DFhVlUxpfDzzAcyefi+Zp7lF8XM5dY5aysfGBT3V8VNwwPl7PeqPTisNTADXK qgorjTVw9ujmNqqdU69tvRbO9uX10y4hMw+aBL0l1F/e1g1eKghDvQ0dTT16ExVvwzTm RlTaSDxJPPmd4bz7tWyiXaqh3n/zepbvQvJpPCyfBAtfaPHieK/1RVklqVbA2fFfm3+g FoBDqCDMiUcXL/pSxqgLlDettC2MGh/8NlZwbWNI2HHs9a5hUwaKoz1RWcfJbfMQUJ0D EoAA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=Mod0KOPp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g4-20020a17090ae58400b00234b3d276afsi8818618pjz.91.2023.03.06.06.59.29; Mon, 06 Mar 2023 06:59:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=Mod0KOPp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231162AbjCFOVI (ORCPT + 99 others); Mon, 6 Mar 2023 09:21:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55910 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231613AbjCFOUV (ORCPT ); Mon, 6 Mar 2023 09:20:21 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2C72E28862 for ; Mon, 6 Mar 2023 06:19:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=2nCdUjbTqHSXIpP/as6UIFPlbvoMHockVe8OM3H17xE=; b=Mod0KOPpiSabgUHOLTx1sle+s2 anx6IYYxBch+n485hQS8W7QsAh3wiEDutIbB1MmycwrsXES4TkHFXVNYRq2M9wDwN09aC020vrFlU lW7pv5pEZhnV7cxx+FJFWUXYP/VXMWB6k/o8zGFRkTNAsInxGRKzKJ/PGma5pjzyE7ieMmW1EQ6hv kiRINAsgZMgUqLhEtmzYJdfKHmvFog2n8oUJ6ojkUeJRb4ONUyLKGOanBZ5QuPVfqI9K93Ol4dGdR Ln3uKdLOA4vM25GKxYtrxmBUaSrvON2J1xvyBIKnIQRqe/6+0WWVLPiFHr41pBqIM2k77WyjlC+GV f3YXMk6Q==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1pZBe6-005P2a-8B; Mon, 06 Mar 2023 14:16:58 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 89030300328; Mon, 6 Mar 2023 15:16:56 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id 18D5C23BC8E2F; Mon, 6 Mar 2023 15:16:55 +0100 (CET) Message-ID: <20230306141502.750918974@infradead.org> User-Agent: quilt/0.66 Date: Mon, 06 Mar 2023 14:25:30 +0100 From: Peter Zijlstra To: mingo@kernel.org, vincent.guittot@linaro.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, juri.lelli@redhat.com, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, corbet@lwn.net, qyousef@layalina.io, chris.hyser@oracle.com, patrick.bellasi@matbug.net, pjt@google.com, pavel@ucw.cz, qperret@google.com, tim.c.chen@linux.intel.com, joshdon@google.com, timj@gnu.org, kprateek.nayak@amd.com, yu.c.chen@intel.com, youssefesmat@chromium.org, joel@joelfernandes.org Subject: [PATCH 09/10] rbtree: Add rb_add_augmented_cached() helper References: <20230306132521.968182689@infradead.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759630885270083254?= X-GMAIL-MSGID: =?utf-8?q?1759630885270083254?= While slightly sub-optimal, updating the augmented data while going down the tree during lookup would be faster -- alas the augment interface does not currently allow for that, provide a generic helper to add a node to an augmented cached tree. Signed-off-by: Peter Zijlstra (Intel) --- include/linux/rbtree_augmented.h | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) --- a/include/linux/rbtree_augmented.h +++ b/include/linux/rbtree_augmented.h @@ -60,6 +60,32 @@ rb_insert_augmented_cached(struct rb_nod rb_insert_augmented(node, &root->rb_root, augment); } +static __always_inline struct rb_node * +rb_add_augmented_cached(struct rb_node *node, struct rb_root_cached *tree, + bool (*less)(struct rb_node *, const struct rb_node *), + const struct rb_augment_callbacks *augment) +{ + struct rb_node **link = &tree->rb_root.rb_node; + struct rb_node *parent = NULL; + bool leftmost = true; + + while (*link) { + parent = *link; + if (less(node, parent)) { + link = &parent->rb_left; + } else { + link = &parent->rb_right; + leftmost = false; + } + } + + rb_link_node(node, parent, link); + augment->propagate(parent, NULL); /* suboptimal */ + rb_insert_augmented_cached(node, tree, leftmost, augment); + + return leftmost ? node : NULL; +} + /* * Template for declaring augmented rbtree callbacks (generic case) * From patchwork Mon Mar 6 13:25:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Zijlstra X-Patchwork-Id: 64752 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1887416wrd; Mon, 6 Mar 2023 07:00:57 -0800 (PST) X-Google-Smtp-Source: AK7set8C44SOIdv4vGH/2qZiPRtOQ3KNe/YNRhLxeM8742sqqf3klmqKBktYfQWaaeOd2oPO2eok X-Received: by 2002:a17:90b:1d88:b0:237:c52f:a54d with SMTP id pf8-20020a17090b1d8800b00237c52fa54dmr15304154pjb.21.1678114857172; Mon, 06 Mar 2023 07:00:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678114857; cv=none; d=google.com; s=arc-20160816; b=ytbzvO55lHdtDEP6oNqalGRThN9Koj1oMOVk6Y2Mbr75712QVkBn7SObQPPag+1Hdo QqzNCQzP6iGJ6YcKYDXmEKuMDkhbH1mCr9xfUKF6Tjiayf5ZHPa0ItpCs36lqSlTuZER ob2T00NZEj//VhdVtL/U2fJ0+LvarIrFg56WjkTXWDoqxEceFTdLarxFUZ+71caLo6qC iOzfaztDlpxe2CZq6gEynhSsgbsKJRvaA92dEbl7dg01xi/qfYLWWs7oYT0wk3Bi9jwu sWtdLs0Hm9hMbR7oQVLHZpj6MAiEO5lvJTX2W3z8cX0Q1ilHMgNLAH25S/NznKRkLPCS ax+g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id:dkim-signature; bh=/yjun6vpYpWqA4dqrl5sx2H+6lfV4e1BW4iWFGQBepI=; b=FlU3yDE/r/D0hg4a63N4G0Ga18bgF0rfR3T2ivrJzutsOoUizrzW8zbhfPuq9jLVp8 PDXWZ6xHTbFrDEAodMZWK2w325S9hffzgJx8lggeFjhW5a885Ae97y+qaYq7LxIAbwwt 45c2oUhMf6a6eTr7BZnsAQqzK4WGjCPaOgTR5Y4ffm+rjzOn2sQfu596EcrMXSsdF2c1 TeX8s4X4R8aCn11GI9CoADkV1URv9E4MFJ5rL6PPMjbDjDZ1amhYNxNhIquwb2GyQVMc RVEbXBY1sevwRQEZvU0Bi8aJ06tbviiC2innrZNsCY4+C/hVeou4gxAg1cvb7QovSb2H PFHQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=l7U5421a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lw4-20020a17090b180400b002298836bbecsi13239480pjb.171.2023.03.06.07.00.27; Mon, 06 Mar 2023 07:00:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=l7U5421a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230170AbjCFOUj (ORCPT + 99 others); Mon, 6 Mar 2023 09:20:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53724 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230199AbjCFOTm (ORCPT ); Mon, 6 Mar 2023 09:19:42 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 53A924202 for ; Mon, 6 Mar 2023 06:18:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=/yjun6vpYpWqA4dqrl5sx2H+6lfV4e1BW4iWFGQBepI=; b=l7U5421aD9xrKYMWHcamAUrvJm 7NRNPm+WtuEplUmwyYZod1CUtQFPipEyuqkp/UINMTrFYNdgN3Pt7imzmm/EmKrydOhogbP7USb57 +nPEFojXPsknUi2rLVSYDbblYj0zgIrwFF1xzq3uY5EjIbEnUhMhEXOnvDrkuqfYc8+hf76hUM6BT HYiBJ7KXIeoyRayy0xauPcVBQGuho//5cwDhK/RqxuYR8Lj2WEVDgm9K0QSuMDema0lHSYyO7VO+o vI6VK4b/NLV44QEJh4HmxSMe3H8b5CkmLOb6XaIM81DjL1EODRllGknRmHLQHWtHgaSAMtkv2BpQu iPIJS0Bg==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1pZBe6-005P2h-D3; Mon, 06 Mar 2023 14:16:59 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 92358302D60; Mon, 6 Mar 2023 15:16:56 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id 1E81523BC8E30; Mon, 6 Mar 2023 15:16:55 +0100 (CET) Message-ID: <20230306141502.810909205@infradead.org> User-Agent: quilt/0.66 Date: Mon, 06 Mar 2023 14:25:31 +0100 From: Peter Zijlstra To: mingo@kernel.org, vincent.guittot@linaro.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, juri.lelli@redhat.com, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, corbet@lwn.net, qyousef@layalina.io, chris.hyser@oracle.com, patrick.bellasi@matbug.net, pjt@google.com, pavel@ucw.cz, qperret@google.com, tim.c.chen@linux.intel.com, joshdon@google.com, timj@gnu.org, kprateek.nayak@amd.com, yu.c.chen@intel.com, youssefesmat@chromium.org, joel@joelfernandes.org Subject: [PATCH 10/10] sched/fair: Implement an EEVDF like policy References: <20230306132521.968182689@infradead.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759630964323567654?= X-GMAIL-MSGID: =?utf-8?q?1759630964323567654?= Where CFS is currently a WFQ based scheduler with only a single knob, the weight. The addition of a second, latency oriented parameter, makes something like WF2Q or EEVDF based a much better fit. Specifically, EEVDF does EDF like scheduling in the left half of the tree -- those entities that are owed service. Except because this is a virtual time scheduler, the deadlines are in virtual time as well, which is what allows over-subscription. EEVDF has two parameters: - weight; which is mapped to nice just as before - relative deadline; which is related to slice length and mapped to the new latency nice. Basically, by setting a smaller slice, the deadline will be earlier and the task will be more eligible and ran earlier. Preemption (both tick and wakeup) is driven by testing against a fresh pick. Because the tree is now effectively an interval tree, and the selection is no longer 'leftmost', over-scheduling is less of a problem. Signed-off-by: Peter Zijlstra (Intel) --- include/linux/sched.h | 4 kernel/sched/debug.c | 6 - kernel/sched/fair.c | 265 ++++++++++++++++++++++++++++++++++++++++++------ kernel/sched/features.h | 2 kernel/sched/sched.h | 1 5 files changed, 247 insertions(+), 31 deletions(-) --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -548,6 +548,9 @@ struct sched_entity { /* For load-balancing: */ struct load_weight load; struct rb_node run_node; + u64 deadline; + u64 min_deadline; + struct list_head group_node; unsigned int on_rq; @@ -556,6 +559,7 @@ struct sched_entity { u64 vruntime; u64 prev_sum_exec_runtime; s64 lag; + u64 slice; u64 nr_migrations; --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -535,9 +535,13 @@ print_task(struct seq_file *m, struct rq else SEQ_printf(m, " %c", task_state_to_char(p)); - SEQ_printf(m, " %15s %5d %9Ld.%06ld %9Ld %5d ", + SEQ_printf(m, "%15s %5d %9Ld.%06ld %c %9Ld.%06ld %9Ld.%06ld %9Ld.%06ld %9Ld %5d ", p->comm, task_pid_nr(p), SPLIT_NS(p->se.vruntime), + entity_eligible(cfs_rq_of(&p->se), &p->se) ? 'E' : 'N', + SPLIT_NS(p->se.deadline), + SPLIT_NS(p->se.slice), + SPLIT_NS(p->se.sum_exec_runtime), (long long)(p->nvcsw + p->nivcsw), p->prio); --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -47,6 +47,7 @@ #include #include #include +#include #include @@ -683,6 +684,34 @@ u64 avg_vruntime(struct cfs_rq *cfs_rq) return cfs_rq->min_vruntime + lag; } +/* + * Entity is eligible once it received less service than it ought to have, + * eg. lag >= 0. + * + * lag_i = S - s_i = w_i*(V - w_i) + * + * lag_i >= 0 -> V >= v_i + * + * \Sum (v_i - v)*w_i + * V = ------------------ + v + * \Sum w_i + * + * lag_i >= 0 -> \Sum (v_i - v)*w_i >= (v_i - v)*(\Sum w_i) + */ +int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se) +{ + struct sched_entity *curr = cfs_rq->curr; + s64 avg_vruntime = cfs_rq->avg_vruntime; + long avg_load = cfs_rq->avg_load; + + if (curr && curr->on_rq) { + avg_vruntime += entity_key(cfs_rq, curr) * curr->load.weight; + avg_load += curr->load.weight; + } + + return avg_vruntime >= entity_key(cfs_rq, se) * avg_load; +} + static u64 __update_min_vruntime(struct cfs_rq *cfs_rq, u64 vruntime) { u64 min_vruntime = cfs_rq->min_vruntime; @@ -699,8 +728,8 @@ static u64 __update_min_vruntime(struct static void update_min_vruntime(struct cfs_rq *cfs_rq) { + struct sched_entity *se = __pick_first_entity(cfs_rq); struct sched_entity *curr = cfs_rq->curr; - struct rb_node *leftmost = rb_first_cached(&cfs_rq->tasks_timeline); u64 vruntime = cfs_rq->min_vruntime; @@ -711,9 +740,7 @@ static void update_min_vruntime(struct c curr = NULL; } - if (leftmost) { /* non-empty tree */ - struct sched_entity *se = __node_2_se(leftmost); - + if (se) { if (!curr) vruntime = se->vruntime; else @@ -730,18 +757,50 @@ static inline bool __entity_less(struct return entity_before(__node_2_se(a), __node_2_se(b)); } +#define deadline_gt(field, lse, rse) ({ (s64)((lse)->field - (rse)->field) > 0; }) + +static inline void __update_min_deadline(struct sched_entity *se, struct rb_node *node) +{ + if (node) { + struct sched_entity *rse = __node_2_se(node); + if (deadline_gt(min_deadline, se, rse)) + se->min_deadline = rse->min_deadline; + } +} + +/* + * se->min_deadline = min(se->deadline, left->min_deadline, right->min_deadline) + */ +static inline bool min_deadline_update(struct sched_entity *se, bool exit) +{ + u64 old_min_deadline = se->min_deadline; + struct rb_node *node = &se->run_node; + + se->min_deadline = se->deadline; + __update_min_deadline(se, node->rb_right); + __update_min_deadline(se, node->rb_left); + + return se->min_deadline == old_min_deadline; +} + +RB_DECLARE_CALLBACKS(static, min_deadline_cb, struct sched_entity, + run_node, min_deadline, min_deadline_update); + /* * Enqueue an entity into the rb-tree: */ static void __enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se) { avg_vruntime_add(cfs_rq, se); - rb_add_cached(&se->run_node, &cfs_rq->tasks_timeline, __entity_less); + se->min_deadline = se->deadline; + rb_add_augmented_cached(&se->run_node, &cfs_rq->tasks_timeline, + __entity_less, &min_deadline_cb); } static void __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se) { - rb_erase_cached(&se->run_node, &cfs_rq->tasks_timeline); + rb_erase_augmented_cached(&se->run_node, &cfs_rq->tasks_timeline, + &min_deadline_cb); avg_vruntime_sub(cfs_rq, se); } @@ -765,6 +824,101 @@ static struct sched_entity *__pick_next_ return __node_2_se(next); } +static struct sched_entity *pick_cfs(struct cfs_rq *cfs_rq, struct sched_entity *curr) +{ + struct sched_entity *left = __pick_first_entity(cfs_rq); + + /* + * If curr is set we have to see if its left of the leftmost entity + * still in the tree, provided there was anything in the tree at all. + */ + if (!left || (curr && entity_before(curr, left))) + left = curr; + + return left; +} + +/* + * Earliest Eligible Virtual Deadline First + * + * In order to provide latency guarantees for different request sizes + * EEVDF selects the best runnable task from two criteria: + * + * 1) the task must be eligible (must be owed service) + * + * 2) from those tasks that meet 1), we select the one + * with the earliest virtual deadline. + * + * We can do this in O(log n) time due to an augmented RB-tree. The + * tree keeps the entries sorted on service, but also functions as a + * heap based on the deadline by keeping: + * + * se->min_deadline = min(se->deadline, se->{left,right}->min_deadline) + * + * Which allows an EDF like search on (sub)trees. + */ +static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq) +{ + struct rb_node *node = cfs_rq->tasks_timeline.rb_root.rb_node; + struct sched_entity *curr = cfs_rq->curr; + struct sched_entity *best = NULL; + + if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr))) + curr = NULL; + + while (node) { + struct sched_entity *se = __node_2_se(node); + + /* + * If this entity is not eligible, try the left subtree. + * + * XXX: would it be worth it to do the single division for + * avg_vruntime() once, instead of the multiplication + * in entity_eligible() O(log n) times? + */ + if (!entity_eligible(cfs_rq, se)) { + node = node->rb_left; + continue; + } + + /* + * If this entity has an earlier deadline than the previous + * best, take this one. If it also has the earliest deadline + * of its subtree, we're done. + */ + if (!best || deadline_gt(deadline, best, se)) { + best = se; + if (best->deadline == best->min_deadline) + break; + } + + /* + * If the earlest deadline in this subtree is in the fully + * eligible left half of our space, go there. + */ + if (node->rb_left && + __node_2_se(node->rb_left)->min_deadline == se->min_deadline) { + node = node->rb_left; + continue; + } + + node = node->rb_right; + } + + if (!best || (curr && deadline_gt(deadline, best, curr))) + best = curr; + + if (unlikely(!best)) { + struct sched_entity *left = __pick_first_entity(cfs_rq); + if (left) { + pr_err("EEVDF scheduling fail, picking leftmost\n"); + return left; + } + } + + return best; +} + #ifdef CONFIG_SCHED_DEBUG struct sched_entity *__pick_last_entity(struct cfs_rq *cfs_rq) { @@ -882,6 +1036,32 @@ static u64 sched_slice(struct cfs_rq *cf return slice; } +static void set_slice(struct cfs_rq *cfs_rq, struct sched_entity *se) +{ + if (sched_feat(EEVDF)) { + /* + * For EEVDF the virtual time slope is determined by w_i (iow. + * nice) while the request time r_i is determined by + * latency-nice. + */ + se->slice = se->latency_offset; + } else { + /* + * When many tasks blow up the sched_period; it is possible + * that sched_slice() reports unusually large results (when + * many tasks are very light for example). Therefore impose a + * maximum. + */ + se->slice = min_t(u64, sched_slice(cfs_rq, se), sysctl_sched_latency); + } + + /* + * vd_i = ve_i + r_i / w_i + */ + se->deadline = se->vruntime + calc_delta_fair(se->slice, se); + se->min_deadline = se->deadline; +} + #include "pelt.h" #ifdef CONFIG_SMP @@ -1014,6 +1194,13 @@ static void update_curr(struct cfs_rq *c schedstat_add(cfs_rq->exec_clock, delta_exec); curr->vruntime += calc_delta_fair(delta_exec, curr); + /* + * XXX: strictly: vd_i += N*r_i/w_i such that: vd_i > ve_i + * this is probably good enough. + */ + if ((s64)(curr->vruntime - curr->deadline) > 0) + set_slice(cfs_rq, curr); + update_min_vruntime(cfs_rq); if (entity_is_task(curr)) { @@ -4788,6 +4975,7 @@ place_entity(struct cfs_rq *cfs_rq, stru vruntime -= se->lag; se->vruntime = vruntime; + set_slice(cfs_rq, se); } static void check_enqueue_throttle(struct cfs_rq *cfs_rq); @@ -4996,19 +5184,20 @@ dequeue_entity(struct cfs_rq *cfs_rq, st static void check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr) { - unsigned long ideal_runtime, delta_exec; + unsigned long delta_exec; struct sched_entity *se; s64 delta; - /* - * When many tasks blow up the sched_period; it is possible that - * sched_slice() reports unusually large results (when many tasks are - * very light for example). Therefore impose a maximum. - */ - ideal_runtime = min_t(u64, sched_slice(cfs_rq, curr), sysctl_sched_latency); + if (sched_feat(EEVDF)) { + if (pick_eevdf(cfs_rq) != curr) + goto preempt; + + return; + } delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime; - if (delta_exec > ideal_runtime) { + if (delta_exec > curr->slice) { +preempt: resched_curr(rq_of(cfs_rq)); /* * The current task ran long enough, ensure it doesn't get @@ -5032,7 +5221,7 @@ check_preempt_tick(struct cfs_rq *cfs_rq if (delta < 0) return; - if (delta > ideal_runtime) + if (delta > curr->slice) resched_curr(rq_of(cfs_rq)); } @@ -5087,17 +5276,20 @@ wakeup_preempt_entity(struct sched_entit static struct sched_entity * pick_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *curr) { - struct sched_entity *left = __pick_first_entity(cfs_rq); - struct sched_entity *se; + struct sched_entity *left, *se; - /* - * If curr is set we have to see if its left of the leftmost entity - * still in the tree, provided there was anything in the tree at all. - */ - if (!left || (curr && entity_before(curr, left))) - left = curr; + if (sched_feat(EEVDF)) { + /* + * Enabling NEXT_BUDDY will affect latency but not fairness. + */ + if (sched_feat(NEXT_BUDDY) && + cfs_rq->next && entity_eligible(cfs_rq, cfs_rq->next)) + return cfs_rq->next; + + return pick_eevdf(cfs_rq); + } - se = left; /* ideally we run the leftmost entity */ + se = left = pick_cfs(cfs_rq, curr); /* * Avoid running the skip buddy, if running something else can @@ -6192,13 +6384,12 @@ static inline void unthrottle_offline_cf static void hrtick_start_fair(struct rq *rq, struct task_struct *p) { struct sched_entity *se = &p->se; - struct cfs_rq *cfs_rq = cfs_rq_of(se); SCHED_WARN_ON(task_rq(p) != rq); if (rq->cfs.h_nr_running > 1) { - u64 slice = sched_slice(cfs_rq, se); u64 ran = se->sum_exec_runtime - se->prev_sum_exec_runtime; + u64 slice = se->slice; s64 delta = slice - ran; if (delta < 0) { @@ -7921,7 +8112,19 @@ static void check_preempt_wakeup(struct if (cse_is_idle != pse_is_idle) return; - update_curr(cfs_rq_of(se)); + cfs_rq = cfs_rq_of(se); + update_curr(cfs_rq); + + if (sched_feat(EEVDF)) { + /* + * XXX pick_eevdf(cfs_rq) != se ? + */ + if (pick_eevdf(cfs_rq) == pse) + goto preempt; + + return; + } + if (wakeup_preempt_entity(se, pse) == 1) { /* * Bias pick_next to pick the sched entity that is @@ -8167,7 +8370,7 @@ static void yield_task_fair(struct rq *r clear_buddies(cfs_rq, se); - if (curr->policy != SCHED_BATCH) { + if (sched_feat(EEVDF) || curr->policy != SCHED_BATCH) { update_rq_clock(rq); /* * Update run-time statistics of the 'current'. @@ -8180,6 +8383,8 @@ static void yield_task_fair(struct rq *r */ rq_clock_skip_update(rq); } + if (sched_feat(EEVDF)) + se->deadline += calc_delta_fair(se->slice, se); set_skip_buddy(se); } @@ -11923,8 +12128,8 @@ static void rq_offline_fair(struct rq *r static inline bool __entity_slice_used(struct sched_entity *se, int min_nr_tasks) { - u64 slice = sched_slice(cfs_rq_of(se), se); u64 rtime = se->sum_exec_runtime - se->prev_sum_exec_runtime; + u64 slice = se->slice; return (rtime * min_nr_tasks > slice); } @@ -12639,7 +12844,7 @@ static unsigned int get_rr_interval_fair * idle runqueue: */ if (rq->cfs.load.weight) - rr_interval = NS_TO_JIFFIES(sched_slice(cfs_rq_of(se), se)); + rr_interval = NS_TO_JIFFIES(se->slice); return rr_interval; } --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -103,3 +103,5 @@ SCHED_FEAT(LATENCY_WARN, false) SCHED_FEAT(ALT_PERIOD, true) SCHED_FEAT(BASE_SLICE, true) + +SCHED_FEAT(EEVDF, true) --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3316,5 +3316,6 @@ static inline void switch_mm_cid(struct #endif extern u64 avg_vruntime(struct cfs_rq *cfs_rq); +extern int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se); #endif /* _KERNEL_SCHED_SCHED_H */