Message ID | 20230531124604.615053451@infradead.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp2854987vqr; Wed, 31 May 2023 05:54:50 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5Nol8eClau4QtNy9SAnjYwlklTf8BYI17bsnwcQ2MUdP4xKBPpbotYYOm/3R7bURo8vIsP X-Received: by 2002:a17:902:c20d:b0:1aa:ff24:f8f0 with SMTP id 13-20020a170902c20d00b001aaff24f8f0mr4212897pll.4.1685537689772; Wed, 31 May 2023 05:54:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685537689; cv=none; d=google.com; s=arc-20160816; b=UDjxY+VK6NgJpq4oa0SuKulcWy0rWoBZqVx8rWRz/QPSS4EsYuBZztjetLMEQq2VUE b/gooRybPIS6F3b3F4bbWDm8kGqRU9LzeQb50nRoPAdOnlL2vcjOig6KqNdNrZjIp4LE prLuvJ/GqNz663xN+xjG+OTU/FN4uzwqhmfrl7R0iUiLZEPdi6z0sKDcZdjvEC3lndMM LTQcROK8cqcRiPVm23oODPlLZUqSuVMCak5GEWIm/b5/b34LmXImFGSdlI57g/QczsJX E7HlWmrv196C8QhbUgNwhfUXKdRpuvkErjn03OlDnYHhJuH1lSh+npGqXXtfl2VlHr9M L7Fw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id:dkim-signature; bh=R1+1PylE05YHMapFN57/a7eEiVmU95jtvPhJv8WUe5M=; b=exBP+DLtHHcd3bXOl+9vZ6tdBNKsUM0LY0BGPNvRdlMX3BiGWFr6D8RgY+uxzeYbAI ExMttsvxc7rnYwURRrZIQvXMmdXwaR5tQuFwrQyNCMh3tVSfBC5QTzBxcQ+z8FygXQTO E1uYoH7J/aI2b/AysZgTyjwnvZkZ91cDT6CXvGcbxu8R5Lue1bkc4RZM82BKjle1ageP OYtnYsRH6kiW1sx2W9YLcqawrbxO8mAQgl039YIEUp8qcUfqcGdPrXcAWis3vDs4T3Np G/TGR4cKeS2GKdWScMXEZtmTww7Pd/OxdcNx8mCbf03Zm2mCSV2SirZuyRQjs7nvzcQM s+Fg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=desiato.20200630 header.b=Ay5zZXGO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ju2-20020a170903428200b001b042d1eccfsi775507plb.530.2023.05.31.05.54.32; Wed, 31 May 2023 05:54:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=desiato.20200630 header.b=Ay5zZXGO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235976AbjEaMtY (ORCPT <rfc822;andrewvogler123@gmail.com> + 99 others); Wed, 31 May 2023 08:49:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45520 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235972AbjEaMs7 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 31 May 2023 08:48:59 -0400 Received: from desiato.infradead.org (desiato.infradead.org [IPv6:2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9676AE43 for <linux-kernel@vger.kernel.org>; Wed, 31 May 2023 05:48:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=R1+1PylE05YHMapFN57/a7eEiVmU95jtvPhJv8WUe5M=; b=Ay5zZXGO21ULUt+43kfg0VDjJN UgV5SrHVxn4hAmrFaQ4kYArs+AJC8z5VYKmZ8KLzuE+mvANohLcC5guj8YfevTOe9dtq4mt9CzJl4 jiPG9fz3A+mjIaSA2hrUYI/1JohWk9t4OZqvJrgBTMRmDvxlrOmX/DkNQ0s07gXDm4Pe+4dSWIguo ANU6sF6AXgsYm87kDlD5H2oLO4E/G73fKyV4xFx4CIKfY6LRC/yYPXBkr+9NxNO8hw7Lou0D/8i6J 4KYBkZHqUCxYC1WR6D9PIdtjmEFwSqBavkiWSktmu5xkdovrLbba9gtYFVZHDRAlV3lBBwKZ2RcB8 vQghUzFg==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1q4LEq-00FSLH-0M; Wed, 31 May 2023 12:47:40 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 9B7603015B5; Wed, 31 May 2023 14:47:37 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id 29BA422BA6462; Wed, 31 May 2023 14:47:34 +0200 (CEST) Message-ID: <20230531124604.615053451@infradead.org> User-Agent: quilt/0.66 Date: Wed, 31 May 2023 13:58:54 +0200 From: Peter Zijlstra <peterz@infradead.org> To: mingo@kernel.org, vincent.guittot@linaro.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, juri.lelli@redhat.com, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, corbet@lwn.net, qyousef@layalina.io, chris.hyser@oracle.com, patrick.bellasi@matbug.net, pjt@google.com, pavel@ucw.cz, qperret@google.com, tim.c.chen@linux.intel.com, joshdon@google.com, timj@gnu.org, kprateek.nayak@amd.com, yu.c.chen@intel.com, youssefesmat@chromium.org, joel@joelfernandes.org, efault@gmx.de, tglx@linutronix.de Subject: [RFC][PATCH 15/15] sched/eevdf: Use sched_attr::sched_runtime to set request/slice References: <20230531115839.089944915@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767414368664400613?= X-GMAIL-MSGID: =?utf-8?q?1767414368664400613?= |
Series |
sched: EEVDF and latency-nice and/or slice-attr
|
|
Commit Message
Peter Zijlstra
May 31, 2023, 11:58 a.m. UTC
As an alternative to the latency-nice interface; allow applications to
directly set the request/slice using sched_attr::sched_runtime.
The implementation clamps the value to: 0.1[ms] <= slice <= 100[ms]
which is 1/10 the size of HZ=1000 and 10 times the size of HZ=100.
Applications should strive to use their periodic runtime at a high
confidence interval (95%+) as the target slice. Using a smaller slice
will introduce undue preemptions, while using a larger value will
reduce latency.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/sched/core.c | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
Comments
On Wed, 31 May 2023 at 14:47, Peter Zijlstra <peterz@infradead.org> wrote: > > As an alternative to the latency-nice interface; allow applications to > directly set the request/slice using sched_attr::sched_runtime. > > The implementation clamps the value to: 0.1[ms] <= slice <= 100[ms] > which is 1/10 the size of HZ=1000 and 10 times the size of HZ=100. There were some discussions about the latency interface and setting a raw time value. The problems with using a raw time value are: - what does this raw time value mean ? and how it applies to the scheduling latency of the task. Typically what does setting sched_runtime to 1ms means ? Regarding the latency, users would expect to be scheduled in less than 1ms but this is not what will (always) happen with a sched_slice set to 1ms whereas we ensure that the task will run for sched_runtime in the sched_period (and before sched_deadline) when using it with deadline scheduler. so this will be confusing - more than a runtime, we want to set a scheduling latency hint which would be more aligned with a deadline - Then the user will complain that he set 1ms but its task is scheduled after several (or even dozens) ms in some cases. Also, you will probably end up with everybody setting 0.1ms and expecting 0.1ms latency. The latency nice like the nice give an opaque weight against others without any determinism that we can't respect - How do you set that you don't want to preempt others ? But still want to keep your allocated running time. > > Applications should strive to use their periodic runtime at a high > confidence interval (95%+) as the target slice. Using a smaller slice > will introduce undue preemptions, while using a larger value will > reduce latency. > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > kernel/sched/core.c | 24 ++++++++++++++++++------ > 1 file changed, 18 insertions(+), 6 deletions(-) > > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -7494,10 +7494,18 @@ static void __setscheduler_params(struct > > p->policy = policy; > > - if (dl_policy(policy)) > + if (dl_policy(policy)) { > __setparam_dl(p, attr); > - else if (fair_policy(policy)) > + } else if (fair_policy(policy)) { > p->static_prio = NICE_TO_PRIO(attr->sched_nice); > + if (attr->sched_runtime) { > + p->se.slice = clamp_t(u64, attr->sched_runtime, > + NSEC_PER_MSEC/10, /* HZ=1000 * 10 */ > + NSEC_PER_MSEC*100); /* HZ=100 / 10 */ > + } else { > + p->se.slice = sysctl_sched_base_slice; > + } > + } > > /* > * __sched_setscheduler() ensures attr->sched_priority == 0 when > @@ -7689,7 +7697,9 @@ static int __sched_setscheduler(struct t > * but store a possible modification of reset_on_fork. > */ > if (unlikely(policy == p->policy)) { > - if (fair_policy(policy) && attr->sched_nice != task_nice(p)) > + if (fair_policy(policy) && > + (attr->sched_nice != task_nice(p) || > + (attr->sched_runtime && attr->sched_runtime != p->se.slice))) > goto change; > if (rt_policy(policy) && attr->sched_priority != p->rt_priority) > goto change; > @@ -8017,12 +8027,14 @@ static int sched_copy_attr(struct sched_ > > static void get_params(struct task_struct *p, struct sched_attr *attr) > { > - if (task_has_dl_policy(p)) > + if (task_has_dl_policy(p)) { > __getparam_dl(p, attr); > - else if (task_has_rt_policy(p)) > + } else if (task_has_rt_policy(p)) { > attr->sched_priority = p->rt_priority; > - else > + } else { > attr->sched_nice = task_nice(p); > + attr->sched_runtime = p->se.slice; > + } > } > > /** > >
On Thu, Jun 01, 2023 at 03:55:18PM +0200, Vincent Guittot wrote: > On Wed, 31 May 2023 at 14:47, Peter Zijlstra <peterz@infradead.org> wrote: > > > > As an alternative to the latency-nice interface; allow applications to > > directly set the request/slice using sched_attr::sched_runtime. > > > > The implementation clamps the value to: 0.1[ms] <= slice <= 100[ms] > > which is 1/10 the size of HZ=1000 and 10 times the size of HZ=100. > > There were some discussions about the latency interface and setting a > raw time value. The problems with using a raw time value are: So yeah, I'm well aware of that. And I'm not saying this is a better interface -- just an alternative. > - what does this raw time value mean ? and how it applies to the > scheduling latency of the task. Typically what does setting > sched_runtime to 1ms means ? Regarding the latency, users would expect > to be scheduled in less than 1ms but this is not what will (always) > happen with a sched_slice set to 1ms whereas we ensure that the task > will run for sched_runtime in the sched_period (and before > sched_deadline) when using it with deadline scheduler. so this will be > confusing Confusing only if you don't know how to look at it; users are confused in general and that's unfixable, nature will always invent a better moron. The best we can do is provide enough clues for someone that does know what he's doing. So let me start by explaining how such an interface could be used and how to look at it. (and because we all love steady state things; I too shall use it) Consider 4 equal-weight always running tasks (A,B,C,D) with a default slice of 1ms. The perfect schedule for this is a straight up FIFO rotation of the 4 tasks, 1ms each for a total period of 4ms. ABCDABCD... +---+---+---+--- By keeping the tasks in the same order, we ensure the max latency is the min latency -- consistency is king. If for one period you were to say flip the first and last tasks in the order, your max latency takes a hit, the task that was first will now have to wait 7ms instead of it's usual 3ms. ABCDDBCA... +---+---+---+--- So far so obvious and boring.. Now, is we were to change the slice of task D to 2ms, what happens is that it can't run the first time, because the slice rotations are 1ms, and it needs 2ms, so it needs to save up and bank the first slot, so you get a schedule like: ABCABCDDABCABCDD... +---+---+---+---+--- And here you can see that the total period becomes 8 (N*r_max). The period for the 1ms tasks is still 4ms -- on average, but the period for the 2ms task is 8ms. A more complex example would be 3 tasks: A(w=1,r=1), B(w=1,r=1), C(w=2,r=1) [to keep the 4ms period]: CCABCCAB... +---+---+---+--- If we change the slice of B to 2 then it becomes: CCACCABBCCACCABB... +---+---+---+---+--- So the total period is W*r_max (8ms), each task will average to a period of W*r_i and each task will get the fair share of w_i/W time over the total period (W*r_max per previous). > - more than a runtime, we want to set a scheduling latency hint which > would be more aligned with a deadline We all wants ponies ;-) But seriously if you have a real deadline, use SCHED_DEADLINE. > - Then the user will complain that he set 1ms but its task is > scheduled after several (or even dozens) ms in some cases. Also, you > will probably end up with everybody setting 0.1ms and expecting 0.1ms > latency. The latency nice like the nice give an opaque weight against > others without any determinism that we can't respect Now, notably I used sched_attr::sched_runtime, not _deadline nor _period. Runtime is how long you expect each job-execution to take (WCET and all that) in a periodic or sporadic task model. Given this is a best effort overcommit scheduling class, we *CANNOT* guarantee actual latency. The best we can offer is consistency (and this is where EEVDF is *much* better than CFS). We cannot, and must not pretend to provide a real deadline; hence we should really not use that term in the user interface for this. From the above examples we can see that if you ask for 1ms slices, you get 1ms slices spaced (on average) closer together than if you were to ask for 2ms slices -- even though they end up with the same share of CPU-time. Per the previous argument, the 2ms slice task has to forgo one slot in first period to bank and save up for a 2ms slot in a super period. How, if you're not a CPU hogging bully and don't use much CPU time at all (your music player etc..) then setting the slice length to what it actually takes to decode the next sample buffer, you can likely get a smaller average period. Conversely, if you ask for a slice significantly smaller than your job execution time, you'll see it'll get split up in smaller chunks and suffer preemption. > - How do you set that you don't want to preempt others ? But still > want to keep your allocated running time. SCHED_BATCH is what we have for that. That actually works.
--- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7494,10 +7494,18 @@ static void __setscheduler_params(struct p->policy = policy; - if (dl_policy(policy)) + if (dl_policy(policy)) { __setparam_dl(p, attr); - else if (fair_policy(policy)) + } else if (fair_policy(policy)) { p->static_prio = NICE_TO_PRIO(attr->sched_nice); + if (attr->sched_runtime) { + p->se.slice = clamp_t(u64, attr->sched_runtime, + NSEC_PER_MSEC/10, /* HZ=1000 * 10 */ + NSEC_PER_MSEC*100); /* HZ=100 / 10 */ + } else { + p->se.slice = sysctl_sched_base_slice; + } + } /* * __sched_setscheduler() ensures attr->sched_priority == 0 when @@ -7689,7 +7697,9 @@ static int __sched_setscheduler(struct t * but store a possible modification of reset_on_fork. */ if (unlikely(policy == p->policy)) { - if (fair_policy(policy) && attr->sched_nice != task_nice(p)) + if (fair_policy(policy) && + (attr->sched_nice != task_nice(p) || + (attr->sched_runtime && attr->sched_runtime != p->se.slice))) goto change; if (rt_policy(policy) && attr->sched_priority != p->rt_priority) goto change; @@ -8017,12 +8027,14 @@ static int sched_copy_attr(struct sched_ static void get_params(struct task_struct *p, struct sched_attr *attr) { - if (task_has_dl_policy(p)) + if (task_has_dl_policy(p)) { __getparam_dl(p, attr); - else if (task_has_rt_policy(p)) + } else if (task_has_rt_policy(p)) { attr->sched_priority = p->rt_priority; - else + } else { attr->sched_nice = task_nice(p); + attr->sched_runtime = p->se.slice; + } } /**