Message ID | 20230531115839.089944915@infradead.org |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp2856219vqr; Wed, 31 May 2023 05:57:06 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ71u8CaKDbjscnPn2xq05KYAOR1wsKgvTereKx4qJtlnCD0g/s6IA9CkExZFKtFLY7jYvLG X-Received: by 2002:a05:6358:5923:b0:123:3408:6da6 with SMTP id g35-20020a056358592300b0012334086da6mr1913339rwf.2.1685537826541; Wed, 31 May 2023 05:57:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685537826; cv=none; d=google.com; s=arc-20160816; b=v2jK6PCU/BaBFWM55t3eIoMm1kCfo9tQP0CJYPJs06FrVkUQL+dt6cPduo4LCWV0bS WAO+1ldhRx9W5HhV8QtnTH899dij6QIHeZKoukjg1BLBuTONiNKv62v6zxF5cU4AnfGt bIR5JjWvFgU23HDGcB3+etJNzOWF5mSEaURiPuVkBRXEZRb9En/cNk4YtevWUijhuQJ4 OlYhmln/ygBu4rBeURPSCp4zqevn9sWUUPocSobkkBRuQbGKnXiIRfkmt7fUAPJmfq15 cvSiDAX8JMhl4wum0PPHWls+M970dQW0PoVJgjFEownfISWQD0vSN5Bcs8mMJpek534G YlUQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:subject:cc:to:from:date:user-agent:message-id :dkim-signature; bh=43PZ7APiLSZw4p8V3oESMbNYg24McF+5B7Oexe48Sr0=; b=FDb2H4YURzYjVHwHaIzx4kYnezRc3eUiJumhC9eFn2c+75nrMcNdC9UW+SZZNweY1T +chv562dWGidOA2TkHIgoiYFtWcN1QSCxx9hfcrPFz5U09+6JlSO+iZDIDlUvVlxsQ9u EEUzAa3SvZKXqoXSqa8q8OVjoPXdRgrBIVqLHwZKLMVlFmEt3Ypfz+k3TYLf6RwtvX05 pGFDTTdS+Vf18xW9I/C3Y4vQB9LhOs2Khgzu+xcVVnZyjq2c8pJuqsLh5pNgom9wnnW/ jnQxefJtuAl6StOsd8mfbIQoN+S6wusGTz9aiFzZ2HRZOv4hdw3qarG1NbRku23kKYEN UYKw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=YzzuCGuu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b5-20020a63eb45000000b005346bd7dee9si933801pgk.682.2023.05.31.05.56.51; Wed, 31 May 2023 05:57:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=YzzuCGuu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236035AbjEaMtf (ORCPT <rfc822;andrewvogler123@gmail.com> + 99 others); Wed, 31 May 2023 08:49:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45482 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235973AbjEaMtA (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 31 May 2023 08:49:00 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3AD75E46 for <linux-kernel@vger.kernel.org>; Wed, 31 May 2023 05:48:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Subject:Cc:To:From:Date:Message-ID: Sender:Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To:References; bh=43PZ7APiLSZw4p8V3oESMbNYg24McF+5B7Oexe48Sr0=; b=YzzuCGuuyWwEyLi5LKjKjEuwBk vWljkFJM0uR0sKmb9XcV78lQSS2/v5GNjiNEqJLpAMoOeCL7ksS2EqldWz0S8HPm2TI2Owc9NFF3h rqpr8FzYhu4mUeS6gy3gRUBmYu+ZhLw9rJFdw1TiYhF6WIKCfnhk/Ty9poWp/nmPWf9TnCtDridc7 hUQAMTMsXj0JfIc8d7J8wODqN5MTY85B9JdPdvteeYXSjHU+Fa7W9YRWGPZ28y2oXOsLbaNGYBOQW ARTnYRrcZR9j8FcDA0ptyaAvu0scqsMuAKimy1t7Ewxlr9E9bTWFUlGGsqVr2GViuDMD5XqzhpEkM Zlw6WA5w==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1q4LEp-007GRX-0i; Wed, 31 May 2023 12:47:39 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 058CD3002A9; Wed, 31 May 2023 14:47:34 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id D3FEE20FF5F85; Wed, 31 May 2023 14:47:33 +0200 (CEST) Message-ID: <20230531115839.089944915@infradead.org> User-Agent: quilt/0.66 Date: Wed, 31 May 2023 13:58:39 +0200 From: Peter Zijlstra <peterz@infradead.org> To: mingo@kernel.org, vincent.guittot@linaro.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, juri.lelli@redhat.com, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, corbet@lwn.net, qyousef@layalina.io, chris.hyser@oracle.com, patrick.bellasi@matbug.net, pjt@google.com, pavel@ucw.cz, qperret@google.com, tim.c.chen@linux.intel.com, joshdon@google.com, timj@gnu.org, kprateek.nayak@amd.com, yu.c.chen@intel.com, youssefesmat@chromium.org, joel@joelfernandes.org, efault@gmx.de, tglx@linutronix.de Subject: [PATCH 00/15] sched: EEVDF and latency-nice and/or slice-attr X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED,URI_DOTEDU autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767414512067051986?= X-GMAIL-MSGID: =?utf-8?q?1767414512067051986?= |
Series |
sched: EEVDF and latency-nice and/or slice-attr
|
|
Message
Peter Zijlstra
May 31, 2023, 11:58 a.m. UTC
Hi! Latest version of the EEVDF [1] patches. The only real change since last time is the fix for tick-preemption [2], and a simple safe-guard for the mixed slice heuristic. Other than that, I've re-arranged the patches to make EEVDF come first and have the latency-nice or slice-attribute patches on top. Results should not be different from last time around, lots of people ran them and found no major performance issues; what was found was better latency and smaller variance (probably due to the more stable latency). I'm hoping we can start queueing this part. The big question is what additional interface to expose; some people have voiced objections to the latency-nice interface, the 'obvious' alternative is to directly expose the slice length as a request/hint. The very last patch implements this alternative using sched_attr::sched_runtime but is untested. Diffstat for the base patches [1-11]: include/linux/rbtree_augmented.h | 26 + include/linux/sched.h | 7 +- kernel/sched/core.c | 2 + kernel/sched/debug.c | 48 +- kernel/sched/fair.c | 1105 ++++++++++++++++++-------------------- kernel/sched/features.h | 24 +- kernel/sched/sched.h | 16 +- 7 files changed, 587 insertions(+), 641 deletions(-) [1] https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=805acf7726282721504c8f00575d91ebfd750564 [2] https://lkml.kernel.org/r/20230420150537.GC4253%40hirez.programming.kicks-ass.net
Comments
On 05/31/23 13:58, Peter Zijlstra wrote: > Hi! > > Latest version of the EEVDF [1] patches. > > The only real change since last time is the fix for tick-preemption [2], and a > simple safe-guard for the mixed slice heuristic. > > Other than that, I've re-arranged the patches to make EEVDF come first and have > the latency-nice or slice-attribute patches on top. > > Results should not be different from last time around, lots of people ran them > and found no major performance issues; what was found was better latency and > smaller variance (probably due to the more stable latency). > > I'm hoping we can start queueing this part. > > The big question is what additional interface to expose; some people have > voiced objections to the latency-nice interface, the 'obvious' alternative > is to directly expose the slice length as a request/hint. I haven't thought much about this, but.. There are two desired properties form user space that I hope we can achieve with this new hint: 1. The obvious improve the wake up latencies of some tasks. 2. Hint that there are some tasks that are happy to suffer high latencies. 2 is important because one thing that user space wants to do but we lack mechanisms to do so is give a hint some tasks are background tasks and can be shuffled around and preempted more often at the expense of keeping other tasks more happy. I'm hoping this + uclamp_max there would be reasonable way to tag these tasks now so that they consume less power and can be kept 'out of the way' when necessary. Slice length seems a bit counter intuitive for these use cases. Maybe expose the lag? Not sure if this neutral though to allow moving away from EEVDF in the future if we ever need to. deadline can sound too promising probably. Also, we had in mind to use these in EAS to avoid packing (though this has to be re-evaluated if still needed) which is another source of latency. I think Oracle wanted to use that to control the search depth at load balance IIRC - not sure if they still want that. Not sure where we stand from multiple users of this hint now. Should they be new hints? If so, are we okay with continue to extend sched_setattr() or better create a proper QoS framework? As a side note - I started working on some patches to generalize the load balancer misfit path. Issues we see: 1. latency sensitive tasks ending up on the same CPU can end up suffering. We need both wake up and load balancer to be aware of this and help spreading. 2. Busy uclamp_max tasks can end up stuck on a wrong core with no ability to move it around. I still have to see this in practice but it's a concern for wider deployment (which hasn't happened yet). I think the misfit path the right way to handle these cases. But I could be wrong. (or maybe you already take care of this and I just need to better read the patches). For the wake up side of things I think we need to be careful of cramming latency sensitive tasks on the same rq if we can avoid it. I didn't spot that in Vincent patches, neither in yours (which I yet to look better at). If I may piggy-back a bit more. We seem to reserve a policy for SCHED_ISO, which AFAICT had an old proposal to better tag interactive tasks. Is this something waiting for someone to come forward with a proposal for again? I have to say, the choice of interactive policy looks attractive. I wonder sometimes if we need to go vertical (new policy/sched_class) instead of horizental - or maybe both. Sorry a bit of a brain dump here. HTH. Cheers -- Qais Yousef > > The very last patch implements this alternative using sched_attr::sched_runtime > but is untested. > > Diffstat for the base patches [1-11]: > > include/linux/rbtree_augmented.h | 26 + > include/linux/sched.h | 7 +- > kernel/sched/core.c | 2 + > kernel/sched/debug.c | 48 +- > kernel/sched/fair.c | 1105 ++++++++++++++++++-------------------- > kernel/sched/features.h | 24 +- > kernel/sched/sched.h | 16 +- > 7 files changed, 587 insertions(+), 641 deletions(-) > > > [1] https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=805acf7726282721504c8f00575d91ebfd750564 > > [2] https://lkml.kernel.org/r/20230420150537.GC4253%40hirez.programming.kicks-ass.net >
> > EEVDF fundamentally supports per task request/slice sizes, which is the > primary motivator for finally finishing these patches. > > So the plan is to extend sched_setattr() to allow tasks setting their > own ideal slice length. But we're not quite there yet. > > Having just returned from PTO the mailbox is an utter trainwreck, but > I'll try and refresh those few patches this week for consideration. > > In the meantime I think you found the right knob to twiddle. Hello Peter, I am trying to understand a little better the need for the eligibility checks (entity_eligible). I understand the general concept, but I am trying to find a scenario where it is necessary. And maybe propose to have it toggled by a feature flag. Some of my testing: All my testing was done on a two core Celeron N400 cpu system 1.1Ghz. It was done on the 6.5-rc3 kernel with EEVDF changes ported. I have two CPU bound tasks one with a nice of -4 and the other with a nice of 0. They are both affinitized to CPU 0. (while 1 { i++ }) With entity_eligible *enabled* and with entity_eligible *disabled* (always returns 1): Top showed consistent results, one at ~70% and the other at ~30% So it seems the deadline adjustment will naturally achieve fairness. I also added a few trace_printks to see if there is a case where entity_eligible would have returned 0 before the deadline forced us to reschedule. There were a few such cases. The following snippet of prints shows that an entity became ineligible 2 slices before its deadline expired. It seems like this will add more context switching but still achieve a similar result at the end. bprint: pick_eevdf: eligibility check: tid=4568, eligible=0, deadline=26577257249, vruntime=26575761118 bprint: pick_eevdf: found best deadline: tid=4573, deadline=26575451399, vruntime=26574838855 sched_switch: prev_comm=loop prev_pid=4568 prev_prio=120 prev_state=R ==> next_comm=loop next_pid=4573 next_prio=116 bputs: task_tick_fair: tick bputs: task_tick_fair: tick bprint: pick_eevdf: eligibility check: tid=4573, eligible=1, deadline=26576270304, vruntime=26575657159 bprint: pick_eevdf: found best deadline: tid=4573, deadline=26576270304, vruntime=26575657159 bputs: task_tick_fair: tick bputs: task_tick_fair: tick bprint: pick_eevdf: eligibility check: tid=4573, eligible=0, deadline=26577089170, vruntime=26576476006 bprint: pick_eevdf: found best deadline: tid=4573, deadline=26577089170, vruntime=26576476006 bputs: task_tick_fair: tick bputs: task_tick_fair: tick bprint: pick_eevdf: eligibility check: tid=4573, eligible=0, deadline=26577908042, vruntime=26577294838 bprint: pick_eevdf: found best deadline: tid=4568, deadline=26577257249, vruntime=26575761118 sched_switch: prev_comm=loop prev_pid=4573 prev_prio=116 prev_state=R ==> next_comm=loop next_pid=4568 next_prio=120 In a more practical example, I tried this with one of our benchmarks that involves running Meet and Docs side by side and measuring the input latency in the Docs document. The following is the average latency for 5 runs: (These numbers are after removing our cgroup hierarchy - that might be a discussion for a later time). CFS: 168ms EEVDF with eligibility: 206ms (regression from CFS) EEVDF *without* eligibility: 143ms (improvement to CFS) EEVDF *without* eligibility and with a 6ms base_slice_ns (was 1.5ms): 104ms (great improvement) Removing the eligibility check for this workload seemed to result in a great improvement. I haven't dug deeper but I suspect it's related to reduced context switches on our 2 core system. As an extra test I also increased the base_slice_ns and it further improved the input latency significantly. I would love to hear your thoughts. Thanks!
On Fri, Sep 29, 2023 at 11:54 AM Youssef Esmat <youssefesmat@chromium.org> wrote: > > > > > EEVDF fundamentally supports per task request/slice sizes, which is the > > primary motivator for finally finishing these patches. > > > > So the plan is to extend sched_setattr() to allow tasks setting their > > own ideal slice length. But we're not quite there yet. > > > > Having just returned from PTO the mailbox is an utter trainwreck, but > > I'll try and refresh those few patches this week for consideration. > > > > In the meantime I think you found the right knob to twiddle. > > Hello Peter, > > I am trying to understand a little better the need for the eligibility > checks (entity_eligible). I understand the general concept, but I am > trying to find a scenario where it is necessary. And maybe propose to > have it toggled by a feature flag. > > Some of my testing: > > All my testing was done on a two core Celeron N400 cpu system 1.1Ghz. > It was done on the 6.5-rc3 kernel with EEVDF changes ported. > > I have two CPU bound tasks one with a nice of -4 and the other with a > nice of 0. They are both affinitized to CPU 0. (while 1 { i++ }) > > With entity_eligible *enabled* and with entity_eligible *disabled* > (always returns 1): > Top showed consistent results, one at ~70% and the other at ~30% > > So it seems the deadline adjustment will naturally achieve fairness. > > I also added a few trace_printks to see if there is a case where > entity_eligible would have returned 0 before the deadline forced us to > reschedule. There were a few such cases. The following snippet of > prints shows that an entity became ineligible 2 slices before its > deadline expired. It seems like this will add more context switching > but still achieve a similar result at the end. > > bprint: pick_eevdf: eligibility check: tid=4568, > eligible=0, deadline=26577257249, vruntime=26575761118 > bprint: pick_eevdf: found best deadline: tid=4573, > deadline=26575451399, vruntime=26574838855 > sched_switch: prev_comm=loop prev_pid=4568 prev_prio=120 > prev_state=R ==> next_comm=loop next_pid=4573 next_prio=116 > bputs: task_tick_fair: tick > bputs: task_tick_fair: tick > bprint: pick_eevdf: eligibility check: tid=4573, > eligible=1, deadline=26576270304, vruntime=26575657159 > bprint: pick_eevdf: found best deadline: tid=4573, > deadline=26576270304, vruntime=26575657159 > bputs: task_tick_fair: tick > bputs: task_tick_fair: tick > bprint: pick_eevdf: eligibility check: tid=4573, > eligible=0, deadline=26577089170, vruntime=26576476006 > bprint: pick_eevdf: found best deadline: tid=4573, > deadline=26577089170, vruntime=26576476006 > bputs: task_tick_fair: tick > bputs: task_tick_fair: tick > bprint: pick_eevdf: eligibility check: tid=4573, > eligible=0, deadline=26577908042, vruntime=26577294838 > bprint: pick_eevdf: found best deadline: tid=4568, > deadline=26577257249, vruntime=26575761118 > sched_switch: prev_comm=loop prev_pid=4573 prev_prio=116 > prev_state=R ==> next_comm=loop next_pid=4568 next_prio=120 > > In a more practical example, I tried this with one of our benchmarks > that involves running Meet and Docs side by side and measuring the > input latency in the Docs document. The following is the average > latency for 5 runs: > > (These numbers are after removing our cgroup hierarchy - that might be > a discussion for a later time). > > CFS: 168ms > EEVDF with eligibility: 206ms (regression from CFS) > EEVDF *without* eligibility: 143ms (improvement to CFS) > EEVDF *without* eligibility and with a 6ms base_slice_ns (was 1.5ms): > 104ms (great improvement) > > Removing the eligibility check for this workload seemed to result in a > great improvement. I haven't dug deeper but I suspect it's related to > reduced context switches on our 2 core system. > As an extra test I also increased the base_slice_ns and it further > improved the input latency significantly. > > I would love to hear your thoughts. Thanks! For completeness I ran two more tests: 1. EEVDF with eligibility and 6ms base_slice_ns. 2. EEVDF with eligibility with Benjamin Segall's patch (https://lore.kernel.org/all/xm261qego72d.fsf_-_@google.com/). I copied over all the previous results for easier comparison. CFS: 168ms EEVDF, eligib, 1.5ms slice: 206ms EEVDF, eligib, 6ms slice: 167ms EEVDF_Fix, eligib, 1.5ms slice: 190ms EEVDF, *no*eligib, 1.5ms slice: 143ms EEVDF, *no*eligib, 6ms slice: 104ms It does seem like Benjamin's fix did have an improvement.
On Fri, Sep 29, 2023 at 11:54:25AM -0500, Youssef Esmat wrote: > > > > EEVDF fundamentally supports per task request/slice sizes, which is the > > primary motivator for finally finishing these patches. > > > > So the plan is to extend sched_setattr() to allow tasks setting their > > own ideal slice length. But we're not quite there yet. > > > > Having just returned from PTO the mailbox is an utter trainwreck, but > > I'll try and refresh those few patches this week for consideration. > > > > In the meantime I think you found the right knob to twiddle. > > Hello Peter, > > I am trying to understand a little better the need for the eligibility > checks (entity_eligible). I understand the general concept, but I am > trying to find a scenario where it is necessary. And maybe propose to > have it toggled by a feature flag. My initial response was that it ensures fairness, but thinking a little about it I'm not so sure anymore. I do think it features in section 6 lemma 4 and 5, which provide interference bounds. But I'd have to think a little more about it. The current setup, where all requests are of equal size, then virtual deadline tree is basically identical to the virtual runtime tree, just transposed (in the static state scenario). When mixing request sizes things become a little more interesting. Let me ponder this a little bit more.
On Mon, Oct 02, 2023 at 08:41:36PM +0200, Peter Zijlstra wrote: > When mixing request sizes things become a little more interesting. > > Let me ponder this a little bit more. Using the attached program (I got *REALLY* fed up trying to draw these diagrams by hand), let us illustrate the difference between Earliest *Eligible* Virtual Deadline First and the one with the Eligible test taken out: EVDF. Specifically, the program was used with the following argument for EEVDF: ./eevdf -e "0,1024,6" -e "1,1024,2" -e "2,1024,18" -v 19 and with an additional '-n' for the EVDF column. EEVDF EVDF d = 6 d = 6 d = 2 d = 2 d = 18 d = 18 q = 2 q = 2 t=0 V=1 t=0 V=1 A |----< A |----< >B |< >B |< C |----------------< C |----------------< |*--------|---------|---------|---------|---- |*--------|---------|---------|---------|---- t=2 V=1 t=2 V=1 >A |----< A |----< B |< >B |< C |----------------< C |----------------< |*--------|---------|---------|---------|---- |*--------|---------|---------|---------|---- t=8 V=3 t=4 V=2 A |----< >A |----< >B |< B |< C |----------------< C |----------------< |--*------|---------|---------|---------|---- |-*-------|---------|---------|---------|---- t=10 V=4 t=10 V=4 A |----< A |----< B |< >B |< >C |----------------< C |----------------< |---*-----|---------|---------|---------|---- |---*-----|---------|---------|---------|---- t=28 V=10 t=12 V=5 A |----< A |----< >B |< >B |< C |----------------< C |----------------< |---------*---------|---------|---------|---- |----*----|---------|---------|---------|---- t=30 V=11 t=14 V=5 A |----< A |----< >B |< >B |< C |----------------< C |----------------< |---------|*--------|---------|---------|---- |----*----|---------|---------|---------|---- t=32 V=11 t=16 V=6 A |----< >A |----< >B |< B |< C |----------------< C |----------------< |---------|*--------|---------|---------|---- |-----*---|---------|---------|---------|---- t=34 V=12 t=22 V=8 >A |----< A |----< B |< >B |< C |----------------< C |----------------< |---------|-*-------|---------|---------|---- |-------*-|---------|---------|---------|---- t=40 V=14 t=24 V=9 A |----< A |----< >B |< >B |< C |----------------< C |----------------< |---------|---*-----|---------|---------|---- |--------*|---------|---------|---------|---- t=42 V=15 t=26 V=9 A |----< A |----< >B |< >B |< C |----------------< C |----------------< |---------|----*----|---------|---------|---- |--------*|---------|---------|---------|---- t=44 V=15 t=28 V=10 A |----< >A |----< >B |< B |< C |----------------< C |----------------< |---------|----*----|---------|---------|---- |---------*---------|---------|---------|---- t=46 V=16 t=34 V=12 >A |----< A |----< B |< >B |< C |----------------< C |----------------< |---------|-----*---|---------|---------|---- |---------|-*-------|---------|---------|---- t=52 V=18 t=36 V=13 A |----< A |----< >B |< B |< C |----------------< >C |----------------< |---------|-------*-|---------|---------|---- |---------|--*------|---------|---------|---- t=54 V=19 t=54 V=19 A |----< A |----< >B |< >B |< C |----------------< C |----------------< |---------|--------*|---------|---------|---- |---------|--------*|---------|---------|---- lags: -10, 6 lags: -7, 11 BAaaBCccccccccBBBAaaBBBAaaBB BBAaaBBBAaaBBBAaaBCccccccccB As I wrote before; EVDF has worse lag bounds, but this is not insurmountable. The biggest problem that I can see is that of wakeup preemption. Currently we allow to preempt when 'current' has reached V (RUN_TO_PARITY in pick_eevdf()). With these rules, when EEVDF schedules C (our large slice task) at t=10 above, it is only a little behind C and can be reaily preempted after about 2 time units. However, EVDF will delay scheduling C until much later, see how A and B walk far ahead of V until t=36. Only when will we pick C. But this means that we're firmly stuck with C for at least 11 time units. A newly placed task will be around V and will have no chance to preempt. That said, I do have me a patch to cure some of that: https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=d7edbe431f31762e516f2730196f41322edcc621 That would allow a task with a shorter request time to preempt in spite of RUN_TO_PARITY. However, in this example V is only 2/3 of the way to C's deadline, but it we were to have many more tasks, you'll see V gets closer and closer to C's deadline and it will become harder and harder to place such that preemption becomes viable. Adding 4 more tasks: ./eevdf -e "0,1024,6" -e "1,1024,2" -e "2,1024,18" -v 19 -n -e "3,1024,2" -e "4,1024,2" -e "5,1024,2" -e "6,1024,2" t=92 V=16 A |----< B |< >C |----------------< D |< E |< F |< G |< |---------|-----*---|---------|---------|---- And I worry this will create very real latency spikes. That said; I do see not having the eligibility check can help. So I'm not opposed to having a sched_feat for this, but I would not want to default to EVDF.
On Thu, Oct 05, 2023 at 02:05:57PM +0200, Peter Zijlstra wrote: > Using the attached program (I got *REALLY* fed up trying to draw these > diagrams by hand), let us illustrate the difference between Earliest > *Eligible* Virtual Deadline First and the one with the Eligible test > taken out: EVDF. > > Specifically, the program was used with the following argument for > EEVDF: > > ./eevdf -e "0,1024,6" -e "1,1024,2" -e "2,1024,18" -v 19 > > and with an additional '-n' for the EVDF column. > <snip a metric ton of diagrams> > > As I wrote before; EVDF has worse lag bounds, but this is not > insurmountable. The biggest problem that I can see is that of wakeup > preemption. Currently we allow to preempt when 'current' has reached V > (RUN_TO_PARITY in pick_eevdf()). > > With these rules, when EEVDF schedules C (our large slice task) at t=10 > above, it is only a little behind C and can be reaily preempted after > about 2 time units. > > However, EVDF will delay scheduling C until much later, see how A and B > walk far ahead of V until t=36. Only when will we pick C. But this means > that we're firmly stuck with C for at least 11 time units. A newly > placed task will be around V and will have no chance to preempt. > > That said, I do have me a patch to cure some of that: > > https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=d7edbe431f31762e516f2730196f41322edcc621 > > That would allow a task with a shorter request time to preempt in spite > of RUN_TO_PARITY. > So, doing all that gave me an idea, see queue/sched/eevdf BIAS_ELIGIBLE. It pushes the eligibility threshold (V) right by one average request. The below patch is against eevdf.c. I've not yet ran it through the normal set of hackbench,netperf etc.. so it might eat your pet and set your granny on fire. --- eevdf.c.orig 2023-10-05 16:11:35.821114320 +0200 +++ eevdf.c 2023-10-05 16:08:38.387080327 +0200 @@ -7,6 +7,7 @@ #include <sys/param.h> bool eligible = true; +bool bias = false; unsigned long V_lim = 20; struct entity { @@ -79,16 +80,17 @@ struct entity *pick_entity(int nr, struct entity *es) { - unsigned long W = 0, V = 0; + unsigned long W = 0, V = 0, R = 0; struct entity *e = NULL; for (int i = 0; i < nr; i++) { V += es[i].weight * es[i].vruntime; + R += es[i].request; W += es[i].weight; } for (int i = 0; i < nr; i++) { - if (eligible && W*es[i].vruntime > V) + if (eligible && W*es[i].vruntime > V + (bias * R)) continue; if (!e || es[i].vdeadline < e->vdeadline) @@ -169,10 +171,14 @@ const int N = sizeof(es) / sizeof(es[0]); - while ((opt = getopt(argc, argv, "nv:e:")) != -1) { + while ((opt = getopt(argc, argv, "bnv:e:")) != -1) { unsigned int v,w,r; switch (opt) { + case 'b': + bias = true; + break; + case 'n': eligible = false; break;
On Thu, Oct 05, 2023 at 04:14:08PM +0200, Peter Zijlstra wrote: > --- eevdf.c.orig 2023-10-05 16:11:35.821114320 +0200 > +++ eevdf.c 2023-10-05 16:08:38.387080327 +0200 > @@ -7,6 +7,7 @@ > #include <sys/param.h> > > bool eligible = true; > +bool bias = false; > unsigned long V_lim = 20; > > struct entity { > @@ -79,16 +80,17 @@ > > struct entity *pick_entity(int nr, struct entity *es) > { > - unsigned long W = 0, V = 0; > + unsigned long W = 0, V = 0, R = 0; > struct entity *e = NULL; > > for (int i = 0; i < nr; i++) { > V += es[i].weight * es[i].vruntime; > + R += es[i].request; * 1024 Also, average seems too much, one large value lifts it too easily. Need to come up with something better :/ > W += es[i].weight; > } > > for (int i = 0; i < nr; i++) { > - if (eligible && W*es[i].vruntime > V) > + if (eligible && W*es[i].vruntime > V + (bias * R)) > continue; > > if (!e || es[i].vdeadline < e->vdeadline) > @@ -169,10 +171,14 @@ > > const int N = sizeof(es) / sizeof(es[0]); > > - while ((opt = getopt(argc, argv, "nv:e:")) != -1) { > + while ((opt = getopt(argc, argv, "bnv:e:")) != -1) { > unsigned int v,w,r; > > switch (opt) { > + case 'b': > + bias = true; > + break; > + > case 'n': > eligible = false; > break;
On Thu, Oct 5, 2023 at 7:06 AM Peter Zijlstra <peterz@infradead.org> wrote: > > On Mon, Oct 02, 2023 at 08:41:36PM +0200, Peter Zijlstra wrote: > > > When mixing request sizes things become a little more interesting. > > > > Let me ponder this a little bit more. > > Using the attached program (I got *REALLY* fed up trying to draw these > diagrams by hand), let us illustrate the difference between Earliest > *Eligible* Virtual Deadline First and the one with the Eligible test > taken out: EVDF. > > Specifically, the program was used with the following argument for > EEVDF: > > ./eevdf -e "0,1024,6" -e "1,1024,2" -e "2,1024,18" -v 19 > > and with an additional '-n' for the EVDF column. > > > EEVDF EVDF > > > d = 6 d = 6 > d = 2 d = 2 > d = 18 d = 18 > q = 2 q = 2 > > t=0 V=1 t=0 V=1 > A |----< A |----< > >B |< >B |< > C |----------------< C |----------------< > |*--------|---------|---------|---------|---- |*--------|---------|---------|---------|---- > > > t=2 V=1 t=2 V=1 > >A |----< A |----< > B |< >B |< > C |----------------< C |----------------< > |*--------|---------|---------|---------|---- |*--------|---------|---------|---------|---- > > > t=8 V=3 t=4 V=2 > A |----< >A |----< > >B |< B |< > C |----------------< C |----------------< > |--*------|---------|---------|---------|---- |-*-------|---------|---------|---------|---- > > > t=10 V=4 t=10 V=4 > A |----< A |----< > B |< >B |< > >C |----------------< C |----------------< > |---*-----|---------|---------|---------|---- |---*-----|---------|---------|---------|---- > > > t=28 V=10 t=12 V=5 > A |----< A |----< > >B |< >B |< > C |----------------< C |----------------< > |---------*---------|---------|---------|---- |----*----|---------|---------|---------|---- > > > t=30 V=11 t=14 V=5 > A |----< A |----< > >B |< >B |< > C |----------------< C |----------------< > |---------|*--------|---------|---------|---- |----*----|---------|---------|---------|---- > > > t=32 V=11 t=16 V=6 > A |----< >A |----< > >B |< B |< > C |----------------< C |----------------< > |---------|*--------|---------|---------|---- |-----*---|---------|---------|---------|---- > > > t=34 V=12 t=22 V=8 > >A |----< A |----< > B |< >B |< > C |----------------< C |----------------< > |---------|-*-------|---------|---------|---- |-------*-|---------|---------|---------|---- > > > t=40 V=14 t=24 V=9 > A |----< A |----< > >B |< >B |< > C |----------------< C |----------------< > |---------|---*-----|---------|---------|---- |--------*|---------|---------|---------|---- > > > t=42 V=15 t=26 V=9 > A |----< A |----< > >B |< >B |< > C |----------------< C |----------------< > |---------|----*----|---------|---------|---- |--------*|---------|---------|---------|---- > > > t=44 V=15 t=28 V=10 > A |----< >A |----< > >B |< B |< > C |----------------< C |----------------< > |---------|----*----|---------|---------|---- |---------*---------|---------|---------|---- > > > t=46 V=16 t=34 V=12 > >A |----< A |----< > B |< >B |< > C |----------------< C |----------------< > |---------|-----*---|---------|---------|---- |---------|-*-------|---------|---------|---- > > > t=52 V=18 t=36 V=13 > A |----< A |----< > >B |< B |< > C |----------------< >C |----------------< > |---------|-------*-|---------|---------|---- |---------|--*------|---------|---------|---- > > > t=54 V=19 t=54 V=19 > A |----< A |----< > >B |< >B |< > C |----------------< C |----------------< > |---------|--------*|---------|---------|---- |---------|--------*|---------|---------|---- > > > lags: -10, 6 lags: -7, 11 > > BAaaBCccccccccBBBAaaBBBAaaBB BBAaaBBBAaaBBBAaaBCccccccccB > > > > As I wrote before; EVDF has worse lag bounds, but this is not > insurmountable. The biggest problem that I can see is that of wakeup > preemption. Currently we allow to preempt when 'current' has reached V > (RUN_TO_PARITY in pick_eevdf()). > > With these rules, when EEVDF schedules C (our large slice task) at t=10 > above, it is only a little behind C and can be reaily preempted after > about 2 time units. > > However, EVDF will delay scheduling C until much later, see how A and B > walk far ahead of V until t=36. Only when will we pick C. But this means > that we're firmly stuck with C for at least 11 time units. A newly > placed task will be around V and will have no chance to preempt. > Thank you for the detailed analysis! I am still in the process of digesting everything. I do have a quick question, this will only be the case if we adjust C's runtime without adjusting nice value, correct? So it does not currently apply to the submitted code where the only way to change the deadline is to also change the nice value and thus how fast/slow vruntime accumulates. In other words without the sched_runtime patch[1] we should not run into this scenario, correct? [1] https://lore.kernel.org/lkml/20230915124354.416936110@noisy.programming.kicks-ass.net/ > That said, I do have me a patch to cure some of that: > > https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=d7edbe431f31762e516f2730196f41322edcc621 > > That would allow a task with a shorter request time to preempt in spite > of RUN_TO_PARITY. > > However, in this example V is only 2/3 of the way to C's deadline, but > it we were to have many more tasks, you'll see V gets closer and closer > to C's deadline and it will become harder and harder to place such that > preemption becomes viable. > > Adding 4 more tasks: > > ./eevdf -e "0,1024,6" -e "1,1024,2" -e "2,1024,18" -v 19 -n -e "3,1024,2" -e "4,1024,2" -e "5,1024,2" -e "6,1024,2" > > t=92 V=16 > A |----< > B |< > >C |----------------< > D |< > E |< > F |< > G |< > |---------|-----*---|---------|---------|---- > > > And I worry this will create very real latency spikes. > > That said; I do see not having the eligibility check can help. So I'm > not opposed to having a sched_feat for this, but I would not want to > default to EVDF.
On Thu, Oct 5, 2023 at 1:23 PM Youssef Esmat <youssefesmat@chromium.org> wrote: > > On Thu, Oct 5, 2023 at 7:06 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > On Mon, Oct 02, 2023 at 08:41:36PM +0200, Peter Zijlstra wrote: > > > > > When mixing request sizes things become a little more interesting. > > > > > > Let me ponder this a little bit more. > > > > Using the attached program (I got *REALLY* fed up trying to draw these > > diagrams by hand), let us illustrate the difference between Earliest > > *Eligible* Virtual Deadline First and the one with the Eligible test > > taken out: EVDF. > > > > Specifically, the program was used with the following argument for > > EEVDF: > > > > ./eevdf -e "0,1024,6" -e "1,1024,2" -e "2,1024,18" -v 19 > > > > and with an additional '-n' for the EVDF column. > > <snip diagrams> > > > > > > As I wrote before; EVDF has worse lag bounds, but this is not > > insurmountable. The biggest problem that I can see is that of wakeup > > preemption. Currently we allow to preempt when 'current' has reached V > > (RUN_TO_PARITY in pick_eevdf()). > > > > With these rules, when EEVDF schedules C (our large slice task) at t=10 > > above, it is only a little behind C and can be reaily preempted after > > about 2 time units. > > > > However, EVDF will delay scheduling C until much later, see how A and B > > walk far ahead of V until t=36. Only when will we pick C. But this means > > that we're firmly stuck with C for at least 11 time units. A newly > > placed task will be around V and will have no chance to preempt. > > > > Thank you for the detailed analysis! I am still in the process of > digesting everything. > I do have a quick question, this will only be the case if we adjust > C's runtime without adjusting nice value, correct? So it does not > currently apply to the submitted code where the only way to change the > deadline is to also change the nice value and thus how fast/slow > vruntime accumulates. In other words without the sched_runtime > patch[1] we should not run into this scenario, correct? > > [1] https://lore.kernel.org/lkml/20230915124354.416936110@noisy.programming.kicks-ass.net/ Sorry, to clarify, by "this" I meant "that we're firmly stuck with C for at least 11 time units". > > > That said, I do have me a patch to cure some of that: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=d7edbe431f31762e516f2730196f41322edcc621 > > > > That would allow a task with a shorter request time to preempt in spite > > of RUN_TO_PARITY. > > > > However, in this example V is only 2/3 of the way to C's deadline, but > > it we were to have many more tasks, you'll see V gets closer and closer > > to C's deadline and it will become harder and harder to place such that > > preemption becomes viable. > > > > Adding 4 more tasks: > > > > ./eevdf -e "0,1024,6" -e "1,1024,2" -e "2,1024,18" -v 19 -n -e "3,1024,2" -e "4,1024,2" -e "5,1024,2" -e "6,1024,2" > > > > t=92 V=16 > > A |----< > > B |< > > >C |----------------< > > D |< > > E |< > > F |< > > G |< > > |---------|-----*---|---------|---------|---- > > > > > > And I worry this will create very real latency spikes. > > > > That said; I do see not having the eligibility check can help. So I'm > > not opposed to having a sched_feat for this, but I would not want to > > default to EVDF.
On Thu, Oct 05, 2023 at 02:05:57PM +0200, Peter Zijlstra wrote: > t=10 V=4 t=10 V=4 > A |----< A |----< > B |< >B |< > >C |----------------< C |----------------< > |---*-----|---------|---------|---------|---- |---*-----|---------|---------|---------|---- > > > t=52 V=18 t=36 V=13 > A |----< A |----< > >B |< B |< > C |----------------< >C |----------------< > |---------|-------*-|---------|---------|---- |---------|--*------|---------|---------|---- > > > BAaaBCccccccccBBBAaaBBBAaaBB BBAaaBBBAaaBBBAaaBCccccccccB > > > > As I wrote before; EVDF has worse lag bounds, but this is not > insurmountable. The biggest problem that I can see is that of wakeup > preemption. Currently we allow to preempt when 'current' has reached V > (RUN_TO_PARITY in pick_eevdf()). > > With these rules, when EEVDF schedules C (our large slice task) at t=10 > above, it is only a little behind C and can be reaily preempted after > about 2 time units. > > However, EVDF will delay scheduling C until much later, see how A and B > walk far ahead of V until t=36. Only when will we pick C. But this means > that we're firmly stuck with C for at least 11 time units. A newly > placed task will be around V and will have no chance to preempt. Playing around with it a little: EEVDF EVDF slice 30000000 slice 30000000 # Min Latencies: 00014 # Min Latencies: 00048 # Avg Latencies: 00692 # Avg Latencies: 188239 # Max Latencies: 94633 # Max Latencies: 961241 slice 3000000 slice 3000000 # Min Latencies: 00054 # Min Latencies: 00055 # Avg Latencies: 00522 # Avg Latencies: 00673 # Max Latencies: 41475 # Max Latencies: 13297 slice 300000 slice 300000 # Min Latencies: 00018 # Min Latencies: 00024 # Avg Latencies: 00344 # Avg Latencies: 00056 # Max Latencies: 20061 # Max Latencies: 00860 So while it improves the short slices, it completely blows up the large slices -- utterly slaughters the large slices in fact. And all the many variants of BIAS_ELIGIBLE that I've tried so far only manage to murder the high end while simultaneously not actually helping the low end -- so that's a complete write off. By far the sanest option so far is PLACE_SLEEPER -- and that is very much not a nice option either :-(
On Sun, Oct 08, 2023 at 12:04:00AM +0200, Peter Zijlstra wrote: > On Thu, Oct 05, 2023 at 02:05:57PM +0200, Peter Zijlstra wrote: > > > t=10 V=4 t=10 V=4 > > A |----< A |----< > > B |< >B |< > > >C |----------------< C |----------------< > > |---*-----|---------|---------|---------|---- |---*-----|---------|---------|---------|---- > > > > > > > t=52 V=18 t=36 V=13 > > A |----< A |----< > > >B |< B |< > > C |----------------< >C |----------------< > > |---------|-------*-|---------|---------|---- |---------|--*------|---------|---------|---- > > > > > > > BAaaBCccccccccBBBAaaBBBAaaBB BBAaaBBBAaaBBBAaaBCccccccccB > > > > > > > > As I wrote before; EVDF has worse lag bounds, but this is not > > insurmountable. The biggest problem that I can see is that of wakeup > > preemption. Currently we allow to preempt when 'current' has reached V > > (RUN_TO_PARITY in pick_eevdf()). > > > > With these rules, when EEVDF schedules C (our large slice task) at t=10 > > above, it is only a little behind C and can be reaily preempted after > > about 2 time units. > > > > However, EVDF will delay scheduling C until much later, see how A and B > > walk far ahead of V until t=36. Only when will we pick C. But this means > > that we're firmly stuck with C for at least 11 time units. A newly > > placed task will be around V and will have no chance to preempt. > > Playing around with it a little: > > EEVDF EVDF > > slice 30000000 slice 30000000 > # Min Latencies: 00014 # Min Latencies: 00048 > # Avg Latencies: 00692 # Avg Latencies: 188239 > # Max Latencies: 94633 # Max Latencies: 961241 > > slice 3000000 slice 3000000 > # Min Latencies: 00054 # Min Latencies: 00055 > # Avg Latencies: 00522 # Avg Latencies: 00673 > # Max Latencies: 41475 # Max Latencies: 13297 > > slice 300000 slice 300000 > # Min Latencies: 00018 # Min Latencies: 00024 > # Avg Latencies: 00344 # Avg Latencies: 00056 > # Max Latencies: 20061 # Max Latencies: 00860 > > > So while it improves the short slices, it completely blows up the large > slices -- utterly slaughters the large slices in fact. > > And all the many variants of BIAS_ELIGIBLE that I've tried so far only > manage to murder the high end while simultaneously not actually helping > the low end -- so that's a complete write off. > > > By far the sanest option so far is PLACE_SLEEPER -- and that is very > much not a nice option either :-( And this can be easily explained by the fact that we insert tasks around 0-lag, so if we delay execution past this point we create an effective DoS window.
On Sat, Oct 7, 2023 at 5:04 PM Peter Zijlstra <peterz@infradead.org> wrote: > > On Thu, Oct 05, 2023 at 02:05:57PM +0200, Peter Zijlstra wrote: > > > t=10 V=4 t=10 V=4 > > A |----< A |----< > > B |< >B |< > > >C |----------------< C |----------------< > > |---*-----|---------|---------|---------|---- |---*-----|---------|---------|---------|---- > > > > > > > t=52 V=18 t=36 V=13 > > A |----< A |----< > > >B |< B |< > > C |----------------< >C |----------------< > > |---------|-------*-|---------|---------|---- |---------|--*------|---------|---------|---- > > > > > > > BAaaBCccccccccBBBAaaBBBAaaBB BBAaaBBBAaaBBBAaaBCccccccccB > > > > > > > > As I wrote before; EVDF has worse lag bounds, but this is not > > insurmountable. The biggest problem that I can see is that of wakeup > > preemption. Currently we allow to preempt when 'current' has reached V > > (RUN_TO_PARITY in pick_eevdf()). > > > > With these rules, when EEVDF schedules C (our large slice task) at t=10 > > above, it is only a little behind C and can be reaily preempted after > > about 2 time units. > > > > However, EVDF will delay scheduling C until much later, see how A and B > > walk far ahead of V until t=36. Only when will we pick C. But this means > > that we're firmly stuck with C for at least 11 time units. A newly > > placed task will be around V and will have no chance to preempt. > > Playing around with it a little: > > EEVDF EVDF > > slice 30000000 slice 30000000 > # Min Latencies: 00014 # Min Latencies: 00048 > # Avg Latencies: 00692 # Avg Latencies: 188239 > # Max Latencies: 94633 # Max Latencies: 961241 > > slice 3000000 slice 3000000 > # Min Latencies: 00054 # Min Latencies: 00055 > # Avg Latencies: 00522 # Avg Latencies: 00673 > # Max Latencies: 41475 # Max Latencies: 13297 > > slice 300000 slice 300000 > # Min Latencies: 00018 # Min Latencies: 00024 > # Avg Latencies: 00344 # Avg Latencies: 00056 > # Max Latencies: 20061 # Max Latencies: 00860 > Thanks for sharing. Which workload was used to generate these numbers? I think looking at the sched latency numbers alone does not show the complete picture. I ran the same input latency test again and tried to capture some of these numbers for the chrome processes. EEVDF 1.5ms slice: Input latency test result: 226ms perf sched latency: switches: 1,084,694 avg: 1.139 ms max: 408.397 ms EEVDF 6.0ms slice: Input latency test result: 178ms perf sched latency: switches: 892,306 avg: 1.145 ms max: 354.344 ms EVDF 6.0ms slice: Input latency test result: 112ms perf sched latency: switches: 134,200 avg: 2.610 ms max: 374.888 ms EVDF 6.0ms slice (no run_to_parity, no place_lag, no place_deadline_initial): Input latency test result: 110ms perf sched latency: switches: 531,656 avg: 0.830 ms max: 520.463 ms For our scenario, it is very expensive to interrupt UI threads. It will increase the input latency significantly. Lowering the scheduling latency at the cost of switching out important threads can be very detrimental in this workload. UI and input threads run with a nice value of -8. This also seems to match Daniel's message earlier in this thread where using 12ms base slice improved their benchmarks. That said, this might not be beneficial for all workloads, and we are still trying our other workloads out. > > So while it improves the short slices, it completely blows up the large > slices -- utterly slaughters the large slices in fact. > > And all the many variants of BIAS_ELIGIBLE that I've tried so far only > manage to murder the high end while simultaneously not actually helping > the low end -- so that's a complete write off. > > > By far the sanest option so far is PLACE_SLEEPER -- and that is very > much not a nice option either :-(
On Mon, Oct 09, 2023 at 07:51:03PM -0500, Youssef Esmat wrote: > > Playing around with it a little: > > > > EEVDF EVDF > > > > slice 30000000 slice 30000000 > > # Min Latencies: 00014 # Min Latencies: 00048 > > # Avg Latencies: 00692 # Avg Latencies: 188239 > > # Max Latencies: 94633 # Max Latencies: 961241 > > > > slice 3000000 slice 3000000 > > # Min Latencies: 00054 # Min Latencies: 00055 > > # Avg Latencies: 00522 # Avg Latencies: 00673 > > # Max Latencies: 41475 # Max Latencies: 13297 > > > > slice 300000 slice 300000 > > # Min Latencies: 00018 # Min Latencies: 00024 > > # Avg Latencies: 00344 # Avg Latencies: 00056 > > # Max Latencies: 20061 # Max Latencies: 00860 > > > > Thanks for sharing. Which workload was used to generate these numbers? This is hackbench vs cyclictest, where cyclictest gets a custom slice set, the big slice is 10 * normal, the middle slice is normal (equal to hackbench) and the short slice is normal / 10.
On Thu, Oct 05, 2023 at 01:23:14PM -0500, Youssef Esmat wrote: > On Thu, Oct 5, 2023 at 7:06 AM Peter Zijlstra <peterz@infradead.org> wrote: > > BAaaBCccccccccBBBAaaBBBAaaBB BBAaaBBBAaaBBBAaaBCccccccccB > > > > > > > > As I wrote before; EVDF has worse lag bounds, but this is not > > insurmountable. The biggest problem that I can see is that of wakeup > > preemption. Currently we allow to preempt when 'current' has reached V > > (RUN_TO_PARITY in pick_eevdf()). > > > > With these rules, when EEVDF schedules C (our large slice task) at t=10 > > above, it is only a little behind C and can be reaily preempted after > > about 2 time units. > > > > However, EVDF will delay scheduling C until much later, see how A and B > > walk far ahead of V until t=36. Only when will we pick C. But this means > > that we're firmly stuck with C for at least 11 time units. A newly > > placed task will be around V and will have no chance to preempt. > > > > Thank you for the detailed analysis! I am still in the process of > digesting everything. > I do have a quick question, this will only be the case if we adjust > C's runtime without adjusting nice value, correct? So it does not > currently apply to the submitted code where the only way to change the > deadline is to also change the nice value and thus how fast/slow > vruntime accumulates. In other words without the sched_runtime > patch[1] we should not run into this scenario, correct? > > [1] https://lore.kernel.org/lkml/20230915124354.416936110@noisy.programming.kicks-ass.net/ Much harder to run into it, but you can still hit it using nice. d_i = v_i + r_i / w_i So for very light tasks, the effective v_deadline goes out lots. And as I wrote yesterday, by delaying scheduling (far) past 0-lag you effectively open up a denial of service window, since new tasks are placed around 0-lag. Now, nobody will likely care much if very light tasks get horrific treatment, but still, it's there.
Sorry, I seem to have forgotten to reply to this part... On Mon, Oct 09, 2023 at 07:51:03PM -0500, Youssef Esmat wrote: > I think looking at the sched latency numbers alone does not show the > complete picture. I ran the same input latency test again and tried to > capture some of these numbers for the chrome processes. > > EEVDF 1.5ms slice: > > Input latency test result: 226ms > perf sched latency: > switches: 1,084,694 > avg: 1.139 ms > max: 408.397 ms > > EEVDF 6.0ms slice: > > Input latency test result: 178ms > perf sched latency: > switches: 892,306 > avg: 1.145 ms > max: 354.344 ms > For our scenario, it is very expensive to interrupt UI threads. It > will increase the input latency significantly. Lowering the scheduling > latency at the cost of switching out important threads can be very > detrimental in this workload. UI and input threads run with a nice > value of -8. > That said, this might not be beneficial for all workloads, and we are > still trying our other workloads out. Right, this seems to suggest something on your critical path (you should trace that) has more than 3ms of compute in a single activation. Basically this means chrome is fairly fat on this critical path. But it seems you know about that. Anyway, once you know the 95% length of the longest activation on your critical path, you can indeed set your slice to that. This should be readily available from trace data.