From patchwork Tue Nov 14 21:57:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: tip-bot2 for Thomas Gleixner X-Patchwork-Id: 165106 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6358:a59:b0:164:83eb:24d7 with SMTP id 25csp2226963rwb; Tue, 14 Nov 2023 13:58:30 -0800 (PST) X-Google-Smtp-Source: AGHT+IGBD1kPF5SGddC/sHaoxDdarteQ6KYrn7wXqOXIFcsPsPa7wRUObMgUyN/JkUcF990erI5c X-Received: by 2002:a05:6830:20b:b0:6d6:4c54:72e4 with SMTP id em11-20020a056830020b00b006d64c5472e4mr3501462otb.36.1699999109986; Tue, 14 Nov 2023 13:58:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699999109; cv=none; d=google.com; s=arc-20160816; b=DX54iTCA05GiItlUP4951LHQrK9e7F/TSHcXbi//UwxQydO2mHRbQI0yg0E9AkEduq ia3Dv8bE6o0v8/9LtRiQq+RHNBe8T8PAYK6yoiL3dyIqabXik3OpQfV3OlDQBVSsL5WC G5yQx8hkpyqWPBK+1gVkJ9s6w20DXc90t0N/CLRZU5+plnAPmHN5LWazC8XrwcJJFK7R Hds69v+4nWcVM14NrYkdsH+qCObRQ/Ml448Nw6uEXqlGL5dEcNEsY30VixzKE/0MJuwC NF+YQu8mtBHhxgrMiU7bI7AEKXBCMm1opKnJgdxnbLnpTmfEIjsFShumbSpcb5GeHEkd VHYA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:robot-unsubscribe :robot-id:message-id:mime-version:references:in-reply-to:cc:subject :to:reply-to:sender:from:dkim-signature:dkim-signature:date; bh=CUJpXWysZAAxb9xNg7ygyvbvxjeUdaOSci8Z2b9XVsk=; fh=YIX/3xudK9Lq2PV2FGrJBffVY0XIymHnWmj078pIWkQ=; b=TJaSMoQi4yG/BEHRatv8RDMCtO+jMTCHFQPflZ2CHQy5cKyrpdQYTvmIb2lLTYpY5M zSSUC4zbnjmq10tQALuXVNdyWY8u2nsBWe0Kl3UYW9mZ1opN4AoeGq5IUQMqKVDxWaOp c0GE4s5iRu89pf5lDInJw8/2jv5/XSNWtuGQmp0Opx/8SgYDVHhHxN4AFd6hdF7JO0Lz 5CQxzFr825OP7PQrzK5lGOU8TzyhLi07tiHzyob56r9SUnKmopLgWOngxn+xOr1aLIU2 1vgjrkQLhUV5UusRYoJcxOyxCW2sfv68apl09u2HzMwmYCh3GUKAdWlQKbRL6WxyWPIx ckyQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=pN4oIfHA; dkim=neutral (no key) header.i=@linutronix.de header.b=wDeJfjQn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id bn21-20020a056a02031500b005bd42f65dc5si9106089pgb.285.2023.11.14.13.58.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Nov 2023 13:58:29 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=pN4oIfHA; dkim=neutral (no key) header.i=@linutronix.de header.b=wDeJfjQn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id F211280239E5; Tue, 14 Nov 2023 13:58:26 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234133AbjKNV5Z (ORCPT + 29 others); Tue, 14 Nov 2023 16:57:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52476 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233873AbjKNV5T (ORCPT ); Tue, 14 Nov 2023 16:57:19 -0500 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B211DD6; Tue, 14 Nov 2023 13:57:15 -0800 (PST) Date: Tue, 14 Nov 2023 21:57:13 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1699999034; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CUJpXWysZAAxb9xNg7ygyvbvxjeUdaOSci8Z2b9XVsk=; b=pN4oIfHA1tc9V8WVcyCOXXys4vdfCsNcas7u9CnptvQQAoa3S4+XiyuQnqPFl0yi9Bq+pR EccmPt4GoVUfihsRMYfBxemZ3cPlLyp3Gu1XYdVmg7Y1BiTds22MwZlC/+6IOo3mqBSwwD T5yJ14VENbN+QPt1Q899CIs1oaNgns2OiwrzrfsewjDbh3y5ctEEYDrPcDP6TYCh3gy+w5 cAoY2o7tTTPh4ustE55laYqPzP2vwE5Z4FAMg1wAt8JZtGhJWPYlpR+r1iSueeS0hhAsW/ 9eDR3N6G5fAEPAWO+xxB/85wTYecKd3iXMzuAyySabLo6oEBF7lXq1elB0swpA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1699999034; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CUJpXWysZAAxb9xNg7ygyvbvxjeUdaOSci8Z2b9XVsk=; b=wDeJfjQn82EgJ89yjrwdRFjOC3A+7HHrClYa3gUlL8v6JV+qmhez76oZQ2RTMFZA34RgPP S3g15P27ZdUzeUAg== From: "tip-bot2 for Abel Wu" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: sched/urgent] sched/eevdf: Fix vruntime adjustment on reweight Cc: Abel Wu , "Peter Zijlstra (Intel)" , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20231107090510.71322-2-wuyun.abel@bytedance.com> References: <20231107090510.71322-2-wuyun.abel@bytedance.com> MIME-Version: 1.0 Message-ID: <169999903332.391.13801155435294949562.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Tue, 14 Nov 2023 13:58:27 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781895589928971159 X-GMAIL-MSGID: 1782578266630211866 The following commit has been merged into the sched/urgent branch of tip: Commit-ID: eab03c23c2a162085b13200d7942fc5a00b5ccc8 Gitweb: https://git.kernel.org/tip/eab03c23c2a162085b13200d7942fc5a00b5ccc8 Author: Abel Wu AuthorDate: Tue, 07 Nov 2023 17:05:07 +08:00 Committer: Peter Zijlstra CommitterDate: Tue, 14 Nov 2023 22:27:00 +01:00 sched/eevdf: Fix vruntime adjustment on reweight vruntime of the (on_rq && !0-lag) entity needs to be adjusted when it gets re-weighted, and the calculations can be simplified based on the fact that re-weight won't change the w-average of all the entities. Please check the proofs in comments. But adjusting vruntime can also cause position change in RB-tree hence require re-queue to fix up which might be costly. This might be avoided by deferring adjustment to the time the entity actually leaves tree (dequeue/pick), but that will negatively affect task selection and probably not good enough either. Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling policy") Signed-off-by: Abel Wu Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20231107090510.71322-2-wuyun.abel@bytedance.com --- kernel/sched/fair.c | 151 ++++++++++++++++++++++++++++++++++++------- 1 file changed, 128 insertions(+), 23 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 2048138..025d909 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3666,41 +3666,140 @@ static inline void dequeue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) { } #endif +static void reweight_eevdf(struct cfs_rq *cfs_rq, struct sched_entity *se, + unsigned long weight) +{ + unsigned long old_weight = se->load.weight; + u64 avruntime = avg_vruntime(cfs_rq); + s64 vlag, vslice; + + /* + * VRUNTIME + * ======== + * + * COROLLARY #1: The virtual runtime of the entity needs to be + * adjusted if re-weight at !0-lag point. + * + * Proof: For contradiction assume this is not true, so we can + * re-weight without changing vruntime at !0-lag point. + * + * Weight VRuntime Avg-VRuntime + * before w v V + * after w' v' V' + * + * Since lag needs to be preserved through re-weight: + * + * lag = (V - v)*w = (V'- v')*w', where v = v' + * ==> V' = (V - v)*w/w' + v (1) + * + * Let W be the total weight of the entities before reweight, + * since V' is the new weighted average of entities: + * + * V' = (WV + w'v - wv) / (W + w' - w) (2) + * + * by using (1) & (2) we obtain: + * + * (WV + w'v - wv) / (W + w' - w) = (V - v)*w/w' + v + * ==> (WV-Wv+Wv+w'v-wv)/(W+w'-w) = (V - v)*w/w' + v + * ==> (WV - Wv)/(W + w' - w) + v = (V - v)*w/w' + v + * ==> (V - v)*W/(W + w' - w) = (V - v)*w/w' (3) + * + * Since we are doing at !0-lag point which means V != v, we + * can simplify (3): + * + * ==> W / (W + w' - w) = w / w' + * ==> Ww' = Ww + ww' - ww + * ==> W * (w' - w) = w * (w' - w) + * ==> W = w (re-weight indicates w' != w) + * + * So the cfs_rq contains only one entity, hence vruntime of + * the entity @v should always equal to the cfs_rq's weighted + * average vruntime @V, which means we will always re-weight + * at 0-lag point, thus breach assumption. Proof completed. + * + * + * COROLLARY #2: Re-weight does NOT affect weighted average + * vruntime of all the entities. + * + * Proof: According to corollary #1, Eq. (1) should be: + * + * (V - v)*w = (V' - v')*w' + * ==> v' = V' - (V - v)*w/w' (4) + * + * According to the weighted average formula, we have: + * + * V' = (WV - wv + w'v') / (W - w + w') + * = (WV - wv + w'(V' - (V - v)w/w')) / (W - w + w') + * = (WV - wv + w'V' - Vw + wv) / (W - w + w') + * = (WV + w'V' - Vw) / (W - w + w') + * + * ==> V'*(W - w + w') = WV + w'V' - Vw + * ==> V' * (W - w) = (W - w) * V (5) + * + * If the entity is the only one in the cfs_rq, then reweight + * always occurs at 0-lag point, so V won't change. Or else + * there are other entities, hence W != w, then Eq. (5) turns + * into V' = V. So V won't change in either case, proof done. + * + * + * So according to corollary #1 & #2, the effect of re-weight + * on vruntime should be: + * + * v' = V' - (V - v) * w / w' (4) + * = V - (V - v) * w / w' + * = V - vl * w / w' + * = V - vl' + */ + if (avruntime != se->vruntime) { + vlag = (s64)(avruntime - se->vruntime); + vlag = div_s64(vlag * old_weight, weight); + se->vruntime = avruntime - vlag; + } + + /* + * DEADLINE + * ======== + * + * When the weight changes, the virtual time slope changes and + * we should adjust the relative virtual deadline accordingly. + * + * d' = v' + (d - v)*w/w' + * = V' - (V - v)*w/w' + (d - v)*w/w' + * = V - (V - v)*w/w' + (d - v)*w/w' + * = V + (d - V)*w/w' + */ + vslice = (s64)(se->deadline - avruntime); + vslice = div_s64(vslice * old_weight, weight); + se->deadline = avruntime + vslice; +} + static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, unsigned long weight) { - unsigned long old_weight = se->load.weight; + bool curr = cfs_rq->curr == se; if (se->on_rq) { /* commit outstanding execution time */ - if (cfs_rq->curr == se) + if (curr) update_curr(cfs_rq); else - avg_vruntime_sub(cfs_rq, se); + __dequeue_entity(cfs_rq, se); update_load_sub(&cfs_rq->load, se->load.weight); } dequeue_load_avg(cfs_rq, se); - update_load_set(&se->load, weight); - if (!se->on_rq) { /* * Because we keep se->vlag = V - v_i, while: lag_i = w_i*(V - v_i), * we need to scale se->vlag when w_i changes. */ - se->vlag = div_s64(se->vlag * old_weight, weight); + se->vlag = div_s64(se->vlag * se->load.weight, weight); } else { - s64 deadline = se->deadline - se->vruntime; - /* - * When the weight changes, the virtual time slope changes and - * we should adjust the relative virtual deadline accordingly. - */ - deadline = div_s64(deadline * old_weight, weight); - se->deadline = se->vruntime + deadline; - if (se != cfs_rq->curr) - min_deadline_cb_propagate(&se->run_node, NULL); + reweight_eevdf(cfs_rq, se, weight); } + update_load_set(&se->load, weight); + #ifdef CONFIG_SMP do { u32 divider = get_pelt_divider(&se->avg); @@ -3712,8 +3811,17 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, enqueue_load_avg(cfs_rq, se); if (se->on_rq) { update_load_add(&cfs_rq->load, se->load.weight); - if (cfs_rq->curr != se) - avg_vruntime_add(cfs_rq, se); + if (!curr) { + /* + * The entity's vruntime has been adjusted, so let's check + * whether the rq-wide min_vruntime needs updated too. Since + * the calculations above require stable min_vruntime rather + * than up-to-date one, we do the update at the end of the + * reweight process. + */ + __enqueue_entity(cfs_rq, se); + update_min_vruntime(cfs_rq); + } } } @@ -3857,14 +3965,11 @@ static void update_cfs_group(struct sched_entity *se) #ifndef CONFIG_SMP shares = READ_ONCE(gcfs_rq->tg->shares); - - if (likely(se->load.weight == shares)) - return; #else - shares = calc_group_shares(gcfs_rq); + shares = calc_group_shares(gcfs_rq); #endif - - reweight_entity(cfs_rq_of(se), se, shares); + if (unlikely(se->load.weight != shares)) + reweight_entity(cfs_rq_of(se), se, shares); } #else /* CONFIG_FAIR_GROUP_SCHED */