From patchwork Mon Mar  6 13:24:18 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Zhang Qiao <zhangqiao22@huawei.com>
X-Patchwork-Id: 64649
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1838152wrd;
        Mon, 6 Mar 2023 05:25:02 -0800 (PST)
X-Google-Smtp-Source: 
 AK7set+MIH5A0YcRLzSN5BR+wiBfRFJe9flyOTb4AP5iQoQR0vbV8VHnOdRbP/R2rTM7UP07Tx3x
X-Received: by 2002:aa7:d390:0:b0:4bd:8714:cc59 with SMTP id
 x16-20020aa7d390000000b004bd8714cc59mr10060632edq.37.1678109102038;
        Mon, 06 Mar 2023 05:25:02 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1678109102; cv=none;
        d=google.com; s=arc-20160816;
        b=noFplffspa9wyEcg0zBwcQV4JZtQHRBOJjdl5cJybuLXMLyz26zpwHTt850K9N9O8B
         Ep8uUIYaDA2vwEzVY0YmASPkLS5I4LEtZxENgpLCfe9WnWcxm7A9+cXPagXWEfZEHJJv
         82skq76SDWdlTH+Oyb3wPuvhDzzsfuys6+dguc2XgkB8V5kJZZ7s1FD0uyVI5WtIbqc8
         hCCU3pkMZ2hqsD4xhpN1blGntBLw4BwzqWYTXNaZtaQtXaF5LzLsl1TYB2qe4/BftTw0
         o+l6KGA0/K1qjCekVAi5G0fMpXI9ZLLEocL2+Sct89Zthr4wK0FaLnArh/d8f39p4by7
         g89Q==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:mime-version:message-id:date:subject:cc:to:from;
        bh=cAMpeauQyiu0QV2b9mJinnhezS6WP8zI+jeoN1SOhWA=;
        b=SIVP9QVPfq6eZPFqtRA0RgVK8Pt3oNuMHdIjbDR1ErBO1wiCnGZB1klkQY8qLxo8gz
         6eWiO4jENLqAZx/hKPv1B7XNk+VIm4AR05mW6wc3kSNY6KJ5QYa9Dmf1DxGfq1TPdROf
         wG8otXg0Rpr/H6EQcmhLbmUcoU+FNKBG4I41k+EKFRGnBvijDWZdsnNhNIj43rWoP2Dj
         Hn7AIwKvIVwgnIapH4Mk8EBgfSNMENPvrzRK68bw5VCGxOcQpW1MPa2eeqYiraRC/Cu/
         0gkefNbvcb5ciAfV0E9atrwkQtIDXsad+g0zE7RxJWcRMeI4u5RqRGMyKW1YN3hCel1h
         c6wQ==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 c16-20020a056402121000b004a99cdae8b5si10562681edw.122.2023.03.06.05.24.38;
        Mon, 06 Mar 2023 05:25:02 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230362AbjCFM5H (ORCPT <rfc822;toshivichauhan@gmail.com>
        + 99 others); Mon, 6 Mar 2023 07:57:07 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56704 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230072AbjCFM5F (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 6 Mar 2023 07:57:05 -0500
Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 373FF2BEEE
        for <linux-kernel@vger.kernel.org>;
 Mon,  6 Mar 2023 04:57:04 -0800 (PST)
Received: from dggpeml500018.china.huawei.com (unknown [172.30.72.54])
        by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4PVdpB29njzKq4F;
        Mon,  6 Mar 2023 20:54:58 +0800 (CST)
Received: from huawei.com (10.67.174.191) by dggpeml500018.china.huawei.com
 (7.185.36.186) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.21; Mon, 6 Mar
 2023 20:57:01 +0800
From: Zhang Qiao <zhangqiao22@huawei.com>
To: <linux-kernel@vger.kernel.org>
CC: <mingo@redhat.com>, <peterz@infradead.org>,
        <juri.lelli@redhat.com>, <vincent.guittot@linaro.org>,
        <dietmar.eggemann@arm.com>, <rostedt@goodmis.org>,
        <bsegall@google.com>, <mgorman@suse.de>, <bristot@redhat.com>,
        <vschneid@redhat.com>, <rkagan@amazon.de>, <zhangqiao22@huawei.com>
Subject: [PATCH v2] sched/fair: sanitize vruntime of entity being migrated
Date: Mon, 6 Mar 2023 21:24:18 +0800
Message-ID: <20230306132418.50389-1-zhangqiao22@huawei.com>
X-Mailer: git-send-email 2.17.1
MIME-Version: 1.0
X-Originating-IP: [10.67.174.191]
X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To
 dggpeml500018.china.huawei.com (7.185.36.186)
X-CFilter-Loop: Reflected
X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED,
        SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1759426169413879730?=
X-GMAIL-MSGID: =?utf-8?q?1759624929600469434?=

Commit 829c1651e9c4 ("sched/fair: sanitize vruntime of
entity being placed") fix an overflowing bug, but ignore
a case that se->exec_start is reset after a migration.

For fixing this case, we reset the vruntime of a long
sleeping task in migrate_task_rq_fair().

Fixes: 829c1651e9c4 ("sched/fair: sanitize vruntime of entity being placed")
Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
---

v1 -> v2:
- fix some typos and update comments
- reformat the patch

---
 kernel/sched/fair.c | 76 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 55 insertions(+), 21 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7a1b1f855b96..74c9918ffe76 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4648,11 +4648,45 @@ static void check_spread(struct cfs_rq *cfs_rq, struct sched_entity *se)
 #endif
 }
 
+static inline bool entity_is_long_sleep(struct sched_entity *se)
+{
+	struct cfs_rq *cfs_rq;
+	u64 sleep_time;
+
+	if (se->exec_start == 0)
+		return false;
+
+	cfs_rq = cfs_rq_of(se);
+	sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
+	if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
+		return true;
+
+	return false;
+}
+
+static inline u64 sched_sleeper_credit(struct sched_entity *se)
+{
+	unsigned long thresh;
+
+	if (se_is_idle(se))
+		thresh = sysctl_sched_min_granularity;
+	else
+		thresh = sysctl_sched_latency;
+
+	/*
+	 * Halve their sleep time's effect, to allow
+	 * for a gentler effect of sleepers:
+	 */
+	if (sched_feat(GENTLE_FAIR_SLEEPERS))
+		thresh >>= 1;
+
+	return thresh;
+}
+
 static void
 place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
 {
 	u64 vruntime = cfs_rq->min_vruntime;
-	u64 sleep_time;
 
 	/*
 	 * The 'current' period is already promised to the current tasks,
@@ -4664,23 +4698,8 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
 		vruntime += sched_vslice(cfs_rq, se);
 
 	/* sleeps up to a single latency don't count. */
-	if (!initial) {
-		unsigned long thresh;
-
-		if (se_is_idle(se))
-			thresh = sysctl_sched_min_granularity;
-		else
-			thresh = sysctl_sched_latency;
-
-		/*
-		 * Halve their sleep time's effect, to allow
-		 * for a gentler effect of sleepers:
-		 */
-		if (sched_feat(GENTLE_FAIR_SLEEPERS))
-			thresh >>= 1;
-
-		vruntime -= thresh;
-	}
+	if (!initial)
+		vruntime -= sched_sleeper_credit(se);
 
 	/*
 	 * Pull vruntime of the entity being placed to the base level of
@@ -4689,8 +4708,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
 	 * the base as it may be too far off and the comparison may get
 	 * inversed due to s64 overflow.
 	 */
-	sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
-	if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
+	if (entity_is_long_sleep(se))
 		se->vruntime = vruntime;
 	else
 		se->vruntime = max_vruntime(se->vruntime, vruntime);
@@ -7635,7 +7653,23 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
 	if (READ_ONCE(p->__state) == TASK_WAKING) {
 		struct cfs_rq *cfs_rq = cfs_rq_of(se);
 
-		se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
+		/*
+		 * We determine whether a task sleeps for long by checking
+		 * se->exec_start, and if it is, we sanitize its vruntime at
+		 * place_entity(). However, after a migration, this detection
+		 * method fails due to se->exec_start being reset.
+		 *
+		 * For fixing this case, we add the same check here. For a task
+		 * which has slept for a long time, its vruntime should be reset
+		 * to cfs_rq->min_vruntime with a sleep credit. Because waking
+		 * task's vruntime will be added to cfs_rq->min_vruntime when
+		 * enqueue, we only need to reset the se->vruntime of waking task
+		 * to a credit here.
+		 */
+		if (entity_is_long_sleep(se))
+			se->vruntime = -sched_sleeper_credit(se);
+		else
+			se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
 	}
 
 	if (!task_on_rq_migrating(p)) {