Message ID | 20240228161018.14253-1-huschle@linux.ibm.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-85326-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:a81b:b0:108:e6aa:91d0 with SMTP id bq27csp3463305dyb; Wed, 28 Feb 2024 08:36:12 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCXbZQNmjFqcNYGjnpnNsplt46CTGPli6py3NHABArlhXS5KXFX6LADcsR6zroHBJk6H69b5XX3l17eajL63Jw1bGA0YBg== X-Google-Smtp-Source: AGHT+IFFx+r9rHjqVMCTm0mDwQfbogyhpcOiyauaMZZx/bjeG2zi/D16f5J76skPsNY2J75YgbOv X-Received: by 2002:a05:6a21:1518:b0:1a0:e9ad:7f29 with SMTP id nq24-20020a056a21151800b001a0e9ad7f29mr6679951pzb.6.1709138171963; Wed, 28 Feb 2024 08:36:11 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709138171; cv=pass; d=google.com; s=arc-20160816; b=y56+ZOjYGyTb1j7hlDOgpArmoDc733NrYBzWoLDYraXsnQTqqYcoeblStufxjrz6U7 C2nXn8ti3gC5AP3PVvMEdR6R/qIPHkrHRUDK92EpN1xMR6+03R1hirK+wX5TWc4GF+8Y Uagomb/r97o38bAP7pQfZl7OMo9Yn7BQXPbAPoA2T+XFAtgvS3PWjDOB6LMwllLHWtN/ NMYH96Os1nmIIob+Le0PMljcES44exOWeF9cTM3kxx0zk1QbB05vuOEcZjk6Z/Kn/Vcg lcERKtIrBVWDwravYM+r5VnjCHPor1m5yQyoVgEfMhLhdE9wGe2VWsFdFCD1i63PuN8X hGvQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :content-transfer-encoding:message-id:date:subject:cc:to:from :dkim-signature; bh=lFzZ+uKK9jWdHGRqBfOIQfaRRz7nH+OApMLXZqmSw38=; fh=m12EPPsGdC0kQX+lv57x5lcoSp0gpVVs9YLWG0bJAyo=; b=LElbDPR84qDeWxVNfT6Xhea8WOMAN2BP/mIZ3KRL+1xnjdDqsQpZA9zQkPLwEQaED1 Cww2y42gaHiBxfMyhfv2vp0oTBSUOIJJEdvNhUdwOt9oXubp5fHsjHBVp/xLb7QplxHO kLO8TIPGZ/NcPhUtCqo9faLD4RGgyyBpX758RCn7yl0vI6L+uf4bQ8m9XVDzSmWeyE9B WnkU8f5fmwng2PxhjJbuOsl2fC1C32z8DOoBg/lK7k2Rm/5l8//LEeZKTPvsVYQn8RLN bsbAKjVMuD9akteNBgdnL0508OY0H/9LE4jYQ7088jPKCv0BWwFejsVu3gNLx0B0vpDM aCvQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=FE13DZ76; arc=pass (i=1 spf=pass spfdomain=linux.ibm.com dkim=pass dkdomain=ibm.com dmarc=pass fromdomain=linux.ibm.com); spf=pass (google.com: domain of linux-kernel+bounces-85326-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-85326-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id p2-20020a635b02000000b005d8e401b4a2si7669805pgb.801.2024.02.28.08.36.11 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 08:36:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-85326-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=FE13DZ76; arc=pass (i=1 spf=pass spfdomain=linux.ibm.com dkim=pass dkdomain=ibm.com dmarc=pass fromdomain=linux.ibm.com); spf=pass (google.com: domain of linux-kernel+bounces-85326-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-85326-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id E2874B264B0 for <ouuuleilei@gmail.com>; Wed, 28 Feb 2024 16:11:00 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6772515DBBB; Wed, 28 Feb 2024 16:10:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="FE13DZ76" Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 76FA615CD4A for <linux-kernel@vger.kernel.org>; Wed, 28 Feb 2024 16:10:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709136644; cv=none; b=YfzONc39owFKOnzd73cRnJt8iLRV6FJV5XOAH71jZlehNrRBfuZ85Z9CmYl6LQkDc4tqA2i9G8TGEzQULN4sfMvXSBdqTQtMhzMN9MIeiAxFAKkcSBxJcof+2rY9bnlYNuOV7x5FMVr5LBVTOPChdOM8FNosYlKg8XpYEU68rfM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709136644; c=relaxed/simple; bh=fe8lOuxEMj1yMsRcOiFuSFp7gwfqXyve/jnu+J2Xg9k=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=rIwaAa7oTDOMqwY0C9u5mVBNH342UT9XZvrBshaukOvt6Fb6GEC5gcGvrUcS3Q/ue6+MUxq4Lyhjnr8bivOJsIKShOEO5jnLLvHmETqwkMz5ze5Z1/8nAoly2cVQiDfgBF9raQ0PR5Tw4KEpAnMg8FybhEaGIbXm8j9646UO+xI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=FE13DZ76; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 41SG00j7011011; Wed, 28 Feb 2024 16:10:26 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : content-transfer-encoding : mime-version; s=pp1; bh=lFzZ+uKK9jWdHGRqBfOIQfaRRz7nH+OApMLXZqmSw38=; b=FE13DZ76EJt6cOMeazEmd3luq6vjhV1izqWqqFILHjRq7J+wPBPkU41f7W/i/duxxU6O F2SyR82gdgLu5XjEBovjTiT3myUYOwsML3tqYv8baQLPqhXVU71w3B4e+MlaZ0zBUrYu V+3Cad3XXKgLGF1i5hT0AUODMncaYo4+4Akv+GAU5+2REpdRhXCakcyHb8FVXBdyeXaU Thlx5qYEh3ORDsYFaNZbK4MGXqWmcLD0jpAfmRk9Z4c7clnFtjpdebhdAkZLeW8Gq/q7 7wZHpZGxMpc/e6QzHbpH5qkgE3AxYap9ckUrFD+b5XCFXVv+eXhnYdQciZuMwVSUPGhx Jg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3wj7ybge9w-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 28 Feb 2024 16:10:25 +0000 Received: from m0353725.ppops.net (m0353725.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 41SFxwRa010832; Wed, 28 Feb 2024 16:10:25 GMT Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3wj7ybge9b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 28 Feb 2024 16:10:25 +0000 Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 41SDe9qQ008782; Wed, 28 Feb 2024 16:10:24 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 3wftstqkd1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 28 Feb 2024 16:10:24 +0000 Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 41SGAKGt28574014 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 28 Feb 2024 16:10:22 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 80D3C2005A; Wed, 28 Feb 2024 16:10:20 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3FF9D2004F; Wed, 28 Feb 2024 16:10:20 +0000 (GMT) Received: from localhost.localdomain (unknown [9.171.184.61]) by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 28 Feb 2024 16:10:20 +0000 (GMT) From: Tobias Huschle <huschle@linux.ibm.com> To: linux-kernel@vger.kernel.org Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, sshegde@linux.vnet.ibm.com, srikar@linux.vnet.ibm.com, linuxppc-dev@lists.ozlabs.org Subject: [RFC] sched/eevdf: sched feature to dismiss lag on wakeup Date: Wed, 28 Feb 2024 17:10:18 +0100 Message-Id: <20240228161018.14253-1-huschle@linux.ibm.com> X-Mailer: git-send-email 2.34.1 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: U_5Hm9dPXdMRVcUqEwgF8Jr34BZT4MAW X-Proofpoint-ORIG-GUID: htKPM3LVBjGR6p1vhcpzqx_CsnXtnahg Content-Transfer-Encoding: 8bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-02-28_08,2024-02-27_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 lowpriorityscore=0 mlxlogscore=999 adultscore=0 clxscore=1011 suspectscore=0 phishscore=0 impostorscore=0 mlxscore=0 spamscore=0 malwarescore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2402280126 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792161268087144220 X-GMAIL-MSGID: 1792161268087144220 |
Series |
[RFC] sched/eevdf: sched feature to dismiss lag on wakeup
|
|
Commit Message
Tobias Huschle
Feb. 28, 2024, 4:10 p.m. UTC
The previously used CFS scheduler gave tasks that were woken up an
enhanced chance to see runtime immediately by deducting a certain value
from its vruntime on runqueue placement during wakeup.
This property was used by some, at least vhost, to ensure, that certain
kworkers are scheduled immediately after being woken up. The EEVDF
scheduler, does not support this so far. Instead, if such a woken up
entitiy carries a negative lag from its previous execution, it will have
to wait for the current time slice to finish, which affects the
performance of the process expecting the immediate execution negatively.
To address this issue, implement EEVDF strategy #2 for rejoining
entities, which dismisses the lag from previous execution and allows
the woken up task to run immediately (if no other entities are deemed
to be preferred for scheduling by EEVDF).
The vruntime is decremented by an additional value of 1 to make sure,
that the woken up tasks gets to actually run. This is of course not
following strategy #2 in an exact manner but guarantees the expected
behavior for the scenario described above. Without the additional
decrement, the performance goes south even more. So there are some
side effects I could not get my head around yet.
Questions:
1. The kworker getting its negative lag occurs in the following scenario
- kworker and a cgroup are supposed to execute on the same CPU
- one task within the cgroup is executing and wakes up the kworker
- kworker with 0 lag, gets picked immediately and finishes its
execution within ~5000ns
- on dequeue, kworker gets assigned a negative lag
Is this expected behavior? With this short execution time, I would
expect the kworker to be fine.
For a more detailed discussion on this symptom, please see:
https://lore.kernel.org/netdev/ZWbapeL34Z8AMR5f@DESKTOP-2CCOB1S./T/
2. The proposed code change of course only addresses the symptom. Am I
assuming correctly that this is in general the exepected behavior and
that the task waking up the kworker should rather do an explicit
reschedule of itself to grant the kworker time to execute?
In the vhost case, this is currently attempted through a cond_resched
which is not doing anything because the need_resched flag is not set.
Feedback and opinions would be highly appreciated.
Signed-off-by: Tobias Huschle <huschle@linux.ibm.com>
---
kernel/sched/fair.c | 5 +++++
kernel/sched/features.h | 1 +
2 files changed, 6 insertions(+)
Comments
(+ Xuewen Yan, Ke Wang) Hello Tobias, On 2/28/2024 9:40 PM, Tobias Huschle wrote: > The previously used CFS scheduler gave tasks that were woken up an > enhanced chance to see runtime immediately by deducting a certain value > from its vruntime on runqueue placement during wakeup. > > This property was used by some, at least vhost, to ensure, that certain > kworkers are scheduled immediately after being woken up. The EEVDF > scheduler, does not support this so far. Instead, if such a woken up > entitiy carries a negative lag from its previous execution, it will have > to wait for the current time slice to finish, which affects the > performance of the process expecting the immediate execution negatively. > > To address this issue, implement EEVDF strategy #2 for rejoining > entities, which dismisses the lag from previous execution and allows > the woken up task to run immediately (if no other entities are deemed > to be preferred for scheduling by EEVDF). > > The vruntime is decremented by an additional value of 1 to make sure, > that the woken up tasks gets to actually run. This is of course not > following strategy #2 in an exact manner but guarantees the expected > behavior for the scenario described above. Without the additional > decrement, the performance goes south even more. So there are some > side effects I could not get my head around yet. > > Questions: > 1. The kworker getting its negative lag occurs in the following scenario > - kworker and a cgroup are supposed to execute on the same CPU > - one task within the cgroup is executing and wakes up the kworker > - kworker with 0 lag, gets picked immediately and finishes its > execution within ~5000ns > - on dequeue, kworker gets assigned a negative lag > Is this expected behavior? With this short execution time, I would > expect the kworker to be fine. > For a more detailed discussion on this symptom, please see: > https://lore.kernel.org/netdev/ZWbapeL34Z8AMR5f@DESKTOP-2CCOB1S./T/ Does the lag clamping path from Xuewen Yan [1] work for the vhost case mentioned in the thread? Instead of placing the task just behind the 0-lag point, clamping the lag seems to be more principled approach since EEVDF already does it in update_entity_lag(). If the lag is still too large, maybe the above coupled with Peter's delayed dequeue patch can help [2] (Note: tree is prone to force updates) [1] https://lore.kernel.org/lkml/20240130080643.1828-1-xuewen.yan@unisoc.com/ [2] https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=e62ef63a888c97188a977daddb72b61548da8417 > 2. The proposed code change of course only addresses the symptom. Am I > assuming correctly that this is in general the exepected behavior and > that the task waking up the kworker should rather do an explicit > reschedule of itself to grant the kworker time to execute? > In the vhost case, this is currently attempted through a cond_resched > which is not doing anything because the need_resched flag is not set. > > Feedback and opinions would be highly appreciated. > > Signed-off-by: Tobias Huschle <huschle@linux.ibm.com> > --- > kernel/sched/fair.c | 5 +++++ > kernel/sched/features.h | 1 + > 2 files changed, 6 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 533547e3c90a..c20ae6d62961 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -5239,6 +5239,11 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) > lag = div_s64(lag, load); > } > > + if (sched_feat(NOLAG_WAKEUP) && (flags & ENQUEUE_WAKEUP)) { > + se->vlag = 0; > + lag = 1; > + } > + > se->vruntime = vruntime - lag; > > /* > diff --git a/kernel/sched/features.h b/kernel/sched/features.h > index 143f55df890b..d3118e7568b4 100644 > --- a/kernel/sched/features.h > +++ b/kernel/sched/features.h > @@ -7,6 +7,7 @@ > SCHED_FEAT(PLACE_LAG, true) > SCHED_FEAT(PLACE_DEADLINE_INITIAL, true) > SCHED_FEAT(RUN_TO_PARITY, true) > +SCHED_FEAT(NOLAG_WAKEUP, true) > > /* > * Prefer to schedule the task we woke last (assuming it failed -- Thanks and Regards, Prateek
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 533547e3c90a..c20ae6d62961 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5239,6 +5239,11 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) lag = div_s64(lag, load); } + if (sched_feat(NOLAG_WAKEUP) && (flags & ENQUEUE_WAKEUP)) { + se->vlag = 0; + lag = 1; + } + se->vruntime = vruntime - lag; /* diff --git a/kernel/sched/features.h b/kernel/sched/features.h index 143f55df890b..d3118e7568b4 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -7,6 +7,7 @@ SCHED_FEAT(PLACE_LAG, true) SCHED_FEAT(PLACE_DEADLINE_INITIAL, true) SCHED_FEAT(RUN_TO_PARITY, true) +SCHED_FEAT(NOLAG_WAKEUP, true) /* * Prefer to schedule the task we woke last (assuming it failed