Message ID | 20240205022500.2232124-1-qyousef@layalina.io |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-51990-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:168b:b0:106:860b:bbdd with SMTP id ma11csp628253dyb; Sun, 4 Feb 2024 18:25:38 -0800 (PST) X-Google-Smtp-Source: AGHT+IGpYMGmq2V3fpxd4ZqHWlxf53HrQ6DG4chewgzjplSeyqbl7CbVus6aE8ijnGtmp2lvJMny X-Received: by 2002:a05:622a:44:b0:42c:28c2:805 with SMTP id y4-20020a05622a004400b0042c28c20805mr597097qtw.42.1707099938828; Sun, 04 Feb 2024 18:25:38 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707099938; cv=pass; d=google.com; s=arc-20160816; b=Xf7FVm5Ppr4E8pnZcwRjVyV6d83xVFsK1mQWhLkqmyfhqcEKZW8orA4E12WmKKooFq HYO1nt656N5ZrfwzW7C9vqLzOq1zwsZRidLCUvY0WNCNT2jWvAnhgAOQT8IBGvILX/Pg v57RDnODraFA2qd9bj5vIhAZyZ4dkHEqay5/qWzyZUfuw4x3kIGAiuV2zpBZXy7xS+oG EMoLxRXFoFZObzRJOlRi01keegBbx72VYYHABqbj2ECUrjME+Opxg3ocA98xS9XPfxFQ An+2GD6c5xHiiWbCP8b7AoNiZ9nbybm+kmqYN77Ua6GWq5oZW1MYS4eI4ad5PF8D1wc4 L3rg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=w8scmLEsyzGxoS8Gc0uCCvoQDZ8pPxzLZfnYWsrBWUA=; fh=VtEHNxNRL0NXxFkHE2MLYSxX1SUEfFTSDYSPALbIUeo=; b=JP8jjgXn2Q8TZRXigogVdMXkqKhB+V1vSbhUP40AkfIL0bd9kTTXx16cA0PHSBfZjx ZnzFYF/EiQqQZawAk7WsyTDaoQppCXFwTt0Snb8Oztx0gG8IevqqgUCzBJnWjNcmJ8oI jNqLxFJqPkCYwDub7XIlBbqkk2guAMnujGG14pfAS9oNEMuiNush8uZj+3WcYJf+XiLh dnKoZd62NhJvpL8R37ajZTo0ChBk4AghuXtHsnLSXCo4xROFvpUPEfO+pxDaKH+xkRlW crhA9IJzseN6r2NtvILZWoX85OAOfZVsxDesAIBkA/N2Ju1oe1163XhMnU4GVPMrlMDL +ApA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@layalina-io.20230601.gappssmtp.com header.s=20230601 header.b=fRTUbibM; arc=pass (i=1 spf=pass spfdomain=layalina.io dkim=pass dkdomain=layalina-io.20230601.gappssmtp.com); spf=pass (google.com: domain of linux-kernel+bounces-51990-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-51990-ouuuleilei=gmail.com@vger.kernel.org" X-Forwarded-Encrypted: i=1; AJvYcCUo41Nrm1by9bXI1kt7J/DndnAPYGJY1DYFBGJqJweeywHFOZe22C26vB7l+vJmNlAcedzKe88f+cfiDRVbL1j7z1HVRA== Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id h12-20020ac87d4c000000b0042a6ec8f4fbsi7719751qtb.224.2024.02.04.18.25.38 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 04 Feb 2024 18:25:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-51990-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@layalina-io.20230601.gappssmtp.com header.s=20230601 header.b=fRTUbibM; arc=pass (i=1 spf=pass spfdomain=layalina.io dkim=pass dkdomain=layalina-io.20230601.gappssmtp.com); spf=pass (google.com: domain of linux-kernel+bounces-51990-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-51990-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 9D2E31C220DC for <ouuuleilei@gmail.com>; Mon, 5 Feb 2024 02:25:38 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id DF7459455; Mon, 5 Feb 2024 02:25:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b="fRTUbibM" Received: from mail-wr1-f42.google.com (mail-wr1-f42.google.com [209.85.221.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A454CAD27 for <linux-kernel@vger.kernel.org>; Mon, 5 Feb 2024 02:25:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707099922; cv=none; b=Us4KoFtMffYBLG+5pJuocb1GqkNbHw2rUyf8kfitTkCp8CNnp2EXKrerCad94jObUw7EpSakTba2SIr6lSoB+XEHsp63y1mhfGCxcI8JzKtKla4r67XKkUDbR460OfymMbvFv8+yZK212dQa1SqK+azTDp4GL4ZvLXJuCnyzF9U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707099922; c=relaxed/simple; bh=8e2ZtlAdrtsfponDxeWJ9SpUMacZnh0zw7mhcfhkb88=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=PPY+s3Zm5L9OCFGti9WRkyId54RRmvJdFRs+Vn0gevPuiR5srCFhOSy4getK115ORTqvmUQ9w9OfBAWEzrbJlY7eVfOux3oo1WJsEMY9tB02WnyiEW+V8mkkXTfttvSPdiAUTvUija/NCMvWqbh3XyfYlVr313duZiXRz6t1mVo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io; spf=pass smtp.mailfrom=layalina.io; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b=fRTUbibM; arc=none smtp.client-ip=209.85.221.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=layalina.io Received: by mail-wr1-f42.google.com with SMTP id ffacd0b85a97d-33b0e5d1e89so3095561f8f.0 for <linux-kernel@vger.kernel.org>; Sun, 04 Feb 2024 18:25:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1707099919; x=1707704719; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=w8scmLEsyzGxoS8Gc0uCCvoQDZ8pPxzLZfnYWsrBWUA=; b=fRTUbibMeu8IOUrMBRykoIhEFldewn+85ilF1zECtXHml7+qXISZo251hJoH8xfNzh TXPKKb5S2pxLL3Ma3IBsTqJHpy/kgET4W5W+K6gbYTf27SRwiKdlO2i8G446yHLsypNw gUX0v4qtRp9l9cszG/4Q7R9CgoJpk9mXDQXTAHTYmeXDXCEtSFptgVaDyfpUtpnODQnE DESTYHGgD9lyuR9o5vDzdvS6NdqknKc72fi/6a36cpv95O9iovv3FFV39Y4nB7wRAQ7v N8MUw3FupsX1UgEr+advRah+Equ0o7ElFuRocQ20dwKp27of+ooqpjNbtJdfcm6DQK06 uPXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707099919; x=1707704719; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=w8scmLEsyzGxoS8Gc0uCCvoQDZ8pPxzLZfnYWsrBWUA=; b=UQHiFT609jlkpHW61N7CN8bXq45Fl4sdUjw2FdAkgNMxUwZCdlZp6+Lq8Wg8af/HOK NjbfA79GwxNakl9dyK+mlkmwyFma0KRl7+uhepw1erLcwHqVf1FCflbmGapO7CTvX1xC oMixB+/i7gzzYwwTt8sMZuVILPWXsqYVas5t215Lrnf0wbkPBccjXF1wERemWHnEL7SV ThQ7piT7K2I7JDy2bePuTJ31ie0vq0m2ihbJRYrdsM9h7rL0JMsuSwEZ8zV3hEP2JLky +YKKyT/6zdBjrHybK/ir9WLCJ+etouVpHnUH1Ib/Yo/JygjPvLsk/Du6fxoMjuSbCLzy iwyg== X-Gm-Message-State: AOJu0Yz/IQrvaSughwZQonUqAUZ4WiCWo3AEMlJm4ONrAG+u4ogP50rp fwP3XhPtAdB1mj/65fKgd14FQ2T1K4mbKSyGmELSAagVwJRc+fayoXDY9vsxQrY= X-Received: by 2002:adf:fcc5:0:b0:33a:e2d3:c3ec with SMTP id f5-20020adffcc5000000b0033ae2d3c3ecmr8699060wrs.60.1707099918847; Sun, 04 Feb 2024 18:25:18 -0800 (PST) X-Forwarded-Encrypted: i=0; AJvYcCWXl4Ig7R7yZxUjzj3JEz1+xOHBtc/N6U2uE07YtlDNBfjKKklvgEV4vxbQ56jwyhh77pLVToizjHEmHEO5GhPaz5CcJRivwVUN0f6eS4cHKc26l6Voe21ivkxAD5I42aqMvXKfI7Vo0eAnm7XsW/B4iMjC1eH4Kxz4vXHNo6sfujNOmBYRdyQnLfV0NgYNBCxFA5K/XlfwqyBL91pNrp+GJfNcBjud0Utide8L5Nfsbkl37jQK7n8z1CLggN+HoAI+9rc2oP+tLFOMeP67lwhnevmZHw== Received: from airbuntu.. (host109-154-238-234.range109-154.btcentralplus.com. [109.154.238.234]) by smtp.gmail.com with ESMTPSA id x9-20020adff649000000b0033947d7651asm7032099wrp.5.2024.02.04.18.25.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 04 Feb 2024 18:25:18 -0800 (PST) From: Qais Yousef <qyousef@layalina.io> To: "Rafael J. Wysocki" <rafael@kernel.org>, Viresh Kumar <viresh.kumar@linaro.org> Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>, Vincent Guittot <vincent.guittot@linaro.org>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Qais Yousef <qyousef@layalina.io> Subject: [PATCH] cpufreq: Change default transition delay to 2ms Date: Mon, 5 Feb 2024 02:25:00 +0000 Message-Id: <20240205022500.2232124-1-qyousef@layalina.io> X-Mailer: git-send-email 2.34.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790024025683366979 X-GMAIL-MSGID: 1790024025683366979 |
Series |
cpufreq: Change default transition delay to 2ms
|
|
Commit Message
Qais Yousef
Feb. 5, 2024, 2:25 a.m. UTC
10ms is too high for today's hardware, even low end ones. This default
end up being used a lot on Arm machines at least. Pine64, mac mini and
pixel 6 all end up with 10ms rate_limit_us when using schedutil, and
it's too high for all of them.
Change the default to 2ms which should be 'pessimistic' enough for worst
case scenario, but not too high for platforms with fast DVFS hardware.
Signed-off-by: Qais Yousef <qyousef@layalina.io>
---
drivers/cpufreq/cpufreq.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Comments
On 05-02-24, 02:25, Qais Yousef wrote: > 10ms is too high for today's hardware, even low end ones. This default > end up being used a lot on Arm machines at least. Pine64, mac mini and > pixel 6 all end up with 10ms rate_limit_us when using schedutil, and > it's too high for all of them. > > Change the default to 2ms which should be 'pessimistic' enough for worst > case scenario, but not too high for platforms with fast DVFS hardware. > > Signed-off-by: Qais Yousef <qyousef@layalina.io> > --- > drivers/cpufreq/cpufreq.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index 44db4f59c4cc..8207f7294cb6 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -582,11 +582,11 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) > * for platforms where transition_latency is in milliseconds, it > * ends up giving unrealistic values. > * > - * Cap the default transition delay to 10 ms, which seems to be > + * Cap the default transition delay to 2 ms, which seems to be > * a reasonable amount of time after which we should reevaluate > * the frequency. > */ > - return min(latency * LATENCY_MULTIPLIER, (unsigned int)10000); > + return min(latency * LATENCY_MULTIPLIER, (unsigned int)(2*MSEC_PER_SEC)); Please add spaces around '*'. Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
On 05/02/2024 02:25, Qais Yousef wrote: > 10ms is too high for today's hardware, even low end ones. This default > end up being used a lot on Arm machines at least. Pine64, mac mini and > pixel 6 all end up with 10ms rate_limit_us when using schedutil, and > it's too high for all of them. > > Change the default to 2ms which should be 'pessimistic' enough for worst > case scenario, but not too high for platforms with fast DVFS hardware. > > Signed-off-by: Qais Yousef <qyousef@layalina.io> > --- > drivers/cpufreq/cpufreq.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index 44db4f59c4cc..8207f7294cb6 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -582,11 +582,11 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) > * for platforms where transition_latency is in milliseconds, it > * ends up giving unrealistic values. > * > - * Cap the default transition delay to 10 ms, which seems to be > + * Cap the default transition delay to 2 ms, which seems to be > * a reasonable amount of time after which we should reevaluate > * the frequency. > */ > - return min(latency * LATENCY_MULTIPLIER, (unsigned int)10000); > + return min(latency * LATENCY_MULTIPLIER, (unsigned int)(2*MSEC_PER_SEC)); > } > > return LATENCY_MULTIPLIER; Hi Qais, as previously mentioned I'm working on improving iowait boost and while I'm not against this patch per se it does make iowait boosting more aggressive. ((Doubling limited by rate_limit_us) Since the boost is often applied when not useful (for Android e.g. periodic f2fs writebacks), this might have some side effects. Please give me a couple of days for verifying any impact, or did you do that already? Kind Regards, Christian
Hi Christian On 02/05/24 09:17, Christian Loehle wrote: > On 05/02/2024 02:25, Qais Yousef wrote: > > 10ms is too high for today's hardware, even low end ones. This default > > end up being used a lot on Arm machines at least. Pine64, mac mini and > > pixel 6 all end up with 10ms rate_limit_us when using schedutil, and > > it's too high for all of them. > > > > Change the default to 2ms which should be 'pessimistic' enough for worst > > case scenario, but not too high for platforms with fast DVFS hardware. > > > > Signed-off-by: Qais Yousef <qyousef@layalina.io> > > --- > > drivers/cpufreq/cpufreq.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > > index 44db4f59c4cc..8207f7294cb6 100644 > > --- a/drivers/cpufreq/cpufreq.c > > +++ b/drivers/cpufreq/cpufreq.c > > @@ -582,11 +582,11 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) > > * for platforms where transition_latency is in milliseconds, it > > * ends up giving unrealistic values. > > * > > - * Cap the default transition delay to 10 ms, which seems to be > > + * Cap the default transition delay to 2 ms, which seems to be > > * a reasonable amount of time after which we should reevaluate > > * the frequency. > > */ > > - return min(latency * LATENCY_MULTIPLIER, (unsigned int)10000); > > + return min(latency * LATENCY_MULTIPLIER, (unsigned int)(2*MSEC_PER_SEC)); > > } > > > > return LATENCY_MULTIPLIER; > > Hi Qais, > as previously mentioned I'm working on improving iowait boost and while I'm not against > this patch per se it does make iowait boosting more aggressive. ((Doubling limited by rate_limit_us) > Since the boost is often applied when not useful (for Android e.g. periodic f2fs writebacks), > this might have some side effects. Please give me a couple of days for verifying any impact, > or did you do that already? I don't understand the concern, could you elaborate more please? Products already ship with 500us and 1ms which is lower than this 2ms. On my AMD desktop it is already 1ms. And I think I've seen Intel systems defaulting to 500us or something low too. Ideally cpufreq drivers should set policy->transition_delay_us; so this path is taken if the driver didn't populate that. Which seems to be more common than I'd like tbh. I never run with 10ms. It's too slow. But I had several tests in the past against 2ms posted for those margin and removal of uclamp-max aggregation series. Anyway. I ran PCMark storage on Pixel 6 (running mainlinish kernel) and I see 10ms: 27600 2ms: 29750 HTH Cheers -- Qais Yousef
On 05/02/2024 12:01, Qais Yousef wrote: > Hi Christian > > On 02/05/24 09:17, Christian Loehle wrote: >> On 05/02/2024 02:25, Qais Yousef wrote: >>> 10ms is too high for today's hardware, even low end ones. This default >>> end up being used a lot on Arm machines at least. Pine64, mac mini and >>> pixel 6 all end up with 10ms rate_limit_us when using schedutil, and >>> it's too high for all of them. >>> >>> Change the default to 2ms which should be 'pessimistic' enough for worst >>> case scenario, but not too high for platforms with fast DVFS hardware. >>> >>> Signed-off-by: Qais Yousef <qyousef@layalina.io> >>> --- >>> drivers/cpufreq/cpufreq.c | 4 ++-- >>> 1 file changed, 2 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c >>> index 44db4f59c4cc..8207f7294cb6 100644 >>> --- a/drivers/cpufreq/cpufreq.c >>> +++ b/drivers/cpufreq/cpufreq.c >>> @@ -582,11 +582,11 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) >>> * for platforms where transition_latency is in milliseconds, it >>> * ends up giving unrealistic values. >>> * >>> - * Cap the default transition delay to 10 ms, which seems to be >>> + * Cap the default transition delay to 2 ms, which seems to be >>> * a reasonable amount of time after which we should reevaluate >>> * the frequency. >>> */ >>> - return min(latency * LATENCY_MULTIPLIER, (unsigned int)10000); >>> + return min(latency * LATENCY_MULTIPLIER, (unsigned int)(2*MSEC_PER_SEC)); >>> } >>> >>> return LATENCY_MULTIPLIER; >> >> Hi Qais, >> as previously mentioned I'm working on improving iowait boost and while I'm not against >> this patch per se it does make iowait boosting more aggressive. ((Doubling limited by rate_limit_us) >> Since the boost is often applied when not useful (for Android e.g. periodic f2fs writebacks), >> this might have some side effects. Please give me a couple of days for verifying any impact, >> or did you do that already? > > I don't understand the concern, could you elaborate more please? > > Products already ship with 500us and 1ms which is lower than this 2ms. > > On my AMD desktop it is already 1ms. And I think I've seen Intel systems > defaulting to 500us or something low too. Ideally cpufreq drivers should set > policy->transition_delay_us; so this path is taken if the driver didn't > populate that. Which seems to be more common than I'd like tbh. I'm not disagreeing with you on that part. I'm just worried about the side effects w.r.t iowait boosting. > > I never run with 10ms. It's too slow. But I had several tests in the past > against 2ms posted for those margin and removal of uclamp-max aggregation > series. Anyway. I ran PCMark storage on Pixel 6 (running mainlinish kernel) and > I see > > 10ms: 27600 > 2ms: 29750 Yes, decreasing the rate limit makes it more aggressive, nothing unexpected here. But let's be frank, the scenarios in which iowait boosting actually shows its biggest benefit you are either doing: - Random Read (small iosize), single-threaded, synchronous IO - anything O_DIRECT and I'd argue more than likely you are doing something wrong if you're actually in such a case in the real world. So I'm much more worried about boosting in scenarios where it doesn't help (e.g. on an Android quite frequently: f2fs page cache writeback). Decreasing the default transition latency makes (sugov) iowait boosting much more aggressive, so I'm curious if this patch increases power consumption on systems that were at 10ms previously when in non-IO workloads. Hope that clears that up. Again, not an argument against your change, just being cautious of the potential side effects and if they need some mitigations. Kind Regards, Christian
On 02/05/24 17:35, Christian Loehle wrote: > On 05/02/2024 12:01, Qais Yousef wrote: > > Hi Christian > > > > On 02/05/24 09:17, Christian Loehle wrote: > >> On 05/02/2024 02:25, Qais Yousef wrote: > >>> 10ms is too high for today's hardware, even low end ones. This default > >>> end up being used a lot on Arm machines at least. Pine64, mac mini and > >>> pixel 6 all end up with 10ms rate_limit_us when using schedutil, and > >>> it's too high for all of them. > >>> > >>> Change the default to 2ms which should be 'pessimistic' enough for worst > >>> case scenario, but not too high for platforms with fast DVFS hardware. > >>> > >>> Signed-off-by: Qais Yousef <qyousef@layalina.io> > >>> --- > >>> drivers/cpufreq/cpufreq.c | 4 ++-- > >>> 1 file changed, 2 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > >>> index 44db4f59c4cc..8207f7294cb6 100644 > >>> --- a/drivers/cpufreq/cpufreq.c > >>> +++ b/drivers/cpufreq/cpufreq.c > >>> @@ -582,11 +582,11 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) > >>> * for platforms where transition_latency is in milliseconds, it > >>> * ends up giving unrealistic values. > >>> * > >>> - * Cap the default transition delay to 10 ms, which seems to be > >>> + * Cap the default transition delay to 2 ms, which seems to be > >>> * a reasonable amount of time after which we should reevaluate > >>> * the frequency. > >>> */ > >>> - return min(latency * LATENCY_MULTIPLIER, (unsigned int)10000); > >>> + return min(latency * LATENCY_MULTIPLIER, (unsigned int)(2*MSEC_PER_SEC)); > >>> } > >>> > >>> return LATENCY_MULTIPLIER; > >> > >> Hi Qais, > >> as previously mentioned I'm working on improving iowait boost and while I'm not against > >> this patch per se it does make iowait boosting more aggressive. ((Doubling limited by rate_limit_us) > >> Since the boost is often applied when not useful (for Android e.g. periodic f2fs writebacks), > >> this might have some side effects. Please give me a couple of days for verifying any impact, > >> or did you do that already? > > > > I don't understand the concern, could you elaborate more please? > > > > Products already ship with 500us and 1ms which is lower than this 2ms. > > > > On my AMD desktop it is already 1ms. And I think I've seen Intel systems > > defaulting to 500us or something low too. Ideally cpufreq drivers should set > > policy->transition_delay_us; so this path is taken if the driver didn't > > populate that. Which seems to be more common than I'd like tbh. > > I'm not disagreeing with you on that part. I'm just worried about the side > effects w.r.t iowait boosting. I don't see a reason for that. This value should represent hardware's ability to change frequencies. It is not designed for iowait boost. And it being high means folks with good hardware are getting crap performance as changing frequency once every 10ms with today's bursty workloads means we leave a lot of perf on the floor for no good reason. And as I tried to explain, already platforms ship with low value as this is how the hardware behaves. We are not making iowait boost more aggressive; but bringing the fallback behavior more inline to what properly configured platforms behave already today. > > > > > I never run with 10ms. It's too slow. But I had several tests in the past > > against 2ms posted for those margin and removal of uclamp-max aggregation > > series. Anyway. I ran PCMark storage on Pixel 6 (running mainlinish kernel) and > > I see > > > > 10ms: 27600 > > 2ms: 29750 > > Yes, decreasing the rate limit makes it more aggressive, nothing unexpected here. > But let's be frank, the scenarios in which iowait boosting actually shows its > biggest benefit you are either doing: > - Random Read (small iosize), single-threaded, synchronous IO > - anything O_DIRECT > and I'd argue more than likely you are doing something wrong if you're actually in > such a case in the real world. So I'm much more worried about boosting in scenarios > where it doesn't help (e.g. on an Android quite frequently: f2fs page cache writeback). > > Decreasing the default transition latency makes (sugov) iowait boosting much more aggressive, > so I'm curious if this patch increases power consumption on systems that were at 10ms previously > when in non-IO workloads. > > Hope that clears that up. Again, not an argument against your change, just being cautious of > the potential side effects and if they need some mitigations. > > Kind Regards, > Christian
On Mon, Feb 5, 2024 at 8:45 AM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > On 05-02-24, 02:25, Qais Yousef wrote: > > 10ms is too high for today's hardware, even low end ones. This default > > end up being used a lot on Arm machines at least. Pine64, mac mini and > > pixel 6 all end up with 10ms rate_limit_us when using schedutil, and > > it's too high for all of them. > > > > Change the default to 2ms which should be 'pessimistic' enough for worst > > case scenario, but not too high for platforms with fast DVFS hardware. > > > > Signed-off-by: Qais Yousef <qyousef@layalina.io> > > --- > > drivers/cpufreq/cpufreq.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > > index 44db4f59c4cc..8207f7294cb6 100644 > > --- a/drivers/cpufreq/cpufreq.c > > +++ b/drivers/cpufreq/cpufreq.c > > @@ -582,11 +582,11 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) > > * for platforms where transition_latency is in milliseconds, it > > * ends up giving unrealistic values. > > * > > - * Cap the default transition delay to 10 ms, which seems to be > > + * Cap the default transition delay to 2 ms, which seems to be > > * a reasonable amount of time after which we should reevaluate > > * the frequency. > > */ > > - return min(latency * LATENCY_MULTIPLIER, (unsigned int)10000); > > + return min(latency * LATENCY_MULTIPLIER, (unsigned int)(2*MSEC_PER_SEC)); > > Please add spaces around '*'. > > Acked-by: Viresh Kumar <viresh.kumar@linaro.org> I've adjusted the whitespace as suggested above and applied the patch as 5.9 material. Thanks!
Hello, On 2/12/24 16:53, Rafael J. Wysocki wrote: > On Mon, Feb 5, 2024 at 8:45 AM Viresh Kumar <viresh.kumar@linaro.org> wrote: >> >> On 05-02-24, 02:25, Qais Yousef wrote: >>> 10ms is too high for today's hardware, even low end ones. This default >>> end up being used a lot on Arm machines at least. Pine64, mac mini and >>> pixel 6 all end up with 10ms rate_limit_us when using schedutil, and >>> it's too high for all of them. >>> >>> Change the default to 2ms which should be 'pessimistic' enough for worst >>> case scenario, but not too high for platforms with fast DVFS hardware. >>> >>> Signed-off-by: Qais Yousef <qyousef@layalina.io> >>> --- >>> drivers/cpufreq/cpufreq.c | 4 ++-- >>> 1 file changed, 2 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c >>> index 44db4f59c4cc..8207f7294cb6 100644 >>> --- a/drivers/cpufreq/cpufreq.c >>> +++ b/drivers/cpufreq/cpufreq.c >>> @@ -582,11 +582,11 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) >>> * for platforms where transition_latency is in milliseconds, it >>> * ends up giving unrealistic values. >>> * >>> - * Cap the default transition delay to 10 ms, which seems to be >>> + * Cap the default transition delay to 2 ms, which seems to be >>> * a reasonable amount of time after which we should reevaluate >>> * the frequency. >>> */ >>> - return min(latency * LATENCY_MULTIPLIER, (unsigned int)10000); >>> + return min(latency * LATENCY_MULTIPLIER, (unsigned int)(2*MSEC_PER_SEC)); >> >> Please add spaces around '*'. >> >> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> > > I've adjusted the whitespace as suggested above and applied the patch > as 5.9 material. > > Thanks! > To add some numbers, on a Juno-r2, with latency measured between the frequency request on the kernel side and the SCP actually making the frequency update. The SCP is the firmware responsible of making the frequency updates. It receives the kernel requests and coordinate them/make the actual changes. The SCP also has a mechanism called 'fast channel' (FC) where the kernel writes the requested frequency to a memory area shared with the SCP. Every 4ms, the SCP polls/reads these memory area and make the required modifications. Latency values (in ms) Workload: Idle system, during ~30s +---------------------------------------+ | | Without FC | With FC | +-------+---------------+---------------+ | count | 1663 | 1102 | | mean | 2.92 | 2.10 | | std | 1.90 | 1.58 | | min | 0.21 | 0.00 | | 25% | 1.64 | 0.91 | | 50% | 2.57 | 1.68 | | 75% | 3.66 | 2.97 | | max | 14.37 | 13.50 | +-------+---------------+---------------+ Latency values (in ms) Workload: One 1% task per CPU, period = 32ms. This allows to wake up the CPU every 32ms and send more requests/give more work to the SCP. Indeed the SCP is also responsible of idle state transitions. Test duration ~=30s. +---------------------------------------+ | | Without FC | With FC | +-------+---------------+---------------+ | count | 1629 | 1446 | | mean | 3.23 | 2.31 | | std | 2.40 | 1.73 | | min | 0.05 | 0.02 | | 25% | 1.91 | 0.98 | | 50% | 2.65 | 2.00 | | 75% | 3.65 | 3.23 | | max | 20.56 | 16.73 | +-------+---------------+---------------+ --- The latency increases when fast channels are not used and when there is an actual workload. On average it is always > 2ms. Juno's release date seems to be 2014, so the platform is quite old, but it should also have benefited from regular firmware updates. Regards, Pierre
On 02/12/24 16:53, Rafael J. Wysocki wrote: > On Mon, Feb 5, 2024 at 8:45 AM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > > > On 05-02-24, 02:25, Qais Yousef wrote: > > > 10ms is too high for today's hardware, even low end ones. This default > > > end up being used a lot on Arm machines at least. Pine64, mac mini and > > > pixel 6 all end up with 10ms rate_limit_us when using schedutil, and > > > it's too high for all of them. > > > > > > Change the default to 2ms which should be 'pessimistic' enough for worst > > > case scenario, but not too high for platforms with fast DVFS hardware. > > > > > > Signed-off-by: Qais Yousef <qyousef@layalina.io> > > > --- > > > drivers/cpufreq/cpufreq.c | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > > > index 44db4f59c4cc..8207f7294cb6 100644 > > > --- a/drivers/cpufreq/cpufreq.c > > > +++ b/drivers/cpufreq/cpufreq.c > > > @@ -582,11 +582,11 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) > > > * for platforms where transition_latency is in milliseconds, it > > > * ends up giving unrealistic values. > > > * > > > - * Cap the default transition delay to 10 ms, which seems to be > > > + * Cap the default transition delay to 2 ms, which seems to be > > > * a reasonable amount of time after which we should reevaluate > > > * the frequency. > > > */ > > > - return min(latency * LATENCY_MULTIPLIER, (unsigned int)10000); > > > + return min(latency * LATENCY_MULTIPLIER, (unsigned int)(2*MSEC_PER_SEC)); > > > > Please add spaces around '*'. > > > > Acked-by: Viresh Kumar <viresh.kumar@linaro.org> > > I've adjusted the whitespace as suggested above and applied the patch > as 5.9 material. Sorry I missed that remark about the white space. Thanks for the fixup! > > Thanks!
On 02/14/24 10:19, Pierre Gondois wrote: > Hello, > > On 2/12/24 16:53, Rafael J. Wysocki wrote: > > On Mon, Feb 5, 2024 at 8:45 AM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > > > > > On 05-02-24, 02:25, Qais Yousef wrote: > > > > 10ms is too high for today's hardware, even low end ones. This default > > > > end up being used a lot on Arm machines at least. Pine64, mac mini and > > > > pixel 6 all end up with 10ms rate_limit_us when using schedutil, and > > > > it's too high for all of them. > > > > > > > > Change the default to 2ms which should be 'pessimistic' enough for worst > > > > case scenario, but not too high for platforms with fast DVFS hardware. > > > > > > > > Signed-off-by: Qais Yousef <qyousef@layalina.io> > > > > --- > > > > drivers/cpufreq/cpufreq.c | 4 ++-- > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > > > > index 44db4f59c4cc..8207f7294cb6 100644 > > > > --- a/drivers/cpufreq/cpufreq.c > > > > +++ b/drivers/cpufreq/cpufreq.c > > > > @@ -582,11 +582,11 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) > > > > * for platforms where transition_latency is in milliseconds, it > > > > * ends up giving unrealistic values. > > > > * > > > > - * Cap the default transition delay to 10 ms, which seems to be > > > > + * Cap the default transition delay to 2 ms, which seems to be > > > > * a reasonable amount of time after which we should reevaluate > > > > * the frequency. > > > > */ > > > > - return min(latency * LATENCY_MULTIPLIER, (unsigned int)10000); > > > > + return min(latency * LATENCY_MULTIPLIER, (unsigned int)(2*MSEC_PER_SEC)); > > > > > > Please add spaces around '*'. > > > > > > Acked-by: Viresh Kumar <viresh.kumar@linaro.org> > > > > I've adjusted the whitespace as suggested above and applied the patch > > as 5.9 material. > > > > Thanks! > > > > To add some numbers, on a Juno-r2, with latency measured between the frequency > request on the kernel side and the SCP actually making the frequency update. > > The SCP is the firmware responsible of making the frequency updates. It receives > the kernel requests and coordinate them/make the actual changes. The SCP also has > a mechanism called 'fast channel' (FC) where the kernel writes the requested > frequency to a memory area shared with the SCP. Every 4ms, the SCP polls/reads > these memory area and make the required modifications. > > Latency values (in ms) > Workload: > Idle system, during ~30s > +---------------------------------------+ > | | Without FC | With FC | > +-------+---------------+---------------+ > | count | 1663 | 1102 | > | mean | 2.92 | 2.10 | > | std | 1.90 | 1.58 | > | min | 0.21 | 0.00 | > | 25% | 1.64 | 0.91 | > | 50% | 2.57 | 1.68 | > | 75% | 3.66 | 2.97 | > | max | 14.37 | 13.50 | > +-------+---------------+---------------+ > > Latency values (in ms) > Workload: > One 1% task per CPU, period = 32ms. This allows to wake up the CPU > every 32ms and send more requests/give more work to the SCP. Indeed > the SCP is also responsible of idle state transitions. > Test duration ~=30s. > +---------------------------------------+ > | | Without FC | With FC | > +-------+---------------+---------------+ > | count | 1629 | 1446 | > | mean | 3.23 | 2.31 | > | std | 2.40 | 1.73 | > | min | 0.05 | 0.02 | > | 25% | 1.91 | 0.98 | > | 50% | 2.65 | 2.00 | > | 75% | 3.65 | 3.23 | > | max | 20.56 | 16.73 | > +-------+---------------+---------------+ > > --- > > The latency increases when fast channels are not used and when there is an actual > workload. On average it is always > 2ms. Juno's release date seems to be 2014, > so the platform is quite old, but it should also have benefited from regular > firmware updates. Thanks for sharing the numbers > > Regards, > Pierre
Hello Qais, I added some other remarks, On 2/20/24 14:50, Qais Yousef wrote: > On 02/14/24 10:19, Pierre Gondois wrote: >> Hello, >> >> On 2/12/24 16:53, Rafael J. Wysocki wrote: >>> On Mon, Feb 5, 2024 at 8:45 AM Viresh Kumar <viresh.kumar@linaro.org> wrote: >>>> >>>> On 05-02-24, 02:25, Qais Yousef wrote: >>>>> 10ms is too high for today's hardware, even low end ones. This default >>>>> end up being used a lot on Arm machines at least. Pine64, mac mini and >>>>> pixel 6 all end up with 10ms rate_limit_us when using schedutil, and >>>>> it's too high for all of them. >>>>> >>>>> Change the default to 2ms which should be 'pessimistic' enough for worst >>>>> case scenario, but not too high for platforms with fast DVFS hardware. >>>>> >>>>> Signed-off-by: Qais Yousef <qyousef@layalina.io> >>>>> --- >>>>> drivers/cpufreq/cpufreq.c | 4 ++-- >>>>> 1 file changed, 2 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c >>>>> index 44db4f59c4cc..8207f7294cb6 100644 >>>>> --- a/drivers/cpufreq/cpufreq.c >>>>> +++ b/drivers/cpufreq/cpufreq.c >>>>> @@ -582,11 +582,11 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) >>>>> * for platforms where transition_latency is in milliseconds, it >>>>> * ends up giving unrealistic values. >>>>> * >>>>> - * Cap the default transition delay to 10 ms, which seems to be >>>>> + * Cap the default transition delay to 2 ms, which seems to be >>>>> * a reasonable amount of time after which we should reevaluate >>>>> * the frequency. >>>>> */ >>>>> - return min(latency * LATENCY_MULTIPLIER, (unsigned int)10000); >>>>> + return min(latency * LATENCY_MULTIPLIER, (unsigned int)(2*MSEC_PER_SEC)); >>>> >>>> Please add spaces around '*'. >>>> >>>> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> >>> >>> I've adjusted the whitespace as suggested above and applied the patch >>> as 5.9 material. >>> >>> Thanks! >>> >> >> To add some numbers, on a Juno-r2, with latency measured between the frequency >> request on the kernel side and the SCP actually making the frequency update. >> >> The SCP is the firmware responsible of making the frequency updates. It receives >> the kernel requests and coordinate them/make the actual changes. The SCP also has >> a mechanism called 'fast channel' (FC) where the kernel writes the requested >> frequency to a memory area shared with the SCP. Every 4ms, the SCP polls/reads >> these memory area and make the required modifications. >> >> Latency values (in ms) >> Workload: >> Idle system, during ~30s >> +---------------------------------------+ >> | | Without FC | With FC | >> +-------+---------------+---------------+ >> | count | 1663 | 1102 | >> | mean | 2.92 | 2.10 | >> | std | 1.90 | 1.58 | >> | min | 0.21 | 0.00 | >> | 25% | 1.64 | 0.91 | >> | 50% | 2.57 | 1.68 | >> | 75% | 3.66 | 2.97 | >> | max | 14.37 | 13.50 | >> +-------+---------------+---------------+ >> >> Latency values (in ms) >> Workload: >> One 1% task per CPU, period = 32ms. This allows to wake up the CPU >> every 32ms and send more requests/give more work to the SCP. Indeed >> the SCP is also responsible of idle state transitions. >> Test duration ~=30s. >> +---------------------------------------+ >> | | Without FC | With FC | >> +-------+---------------+---------------+ >> | count | 1629 | 1446 | >> | mean | 3.23 | 2.31 | >> | std | 2.40 | 1.73 | >> | min | 0.05 | 0.02 | >> | 25% | 1.91 | 0.98 | >> | 50% | 2.65 | 2.00 | >> | 75% | 3.65 | 3.23 | >> | max | 20.56 | 16.73 | >> +-------+---------------+---------------+ >> >> --- 1. With this patch, platforms like the Juno which: - don't set a `transition_delay_us` - have a high `transition_latency` (> 1ms) can request freq. changes every 2ms. If a platform has a `transition_latency` > 2ms, this means: `transition_latency` > `transition_delay_us` I.e. a second freq. requests might be emitted before the first one will be completed. On the Juno, this doesn't cause any 'real' issue as the SCMI/mailbox mechanism works well, but this doesn't seem correct. If the util of CPUs is in between OPPs (i.e. freq. changes are often required), the Juno: - sends a freq. request - waits for completion and schedules another task in the meantime - upon completion, immediately sends a new freq. I think that the following should be respected/checked: - `transition_latency` < `transition_delay_us` (it might also make sense to have, with K being any factor:) - `transition_latency` * K < `transition_delay_us` 2. There are references to the 10ms values at other places in the code: include/linux/cpufreq.h * ondemand governor will work on any processor with transition latency <= 10ms, drivers/cpufreq/cpufreq.c * For platforms that can change the frequency very fast (< 10 * us), the above formula gives a decent transition delay. But -> the 10us value matches 10ms = 10us * LATENCY_MULTIPLIER Documentation/admin-guide/pm/cpufreq.rst Typically, it is set to values of the order of 10000 (10 ms). Its default value is equal to the value of ``cpuinfo_transition_latency`` 3. There seems to be a dependency of the conservative/ondemand governors over the the value returned by `cpufreq_policy_transition_delay_us()`: drivers/cpufreq/cpufreq_governor.c dbs_data->sampling_rate = max_t(unsigned int, CPUFREQ_DBS_MIN_SAMPLING_INTERVAL, // = 2 * tick period = 8ms cpufreq_policy_transition_delay_us(policy)); // [1]: val <= 2ms [1] if `transition_latency` is not set and `transition_delay_us` is, which is the case for the Juno. The `sampling_rate` is, FYIU, the period used to evaluate the ratio of the idle/busy time, and if necessary increase/decrease the freq. This patch will likely reduce this sampling rate from 10ms -> 8ms (if `cpufreq_policy_transition_delay_us()`` now returns 2ms for some platforms). This is not much, but just wanted to note it. Regards, Pierre >> >> The latency increases when fast channels are not used and when there is an actual >> workload. On average it is always > 2ms. Juno's release date seems to be 2014, >> so the platform is quite old, but it should also have benefited from regular >> firmware updates. > > Thanks for sharing the numbers > >> >> Regards, >> Pierre
On 02/20/24 18:38, Pierre Gondois wrote: > Hello Qais, > > I added some other remarks, > > On 2/20/24 14:50, Qais Yousef wrote: > > On 02/14/24 10:19, Pierre Gondois wrote: > > > Hello, > > > > > > On 2/12/24 16:53, Rafael J. Wysocki wrote: > > > > On Mon, Feb 5, 2024 at 8:45 AM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > > > > > > > > > On 05-02-24, 02:25, Qais Yousef wrote: > > > > > > 10ms is too high for today's hardware, even low end ones. This default > > > > > > end up being used a lot on Arm machines at least. Pine64, mac mini and > > > > > > pixel 6 all end up with 10ms rate_limit_us when using schedutil, and > > > > > > it's too high for all of them. > > > > > > > > > > > > Change the default to 2ms which should be 'pessimistic' enough for worst > > > > > > case scenario, but not too high for platforms with fast DVFS hardware. > > > > > > > > > > > > Signed-off-by: Qais Yousef <qyousef@layalina.io> > > > > > > --- > > > > > > drivers/cpufreq/cpufreq.c | 4 ++-- > > > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > > > > > > index 44db4f59c4cc..8207f7294cb6 100644 > > > > > > --- a/drivers/cpufreq/cpufreq.c > > > > > > +++ b/drivers/cpufreq/cpufreq.c > > > > > > @@ -582,11 +582,11 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) > > > > > > * for platforms where transition_latency is in milliseconds, it > > > > > > * ends up giving unrealistic values. > > > > > > * > > > > > > - * Cap the default transition delay to 10 ms, which seems to be > > > > > > + * Cap the default transition delay to 2 ms, which seems to be > > > > > > * a reasonable amount of time after which we should reevaluate > > > > > > * the frequency. > > > > > > */ > > > > > > - return min(latency * LATENCY_MULTIPLIER, (unsigned int)10000); > > > > > > + return min(latency * LATENCY_MULTIPLIER, (unsigned int)(2*MSEC_PER_SEC)); > > > > > > > > > > Please add spaces around '*'. > > > > > > > > > > Acked-by: Viresh Kumar <viresh.kumar@linaro.org> > > > > > > > > I've adjusted the whitespace as suggested above and applied the patch > > > > as 5.9 material. > > > > > > > > Thanks! > > > > > > > > > > To add some numbers, on a Juno-r2, with latency measured between the frequency > > > request on the kernel side and the SCP actually making the frequency update. > > > > > > The SCP is the firmware responsible of making the frequency updates. It receives > > > the kernel requests and coordinate them/make the actual changes. The SCP also has > > > a mechanism called 'fast channel' (FC) where the kernel writes the requested > > > frequency to a memory area shared with the SCP. Every 4ms, the SCP polls/reads > > > these memory area and make the required modifications. > > > > > > Latency values (in ms) > > > Workload: > > > Idle system, during ~30s > > > +---------------------------------------+ > > > | | Without FC | With FC | > > > +-------+---------------+---------------+ > > > | count | 1663 | 1102 | > > > | mean | 2.92 | 2.10 | > > > | std | 1.90 | 1.58 | > > > | min | 0.21 | 0.00 | > > > | 25% | 1.64 | 0.91 | > > > | 50% | 2.57 | 1.68 | > > > | 75% | 3.66 | 2.97 | > > > | max | 14.37 | 13.50 | > > > +-------+---------------+---------------+ > > > > > > Latency values (in ms) > > > Workload: > > > One 1% task per CPU, period = 32ms. This allows to wake up the CPU > > > every 32ms and send more requests/give more work to the SCP. Indeed > > > the SCP is also responsible of idle state transitions. > > > Test duration ~=30s. > > > +---------------------------------------+ > > > | | Without FC | With FC | > > > +-------+---------------+---------------+ > > > | count | 1629 | 1446 | > > > | mean | 3.23 | 2.31 | > > > | std | 2.40 | 1.73 | > > > | min | 0.05 | 0.02 | > > > | 25% | 1.91 | 0.98 | > > > | 50% | 2.65 | 2.00 | > > > | 75% | 3.65 | 3.23 | > > > | max | 20.56 | 16.73 | > > > +-------+---------------+---------------+ > > > > > > --- > > 1. > With this patch, platforms like the Juno which: > - don't set a `transition_delay_us` > - have a high `transition_latency` (> 1ms) > can request freq. changes every 2ms. > > If a platform has a `transition_latency` > 2ms, this means: > `transition_latency` > `transition_delay_us` > I.e. a second freq. requests might be emitted before the first one > will be completed. On the Juno, this doesn't cause any 'real' issue > as the SCMI/mailbox mechanism works well, but this doesn't seem > correct. > If the util of CPUs is in between OPPs (i.e. freq. changes are often > required), the Juno: > - sends a freq. request > - waits for completion and schedules another task in the meantime > - upon completion, immediately sends a new freq. > > I think that the following should be respected/checked: > - `transition_latency` < `transition_delay_us` > (it might also make sense to have, with K being any factor:) > - `transition_latency` * K < `transition_delay_us` Makes sense. How about this? I am not sure it is better to multiply with a factor if the platform is already slow. Even the current 1000 multiply factor is high but this is a territory I am not ready to step into yet. diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 66cef33c4ec7..68a5ba24a5e0 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -576,6 +576,15 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) latency = policy->cpuinfo.transition_latency / NSEC_PER_USEC; if (latency) { + unsigned int max_delay_us = 2 * MSEC_PER_SEC;; + + /* + * If the platform already has high transition_latency, use it + * as-is. + */ + if (latency > max_delay_us) + return latency; + /* * For platforms that can change the frequency very fast (< 10 * us), the above formula gives a decent transition delay. But @@ -586,7 +595,7 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) * a reasonable amount of time after which we should reevaluate * the frequency. */ - return min(latency * LATENCY_MULTIPLIER, (unsigned int)(2 * MSEC_PER_SEC)); + return min(latency * LATENCY_MULTIPLIER, max_delay_us); } return LATENCY_MULTIPLIER; > > > 2. > There are references to the 10ms values at other places in the code: > > include/linux/cpufreq.h > * ondemand governor will work on any processor with transition latency <= 10ms, Not sure this one needs updating. Especially with the change above which means 10ms could theoretically happen. But if there are suggestions happy to take them. > > drivers/cpufreq/cpufreq.c > * For platforms that can change the frequency very fast (< 10 > * us), the above formula gives a decent transition delay. But > -> the 10us value matches 10ms = 10us * LATENCY_MULTIPLIER I can't find this one. > > Documentation/admin-guide/pm/cpufreq.rst > Typically, it is set to values of the order of 10000 (10 ms). Its > default value is equal to the value of ``cpuinfo_transition_latency`` I am not sure about this one. It refers to cpuinfo_transition_latency not the delay and uses a formula to calculate it based on that. Seems the paragraph needs updating in general to reflect other changes? > > > 3. > There seems to be a dependency of the conservative/ondemand governors > over the the value returned by `cpufreq_policy_transition_delay_us()`: > > drivers/cpufreq/cpufreq_governor.c > dbs_data->sampling_rate = max_t(unsigned int, > CPUFREQ_DBS_MIN_SAMPLING_INTERVAL, // = 2 * tick period = 8ms > cpufreq_policy_transition_delay_us(policy)); // [1]: val <= 2ms > > [1] > if `transition_latency` is not set and `transition_delay_us` is, > which is the case for the Juno. > > The `sampling_rate` is, FYIU, the period used to evaluate the ratio > of the idle/busy time, and if necessary increase/decrease the freq. > > This patch will likely reduce this sampling rate from 10ms -> 8ms > (if `cpufreq_policy_transition_delay_us()`` now returns 2ms for some > platforms). This is not much, but just wanted to note it. I don't think this is a problem as tick being 1ms is common and transition_delay_us is not 10ms on all platforms. On my amd system it is 1ms and on another intel i5 it is 5ms. So it should have already been coping with various combination. Thanks! -- Qais Yousef
Hello Qais, On 2/22/24 12:55, Qais Yousef wrote: > On 02/20/24 18:38, Pierre Gondois wrote: >> Hello Qais, >> >> I added some other remarks, >> >> On 2/20/24 14:50, Qais Yousef wrote: >>> On 02/14/24 10:19, Pierre Gondois wrote: >>>> Hello, >>>> >>>> On 2/12/24 16:53, Rafael J. Wysocki wrote: >>>>> On Mon, Feb 5, 2024 at 8:45 AM Viresh Kumar <viresh.kumar@linaro.org> wrote: >>>>>> >>>>>> On 05-02-24, 02:25, Qais Yousef wrote: >>>>>>> 10ms is too high for today's hardware, even low end ones. This default >>>>>>> end up being used a lot on Arm machines at least. Pine64, mac mini and >>>>>>> pixel 6 all end up with 10ms rate_limit_us when using schedutil, and >>>>>>> it's too high for all of them. >>>>>>> >>>>>>> Change the default to 2ms which should be 'pessimistic' enough for worst >>>>>>> case scenario, but not too high for platforms with fast DVFS hardware. >>>>>>> >>>>>>> Signed-off-by: Qais Yousef <qyousef@layalina.io> >>>>>>> --- >>>>>>> drivers/cpufreq/cpufreq.c | 4 ++-- >>>>>>> 1 file changed, 2 insertions(+), 2 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c >>>>>>> index 44db4f59c4cc..8207f7294cb6 100644 >>>>>>> --- a/drivers/cpufreq/cpufreq.c >>>>>>> +++ b/drivers/cpufreq/cpufreq.c >>>>>>> @@ -582,11 +582,11 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) >>>>>>> * for platforms where transition_latency is in milliseconds, it >>>>>>> * ends up giving unrealistic values. >>>>>>> * >>>>>>> - * Cap the default transition delay to 10 ms, which seems to be >>>>>>> + * Cap the default transition delay to 2 ms, which seems to be >>>>>>> * a reasonable amount of time after which we should reevaluate >>>>>>> * the frequency. >>>>>>> */ >>>>>>> - return min(latency * LATENCY_MULTIPLIER, (unsigned int)10000); >>>>>>> + return min(latency * LATENCY_MULTIPLIER, (unsigned int)(2*MSEC_PER_SEC)); >>>>>> >>>>>> Please add spaces around '*'. >>>>>> >>>>>> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> >>>>> >>>>> I've adjusted the whitespace as suggested above and applied the patch >>>>> as 5.9 material. >>>>> >>>>> Thanks! >>>>> >>>> >>>> To add some numbers, on a Juno-r2, with latency measured between the frequency >>>> request on the kernel side and the SCP actually making the frequency update. >>>> >>>> The SCP is the firmware responsible of making the frequency updates. It receives >>>> the kernel requests and coordinate them/make the actual changes. The SCP also has >>>> a mechanism called 'fast channel' (FC) where the kernel writes the requested >>>> frequency to a memory area shared with the SCP. Every 4ms, the SCP polls/reads >>>> these memory area and make the required modifications. >>>> >>>> Latency values (in ms) >>>> Workload: >>>> Idle system, during ~30s >>>> +---------------------------------------+ >>>> | | Without FC | With FC | >>>> +-------+---------------+---------------+ >>>> | count | 1663 | 1102 | >>>> | mean | 2.92 | 2.10 | >>>> | std | 1.90 | 1.58 | >>>> | min | 0.21 | 0.00 | >>>> | 25% | 1.64 | 0.91 | >>>> | 50% | 2.57 | 1.68 | >>>> | 75% | 3.66 | 2.97 | >>>> | max | 14.37 | 13.50 | >>>> +-------+---------------+---------------+ >>>> >>>> Latency values (in ms) >>>> Workload: >>>> One 1% task per CPU, period = 32ms. This allows to wake up the CPU >>>> every 32ms and send more requests/give more work to the SCP. Indeed >>>> the SCP is also responsible of idle state transitions. >>>> Test duration ~=30s. >>>> +---------------------------------------+ >>>> | | Without FC | With FC | >>>> +-------+---------------+---------------+ >>>> | count | 1629 | 1446 | >>>> | mean | 3.23 | 2.31 | >>>> | std | 2.40 | 1.73 | >>>> | min | 0.05 | 0.02 | >>>> | 25% | 1.91 | 0.98 | >>>> | 50% | 2.65 | 2.00 | >>>> | 75% | 3.65 | 3.23 | >>>> | max | 20.56 | 16.73 | >>>> +-------+---------------+---------------+ >>>> >>>> --- >> >> 1. >> With this patch, platforms like the Juno which: >> - don't set a `transition_delay_us` >> - have a high `transition_latency` (> 1ms) >> can request freq. changes every 2ms. >> >> If a platform has a `transition_latency` > 2ms, this means: >> `transition_latency` > `transition_delay_us` >> I.e. a second freq. requests might be emitted before the first one >> will be completed. On the Juno, this doesn't cause any 'real' issue >> as the SCMI/mailbox mechanism works well, but this doesn't seem >> correct. >> If the util of CPUs is in between OPPs (i.e. freq. changes are often >> required), the Juno: >> - sends a freq. request >> - waits for completion and schedules another task in the meantime >> - upon completion, immediately sends a new freq. >> >> I think that the following should be respected/checked: >> - `transition_latency` < `transition_delay_us` >> (it might also make sense to have, with K being any factor:) >> - `transition_latency` * K < `transition_delay_us` > > Makes sense. How about this? I am not sure it is better to multiply with > a factor if the platform is already slow. Even the current 1000 multiply factor > is high but this is a territory I am not ready to step into yet. > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index 66cef33c4ec7..68a5ba24a5e0 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -576,6 +576,15 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) > > latency = policy->cpuinfo.transition_latency / NSEC_PER_USEC; > if (latency) { > + unsigned int max_delay_us = 2 * MSEC_PER_SEC;; > + > + /* > + * If the platform already has high transition_latency, use it > + * as-is. > + */ > + if (latency > max_delay_us) [1] return min(latency, 10ms); > + return latency; > + > /* > * For platforms that can change the frequency very fast (< 10 > * us), the above formula gives a decent transition delay. But > @@ -586,7 +595,7 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) > * a reasonable amount of time after which we should reevaluate > * the frequency. > */ > - return min(latency * LATENCY_MULTIPLIER, (unsigned int)(2 * MSEC_PER_SEC)); > + return min(latency * LATENCY_MULTIPLIER, max_delay_us); > } > > return LATENCY_MULTIPLIER; > A policy with these values: - transition_delay_us = 0 - transition_latency = 30ms would get a transition_delay of 30ms I think. Maybe it would be better to default to the old value in this case [1]. --- Also a note that on the Pixel6 I have, transition_latency=5ms, so the platform's policies would end up with transition_delay=5ms >> >> >> 2. >> There are references to the 10ms values at other places in the code: >> >> include/linux/cpufreq.h >> * ondemand governor will work on any processor with transition latency <= 10ms, > > Not sure this one needs updating. Especially with the change above which means > 10ms could theoretically happen. But if there are suggestions happy to take > them. a. LATENCY_MULTIPLIER introduction: 112124ab0a9f ("[CPUFREQ] ondemand/conservative: sanitize sampling_rate restrictions") b. max_transition_latency removal: ed4676e25463 ("cpufreq: Replace "max_transition_latency" with "dynamic_switching"") c. dynamic_switching removal: 9a2a9ebc0a75 ("cpufreq: Introduce governor flags") The value could be removed independently from this patch indeed, as this is not related to cpufreq_policy_transition_delay_us() since b. > >> >> drivers/cpufreq/cpufreq.c >> * For platforms that can change the frequency very fast (< 10 >> * us), the above formula gives a decent transition delay. But >> -> the 10us value matches 10ms = 10us * LATENCY_MULTIPLIER > > I can't find this one. It's in cpufreq_policy_transition_delay_us(), "the 10us value matches 10ms = 10us * LATENCY_MULTIPLIER" is a sentence I wrote, the comment to modify would be: """ * For platforms that can change the frequency very fast (< 10 * us), the above formula gives a decent transition delay. But """ > >> >> Documentation/admin-guide/pm/cpufreq.rst >> Typically, it is set to values of the order of 10000 (10 ms). Its >> default value is equal to the value of ``cpuinfo_transition_latency`` > > I am not sure about this one. It refers to cpuinfo_transition_latency not the > delay and uses a formula to calculate it based on that. > > Seems the paragraph needs updating in general to reflect other changes? aa7519af450d ("cpufreq: Use transition_delay_us for legacy governors as well") The cpufreq_policy_transition_delay_us() was introduced there and integrates the 10ms value related to this paragraph. --- I assume that if we keep the 10ms value in the code, it should be ok to let the comment as is. I'll send a patch to remove the first one as it can be done independently. > >> >> >> 3. >> There seems to be a dependency of the conservative/ondemand governors >> over the the value returned by `cpufreq_policy_transition_delay_us()`: >> >> drivers/cpufreq/cpufreq_governor.c >> dbs_data->sampling_rate = max_t(unsigned int, >> CPUFREQ_DBS_MIN_SAMPLING_INTERVAL, // = 2 * tick period = 8ms >> cpufreq_policy_transition_delay_us(policy)); // [1]: val <= 2ms >> >> [1] >> if `transition_latency` is not set and `transition_delay_us` is, >> which is the case for the Juno. >> >> The `sampling_rate` is, FYIU, the period used to evaluate the ratio >> of the idle/busy time, and if necessary increase/decrease the freq. >> >> This patch will likely reduce this sampling rate from 10ms -> 8ms >> (if `cpufreq_policy_transition_delay_us()`` now returns 2ms for some >> platforms). This is not much, but just wanted to note it. > > I don't think this is a problem as tick being 1ms is common and > transition_delay_us is not 10ms on all platforms. On my amd system it is 1ms > and on another intel i5 it is 5ms. So it should have already been coping with > various combination. Ok right, Regards, Pierre
On 02/22/24 16:15, Pierre Gondois wrote: > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > > index 66cef33c4ec7..68a5ba24a5e0 100644 > > --- a/drivers/cpufreq/cpufreq.c > > +++ b/drivers/cpufreq/cpufreq.c > > @@ -576,6 +576,15 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) > > > > latency = policy->cpuinfo.transition_latency / NSEC_PER_USEC; > > if (latency) { > > + unsigned int max_delay_us = 2 * MSEC_PER_SEC;; > > + > > + /* > > + * If the platform already has high transition_latency, use it > > + * as-is. > > + */ > > + if (latency > max_delay_us) > [1] return min(latency, 10ms); > > + return latency; > > + > > /* > > * For platforms that can change the frequency very fast (< 10 > > * us), the above formula gives a decent transition delay. But > > @@ -586,7 +595,7 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) > > * a reasonable amount of time after which we should reevaluate > > * the frequency. > > */ > > - return min(latency * LATENCY_MULTIPLIER, (unsigned int)(2 * MSEC_PER_SEC)); > > + return min(latency * LATENCY_MULTIPLIER, max_delay_us); > > } > > > > return LATENCY_MULTIPLIER; > > > > A policy with these values: > - transition_delay_us = 0 > - transition_latency = 30ms > would get a transition_delay of 30ms I think. > > Maybe it would be better to default to the old value in this case [1]. Hmm. I think it wouldn't make sense to have 2 levels of capping. It's either we cap to 2ms, or honour the transition latency from the driver if it is higher and let it lower it if it can truly handle smaller values? Rafael, should I send a new version of the patch, a new patch on top or would you like to take a fixup if you can amend the commit? If you and Viresh think the two levels of capping make sense as suggested above let me know. I think better to delegate to the drivers if they report transition_latency higher than 2ms. -->8-- diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 66cef33c4ec7..668263c53810 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -576,8 +576,17 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) latency = policy->cpuinfo.transition_latency / NSEC_PER_USEC; if (latency) { + unsigned int max_delay_us = 2 * MSEC_PER_SEC;; + + /* + * If the platform already has high transition_latency, use it + * as-is. + */ + if (latency > max_delay_us) + return latency; + /* - * For platforms that can change the frequency very fast (< 10 + * For platforms that can change the frequency very fast (< 20 * us), the above formula gives a decent transition delay. But * for platforms where transition_latency is in milliseconds, it * ends up giving unrealistic values. @@ -586,7 +595,7 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) * a reasonable amount of time after which we should reevaluate * the frequency. */ - return min(latency * LATENCY_MULTIPLIER, (unsigned int)(2 * MSEC_PER_SEC)); + return min(latency * LATENCY_MULTIPLIER, max_delay_us); } return LATENCY_MULTIPLIER; -->8-- > > --- > > Also a note that on the Pixel6 I have, transition_latency=5ms, > so the platform's policies would end up with transition_delay=5ms Yes I know. But at this stage it's a driver issue. I think this value is not correct and there's a typo. > > > > > > > > > > > 2. > > > There are references to the 10ms values at other places in the code: > > > > > > include/linux/cpufreq.h > > > * ondemand governor will work on any processor with transition latency <= 10ms, > > > > Not sure this one needs updating. Especially with the change above which means > > 10ms could theoretically happen. But if there are suggestions happy to take > > them. > > a. > LATENCY_MULTIPLIER introduction: > 112124ab0a9f ("[CPUFREQ] ondemand/conservative: sanitize sampling_rate restrictions") > > b. > max_transition_latency removal: > ed4676e25463 ("cpufreq: Replace "max_transition_latency" with "dynamic_switching"") > > c. > dynamic_switching removal: > 9a2a9ebc0a75 ("cpufreq: Introduce governor flags") > > The value could be removed independently from this patch indeed, as this is not > related to cpufreq_policy_transition_delay_us() since b. Thanks for sending the patch. > > > > > > > > > > drivers/cpufreq/cpufreq.c > > > * For platforms that can change the frequency very fast (< 10 > > > * us), the above formula gives a decent transition delay. But > > > -> the 10us value matches 10ms = 10us * LATENCY_MULTIPLIER > > > > I can't find this one. > > It's in cpufreq_policy_transition_delay_us(), > "the 10us value matches 10ms = 10us * LATENCY_MULTIPLIER" > is a sentence I wrote, the comment to modify would be: > """ > * For platforms that can change the frequency very fast (< 10 > * us), the above formula gives a decent transition delay. But > """ Ah you were referring to s/10/20/. Done. > > > > > > > > > Documentation/admin-guide/pm/cpufreq.rst > > > Typically, it is set to values of the order of 10000 (10 ms). Its > > > default value is equal to the value of ``cpuinfo_transition_latency`` > > > > I am not sure about this one. It refers to cpuinfo_transition_latency not the > > delay and uses a formula to calculate it based on that. > > > > Seems the paragraph needs updating in general to reflect other changes? > > aa7519af450d ("cpufreq: Use transition_delay_us for legacy governors as well") > > The cpufreq_policy_transition_delay_us() was introduced there and integrates the > 10ms value related to this paragraph. > > --- > > I assume that if we keep the 10ms value in the code, it should be ok to let > the comment as is. I'll send a patch to remove the first one as it can be > done independently. Thanks! -- Qais Yousef
On 2/23/24 00:39, Qais Yousef wrote: > On 02/22/24 16:15, Pierre Gondois wrote: > >>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c >>> index 66cef33c4ec7..68a5ba24a5e0 100644 >>> --- a/drivers/cpufreq/cpufreq.c >>> +++ b/drivers/cpufreq/cpufreq.c >>> @@ -576,6 +576,15 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) >>> >>> latency = policy->cpuinfo.transition_latency / NSEC_PER_USEC; >>> if (latency) { >>> + unsigned int max_delay_us = 2 * MSEC_PER_SEC;; >>> + >>> + /* >>> + * If the platform already has high transition_latency, use it >>> + * as-is. >>> + */ >>> + if (latency > max_delay_us) >> [1] return min(latency, 10ms); >>> + return latency; >>> + >>> /* >>> * For platforms that can change the frequency very fast (< 10 >>> * us), the above formula gives a decent transition delay. But >>> @@ -586,7 +595,7 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) >>> * a reasonable amount of time after which we should reevaluate >>> * the frequency. >>> */ >>> - return min(latency * LATENCY_MULTIPLIER, (unsigned int)(2 * MSEC_PER_SEC)); >>> + return min(latency * LATENCY_MULTIPLIER, max_delay_us); >>> } >>> >>> return LATENCY_MULTIPLIER; >>> >> >> A policy with these values: >> - transition_delay_us = 0 >> - transition_latency = 30ms >> would get a transition_delay of 30ms I think. >> >> Maybe it would be better to default to the old value in this case [1]. > > Hmm. I think it wouldn't make sense to have 2 levels of capping. It's either we > cap to 2ms, or honour the transition latency from the driver if it is higher > and let it lower it if it can truly handle smaller values? > > Rafael, should I send a new version of the patch, a new patch on top or would > you like to take a fixup if you can amend the commit? If you and Viresh think > the two levels of capping make sense as suggested above let me know. I think > better to delegate to the drivers if they report transition_latency higher than > 2ms. The latency can be computed by dev_pm_opp_get_max_transition_latency() and one of its component comes from `clock-latency-ns` DT binding. The maximum value I saw is 10ms, so it seems it should be ok not to add: `min(latency, 10ms)` > > -->8-- > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index 66cef33c4ec7..668263c53810 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -576,8 +576,17 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) > > latency = policy->cpuinfo.transition_latency / NSEC_PER_USEC; > if (latency) { > + unsigned int max_delay_us = 2 * MSEC_PER_SEC;; > + > + /* > + * If the platform already has high transition_latency, use it > + * as-is. > + */ > + if (latency > max_delay_us) > + return latency; > + > /* > - * For platforms that can change the frequency very fast (< 10 > + * For platforms that can change the frequency very fast (< 20 I think it should be 10->2us as we do: min(latency * 1000, 2ms) > * us), the above formula gives a decent transition delay. But > * for platforms where transition_latency is in milliseconds, it > * ends up giving unrealistic values. > @@ -586,7 +595,7 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) > * a reasonable amount of time after which we should reevaluate > * the frequency. > */ > - return min(latency * LATENCY_MULTIPLIER, (unsigned int)(2 * MSEC_PER_SEC)); > + return min(latency * LATENCY_MULTIPLIER, max_delay_us); > } > > return LATENCY_MULTIPLIER; > > -->8-- > >> >> --- >> >> Also a note that on the Pixel6 I have, transition_latency=5ms, >> so the platform's policies would end up with transition_delay=5ms > > Yes I know. But at this stage it's a driver issue. I think this value is not > correct and there's a typo. > >> >> >>>> >>>> >>>> 2. >>>> There are references to the 10ms values at other places in the code: >>>> >>>> include/linux/cpufreq.h >>>> * ondemand governor will work on any processor with transition latency <= 10ms, >>> >>> Not sure this one needs updating. Especially with the change above which means >>> 10ms could theoretically happen. But if there are suggestions happy to take >>> them. >> >> a. >> LATENCY_MULTIPLIER introduction: >> 112124ab0a9f ("[CPUFREQ] ondemand/conservative: sanitize sampling_rate restrictions") >> >> b. >> max_transition_latency removal: >> ed4676e25463 ("cpufreq: Replace "max_transition_latency" with "dynamic_switching"") >> >> c. >> dynamic_switching removal: >> 9a2a9ebc0a75 ("cpufreq: Introduce governor flags") >> >> The value could be removed independently from this patch indeed, as this is not >> related to cpufreq_policy_transition_delay_us() since b. > > Thanks for sending the patch. > >> >> >>> >>>> >>>> drivers/cpufreq/cpufreq.c >>>> * For platforms that can change the frequency very fast (< 10 >>>> * us), the above formula gives a decent transition delay. But >>>> -> the 10us value matches 10ms = 10us * LATENCY_MULTIPLIER >>> >>> I can't find this one. >> >> It's in cpufreq_policy_transition_delay_us(), >> "the 10us value matches 10ms = 10us * LATENCY_MULTIPLIER" >> is a sentence I wrote, the comment to modify would be: >> """ >> * For platforms that can change the frequency very fast (< 10 >> * us), the above formula gives a decent transition delay. But >> """ > > Ah you were referring to s/10/20/. Done. > >> >>> >>>> >>>> Documentation/admin-guide/pm/cpufreq.rst >>>> Typically, it is set to values of the order of 10000 (10 ms). Its >>>> default value is equal to the value of ``cpuinfo_transition_latency`` >>> >>> I am not sure about this one. It refers to cpuinfo_transition_latency not the >>> delay and uses a formula to calculate it based on that. >>> >>> Seems the paragraph needs updating in general to reflect other changes? >> >> aa7519af450d ("cpufreq: Use transition_delay_us for legacy governors as well") >> >> The cpufreq_policy_transition_delay_us() was introduced there and integrates the >> 10ms value related to this paragraph. >> >> --- >> >> I assume that if we keep the 10ms value in the code, it should be ok to let >> the comment as is. I'll send a patch to remove the first one as it can be >> done independently. > > Thanks! > > -- > Qais Yousef
On 02/23/24 10:48, Pierre Gondois wrote: > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > > index 66cef33c4ec7..668263c53810 100644 > > --- a/drivers/cpufreq/cpufreq.c > > +++ b/drivers/cpufreq/cpufreq.c > > @@ -576,8 +576,17 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) > > latency = policy->cpuinfo.transition_latency / NSEC_PER_USEC; > > if (latency) { > > + unsigned int max_delay_us = 2 * MSEC_PER_SEC;; > > + > > + /* > > + * If the platform already has high transition_latency, use it > > + * as-is. > > + */ > > + if (latency > max_delay_us) > > + return latency; > > + > > /* > > - * For platforms that can change the frequency very fast (< 10 > > + * For platforms that can change the frequency very fast (< 20 > > I think it should be 10->2us as we do: > min(latency * 1000, 2ms) Yeah I meant 2, that was a typo Thanks -->8-- diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 66cef33c4ec7..de92a9912587 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -576,8 +576,17 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) latency = policy->cpuinfo.transition_latency / NSEC_PER_USEC; if (latency) { + unsigned int max_delay_us = 2 * MSEC_PER_SEC;; + + /* + * If the platform already has high transition_latency, use it + * as-is. + */ + if (latency > max_delay_us) + return latency; + /* - * For platforms that can change the frequency very fast (< 10 + * For platforms that can change the frequency very fast (< 2 * us), the above formula gives a decent transition delay. But * for platforms where transition_latency is in milliseconds, it * ends up giving unrealistic values. @@ -586,7 +595,7 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) * a reasonable amount of time after which we should reevaluate * the frequency. */ - return min(latency * LATENCY_MULTIPLIER, (unsigned int)(2 * MSEC_PER_SEC)); + return min(latency * LATENCY_MULTIPLIER, max_delay_us); } return LATENCY_MULTIPLIER;
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 44db4f59c4cc..8207f7294cb6 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -582,11 +582,11 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy) * for platforms where transition_latency is in milliseconds, it * ends up giving unrealistic values. * - * Cap the default transition delay to 10 ms, which seems to be + * Cap the default transition delay to 2 ms, which seems to be * a reasonable amount of time after which we should reevaluate * the frequency. */ - return min(latency * LATENCY_MULTIPLIER, (unsigned int)10000); + return min(latency * LATENCY_MULTIPLIER, (unsigned int)(2*MSEC_PER_SEC)); } return LATENCY_MULTIPLIER;