Message ID | 20221121205647.23343-1-palmer@rivosinc.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1830267wrr; Mon, 21 Nov 2022 13:00:46 -0800 (PST) X-Google-Smtp-Source: AA0mqf71Kn9GcIRuMCqKE9JpnWia3xt+RL0+hgwnUS30wdjVwamJ220tWL8+Jl5+WrDicxY1Olhn X-Received: by 2002:a17:907:7650:b0:781:e568:294f with SMTP id kj16-20020a170907765000b00781e568294fmr2995458ejc.447.1669064445964; Mon, 21 Nov 2022 13:00:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669064445; cv=none; d=google.com; s=arc-20160816; b=rkZGHOA6t3wbqM0O5O/qs294W4m+danpQARcGn4klSyYHlfifxahBzSqaa3d3EAsz5 hpzoGtrqgjhChAWm4+r19oCmRkBaEF8Kx8x+cFe6mTI8bqnU7uLOZUcBSqPtysqFrcwg aFFY38PDnYPTU9aTaC/GdcM9HmJrmds+WmEyHsRnoE4iMcAvwQOHvOMFEMDUTfiG8W+e RAeMvCBSbhvCA3TWyErAb8/dn78gGCsaBkI/tx1wkyDCpFvn4RP1m/0PiYkQifl13VhE bxqBBhxAZ3i7szOtHxcJW/rbEFFlM1oKsxrjXr12++Og8iqS+dccXhNiuKdTqw0abBF5 hW8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:from:cc:content-transfer-encoding :mime-version:message-id:date:subject:dkim-signature; bh=a+8s62R9vB+JzHkkOoet6Cx6GWbxqA3eYfIMwlFnB9o=; b=m1DQ7aOfhhI71Lf0mtopVUav6BpmcXdWb4hDKKDv/PNdh8yFwLyBEG+e/nPWyGUq4L iooJ3ynLEgk87TOSHoPR39b5HRgBcAK4zRwY0nHwovKxXY4HToBOiCEaNF2bBfBD2OgJ 9bNkfnh3itSXw4ommYwNSPMeYtbVrmRsvs8QwKyS963Z4dCDVGuabfTIBdjdaDGaBPKT NUEKUhH5JVMCoqhqHGPj6yCTppid7y9Iug0FHhTtJRaxNmMpF2Q+ywyxzvI9gQo+N1Fk hEoA13fa2ZlTvHST0NpE9ID3AeHxH2IhV03Lbo5s3K+53VkoATKVznaii1RJhvrJnLt1 Cscg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@rivosinc-com.20210112.gappssmtp.com header.s=20210112 header.b=fniv4c5W; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id xa11-20020a170906fd8b00b007306ac0faa0si3779793ejb.615.2022.11.21.13.00.20; Mon, 21 Nov 2022 13:00:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@rivosinc-com.20210112.gappssmtp.com header.s=20210112 header.b=fniv4c5W; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230202AbiKUU5X (ORCPT <rfc822;cjcooper78@gmail.com> + 99 others); Mon, 21 Nov 2022 15:57:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56000 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230148AbiKUU5R (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 21 Nov 2022 15:57:17 -0500 Received: from mail-pl1-x62e.google.com (mail-pl1-x62e.google.com [IPv6:2607:f8b0:4864:20::62e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1ABE2D92EF for <linux-kernel@vger.kernel.org>; Mon, 21 Nov 2022 12:57:17 -0800 (PST) Received: by mail-pl1-x62e.google.com with SMTP id b21so11680163plc.9 for <linux-kernel@vger.kernel.org>; Mon, 21 Nov 2022 12:57:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20210112.gappssmtp.com; s=20210112; h=to:from:cc:content-transfer-encoding:mime-version:message-id:date :subject:from:to:cc:subject:date:message-id:reply-to; bh=a+8s62R9vB+JzHkkOoet6Cx6GWbxqA3eYfIMwlFnB9o=; b=fniv4c5WvNO/fe0wN9wrXEKEfDwyhJUrFdsvKJuO4y8MLgjdPfrmtbqzVu9910adRx JhWCrZzOw5Y4TMTcRNRUMinuR1pTJc07Ab2uIsBTenDVRwIWcSufF0FviArIpiMuWU+P IPtgSPdMeOKp463RZepfWc5GcgnQ1IXwCnXW25wm8eXMbmVuzAGpXvLVDLRyyao1O19g TSpEVqACWSznUcJxTOOQQB2Hl65J4fAhUqLmy4Dld6xq7YGsBTKCXVwuBv4u7AMIz7/t DTzDtC7BTLu1NCSJNGzUlPFc5UqYceYH3rDOtzavepUkuWizF7c6H8sq7yXjv7CjjjEH Xoaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:from:cc:content-transfer-encoding:mime-version:message-id:date :subject:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=a+8s62R9vB+JzHkkOoet6Cx6GWbxqA3eYfIMwlFnB9o=; b=1qP5BmzmNpkvXDRZgrOVVE59tb81icoE8hc6E+BM/JxDjAmaTznzeeah2Y58lPAwQ6 LISQB+nln0rATEMPycZiSlX9avUhRcokROkgHiBYZWMiAfc0s7H9Ei+DClVDV2EL3BkN BgQ4ctLTPBzvUNxwpBCXY44ecxGUS0xQI4QBV601YnNdealShspE2Z0qXPsPCxIM8Lyh G4Mcju0PX15nu/DFkbraXSi2BVtnVXjfQpy6t2nH8LKhlq8AJ9lzxaU4+jicLztvgFFG 1tjSFlgaIE4s/Slzn54AW/+wz/2bIUgMOb4Y7VkGMZXQvEziaQV9TN8h8WJb36CLof4U VrfQ== X-Gm-Message-State: ANoB5pnYBVCxJrFUP7atlQhek1eRPd7SolGIJKGyOWK7IsC9h+y5vU0N 4YdkVM5IrALFeaJ9gIu0tphA5Q== X-Received: by 2002:a17:90b:3c0d:b0:20d:478a:9d75 with SMTP id pb13-20020a17090b3c0d00b0020d478a9d75mr28239558pjb.149.1669064236510; Mon, 21 Nov 2022 12:57:16 -0800 (PST) Received: from localhost ([50.221.140.188]) by smtp.gmail.com with ESMTPSA id b10-20020a170902650a00b0016d773aae60sm4937091plk.19.2022.11.21.12.57.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Nov 2022 12:57:16 -0800 (PST) Subject: [PATCH] cpuidle: riscv-sbi: Stop using non-retentive suspend Date: Mon, 21 Nov 2022 12:56:47 -0800 Message-Id: <20221121205647.23343-1-palmer@rivosinc.com> X-Mailer: git-send-email 2.38.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Cc: anup@brainfault.org, rafael@kernel.org, daniel.lezcano@linaro.org, Paul Walmsley <paul.walmsley@sifive.com>, Palmer Dabbelt <palmer@dabbelt.com>, aou@eecs.berkeley.edu, linux-pm@vger.kernel.org, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux@rivosinc.com, Palmer Dabbelt <palmer@rivosinc.com> From: Palmer Dabbelt <palmer@rivosinc.com> To: Conor Dooley <conor@kernel.org> X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1750140920056635343?= X-GMAIL-MSGID: =?utf-8?q?1750140920056635343?= |
Series |
cpuidle: riscv-sbi: Stop using non-retentive suspend
|
|
Commit Message
Palmer Dabbelt
Nov. 21, 2022, 8:56 p.m. UTC
From: Palmer Dabbelt <palmer@rivosinc.com> As per [1], whether or not the core can wake up from non-retentive suspend is a platform-specific detail. We don't have any way to encode that, so just stop using them until we've sorted that out. Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687 Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver") Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> --- This should allow us to revert 232ccac1bd9b ("clocksource/drivers/riscv: Events are stopped during CPU suspend"), which fixes suspend on the D1 but breaks timers everywhere. --- drivers/cpuidle/cpuidle-riscv-sbi.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
Comments
On Mon, Nov 21, 2022 at 12:56:47PM -0800, Palmer Dabbelt wrote: > From: Palmer Dabbelt <palmer@rivosinc.com> > > As per [1], whether or not the core can wake up from non-retentive > suspend is a platform-specific detail. We don't have any way to encode > that, so just stop using them until we've sorted that out. For anyone playing along at home, Anup had a proposal for encoding this information (yoinked from the GH issue below): https://lore.kernel.org/all/20220727114302.302201-1-apatel@ventanamicro.com/> > Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687 > Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver") > Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> > > --- > > This should allow us to revert 232ccac1bd9b ("clocksource/drivers/riscv: > Events are stopped during CPU suspend"), which fixes suspend on the D1 > but breaks timers everywhere. FWIW the revert is at: https://lore.kernel.org/linux-riscv/20221023185444.678573-1-conor@kernel.org/ Commit message is probably a little lacking as I didn't understand the problem when I wrote it. I'll respin the revert with a tidier message tomorrow. Acked-by: Conor Dooley <conor.dooley@microchip.com> > --- > drivers/cpuidle/cpuidle-riscv-sbi.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/drivers/cpuidle/cpuidle-riscv-sbi.c b/drivers/cpuidle/cpuidle-riscv-sbi.c > index 05fe2902df9a..9d1063a54495 100644 > --- a/drivers/cpuidle/cpuidle-riscv-sbi.c > +++ b/drivers/cpuidle/cpuidle-riscv-sbi.c > @@ -214,6 +214,17 @@ static bool sbi_suspend_state_is_valid(u32 state) > if (state > SBI_HSM_SUSPEND_NON_RET_DEFAULT && > state < SBI_HSM_SUSPEND_NON_RET_PLATFORM) > return false; > + > + /* > + * Whether or not RISC-V systems deliver interrupts to harts in a > + * non-retentive suspend state is a platform-specific detail. This can > + * leave the hart unable to wake up, so just mark these states as > + * unsupported until we have a mechanism to expose these > + * platform-specific details to Linux. > + */ > + if (state & SBI_HSM_SUSP_NON_RET_BIT) > + return false; > + > return true; > } > > -- > 2.38.1 >
Hi Palmer, On 11/21/22 14:56, Palmer Dabbelt wrote: > From: Palmer Dabbelt <palmer@rivosinc.com> > > As per [1], whether or not the core can wake up from non-retentive > suspend is a platform-specific detail. We don't have any way to encode > that, so just stop using them until we've sorted that out. We do have *exactly* a way to encode that. Specifically, the existence or non-existence of a non-retentive CPU suspend state in the DT. If a hart has no way of resuming from non-retentive suspend (i.e. a complete lack of interrupt delivery in non-retentive suspend), then it makes zero sense to advertise such a suspend state in the DT. Therefore, if the state exists in the DT, you can assume there is *some* interrupt that can wake up the hart. And I would extend that to assume at least one of those wakeup-capable interrupts is a timer interrupt, although not necessarily the architectural timer interrupt. > Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687 This comment refers specifically to the architectural timer interrupt, not interrupts more generally. > Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver") > Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> > > --- > > This should allow us to revert 232ccac1bd9b ("clocksource/drivers/riscv: > Events are stopped during CPU suspend"), which fixes suspend on the D1 > but breaks timers everywhere. I understand that reverting 232ccac1bd9b is the easiest way to fix the issues you and others are seeing. I do not have any functioning RISC-V hardware with SMP, so it is hard for me to help debug the root issue in the Linux timer code. I do not know why turning on the C3STOP flag breaks RCU stall detection or clock_nanosleep(), but I agree that breakage should not happen. So while I still think 232ccac1bd9b is the right change to make from a "following the spec" standpoint, I am okay with reverting it for pragmatic reasons. Since the D1 has another timer driver that is currently used in preference to the RISC-V/SBI timer triver, reverting 232ccac1bd9b does not break non-retentive suspend for the D1. But please do not make the change below. It is unnecessarily broad, and will break something that works fine right now. If the DT advertises a CPU suspend state that cannot be woken up from at all, then that is a bug in the DT. Linux should not try to work around that. So revert 232ccac1bd9b for now, and we can figure out what to do about the DT property, but please do not merge this. Regards, Samuel > --- > drivers/cpuidle/cpuidle-riscv-sbi.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/drivers/cpuidle/cpuidle-riscv-sbi.c b/drivers/cpuidle/cpuidle-riscv-sbi.c > index 05fe2902df9a..9d1063a54495 100644 > --- a/drivers/cpuidle/cpuidle-riscv-sbi.c > +++ b/drivers/cpuidle/cpuidle-riscv-sbi.c > @@ -214,6 +214,17 @@ static bool sbi_suspend_state_is_valid(u32 state) > if (state > SBI_HSM_SUSPEND_NON_RET_DEFAULT && > state < SBI_HSM_SUSPEND_NON_RET_PLATFORM) > return false; > + > + /* > + * Whether or not RISC-V systems deliver interrupts to harts in a > + * non-retentive suspend state is a platform-specific detail. This can > + * leave the hart unable to wake up, so just mark these states as > + * unsupported until we have a mechanism to expose these > + * platform-specific details to Linux. > + */ > + if (state & SBI_HSM_SUSP_NON_RET_BIT) > + return false; > + > return true; > } >
On Tue, Nov 22, 2022 at 2:27 AM Palmer Dabbelt <palmer@rivosinc.com> wrote: > > From: Palmer Dabbelt <palmer@rivosinc.com> > > As per [1], whether or not the core can wake up from non-retentive > suspend is a platform-specific detail. We don't have any way to encode > that, so just stop using them until we've sorted that out. > > Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687 > Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver") > Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> This is just unnecessary maintenance churn and it's not the right way to go. Better to fix this the right way instead of having a temporary fix. I had already sent-out a patch series 5 months back to describe this in DT: https://lore.kernel.org/lkml/20220727114302.302201-1-apatel@ventanamicro.com/ No one has commented/suggested anything (except Samuel Holland and Sudeep Holla). Please review this series. I can quickly address comments to make this available for Linux-6.2. Until this series is merged, the affected platforms can simply remove non-retentive suspend states from their DT. With all due respect, NACK to this patch from my side. Regards, Anup > > --- > > This should allow us to revert 232ccac1bd9b ("clocksource/drivers/riscv: > Events are stopped during CPU suspend"), which fixes suspend on the D1 > but breaks timers everywhere. > --- > drivers/cpuidle/cpuidle-riscv-sbi.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/drivers/cpuidle/cpuidle-riscv-sbi.c b/drivers/cpuidle/cpuidle-riscv-sbi.c > index 05fe2902df9a..9d1063a54495 100644 > --- a/drivers/cpuidle/cpuidle-riscv-sbi.c > +++ b/drivers/cpuidle/cpuidle-riscv-sbi.c > @@ -214,6 +214,17 @@ static bool sbi_suspend_state_is_valid(u32 state) > if (state > SBI_HSM_SUSPEND_NON_RET_DEFAULT && > state < SBI_HSM_SUSPEND_NON_RET_PLATFORM) > return false; > + > + /* > + * Whether or not RISC-V systems deliver interrupts to harts in a > + * non-retentive suspend state is a platform-specific detail. This can > + * leave the hart unable to wake up, so just mark these states as > + * unsupported until we have a mechanism to expose these > + * platform-specific details to Linux. > + */ > + if (state & SBI_HSM_SUSP_NON_RET_BIT) > + return false; > + > return true; > } > > -- > 2.38.1 >
On Mon, 21 Nov 2022 19:45:07 PST (-0800), anup@brainfault.org wrote: > On Tue, Nov 22, 2022 at 2:27 AM Palmer Dabbelt <palmer@rivosinc.com> wrote: >> >> From: Palmer Dabbelt <palmer@rivosinc.com> >> >> As per [1], whether or not the core can wake up from non-retentive >> suspend is a platform-specific detail. We don't have any way to encode >> that, so just stop using them until we've sorted that out. >> >> Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687 >> Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver") >> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> > > This is just unnecessary maintenance churn and it's not the > right way to go. Better to fix this the right way instead of having > a temporary fix. > > I had already sent-out a patch series 5 months back to describe > this in DT: > https://lore.kernel.org/lkml/20220727114302.302201-1-apatel@ventanamicro.com/ > > No one has commented/suggested anything (except Samuel > Holland and Sudeep Holla). I see some comments from Krzysztof here <https://lore.kernel.org/lkml/7a0477a0-9f0f-87d6-4070-30321745f4cc@linaro.org/> as well. Looks like everyone is pointing out that having our CPU nodes encode timers is a bad idea, my guess is that they're probably right. > Please review this series. I can quickly address comments to > make this available for Linux-6.2. Until this series is merged, > the affected platforms can simply remove non-retentive suspend > states from their DT. That leaves us with a dependency between kernel versions and DT bindings: kernels with the current driver will result in broken systems with the non-retentive suspend states in the DT they boot with when those states can't wake up the CPU. > With all due respect, NACK to this patch from my side. > > Regards, > Anup > >> >> --- >> >> This should allow us to revert 232ccac1bd9b ("clocksource/drivers/riscv: >> Events are stopped during CPU suspend"), which fixes suspend on the D1 >> but breaks timers everywhere. >> --- >> drivers/cpuidle/cpuidle-riscv-sbi.c | 11 +++++++++++ >> 1 file changed, 11 insertions(+) >> >> diff --git a/drivers/cpuidle/cpuidle-riscv-sbi.c b/drivers/cpuidle/cpuidle-riscv-sbi.c >> index 05fe2902df9a..9d1063a54495 100644 >> --- a/drivers/cpuidle/cpuidle-riscv-sbi.c >> +++ b/drivers/cpuidle/cpuidle-riscv-sbi.c >> @@ -214,6 +214,17 @@ static bool sbi_suspend_state_is_valid(u32 state) >> if (state > SBI_HSM_SUSPEND_NON_RET_DEFAULT && >> state < SBI_HSM_SUSPEND_NON_RET_PLATFORM) >> return false; >> + >> + /* >> + * Whether or not RISC-V systems deliver interrupts to harts in a >> + * non-retentive suspend state is a platform-specific detail. This can >> + * leave the hart unable to wake up, so just mark these states as >> + * unsupported until we have a mechanism to expose these >> + * platform-specific details to Linux. >> + */ >> + if (state & SBI_HSM_SUSP_NON_RET_BIT) >> + return false; >> + >> return true; >> } >> >> -- >> 2.38.1 >>
On Tue, Nov 22, 2022 at 10:46 AM Palmer Dabbelt <palmer@rivosinc.com> wrote: > > On Mon, 21 Nov 2022 19:45:07 PST (-0800), anup@brainfault.org wrote: > > On Tue, Nov 22, 2022 at 2:27 AM Palmer Dabbelt <palmer@rivosinc.com> wrote: > >> > >> From: Palmer Dabbelt <palmer@rivosinc.com> > >> > >> As per [1], whether or not the core can wake up from non-retentive > >> suspend is a platform-specific detail. We don't have any way to encode > >> that, so just stop using them until we've sorted that out. > >> > >> Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687 > >> Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver") > >> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> > > > > This is just unnecessary maintenance churn and it's not the > > right way to go. Better to fix this the right way instead of having > > a temporary fix. > > > > I had already sent-out a patch series 5 months back to describe > > this in DT: > > https://lore.kernel.org/lkml/20220727114302.302201-1-apatel@ventanamicro.com/ > > > > No one has commented/suggested anything (except Samuel > > Holland and Sudeep Holla). > > I see some comments from Krzysztof here > <https://lore.kernel.org/lkml/7a0477a0-9f0f-87d6-4070-30321745f4cc@linaro.org/> > as well. Looks like everyone is pointing out that having our CPU nodes > encode timers is a bad idea, my guess is that they're probably right. Adding a separate timer DT node, creates a new set of compatibility issues for existing platforms. I am fine updating my series to have separate timer DT node but do we want to go in this direction ? Even if ARM has a separate timer DT node, the timers are still part of the CPU. It depends on how we see the DT bindings aligning with actual HW. > > > Please review this series. I can quickly address comments to > > make this available for Linux-6.2. Until this series is merged, > > the affected platforms can simply remove non-retentive suspend > > states from their DT. > > That leaves us with a dependency between kernel versions and DT > bindings: kernels with the current driver will result in broken systems > with the non-retentive suspend states in the DT they boot with when > those states can't wake up the CPU. This is not a new problem we are facing. Even in the ARM world, the DT bindings grew organically over time based on newer platform requirements. Now that we have a platform which does not want the time C3STOP feature, we need to first come-up with DT bindings to support this platform instead of temporarily disabling features which don't work on this platform. Regards, Anup > > > With all due respect, NACK to this patch from my side. > > > > Regards, > > Anup > > > >> > >> --- > >> > >> This should allow us to revert 232ccac1bd9b ("clocksource/drivers/riscv: > >> Events are stopped during CPU suspend"), which fixes suspend on the D1 > >> but breaks timers everywhere. > >> --- > >> drivers/cpuidle/cpuidle-riscv-sbi.c | 11 +++++++++++ > >> 1 file changed, 11 insertions(+) > >> > >> diff --git a/drivers/cpuidle/cpuidle-riscv-sbi.c b/drivers/cpuidle/cpuidle-riscv-sbi.c > >> index 05fe2902df9a..9d1063a54495 100644 > >> --- a/drivers/cpuidle/cpuidle-riscv-sbi.c > >> +++ b/drivers/cpuidle/cpuidle-riscv-sbi.c > >> @@ -214,6 +214,17 @@ static bool sbi_suspend_state_is_valid(u32 state) > >> if (state > SBI_HSM_SUSPEND_NON_RET_DEFAULT && > >> state < SBI_HSM_SUSPEND_NON_RET_PLATFORM) > >> return false; > >> + > >> + /* > >> + * Whether or not RISC-V systems deliver interrupts to harts in a > >> + * non-retentive suspend state is a platform-specific detail. This can > >> + * leave the hart unable to wake up, so just mark these states as > >> + * unsupported until we have a mechanism to expose these > >> + * platform-specific details to Linux. > >> + */ > >> + if (state & SBI_HSM_SUSP_NON_RET_BIT) > >> + return false; > >> + > >> return true; > >> } > >> > >> -- > >> 2.38.1 > >>
Hey Samuel, Thanks a lot for the extra context. On Mon, Nov 21, 2022 at 06:45:25PM -0600, Samuel Holland wrote: > Hi Palmer, > > On 11/21/22 14:56, Palmer Dabbelt wrote: > > From: Palmer Dabbelt <palmer@rivosinc.com> > > > > As per [1], whether or not the core can wake up from non-retentive > > suspend is a platform-specific detail. We don't have any way to encode > > that, so just stop using them until we've sorted that out. > > We do have *exactly* a way to encode that. Specifically, the existence > or non-existence of a non-retentive CPU suspend state in the DT. > > If a hart has no way of resuming from non-retentive suspend (i.e. a > complete lack of interrupt delivery in non-retentive suspend), then it > makes zero sense to advertise such a suspend state in the DT. I would have to agree with that. I think the sprawling conversation has confused us all at this point, I (prior to reading this mail) thought that suspend was borked on the D1. I don't think anyone is advertising specific states in the DT at the moment though, I had a check in the D1 patchset and didn't see anything there - unless you're adding it dynamically from the bootloader? > Therefore, > if the state exists in the DT, you can assume there is *some* interrupt > that can wake up the hart. And I would extend that to assume at least > one of those wakeup-capable interrupts is a timer interrupt, although > not necessarily the architectural timer interrupt. > > > Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687 > > This comment refers specifically to the architectural timer interrupt, > not interrupts more generally. > > > Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver") > > Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> > > > > --- > > > > This should allow us to revert 232ccac1bd9b ("clocksource/drivers/riscv: > > Events are stopped during CPU suspend"), which fixes suspend on the D1 > > but breaks timers everywhere. > > I understand that reverting 232ccac1bd9b is the easiest way to fix the > issues you and others are seeing. I am going to submit another respin of that revert, hopefully with the extra context from here and elsewhere mixed in. > I do not have any functioning RISC-V > hardware with SMP, so it is hard for me to help debug the root issue in > the Linux timer code. I do not know why turning on the C3STOP flag > breaks RCU stall detection or clock_nanosleep(), but I agree that > breakage should not happen. > > So while I still think 232ccac1bd9b is the right change to make from a > "following the spec" standpoint Right, so the spec says: Request the SBI implementation to put the calling hart in a platform specific suspend (or low power) state specified by the suspend_type parameter. The hart will automatically come out of suspended state and resume normal execution when it receives an interrupt or platform specific hardware event. That, as we have discussed a bunch of times, does not say whether a given interrupt should actually arrive during suspend. The correct behaviour, to me, is to assume that no events arrive during suspend. We've got a regular old SiFive implementation so I assume (and will go investigate at some point this week) that the other SiFive {,based} implementations also have timer events during suspend. > I am okay with reverting it for > pragmatic reasons. Since the D1 has another timer driver that is > currently used in preference to the RISC-V/SBI timer triver, Once we have got something in place to actually make the determination we can revert the revert. I'll go give some feedback on Anup's stuff, I've been meaning to but unfortunately not had the chance to do so yet. > reverting > 232ccac1bd9b does not break non-retentive suspend for the D1. Ah, I did not know this. Probably should have gone looking before I acked this patch - sorry! Since that's the case this patch seems un-needed. > But please do not make the change below. It is unnecessarily broad, and > will break something that works fine right now. If the DT advertises a > CPU suspend state that cannot be woken up from at all, then that is a > bug in the DT. Linux should not try to work around that. Thanks again Samuel :) > > --- > > drivers/cpuidle/cpuidle-riscv-sbi.c | 11 +++++++++++ > > 1 file changed, 11 insertions(+) > > > > diff --git a/drivers/cpuidle/cpuidle-riscv-sbi.c b/drivers/cpuidle/cpuidle-riscv-sbi.c > > index 05fe2902df9a..9d1063a54495 100644 > > --- a/drivers/cpuidle/cpuidle-riscv-sbi.c > > +++ b/drivers/cpuidle/cpuidle-riscv-sbi.c > > @@ -214,6 +214,17 @@ static bool sbi_suspend_state_is_valid(u32 state) > > if (state > SBI_HSM_SUSPEND_NON_RET_DEFAULT && > > state < SBI_HSM_SUSPEND_NON_RET_PLATFORM) > > return false; > > + > > + /* > > + * Whether or not RISC-V systems deliver interrupts to harts in a > > + * non-retentive suspend state is a platform-specific detail. This can > > + * leave the hart unable to wake up, so just mark these states as > > + * unsupported until we have a mechanism to expose these > > + * platform-specific details to Linux. > > + */ > > + if (state & SBI_HSM_SUSP_NON_RET_BIT) > > + return false; > > + > > return true; > > } > > > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
On Tue, Nov 22, 2022 at 11:06:15AM +0530, Anup Patel wrote: > On Tue, Nov 22, 2022 at 10:46 AM Palmer Dabbelt <palmer@rivosinc.com> wrote: > > > > On Mon, 21 Nov 2022 19:45:07 PST (-0800), anup@brainfault.org wrote: > > > On Tue, Nov 22, 2022 at 2:27 AM Palmer Dabbelt <palmer@rivosinc.com> wrote: > > >> > > >> From: Palmer Dabbelt <palmer@rivosinc.com> > > >> > > >> As per [1], whether or not the core can wake up from non-retentive > > >> suspend is a platform-specific detail. We don't have any way to encode > > >> that, so just stop using them until we've sorted that out. > > >> > > >> Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687 > > >> Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver") > > >> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> > > > > > > This is just unnecessary maintenance churn and it's not the > > > right way to go. Better to fix this the right way instead of having > > > a temporary fix. > > > > > > I had already sent-out a patch series 5 months back to describe > > > this in DT: > > > https://lore.kernel.org/lkml/20220727114302.302201-1-apatel@ventanamicro.com/ > > > > > > No one has commented/suggested anything (except Samuel > > > Holland and Sudeep Holla). > > > > I see some comments from Krzysztof here > > <https://lore.kernel.org/lkml/7a0477a0-9f0f-87d6-4070-30321745f4cc@linaro.org/> > > as well. Looks like everyone is pointing out that having our CPU nodes > > encode timers is a bad idea, my guess is that they're probably right. > > Adding a separate timer DT node, creates a new set of compatibility > issues for existing platforms. I am fine updating my series to have > separate timer DT node but do we want to go in this direction ? I don't really follow. How is there a compatibility issue created by adding a new node that is not added for a new property? Both will require changes to the device tree. (You need not reply here, I am going to review the other thread, it's been on my todo list for too long. Been caught up with non-coherent stuff & our sw release cycle..) > Even if ARM has a separate timer DT node, the timers are still part > of the CPU. It depends on how we see the DT bindings aligning with > actual HW. > > > > > > Please review this series. I can quickly address comments to > > > make this available for Linux-6.2. Until this series is merged, > > > the affected platforms can simply remove non-retentive suspend > > > states from their DT. > > > > That leaves us with a dependency between kernel versions and DT > > bindings: kernels with the current driver will result in broken systems > > with the non-retentive suspend states in the DT they boot with when > > those states can't wake up the CPU. Can someone point me at a (non D1 or virt) system that has suspend states in the DT that would need fixing? > This is not a new problem we are facing. Even in the ARM world, > the DT bindings grew organically over time based on newer platform > requirements. > > Now that we have a platform which does not want the time > C3STOP feature, we need to first come-up with DT bindings > to support this platform instead of temporarily disabling > features which don't work on this platform. It's the opposite surely? It should be "now that we have a platform that *does want* the C3STOP feature", right? > > > With all due respect, NACK to this patch from my side. As Samuel pointed out that the D1 doesn't actually use the timer in question, I think we are okay here? > > >> > > >> --- > > >> > > >> This should allow us to revert 232ccac1bd9b ("clocksource/drivers/riscv: > > >> Events are stopped during CPU suspend"), which fixes suspend on the D1 > > >> but breaks timers everywhere. > > >> --- > > >> drivers/cpuidle/cpuidle-riscv-sbi.c | 11 +++++++++++ > > >> 1 file changed, 11 insertions(+) > > >> > > >> diff --git a/drivers/cpuidle/cpuidle-riscv-sbi.c b/drivers/cpuidle/cpuidle-riscv-sbi.c > > >> index 05fe2902df9a..9d1063a54495 100644 > > >> --- a/drivers/cpuidle/cpuidle-riscv-sbi.c > > >> +++ b/drivers/cpuidle/cpuidle-riscv-sbi.c > > >> @@ -214,6 +214,17 @@ static bool sbi_suspend_state_is_valid(u32 state) > > >> if (state > SBI_HSM_SUSPEND_NON_RET_DEFAULT && > > >> state < SBI_HSM_SUSPEND_NON_RET_PLATFORM) > > >> return false; > > >> + > > >> + /* > > >> + * Whether or not RISC-V systems deliver interrupts to harts in a > > >> + * non-retentive suspend state is a platform-specific detail. This can > > >> + * leave the hart unable to wake up, so just mark these states as > > >> + * unsupported until we have a mechanism to expose these > > >> + * platform-specific details to Linux. > > >> + */ > > >> + if (state & SBI_HSM_SUSP_NON_RET_BIT) > > >> + return false; > > >> + > > >> return true; > > >> } > > >> > > >> -- > > >> 2.38.1 > > >>
On Tue, 22 Nov 2022 03:19:43 PST (-0800), conor.dooley@microchip.com wrote: > On Tue, Nov 22, 2022 at 11:06:15AM +0530, Anup Patel wrote: >> On Tue, Nov 22, 2022 at 10:46 AM Palmer Dabbelt <palmer@rivosinc.com> wrote: >> > >> > On Mon, 21 Nov 2022 19:45:07 PST (-0800), anup@brainfault.org wrote: >> > > On Tue, Nov 22, 2022 at 2:27 AM Palmer Dabbelt <palmer@rivosinc.com> wrote: >> > >> >> > >> From: Palmer Dabbelt <palmer@rivosinc.com> >> > >> >> > >> As per [1], whether or not the core can wake up from non-retentive >> > >> suspend is a platform-specific detail. We don't have any way to encode >> > >> that, so just stop using them until we've sorted that out. >> > >> >> > >> Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687 >> > >> Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver") >> > >> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> >> > > >> > > This is just unnecessary maintenance churn and it's not the >> > > right way to go. Better to fix this the right way instead of having >> > > a temporary fix. >> > > >> > > I had already sent-out a patch series 5 months back to describe >> > > this in DT: >> > > https://lore.kernel.org/lkml/20220727114302.302201-1-apatel@ventanamicro.com/ >> > > >> > > No one has commented/suggested anything (except Samuel >> > > Holland and Sudeep Holla). >> > >> > I see some comments from Krzysztof here >> > <https://lore.kernel.org/lkml/7a0477a0-9f0f-87d6-4070-30321745f4cc@linaro.org/> >> > as well. Looks like everyone is pointing out that having our CPU nodes >> > encode timers is a bad idea, my guess is that they're probably right. >> >> Adding a separate timer DT node, creates a new set of compatibility >> issues for existing platforms. I am fine updating my series to have >> separate timer DT node but do we want to go in this direction ? > > I don't really follow. How is there a compatibility issue created by > adding a new node that is not added for a new property? Both will > require changes to the device tree. (You need not reply here, I am going > to review the other thread, it's been on my todo list for too long. Been > caught up with non-coherent stuff & our sw release cycle..) > >> Even if ARM has a separate timer DT node, the timers are still part >> of the CPU. It depends on how we see the DT bindings aligning with >> actual HW. >> >> > >> > > Please review this series. I can quickly address comments to >> > > make this available for Linux-6.2. Until this series is merged, >> > > the affected platforms can simply remove non-retentive suspend >> > > states from their DT. >> > >> > That leaves us with a dependency between kernel versions and DT >> > bindings: kernels with the current driver will result in broken systems >> > with the non-retentive suspend states in the DT they boot with when >> > those states can't wake up the CPU. > > Can someone point me at a (non D1 or virt) system that has suspend > states in the DT that would need fixing? > >> This is not a new problem we are facing. Even in the ARM world, >> the DT bindings grew organically over time based on newer platform >> requirements. >> >> Now that we have a platform which does not want the time >> C3STOP feature, we need to first come-up with DT bindings >> to support this platform instead of temporarily disabling >> features which don't work on this platform. > > It's the opposite surely? It should be "now that we have a platform that > *does want* the C3STOP feature", right? IMO we just shouldn't be turning on C3STOP at all. Sure it's making the problem here go away, but it's trying to emulate a bunch of Intel timer features we don't have on RISC-V and ending up in a bunch of fallback paths. If we need some workaround in the timer subsystem to sort out the non-retentive states then we can sort that out, but my guess is that vendors are just going to 3 >> > > With all due respect, NACK to this patch from my side. > > As Samuel pointed out that the D1 doesn't actually use the timer in > question, I think we are okay here? IIUC it just should use the sunxi timer driver, but that requires some priority changes (and presumably breaks That said, I guess I'm confused about what's actually broken here. My understanding of the problem is: The D1's firmware disables some interrupts during non-retentive suspend, which results in SBI timers failing to wake up the core. The patch to add C3STOP makes the core come back from sleep, but that results in a bunch of other timer-related issues. So IMO we should revert the C3STOP patch as it's causing regressions (ie, old workloads break in order to make new ones work). Seems like folks mostly agree on that one? I also think we should stop entering non-retentive suspend until we can sort out how reliably wake up from it, as the SBI makes that a platform-specific detail. If the answer there is "non-retentive suspend is fine on the D1 as long as we don't use the SBI timers" then that seems fine, we just need some way to describe that in Linux -- that doesn't fix other platforms and other interrupts, but at least it's a start. Sorry if I've just misunderstood something here? > >> > >> >> > >> --- >> > >> >> > >> This should allow us to revert 232ccac1bd9b ("clocksource/drivers/riscv: >> > >> Events are stopped during CPU suspend"), which fixes suspend on the D1 >> > >> but breaks timers everywhere. >> > >> --- >> > >> drivers/cpuidle/cpuidle-riscv-sbi.c | 11 +++++++++++ >> > >> 1 file changed, 11 insertions(+) >> > >> >> > >> diff --git a/drivers/cpuidle/cpuidle-riscv-sbi.c b/drivers/cpuidle/cpuidle-riscv-sbi.c >> > >> index 05fe2902df9a..9d1063a54495 100644 >> > >> --- a/drivers/cpuidle/cpuidle-riscv-sbi.c >> > >> +++ b/drivers/cpuidle/cpuidle-riscv-sbi.c >> > >> @@ -214,6 +214,17 @@ static bool sbi_suspend_state_is_valid(u32 state) >> > >> if (state > SBI_HSM_SUSPEND_NON_RET_DEFAULT && >> > >> state < SBI_HSM_SUSPEND_NON_RET_PLATFORM) >> > >> return false; >> > >> + >> > >> + /* >> > >> + * Whether or not RISC-V systems deliver interrupts to harts in a >> > >> + * non-retentive suspend state is a platform-specific detail. This can >> > >> + * leave the hart unable to wake up, so just mark these states as >> > >> + * unsupported until we have a mechanism to expose these >> > >> + * platform-specific details to Linux. >> > >> + */ >> > >> + if (state & SBI_HSM_SUSP_NON_RET_BIT) >> > >> + return false; >> > >> + >> > >> return true; >> > >> } >> > >> >> > >> -- >> > >> 2.38.1 >> > >>
Hi Conor, On 11/22/22 05:06, Conor Dooley wrote: > Hey Samuel, > > Thanks a lot for the extra context. > > On Mon, Nov 21, 2022 at 06:45:25PM -0600, Samuel Holland wrote: >> Hi Palmer, >> >> On 11/21/22 14:56, Palmer Dabbelt wrote: >>> From: Palmer Dabbelt <palmer@rivosinc.com> >>> >>> As per [1], whether or not the core can wake up from non-retentive >>> suspend is a platform-specific detail. We don't have any way to encode >>> that, so just stop using them until we've sorted that out. >> >> We do have *exactly* a way to encode that. Specifically, the existence >> or non-existence of a non-retentive CPU suspend state in the DT. >> >> If a hart has no way of resuming from non-retentive suspend (i.e. a >> complete lack of interrupt delivery in non-retentive suspend), then it >> makes zero sense to advertise such a suspend state in the DT. > > I would have to agree with that. I think the sprawling conversation has > confused us all at this point, I (prior to reading this mail) thought > that suspend was borked on the D1. I don't think anyone is advertising > specific states in the DT at the moment though, I had a check in the D1 > patchset and didn't see anything there - unless you're adding it > dynamically from the bootloader? The availability and latency properties of idle states depend on the SBI implementation, so yes, they need to be added dynamically. >> Therefore, >> if the state exists in the DT, you can assume there is *some* interrupt >> that can wake up the hart. And I would extend that to assume at least >> one of those wakeup-capable interrupts is a timer interrupt, although >> not necessarily the architectural timer interrupt. >> >>> Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687 >> >> This comment refers specifically to the architectural timer interrupt, >> not interrupts more generally. >> >>> Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver") >>> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> >>> >>> --- >>> >>> This should allow us to revert 232ccac1bd9b ("clocksource/drivers/riscv: >>> Events are stopped during CPU suspend"), which fixes suspend on the D1 >>> but breaks timers everywhere. >> >> I understand that reverting 232ccac1bd9b is the easiest way to fix the >> issues you and others are seeing. > > I am going to submit another respin of that revert, hopefully with the > extra context from here and elsewhere mixed in. > >> I do not have any functioning RISC-V >> hardware with SMP, so it is hard for me to help debug the root issue in >> the Linux timer code. I do not know why turning on the C3STOP flag >> breaks RCU stall detection or clock_nanosleep(), but I agree that >> breakage should not happen. >> >> So while I still think 232ccac1bd9b is the right change to make from a >> "following the spec" standpoint > > Right, so the spec says: > Request the SBI implementation to put the calling hart in a platform > specific suspend (or low power) state specified by the suspend_type > parameter. The hart will automatically come out of suspended state and > resume normal execution when it receives an interrupt or platform > specific hardware event. > > That, as we have discussed a bunch of times, does not say whether a > given interrupt should actually arrive during suspend. The correct > behaviour, to me, is to assume that no events arrive during suspend. Are you suggesting that we need some property to declare the delivery of each kind of interrupt (software, timer, external, PMU)? I assumed that external interrupt delivery would be required to consider an idle state "viable", but I suppose it would be _possible_ to have a state where only timer interrupts are deliverable. > We've got a regular old SiFive implementation so I assume (and will go > investigate at some point this week) that the other SiFive {,based} > implementations also have timer events during suspend. > >> I am okay with reverting it for >> pragmatic reasons. Since the D1 has another timer driver that is >> currently used in preference to the RISC-V/SBI timer triver, > > Once we have got something in place to actually make the determination > we can revert the revert. I'll go give some feedback on Anup's stuff, > I've been meaning to but unfortunately not had the chance to do so yet. Thanks for following up on this. Regards, Samuel >> reverting >> 232ccac1bd9b does not break non-retentive suspend for the D1. > > Ah, I did not know this. Probably should have gone looking before I > acked this patch - sorry! > Since that's the case this patch seems un-needed. > >> But please do not make the change below. It is unnecessarily broad, and >> will break something that works fine right now. If the DT advertises a >> CPU suspend state that cannot be woken up from at all, then that is a >> bug in the DT. Linux should not try to work around that. > > Thanks again Samuel :) > >>> --- >>> drivers/cpuidle/cpuidle-riscv-sbi.c | 11 +++++++++++ >>> 1 file changed, 11 insertions(+) >>> >>> diff --git a/drivers/cpuidle/cpuidle-riscv-sbi.c b/drivers/cpuidle/cpuidle-riscv-sbi.c >>> index 05fe2902df9a..9d1063a54495 100644 >>> --- a/drivers/cpuidle/cpuidle-riscv-sbi.c >>> +++ b/drivers/cpuidle/cpuidle-riscv-sbi.c >>> @@ -214,6 +214,17 @@ static bool sbi_suspend_state_is_valid(u32 state) >>> if (state > SBI_HSM_SUSPEND_NON_RET_DEFAULT && >>> state < SBI_HSM_SUSPEND_NON_RET_PLATFORM) >>> return false; >>> + >>> + /* >>> + * Whether or not RISC-V systems deliver interrupts to harts in a >>> + * non-retentive suspend state is a platform-specific detail. This can >>> + * leave the hart unable to wake up, so just mark these states as >>> + * unsupported until we have a mechanism to expose these >>> + * platform-specific details to Linux. >>> + */ >>> + if (state & SBI_HSM_SUSP_NON_RET_BIT) >>> + return false; >>> + >>> return true; >>> } >>> >> >> >> _______________________________________________ >> linux-riscv mailing list >> linux-riscv@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-riscv
On Tue, Nov 22, 2022 at 4:50 PM Conor Dooley <conor.dooley@microchip.com> wrote: > > On Tue, Nov 22, 2022 at 11:06:15AM +0530, Anup Patel wrote: > > On Tue, Nov 22, 2022 at 10:46 AM Palmer Dabbelt <palmer@rivosinc.com> wrote: > > > > > > On Mon, 21 Nov 2022 19:45:07 PST (-0800), anup@brainfault.org wrote: > > > > On Tue, Nov 22, 2022 at 2:27 AM Palmer Dabbelt <palmer@rivosinc.com> wrote: > > > >> > > > >> From: Palmer Dabbelt <palmer@rivosinc.com> > > > >> > > > >> As per [1], whether or not the core can wake up from non-retentive > > > >> suspend is a platform-specific detail. We don't have any way to encode > > > >> that, so just stop using them until we've sorted that out. > > > >> > > > >> Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687 > > > >> Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver") > > > >> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> > > > > > > > > This is just unnecessary maintenance churn and it's not the > > > > right way to go. Better to fix this the right way instead of having > > > > a temporary fix. > > > > > > > > I had already sent-out a patch series 5 months back to describe > > > > this in DT: > > > > https://lore.kernel.org/lkml/20220727114302.302201-1-apatel@ventanamicro.com/ > > > > > > > > No one has commented/suggested anything (except Samuel > > > > Holland and Sudeep Holla). > > > > > > I see some comments from Krzysztof here > > > <https://lore.kernel.org/lkml/7a0477a0-9f0f-87d6-4070-30321745f4cc@linaro.org/> > > > as well. Looks like everyone is pointing out that having our CPU nodes > > > encode timers is a bad idea, my guess is that they're probably right. > > > > Adding a separate timer DT node, creates a new set of compatibility > > issues for existing platforms. I am fine updating my series to have > > separate timer DT node but do we want to go in this direction ? > > I don't really follow. How is there a compatibility issue created by > adding a new node that is not added for a new property? Both will > require changes to the device tree. (You need not reply here, I am going > to review the other thread, it's been on my todo list for too long. Been > caught up with non-coherent stuff & our sw release cycle..) Adding a new timer DT node would mean, the RISC-V timer driver will now be probed using the compatible to the new DT node whereas the RISC-V timer driver is currently probed using CPU DT nodes. > > > Even if ARM has a separate timer DT node, the timers are still part > > of the CPU. It depends on how we see the DT bindings aligning with > > actual HW. > > > > > > > > > Please review this series. I can quickly address comments to > > > > make this available for Linux-6.2. Until this series is merged, > > > > the affected platforms can simply remove non-retentive suspend > > > > states from their DT. > > > > > > That leaves us with a dependency between kernel versions and DT > > > bindings: kernels with the current driver will result in broken systems > > > with the non-retentive suspend states in the DT they boot with when > > > those states can't wake up the CPU. > > Can someone point me at a (non D1 or virt) system that has suspend > states in the DT that would need fixing? For the QEMU virt machine, the default non-retentive suspend state was tested using a temporary DTB provided separately via QEMU command line. The QEMU virt machine does not have its own HART suspend states so OpenSBI will functionally emulate default retentive/non-retentive suspend states. > > > This is not a new problem we are facing. Even in the ARM world, > > the DT bindings grew organically over time based on newer platform > > requirements. > > > > Now that we have a platform which does not want the time > > C3STOP feature, we need to first come-up with DT bindings > > to support this platform instead of temporarily disabling > > features which don't work on this platform. > > It's the opposite surely? It should be "now that we have a platform that > *does want* the C3STOP feature", right? Yes, we can think this way as well. > > > > > With all due respect, NACK to this patch from my side. > > As Samuel pointed out that the D1 doesn't actually use the timer in > question, I think we are okay here? Yes, that's why D1 needs the C3STOP flag. > > > > >> > > > >> --- > > > >> > > > >> This should allow us to revert 232ccac1bd9b ("clocksource/drivers/riscv: > > > >> Events are stopped during CPU suspend"), which fixes suspend on the D1 > > > >> but breaks timers everywhere. > > > >> --- > > > >> drivers/cpuidle/cpuidle-riscv-sbi.c | 11 +++++++++++ > > > >> 1 file changed, 11 insertions(+) > > > >> > > > >> diff --git a/drivers/cpuidle/cpuidle-riscv-sbi.c b/drivers/cpuidle/cpuidle-riscv-sbi.c > > > >> index 05fe2902df9a..9d1063a54495 100644 > > > >> --- a/drivers/cpuidle/cpuidle-riscv-sbi.c > > > >> +++ b/drivers/cpuidle/cpuidle-riscv-sbi.c > > > >> @@ -214,6 +214,17 @@ static bool sbi_suspend_state_is_valid(u32 state) > > > >> if (state > SBI_HSM_SUSPEND_NON_RET_DEFAULT && > > > >> state < SBI_HSM_SUSPEND_NON_RET_PLATFORM) > > > >> return false; > > > >> + > > > >> + /* > > > >> + * Whether or not RISC-V systems deliver interrupts to harts in a > > > >> + * non-retentive suspend state is a platform-specific detail. This can > > > >> + * leave the hart unable to wake up, so just mark these states as > > > >> + * unsupported until we have a mechanism to expose these > > > >> + * platform-specific details to Linux. > > > >> + */ > > > >> + if (state & SBI_HSM_SUSP_NON_RET_BIT) > > > >> + return false; > > > >> + > > > >> return true; > > > >> } > > > >> > > > >> -- > > > >> 2.38.1 > > > >> Regards, Anup
On Tue, Nov 22, 2022 at 8:58 PM Palmer Dabbelt <palmer@rivosinc.com> wrote: > > On Tue, 22 Nov 2022 03:19:43 PST (-0800), conor.dooley@microchip.com wrote: > > On Tue, Nov 22, 2022 at 11:06:15AM +0530, Anup Patel wrote: > >> On Tue, Nov 22, 2022 at 10:46 AM Palmer Dabbelt <palmer@rivosinc.com> wrote: > >> > > >> > On Mon, 21 Nov 2022 19:45:07 PST (-0800), anup@brainfault.org wrote: > >> > > On Tue, Nov 22, 2022 at 2:27 AM Palmer Dabbelt <palmer@rivosinc.com> wrote: > >> > >> > >> > >> From: Palmer Dabbelt <palmer@rivosinc.com> > >> > >> > >> > >> As per [1], whether or not the core can wake up from non-retentive > >> > >> suspend is a platform-specific detail. We don't have any way to encode > >> > >> that, so just stop using them until we've sorted that out. > >> > >> > >> > >> Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687 > >> > >> Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver") > >> > >> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> > >> > > > >> > > This is just unnecessary maintenance churn and it's not the > >> > > right way to go. Better to fix this the right way instead of having > >> > > a temporary fix. > >> > > > >> > > I had already sent-out a patch series 5 months back to describe > >> > > this in DT: > >> > > https://lore.kernel.org/lkml/20220727114302.302201-1-apatel@ventanamicro.com/ > >> > > > >> > > No one has commented/suggested anything (except Samuel > >> > > Holland and Sudeep Holla). > >> > > >> > I see some comments from Krzysztof here > >> > <https://lore.kernel.org/lkml/7a0477a0-9f0f-87d6-4070-30321745f4cc@linaro.org/> > >> > as well. Looks like everyone is pointing out that having our CPU nodes > >> > encode timers is a bad idea, my guess is that they're probably right. > >> > >> Adding a separate timer DT node, creates a new set of compatibility > >> issues for existing platforms. I am fine updating my series to have > >> separate timer DT node but do we want to go in this direction ? > > > > I don't really follow. How is there a compatibility issue created by > > adding a new node that is not added for a new property? Both will > > require changes to the device tree. (You need not reply here, I am going > > to review the other thread, it's been on my todo list for too long. Been > > caught up with non-coherent stuff & our sw release cycle..) > > > >> Even if ARM has a separate timer DT node, the timers are still part > >> of the CPU. It depends on how we see the DT bindings aligning with > >> actual HW. > >> > >> > > >> > > Please review this series. I can quickly address comments to > >> > > make this available for Linux-6.2. Until this series is merged, > >> > > the affected platforms can simply remove non-retentive suspend > >> > > states from their DT. > >> > > >> > That leaves us with a dependency between kernel versions and DT > >> > bindings: kernels with the current driver will result in broken systems > >> > with the non-retentive suspend states in the DT they boot with when > >> > those states can't wake up the CPU. > > > > Can someone point me at a (non D1 or virt) system that has suspend > > states in the DT that would need fixing? > > > >> This is not a new problem we are facing. Even in the ARM world, > >> the DT bindings grew organically over time based on newer platform > >> requirements. > >> > >> Now that we have a platform which does not want the time > >> C3STOP feature, we need to first come-up with DT bindings > >> to support this platform instead of temporarily disabling > >> features which don't work on this platform. > > > > It's the opposite surely? It should be "now that we have a platform that > > *does want* the C3STOP feature", right? > > IMO we just shouldn't be turning on C3STOP at all. Sure it's making the > problem here go away, but it's trying to emulate a bunch of Intel timer > features we don't have on RISC-V and ending up in a bunch of fallback > paths. > > If we need some workaround in the timer subsystem to sort out the > non-retentive states then we can sort that out, but my guess is that > vendors are just going to 3 Actually, it will be better if we set C3STOP in the RISC-V timer driver only for D1 (and future platforms that might need it). We can easily do this based on the D1 compatible string from the root DT node. > > >> > > With all due respect, NACK to this patch from my side. > > > > As Samuel pointed out that the D1 doesn't actually use the timer in > > question, I think we are okay here? > > IIUC it just should use the sunxi timer driver, but that requires some > priority changes (and presumably breaks > > That said, I guess I'm confused about what's actually broken here. My > understanding of the problem is: The D1's firmware disables some > interrupts during non-retentive suspend, which results in SBI timers > failing to wake up the core. The patch to add C3STOP makes the core > come back from sleep, but that results in a bunch of other timer-related > issues. > > So IMO we should revert the C3STOP patch as it's causing regressions > (ie, old workloads break in order to make new ones work). Seems like > folks mostly agree on that one? Yes, I agree. We can go a step further and add quirks in the RISC-V timer driver which only sets C3STOP for certain platforms (e.g. D1). > > I also think we should stop entering non-retentive suspend until we can > sort out how reliably wake up from it, as the SBI makes that a > platform-specific detail. If the answer there is "non-retentive suspend > is fine on the D1 as long as we don't use the SBI timers" then that > seems fine, we just need some way to describe that in Linux -- that > doesn't fix other platforms and other interrupts, but at least it's a > start. > > Sorry if I've just misunderstood something here? Rather than "stop entering non-retentive suspend", we should improve the relevant drivers. > > > > >> > >> > >> > >> --- > >> > >> > >> > >> This should allow us to revert 232ccac1bd9b ("clocksource/drivers/riscv: > >> > >> Events are stopped during CPU suspend"), which fixes suspend on the D1 > >> > >> but breaks timers everywhere. > >> > >> --- > >> > >> drivers/cpuidle/cpuidle-riscv-sbi.c | 11 +++++++++++ > >> > >> 1 file changed, 11 insertions(+) > >> > >> > >> > >> diff --git a/drivers/cpuidle/cpuidle-riscv-sbi.c b/drivers/cpuidle/cpuidle-riscv-sbi.c > >> > >> index 05fe2902df9a..9d1063a54495 100644 > >> > >> --- a/drivers/cpuidle/cpuidle-riscv-sbi.c > >> > >> +++ b/drivers/cpuidle/cpuidle-riscv-sbi.c > >> > >> @@ -214,6 +214,17 @@ static bool sbi_suspend_state_is_valid(u32 state) > >> > >> if (state > SBI_HSM_SUSPEND_NON_RET_DEFAULT && > >> > >> state < SBI_HSM_SUSPEND_NON_RET_PLATFORM) > >> > >> return false; > >> > >> + > >> > >> + /* > >> > >> + * Whether or not RISC-V systems deliver interrupts to harts in a > >> > >> + * non-retentive suspend state is a platform-specific detail. This can > >> > >> + * leave the hart unable to wake up, so just mark these states as > >> > >> + * unsupported until we have a mechanism to expose these > >> > >> + * platform-specific details to Linux. > >> > >> + */ > >> > >> + if (state & SBI_HSM_SUSP_NON_RET_BIT) > >> > >> + return false; > >> > >> + > >> > >> return true; > >> > >> } > >> > >> > >> > >> -- > >> > >> 2.38.1 > >> > >> Regards, Anup
Hi Palmer, On 11/22/22 09:28, Palmer Dabbelt wrote: > On Tue, 22 Nov 2022 03:19:43 PST (-0800), conor.dooley@microchip.com wrote: >> On Tue, Nov 22, 2022 at 11:06:15AM +0530, Anup Patel wrote: >>> On Tue, Nov 22, 2022 at 10:46 AM Palmer Dabbelt <palmer@rivosinc.com> >>> wrote: >>> > >>> > On Mon, 21 Nov 2022 19:45:07 PST (-0800), anup@brainfault.org wrote: >>> > > On Tue, Nov 22, 2022 at 2:27 AM Palmer Dabbelt >>> <palmer@rivosinc.com> wrote: >>> > >> >>> > >> From: Palmer Dabbelt <palmer@rivosinc.com> >>> > >> >>> > >> As per [1], whether or not the core can wake up from non-retentive >>> > >> suspend is a platform-specific detail. We don't have any way to >>> encode >>> > >> that, so just stop using them until we've sorted that out. >>> > >> >>> > >> Link: >>> https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687 >>> > >> Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver") >>> > >> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> >>> > > >>> > > This is just unnecessary maintenance churn and it's not the >>> > > right way to go. Better to fix this the right way instead of having >>> > > a temporary fix. >>> > > >>> > > I had already sent-out a patch series 5 months back to describe >>> > > this in DT: >>> > > >>> https://lore.kernel.org/lkml/20220727114302.302201-1-apatel@ventanamicro.com/ >>> > > >>> > > No one has commented/suggested anything (except Samuel >>> > > Holland and Sudeep Holla). >>> > >>> > I see some comments from Krzysztof here >>> > >>> <https://lore.kernel.org/lkml/7a0477a0-9f0f-87d6-4070-30321745f4cc@linaro.org/> >>> > as well. Looks like everyone is pointing out that having our CPU >>> nodes >>> > encode timers is a bad idea, my guess is that they're probably right. >>> >>> Adding a separate timer DT node, creates a new set of compatibility >>> issues for existing platforms. I am fine updating my series to have >>> separate timer DT node but do we want to go in this direction ? >> >> I don't really follow. How is there a compatibility issue created by >> adding a new node that is not added for a new property? Both will >> require changes to the device tree. (You need not reply here, I am going >> to review the other thread, it's been on my todo list for too long. Been >> caught up with non-coherent stuff & our sw release cycle..) >> >>> Even if ARM has a separate timer DT node, the timers are still part >>> of the CPU. It depends on how we see the DT bindings aligning with >>> actual HW. >>> >>> > >>> > > Please review this series. I can quickly address comments to >>> > > make this available for Linux-6.2. Until this series is merged, >>> > > the affected platforms can simply remove non-retentive suspend >>> > > states from their DT. >>> > >>> > That leaves us with a dependency between kernel versions and DT >>> > bindings: kernels with the current driver will result in broken >>> systems >>> > with the non-retentive suspend states in the DT they boot with when >>> > those states can't wake up the CPU. >> >> Can someone point me at a (non D1 or virt) system that has suspend >> states in the DT that would need fixing? >> >>> This is not a new problem we are facing. Even in the ARM world, >>> the DT bindings grew organically over time based on newer platform >>> requirements. >>> >>> Now that we have a platform which does not want the time >>> C3STOP feature, we need to first come-up with DT bindings >>> to support this platform instead of temporarily disabling >>> features which don't work on this platform. >> >> It's the opposite surely? It should be "now that we have a platform that >> *does want* the C3STOP feature", right? > > IMO we just shouldn't be turning on C3STOP at all. Sure it's making the > problem here go away, but it's trying to emulate a bunch of Intel timer > features we don't have on RISC-V and ending up in a bunch of fallback > paths. While the comment in include/linux/clockchips.h and the name of the flag imply that C3STOP is Intel-specific, it really isn't. The flag is used on ARM, MIPS, and PowerPC as well. However we do it, the end goal here is making tick_broadcast_enter() return nonzero (when called from inside cpuidle_enter_state()), thus inhibiting Linux from using idle states with the "local-timer-stop" property set in the DT. Looking down inside tick_broadcast_oneshot_control(), it appears CLOCK_EVT_FEAT_C3STOP really is required to make this happen. What are you referring to by "fallback paths"? We could add another CLOCK_EVT_FEAT_??? flag, but how should it differ from CLOCK_EVT_FEAT_C3STOP? > If we need some workaround in the timer subsystem to sort out the > non-retentive states then we can sort that out, but my guess is that > vendors are just going to 3 (your message got cut off here) >>> > > With all due respect, NACK to this patch from my side. >> >> As Samuel pointed out that the D1 doesn't actually use the timer in >> question, I think we are okay here? > > IIUC it just should use the sunxi timer driver, but that requires some > priority changes (and presumably breaks (and here too) D1 currently uses sun4i_tick (rating 350) over riscv_timer_clockevent (rating 100). The ratings are fine as is. > That said, I guess I'm confused about what's actually broken here. My > understanding of the problem is: The D1's firmware disables some > interrupts during non-retentive suspend, which results in SBI timers > failing to wake up the core. The D1's hardware cannot deliver the RISC-V architectural timer interrupt while the C906 core is powered off. It cannot deliver the RISC-V architectural external interrupt either, but the SoC provides a side channel for all of the PLIC inputs, so they _can_ wake up the CPU. So the net result is that the CLINT is the only peripheral unable to wake the CPU. > The patch to add C3STOP makes the core > come back from sleep, but that results in a bunch of other timer-related > issues. No, C3STOP _inhibits_ Linux from entering idle states that both: 1) rely on that clockevent device to wake the local CPU, and 2) cause that clockevent device to stop working, as signified by the CPUIDLE_FLAG_TIMER_STOP flag, which is set by the local-timer-stop property in the idle state DT node. That means entering the idle state is allowed if Linux can solve that first condition by finding another clockevent device on some other CPU to wake the local CPU with an IPI. That is the purpose of the broadcast timer mechanism. In the case of the D1, since it is single core, there is no other CPU to broadcast a timer event. So if riscv_timer_clockevent is the only clockevent device, then tick_broadcast_enter() should return nonzero, and find_deepest_state() will pick a retentive idle state instead. This is the already-existing mechanism for only entering idle states we can reliably wake up from. :) > So IMO we should revert the C3STOP patch as it's causing regressions I agree that clearly something is going wrong in the Linux code to cause problems on SMP systems like PolarFire. I do not really understand the broadcast timer internals, but what I do know is: 1) By setting CLOCK_EVT_FEAT_C3STOP, we inhibit the RISC-V timer from being used as the broadcast timer (tick_check_broadcast_device()), and IIUC fall back to kernel/time/tick-broadcast-hrtimer.c. Maybe something goes wrong there? 2) The broadcast timer wakes up CPUs via IPIs. Maybe something goes wrong with IPI delivery? (This raises the question of if we need another DT property for receiving IPIs in non-retentive suspend.) But neither of these should affect behavior when not using broadcast timers. > (ie, old workloads break in order to make new ones work). Seems like > folks mostly agree on that one? Well, for the D1 specifically, there is no new workload that the C3STOP patch makes work. Non-retentive suspend worked there already, as I have explained. The patch was always about being compliant to the spec. > I also think we should stop entering non-retentive suspend until we can > sort out how reliably wake up from it, as the SBI makes that a > platform-specific detail. If the answer there is "non-retentive suspend > is fine on the D1 as long as we don't use the SBI timers" then that > seems fine, we just need some way to describe that in Linux -- that > doesn't fix other platforms and other interrupts, but at least it's a > start. We need some way to describe the situation from the SBI implementation to Linux. Non-retentive suspend is fine on the D1 as long as either one of these conditions is met: 1) we don't use the SBI timers, or 2) the SBI timer implementation does not use the CLINT And it is up to the SBI implementation which timer hardware it uses, so the SBI implementation needs to patch this information in to the DT at runtime. Regards, Samuel
On Wed, Nov 23, 2022 at 10:41 AM Samuel Holland <samuel@sholland.org> wrote: > > Hi Palmer, > > On 11/22/22 09:28, Palmer Dabbelt wrote: > > On Tue, 22 Nov 2022 03:19:43 PST (-0800), conor.dooley@microchip.com wrote: > >> On Tue, Nov 22, 2022 at 11:06:15AM +0530, Anup Patel wrote: > >>> On Tue, Nov 22, 2022 at 10:46 AM Palmer Dabbelt <palmer@rivosinc.com> > >>> wrote: > >>> > > >>> > On Mon, 21 Nov 2022 19:45:07 PST (-0800), anup@brainfault.org wrote: > >>> > > On Tue, Nov 22, 2022 at 2:27 AM Palmer Dabbelt > >>> <palmer@rivosinc.com> wrote: > >>> > >> > >>> > >> From: Palmer Dabbelt <palmer@rivosinc.com> > >>> > >> > >>> > >> As per [1], whether or not the core can wake up from non-retentive > >>> > >> suspend is a platform-specific detail. We don't have any way to > >>> encode > >>> > >> that, so just stop using them until we've sorted that out. > >>> > >> > >>> > >> Link: > >>> https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687 > >>> > >> Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver") > >>> > >> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> > >>> > > > >>> > > This is just unnecessary maintenance churn and it's not the > >>> > > right way to go. Better to fix this the right way instead of having > >>> > > a temporary fix. > >>> > > > >>> > > I had already sent-out a patch series 5 months back to describe > >>> > > this in DT: > >>> > > > >>> https://lore.kernel.org/lkml/20220727114302.302201-1-apatel@ventanamicro.com/ > >>> > > > >>> > > No one has commented/suggested anything (except Samuel > >>> > > Holland and Sudeep Holla). > >>> > > >>> > I see some comments from Krzysztof here > >>> > > >>> <https://lore.kernel.org/lkml/7a0477a0-9f0f-87d6-4070-30321745f4cc@linaro.org/> > >>> > as well. Looks like everyone is pointing out that having our CPU > >>> nodes > >>> > encode timers is a bad idea, my guess is that they're probably right. > >>> > >>> Adding a separate timer DT node, creates a new set of compatibility > >>> issues for existing platforms. I am fine updating my series to have > >>> separate timer DT node but do we want to go in this direction ? > >> > >> I don't really follow. How is there a compatibility issue created by > >> adding a new node that is not added for a new property? Both will > >> require changes to the device tree. (You need not reply here, I am going > >> to review the other thread, it's been on my todo list for too long. Been > >> caught up with non-coherent stuff & our sw release cycle..) > >> > >>> Even if ARM has a separate timer DT node, the timers are still part > >>> of the CPU. It depends on how we see the DT bindings aligning with > >>> actual HW. > >>> > >>> > > >>> > > Please review this series. I can quickly address comments to > >>> > > make this available for Linux-6.2. Until this series is merged, > >>> > > the affected platforms can simply remove non-retentive suspend > >>> > > states from their DT. > >>> > > >>> > That leaves us with a dependency between kernel versions and DT > >>> > bindings: kernels with the current driver will result in broken > >>> systems > >>> > with the non-retentive suspend states in the DT they boot with when > >>> > those states can't wake up the CPU. > >> > >> Can someone point me at a (non D1 or virt) system that has suspend > >> states in the DT that would need fixing? > >> > >>> This is not a new problem we are facing. Even in the ARM world, > >>> the DT bindings grew organically over time based on newer platform > >>> requirements. > >>> > >>> Now that we have a platform which does not want the time > >>> C3STOP feature, we need to first come-up with DT bindings > >>> to support this platform instead of temporarily disabling > >>> features which don't work on this platform. > >> > >> It's the opposite surely? It should be "now that we have a platform that > >> *does want* the C3STOP feature", right? > > > > IMO we just shouldn't be turning on C3STOP at all. Sure it's making the > > problem here go away, but it's trying to emulate a bunch of Intel timer > > features we don't have on RISC-V and ending up in a bunch of fallback > > paths. > > While the comment in include/linux/clockchips.h and the name of the flag > imply that C3STOP is Intel-specific, it really isn't. The flag is used > on ARM, MIPS, and PowerPC as well. > > However we do it, the end goal here is making tick_broadcast_enter() > return nonzero (when called from inside cpuidle_enter_state()), thus > inhibiting Linux from using idle states with the "local-timer-stop" > property set in the DT. > > Looking down inside tick_broadcast_oneshot_control(), it appears > CLOCK_EVT_FEAT_C3STOP really is required to make this happen. > > What are you referring to by "fallback paths"? > > We could add another CLOCK_EVT_FEAT_??? flag, but how should it differ > from CLOCK_EVT_FEAT_C3STOP? > > > If we need some workaround in the timer subsystem to sort out the > > non-retentive states then we can sort that out, but my guess is that > > vendors are just going to 3 > > (your message got cut off here) > > >>> > > With all due respect, NACK to this patch from my side. > >> > >> As Samuel pointed out that the D1 doesn't actually use the timer in > >> question, I think we are okay here? > > > > IIUC it just should use the sunxi timer driver, but that requires some > > priority changes (and presumably breaks > > (and here too) > > D1 currently uses sun4i_tick (rating 350) over riscv_timer_clockevent > (rating 100). The ratings are fine as is. > > > That said, I guess I'm confused about what's actually broken here. My > > understanding of the problem is: The D1's firmware disables some > > interrupts during non-retentive suspend, which results in SBI timers > > failing to wake up the core. > > The D1's hardware cannot deliver the RISC-V architectural timer > interrupt while the C906 core is powered off. It cannot deliver the > RISC-V architectural external interrupt either, but the SoC provides a > side channel for all of the PLIC inputs, so they _can_ wake up the CPU. > So the net result is that the CLINT is the only peripheral unable to > wake the CPU. > > > The patch to add C3STOP makes the core > > come back from sleep, but that results in a bunch of other timer-related > > issues. > > No, C3STOP _inhibits_ Linux from entering idle states that both: > 1) rely on that clockevent device to wake the local CPU, and > 2) cause that clockevent device to stop working, as signified by the > CPUIDLE_FLAG_TIMER_STOP flag, which is set by the local-timer-stop > property in the idle state DT node. > > That means entering the idle state is allowed if Linux can solve that > first condition by finding another clockevent device on some other CPU > to wake the local CPU with an IPI. That is the purpose of the broadcast > timer mechanism. > > In the case of the D1, since it is single core, there is no other CPU to > broadcast a timer event. So if riscv_timer_clockevent is the only > clockevent device, then tick_broadcast_enter() should return nonzero, > and find_deepest_state() will pick a retentive idle state instead. > > This is the already-existing mechanism for only entering idle states we > can reliably wake up from. :) > > > So IMO we should revert the C3STOP patch as it's causing regressions > > I agree that clearly something is going wrong in the Linux code to cause > problems on SMP systems like PolarFire. I do not really understand the > broadcast timer internals, but what I do know is: > 1) By setting CLOCK_EVT_FEAT_C3STOP, we inhibit the RISC-V timer from > being used as the broadcast timer (tick_check_broadcast_device()), > and IIUC fall back to kernel/time/tick-broadcast-hrtimer.c. Maybe > something goes wrong there? > 2) The broadcast timer wakes up CPUs via IPIs. Maybe something goes > wrong with IPI delivery? (This raises the question of if we need > another DT property for receiving IPIs in non-retentive suspend.) > > But neither of these should affect behavior when not using broadcast timers. > > > (ie, old workloads break in order to make new ones work). Seems like > > folks mostly agree on that one? > > Well, for the D1 specifically, there is no new workload that the C3STOP > patch makes work. Non-retentive suspend worked there already, as I have > explained. The patch was always about being compliant to the spec. > > > I also think we should stop entering non-retentive suspend until we can > > sort out how reliably wake up from it, as the SBI makes that a > > platform-specific detail. If the answer there is "non-retentive suspend > > is fine on the D1 as long as we don't use the SBI timers" then that > > seems fine, we just need some way to describe that in Linux -- that > > doesn't fix other platforms and other interrupts, but at least it's a > > start. > > We need some way to describe the situation from the SBI implementation > to Linux. > > Non-retentive suspend is fine on the D1 as long as either one of these > conditions is met: > 1) we don't use the SBI timers, or > 2) the SBI timer implementation does not use the CLINT > > And it is up to the SBI implementation which timer hardware it uses, so > the SBI implementation needs to patch this information in to the DT at > runtime. Rather than SBI implementation patching information in DT, it is much simpler to add a quirk in RISC-V timer driver for D1 platform (i.e. based on D1 compatible string in root node). Regards, Anup
Hi Anup, On 11/22/22 23:35, Anup Patel wrote: > On Wed, Nov 23, 2022 at 10:41 AM Samuel Holland <samuel@sholland.org> wrote: >> On 11/22/22 09:28, Palmer Dabbelt wrote: >>> I also think we should stop entering non-retentive suspend until we can >>> sort out how reliably wake up from it, as the SBI makes that a >>> platform-specific detail. If the answer there is "non-retentive suspend >>> is fine on the D1 as long as we don't use the SBI timers" then that >>> seems fine, we just need some way to describe that in Linux -- that >>> doesn't fix other platforms and other interrupts, but at least it's a >>> start. >> >> We need some way to describe the situation from the SBI implementation >> to Linux. >> >> Non-retentive suspend is fine on the D1 as long as either one of these >> conditions is met: >> 1) we don't use the SBI timers, or >> 2) the SBI timer implementation does not use the CLINT >> >> And it is up to the SBI implementation which timer hardware it uses, so >> the SBI implementation needs to patch this information in to the DT at >> runtime. > > Rather than SBI implementation patching information in DT, it is much > simpler to add a quirk in RISC-V timer driver for D1 platform (i.e. based > on D1 compatible string in root node). It would be simpler, but it would be wrong, as I just explained. Only the SBI implementation knows if the SBI timer extension can wake any given CPU from any given non-retentive suspend state. Regards, Samuel
On Wed, Nov 23, 2022 at 11:08 AM Samuel Holland <samuel@sholland.org> wrote: > > Hi Anup, > > On 11/22/22 23:35, Anup Patel wrote: > > On Wed, Nov 23, 2022 at 10:41 AM Samuel Holland <samuel@sholland.org> wrote: > >> On 11/22/22 09:28, Palmer Dabbelt wrote: > >>> I also think we should stop entering non-retentive suspend until we can > >>> sort out how reliably wake up from it, as the SBI makes that a > >>> platform-specific detail. If the answer there is "non-retentive suspend > >>> is fine on the D1 as long as we don't use the SBI timers" then that > >>> seems fine, we just need some way to describe that in Linux -- that > >>> doesn't fix other platforms and other interrupts, but at least it's a > >>> start. > >> > >> We need some way to describe the situation from the SBI implementation > >> to Linux. > >> > >> Non-retentive suspend is fine on the D1 as long as either one of these > >> conditions is met: > >> 1) we don't use the SBI timers, or > >> 2) the SBI timer implementation does not use the CLINT > >> > >> And it is up to the SBI implementation which timer hardware it uses, so > >> the SBI implementation needs to patch this information in to the DT at > >> runtime. > > > > Rather than SBI implementation patching information in DT, it is much > > simpler to add a quirk in RISC-V timer driver for D1 platform (i.e. based > > on D1 compatible string in root node). > > It would be simpler, but it would be wrong, as I just explained. > > Only the SBI implementation knows if the SBI timer extension can wake > any given CPU from any given non-retentive suspend state. The SBI implementation would derive this information from platform compatible string which is already available to the Linux kernel so why does SBI implementation have to patch the DTB and put the same information in a different way ? Regards, Anup
On 11/23/22 00:10, Anup Patel wrote: > On Wed, Nov 23, 2022 at 11:08 AM Samuel Holland <samuel@sholland.org> wrote: >> >> Hi Anup, >> >> On 11/22/22 23:35, Anup Patel wrote: >>> On Wed, Nov 23, 2022 at 10:41 AM Samuel Holland <samuel@sholland.org> wrote: >>>> On 11/22/22 09:28, Palmer Dabbelt wrote: >>>>> I also think we should stop entering non-retentive suspend until we can >>>>> sort out how reliably wake up from it, as the SBI makes that a >>>>> platform-specific detail. If the answer there is "non-retentive suspend >>>>> is fine on the D1 as long as we don't use the SBI timers" then that >>>>> seems fine, we just need some way to describe that in Linux -- that >>>>> doesn't fix other platforms and other interrupts, but at least it's a >>>>> start. >>>> >>>> We need some way to describe the situation from the SBI implementation >>>> to Linux. >>>> >>>> Non-retentive suspend is fine on the D1 as long as either one of these >>>> conditions is met: >>>> 1) we don't use the SBI timers, or >>>> 2) the SBI timer implementation does not use the CLINT >>>> >>>> And it is up to the SBI implementation which timer hardware it uses, so >>>> the SBI implementation needs to patch this information in to the DT at >>>> runtime. >>> >>> Rather than SBI implementation patching information in DT, it is much >>> simpler to add a quirk in RISC-V timer driver for D1 platform (i.e. based >>> on D1 compatible string in root node). >> >> It would be simpler, but it would be wrong, as I just explained. >> >> Only the SBI implementation knows if the SBI timer extension can wake >> any given CPU from any given non-retentive suspend state. > > The SBI implementation would derive this information from platform > compatible string which is already available to the Linux kernel so > why does SBI implementation have to patch the DTB and put the > same information in a different way ? It is not the same information. The SBI implementation also chooses, at runtime, which timer hardware (CLINT, platform-specific MMIO timer, etc.) is used to implement the SBI timer extension. The value of the sbi-timer-can-wake-cpu property depends on this choice. Using D1 as an example, there are two MMIO timer peripherals ("sun4i" TIMER and "sun5i" HSTIMER) where the sbi-timer-can-wake-cpu property should be set. But the property should not be set if the CLINT is used by SBI. It would be perfectly reasonable for the SBI implementation to claim one of the wakeup-capable MMIO timers for itself, mark it as "reserved" in the DT passed to Linux, and thus force Linux to use the SBI timer or a native CLINT driver (C906 CLINT has S-mode extensions). Then the SBI timer _would_ be capable of waking the CPU from non-retentive suspend. Regards, Samuel
On Wed, Nov 23, 2022 at 11:57 AM Samuel Holland <samuel@sholland.org> wrote: > > On 11/23/22 00:10, Anup Patel wrote: > > On Wed, Nov 23, 2022 at 11:08 AM Samuel Holland <samuel@sholland.org> wrote: > >> > >> Hi Anup, > >> > >> On 11/22/22 23:35, Anup Patel wrote: > >>> On Wed, Nov 23, 2022 at 10:41 AM Samuel Holland <samuel@sholland.org> wrote: > >>>> On 11/22/22 09:28, Palmer Dabbelt wrote: > >>>>> I also think we should stop entering non-retentive suspend until we can > >>>>> sort out how reliably wake up from it, as the SBI makes that a > >>>>> platform-specific detail. If the answer there is "non-retentive suspend > >>>>> is fine on the D1 as long as we don't use the SBI timers" then that > >>>>> seems fine, we just need some way to describe that in Linux -- that > >>>>> doesn't fix other platforms and other interrupts, but at least it's a > >>>>> start. > >>>> > >>>> We need some way to describe the situation from the SBI implementation > >>>> to Linux. > >>>> > >>>> Non-retentive suspend is fine on the D1 as long as either one of these > >>>> conditions is met: > >>>> 1) we don't use the SBI timers, or > >>>> 2) the SBI timer implementation does not use the CLINT > >>>> > >>>> And it is up to the SBI implementation which timer hardware it uses, so > >>>> the SBI implementation needs to patch this information in to the DT at > >>>> runtime. > >>> > >>> Rather than SBI implementation patching information in DT, it is much > >>> simpler to add a quirk in RISC-V timer driver for D1 platform (i.e. based > >>> on D1 compatible string in root node). > >> > >> It would be simpler, but it would be wrong, as I just explained. > >> > >> Only the SBI implementation knows if the SBI timer extension can wake > >> any given CPU from any given non-retentive suspend state. > > > > The SBI implementation would derive this information from platform > > compatible string which is already available to the Linux kernel so > > why does SBI implementation have to patch the DTB and put the > > same information in a different way ? > > It is not the same information. The SBI implementation also chooses, at > runtime, which timer hardware (CLINT, platform-specific MMIO timer, > etc.) is used to implement the SBI timer extension. The value of the > sbi-timer-can-wake-cpu property depends on this choice. > > Using D1 as an example, there are two MMIO timer peripherals ("sun4i" > TIMER and "sun5i" HSTIMER) where the sbi-timer-can-wake-cpu property > should be set. But the property should not be set if the CLINT is used > by SBI. > > It would be perfectly reasonable for the SBI implementation to claim one > of the wakeup-capable MMIO timers for itself, mark it as "reserved" in > the DT passed to Linux, and thus force Linux to use the SBI timer or a > native CLINT driver (C906 CLINT has S-mode extensions). Then the SBI > timer _would_ be capable of waking the CPU from non-retentive suspend. Fair enough but the DT property should not be SBI specific because same situation can happen with Sstc as well where a particular non-retentive state does not preserve the state of stimecmp CSRs in the HARTs. Better to keep the DT property name as "riscv,timer-can-wake-cpu". Regards, Anup
On 11/23/22 00:41, Anup Patel wrote: > On Wed, Nov 23, 2022 at 11:57 AM Samuel Holland <samuel@sholland.org> wrote: >> >> On 11/23/22 00:10, Anup Patel wrote: >>> On Wed, Nov 23, 2022 at 11:08 AM Samuel Holland <samuel@sholland.org> wrote: >>>> >>>> Hi Anup, >>>> >>>> On 11/22/22 23:35, Anup Patel wrote: >>>>> On Wed, Nov 23, 2022 at 10:41 AM Samuel Holland <samuel@sholland.org> wrote: >>>>>> On 11/22/22 09:28, Palmer Dabbelt wrote: >>>>>>> I also think we should stop entering non-retentive suspend until we can >>>>>>> sort out how reliably wake up from it, as the SBI makes that a >>>>>>> platform-specific detail. If the answer there is "non-retentive suspend >>>>>>> is fine on the D1 as long as we don't use the SBI timers" then that >>>>>>> seems fine, we just need some way to describe that in Linux -- that >>>>>>> doesn't fix other platforms and other interrupts, but at least it's a >>>>>>> start. >>>>>> >>>>>> We need some way to describe the situation from the SBI implementation >>>>>> to Linux. >>>>>> >>>>>> Non-retentive suspend is fine on the D1 as long as either one of these >>>>>> conditions is met: >>>>>> 1) we don't use the SBI timers, or >>>>>> 2) the SBI timer implementation does not use the CLINT >>>>>> >>>>>> And it is up to the SBI implementation which timer hardware it uses, so >>>>>> the SBI implementation needs to patch this information in to the DT at >>>>>> runtime. >>>>> >>>>> Rather than SBI implementation patching information in DT, it is much >>>>> simpler to add a quirk in RISC-V timer driver for D1 platform (i.e. based >>>>> on D1 compatible string in root node). >>>> >>>> It would be simpler, but it would be wrong, as I just explained. >>>> >>>> Only the SBI implementation knows if the SBI timer extension can wake >>>> any given CPU from any given non-retentive suspend state. >>> >>> The SBI implementation would derive this information from platform >>> compatible string which is already available to the Linux kernel so >>> why does SBI implementation have to patch the DTB and put the >>> same information in a different way ? >> >> It is not the same information. The SBI implementation also chooses, at >> runtime, which timer hardware (CLINT, platform-specific MMIO timer, >> etc.) is used to implement the SBI timer extension. The value of the >> sbi-timer-can-wake-cpu property depends on this choice. >> >> Using D1 as an example, there are two MMIO timer peripherals ("sun4i" >> TIMER and "sun5i" HSTIMER) where the sbi-timer-can-wake-cpu property >> should be set. But the property should not be set if the CLINT is used >> by SBI. >> >> It would be perfectly reasonable for the SBI implementation to claim one >> of the wakeup-capable MMIO timers for itself, mark it as "reserved" in >> the DT passed to Linux, and thus force Linux to use the SBI timer or a >> native CLINT driver (C906 CLINT has S-mode extensions). Then the SBI >> timer _would_ be capable of waking the CPU from non-retentive suspend. > > Fair enough but the DT property should not be SBI specific because same > situation can happen with Sstc as well where a particular non-retentive state > does not preserve the state of stimecmp CSRs in the HARTs. > > Better to keep the DT property name as "riscv,timer-can-wake-cpu". Consider a platform where the Sstc-based timer cannot wake the CPU, but the SBI timer can, because it uses different timer hardware or a different interrupt delivery method. It seems like we need two different properties, one for Sstc and the other for the SBI timer. If both are supported, firmware cannot know which one S-mode software will use. Regards, Samuel
On Wed, Nov 23, 2022 at 12:20 PM Samuel Holland <samuel@sholland.org> wrote: > > On 11/23/22 00:41, Anup Patel wrote: > > On Wed, Nov 23, 2022 at 11:57 AM Samuel Holland <samuel@sholland.org> wrote: > >> > >> On 11/23/22 00:10, Anup Patel wrote: > >>> On Wed, Nov 23, 2022 at 11:08 AM Samuel Holland <samuel@sholland.org> wrote: > >>>> > >>>> Hi Anup, > >>>> > >>>> On 11/22/22 23:35, Anup Patel wrote: > >>>>> On Wed, Nov 23, 2022 at 10:41 AM Samuel Holland <samuel@sholland.org> wrote: > >>>>>> On 11/22/22 09:28, Palmer Dabbelt wrote: > >>>>>>> I also think we should stop entering non-retentive suspend until we can > >>>>>>> sort out how reliably wake up from it, as the SBI makes that a > >>>>>>> platform-specific detail. If the answer there is "non-retentive suspend > >>>>>>> is fine on the D1 as long as we don't use the SBI timers" then that > >>>>>>> seems fine, we just need some way to describe that in Linux -- that > >>>>>>> doesn't fix other platforms and other interrupts, but at least it's a > >>>>>>> start. > >>>>>> > >>>>>> We need some way to describe the situation from the SBI implementation > >>>>>> to Linux. > >>>>>> > >>>>>> Non-retentive suspend is fine on the D1 as long as either one of these > >>>>>> conditions is met: > >>>>>> 1) we don't use the SBI timers, or > >>>>>> 2) the SBI timer implementation does not use the CLINT > >>>>>> > >>>>>> And it is up to the SBI implementation which timer hardware it uses, so > >>>>>> the SBI implementation needs to patch this information in to the DT at > >>>>>> runtime. > >>>>> > >>>>> Rather than SBI implementation patching information in DT, it is much > >>>>> simpler to add a quirk in RISC-V timer driver for D1 platform (i.e. based > >>>>> on D1 compatible string in root node). > >>>> > >>>> It would be simpler, but it would be wrong, as I just explained. > >>>> > >>>> Only the SBI implementation knows if the SBI timer extension can wake > >>>> any given CPU from any given non-retentive suspend state. > >>> > >>> The SBI implementation would derive this information from platform > >>> compatible string which is already available to the Linux kernel so > >>> why does SBI implementation have to patch the DTB and put the > >>> same information in a different way ? > >> > >> It is not the same information. The SBI implementation also chooses, at > >> runtime, which timer hardware (CLINT, platform-specific MMIO timer, > >> etc.) is used to implement the SBI timer extension. The value of the > >> sbi-timer-can-wake-cpu property depends on this choice. > >> > >> Using D1 as an example, there are two MMIO timer peripherals ("sun4i" > >> TIMER and "sun5i" HSTIMER) where the sbi-timer-can-wake-cpu property > >> should be set. But the property should not be set if the CLINT is used > >> by SBI. > >> > >> It would be perfectly reasonable for the SBI implementation to claim one > >> of the wakeup-capable MMIO timers for itself, mark it as "reserved" in > >> the DT passed to Linux, and thus force Linux to use the SBI timer or a > >> native CLINT driver (C906 CLINT has S-mode extensions). Then the SBI > >> timer _would_ be capable of waking the CPU from non-retentive suspend. > > > > Fair enough but the DT property should not be SBI specific because same > > situation can happen with Sstc as well where a particular non-retentive state > > does not preserve the state of stimecmp CSRs in the HARTs. > > > > Better to keep the DT property name as "riscv,timer-can-wake-cpu". > > Consider a platform where the Sstc-based timer cannot wake the CPU, but > the SBI timer can, because it uses different timer hardware or a > different interrupt delivery method. It seems like we need two different > properties, one for Sstc and the other for the SBI timer. If both are > supported, firmware cannot know which one S-mode software will use. On a platform with Sstc, the SBI set_timer() call will internally update stimecmp CSR. In fact, this is what OpenSBI already does. Here's the text from Sstc specification: "In systems in which a supervisor execution environment (SEE) provides timer facilities via an SBI function call, this SBI call will continue to support requests to schedule a timer interrupt. The SEE will simply make use of stimecmp, changing its value as appropriate. This ensures compatibility with existing S-mode software that uses this SEE facility, while new S-mode software takes advantage of stimecmp directly.)" Based on the above, we don't need separate DT property for SBI timer and Sstc. Regards, Anup
Hey Anup, On Wed, Nov 23, 2022 at 09:56:31AM +0530, Anup Patel wrote: > On Tue, Nov 22, 2022 at 4:50 PM Conor Dooley <conor.dooley@microchip.com> wrote: > > > > On Tue, Nov 22, 2022 at 11:06:15AM +0530, Anup Patel wrote: > > > On Tue, Nov 22, 2022 at 10:46 AM Palmer Dabbelt <palmer@rivosinc.com> wrote: > > > > > > > > On Mon, 21 Nov 2022 19:45:07 PST (-0800), anup@brainfault.org wrote: > > > > > On Tue, Nov 22, 2022 at 2:27 AM Palmer Dabbelt <palmer@rivosinc.com> wrote: > > > > >> > > > > >> From: Palmer Dabbelt <palmer@rivosinc.com> > > > > >> > > > > >> As per [1], whether or not the core can wake up from non-retentive > > > > >> suspend is a platform-specific detail. We don't have any way to encode > > > > >> that, so just stop using them until we've sorted that out. > > > > >> > > > > >> Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687 > > > > >> Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver") > > > > >> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> > > > > > > > > > > This is just unnecessary maintenance churn and it's not the > > > > > right way to go. Better to fix this the right way instead of having > > > > > a temporary fix. > > > > > > > > > > I had already sent-out a patch series 5 months back to describe > > > > > this in DT: > > > > > https://lore.kernel.org/lkml/20220727114302.302201-1-apatel@ventanamicro.com/ > > > > > > > > > > No one has commented/suggested anything (except Samuel > > > > > Holland and Sudeep Holla). > > > > > > > > I see some comments from Krzysztof here > > > > <https://lore.kernel.org/lkml/7a0477a0-9f0f-87d6-4070-30321745f4cc@linaro.org/> > > > > as well. Looks like everyone is pointing out that having our CPU nodes > > > > encode timers is a bad idea, my guess is that they're probably right. > > > > > > Adding a separate timer DT node, creates a new set of compatibility > > > issues for existing platforms. I am fine updating my series to have > > > separate timer DT node but do we want to go in this direction ? > > > > I don't really follow. How is there a compatibility issue created by > > adding a new node that is not added for a new property? Both will > > require changes to the device tree. (You need not reply here, I am going > > to review the other thread, it's been on my todo list for too long. Been > > caught up with non-coherent stuff & our sw release cycle..) > > Adding a new timer DT node would mean, the RISC-V timer driver > will now be probed using the compatible to the new DT node whereas > the RISC-V timer driver is currently probed using CPU DT nodes. Ahh, that is what I have missed. I'll continue my thoughts on this in the dt-binding thread. > > > Even if ARM has a separate timer DT node, the timers are still part > > > of the CPU. It depends on how we see the DT bindings aligning with > > > actual HW. > > > > > > > > > > > > Please review this series. I can quickly address comments to > > > > > make this available for Linux-6.2. Until this series is merged, > > > > > the affected platforms can simply remove non-retentive suspend > > > > > states from their DT. > > > > > > > > That leaves us with a dependency between kernel versions and DT > > > > bindings: kernels with the current driver will result in broken systems > > > > with the non-retentive suspend states in the DT they boot with when > > > > those states can't wake up the CPU. > > > > Can someone point me at a (non D1 or virt) system that has suspend > > states in the DT that would need fixing? > > For the QEMU virt machine, the default non-retentive suspend state was > tested using a temporary DTB provided separately via QEMU command > line. The QEMU virt machine does not have its own HART suspend > states so OpenSBI will functionally emulate default retentive/non-retentive > suspend states. So since I asked for non D1 or virt systems, that's a no & no DTs actually needs to be fixed :) > > > This is not a new problem we are facing. Even in the ARM world, > > > the DT bindings grew organically over time based on newer platform > > > requirements. > > > > > > Now that we have a platform which does not want the time > > > C3STOP feature, we need to first come-up with DT bindings > > > to support this platform instead of temporarily disabling > > > features which don't work on this platform. > > > > It's the opposite surely? It should be "now that we have a platform that > > *does want* the C3STOP feature", right? > > Yes, we can think this way as well. No, there's no "thinking" involved here from what I can tell. Pre-D1 systems do not seem to need the flag and the D1 does want that flag for its riscv,timer. We have to operate with respect to hardware timelines & the corresponding software implementations, not specs in this context. If it was the case that you proposed, there would be no chance for regressions if someone updates their kernel but not their DT. > > > > > With all due respect, NACK to this patch from my side. > > > > As Samuel pointed out that the D1 doesn't actually use the timer in > > question, I think we are okay here? > > Yes, that's why D1 needs the C3STOP flag. I don't understand what you mean here, you don't appear to be replying to what I said. I was saying that the current D1 configuration does not actually use the timer-riscv driver as there's another one that has a higher rating & therefore we are okay to not apply this patch as my revert will not cause it to be put into sleep states that it cannot return from. Your reply makes no sense to me in that context. Thanks, Conor.
On Tue, Nov 22, 2022 at 11:11:23PM -0600, Samuel Holland wrote: [...] > While the comment in include/linux/clockchips.h and the name of the flag > imply that C3STOP is Intel-specific, it really isn't. The flag is used > on ARM, MIPS, and PowerPC as well. > +1. I agree the name is confusing but it used to just indicate that the timer is not always on and could stop in deeper CPU idle states. It probably was introduced to be used on x86 for C3 state but it is used for other purposes as well. May be one should have or even can now rename it to something more appropriate, but I am sure it might trigger some bikeshedding discussions 😉.
On Wed, Nov 23, 2022 at 3:40 PM Conor Dooley <conor.dooley@microchip.com> wrote: > > Hey Anup, > > On Wed, Nov 23, 2022 at 09:56:31AM +0530, Anup Patel wrote: > > On Tue, Nov 22, 2022 at 4:50 PM Conor Dooley <conor.dooley@microchip.com> wrote: > > > > > > On Tue, Nov 22, 2022 at 11:06:15AM +0530, Anup Patel wrote: > > > > On Tue, Nov 22, 2022 at 10:46 AM Palmer Dabbelt <palmer@rivosinc.com> wrote: > > > > > > > > > > On Mon, 21 Nov 2022 19:45:07 PST (-0800), anup@brainfault.org wrote: > > > > > > On Tue, Nov 22, 2022 at 2:27 AM Palmer Dabbelt <palmer@rivosinc.com> wrote: > > > > > >> > > > > > >> From: Palmer Dabbelt <palmer@rivosinc.com> > > > > > >> > > > > > >> As per [1], whether or not the core can wake up from non-retentive > > > > > >> suspend is a platform-specific detail. We don't have any way to encode > > > > > >> that, so just stop using them until we've sorted that out. > > > > > >> > > > > > >> Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687 > > > > > >> Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver") > > > > > >> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> > > > > > > > > > > > > This is just unnecessary maintenance churn and it's not the > > > > > > right way to go. Better to fix this the right way instead of having > > > > > > a temporary fix. > > > > > > > > > > > > I had already sent-out a patch series 5 months back to describe > > > > > > this in DT: > > > > > > https://lore.kernel.org/lkml/20220727114302.302201-1-apatel@ventanamicro.com/ > > > > > > > > > > > > No one has commented/suggested anything (except Samuel > > > > > > Holland and Sudeep Holla). > > > > > > > > > > I see some comments from Krzysztof here > > > > > <https://lore.kernel.org/lkml/7a0477a0-9f0f-87d6-4070-30321745f4cc@linaro.org/> > > > > > as well. Looks like everyone is pointing out that having our CPU nodes > > > > > encode timers is a bad idea, my guess is that they're probably right. > > > > > > > > Adding a separate timer DT node, creates a new set of compatibility > > > > issues for existing platforms. I am fine updating my series to have > > > > separate timer DT node but do we want to go in this direction ? > > > > > > I don't really follow. How is there a compatibility issue created by > > > adding a new node that is not added for a new property? Both will > > > require changes to the device tree. (You need not reply here, I am going > > > to review the other thread, it's been on my todo list for too long. Been > > > caught up with non-coherent stuff & our sw release cycle..) > > > > Adding a new timer DT node would mean, the RISC-V timer driver > > will now be probed using the compatible to the new DT node whereas > > the RISC-V timer driver is currently probed using CPU DT nodes. > > Ahh, that is what I have missed. I'll continue my thoughts on this in > the dt-binding thread. > > > > > Even if ARM has a separate timer DT node, the timers are still part > > > > of the CPU. It depends on how we see the DT bindings aligning with > > > > actual HW. > > > > > > > > > > > > > > > Please review this series. I can quickly address comments to > > > > > > make this available for Linux-6.2. Until this series is merged, > > > > > > the affected platforms can simply remove non-retentive suspend > > > > > > states from their DT. > > > > > > > > > > That leaves us with a dependency between kernel versions and DT > > > > > bindings: kernels with the current driver will result in broken systems > > > > > with the non-retentive suspend states in the DT they boot with when > > > > > those states can't wake up the CPU. > > > > > > Can someone point me at a (non D1 or virt) system that has suspend > > > states in the DT that would need fixing? > > > > For the QEMU virt machine, the default non-retentive suspend state was > > tested using a temporary DTB provided separately via QEMU command > > line. The QEMU virt machine does not have its own HART suspend > > states so OpenSBI will functionally emulate default retentive/non-retentive > > suspend states. > > So since I asked for non D1 or virt systems, that's a no & no DTs > actually needs to be fixed :) > > > > > This is not a new problem we are facing. Even in the ARM world, > > > > the DT bindings grew organically over time based on newer platform > > > > requirements. > > > > > > > > Now that we have a platform which does not want the time > > > > C3STOP feature, we need to first come-up with DT bindings > > > > to support this platform instead of temporarily disabling > > > > features which don't work on this platform. > > > > > > It's the opposite surely? It should be "now that we have a platform that > > > *does want* the C3STOP feature", right? > > > > Yes, we can think this way as well. > > No, there's no "thinking" involved here from what I can tell. Pre-D1 > systems do not seem to need the flag and the D1 does want that flag for > its riscv,timer. We have to operate with respect to hardware timelines > & the corresponding software implementations, not specs in this context. > > If it was the case that you proposed, there would be no chance for > regressions if someone updates their kernel but not their DT. > > > > > > > With all due respect, NACK to this patch from my side. > > > > > > As Samuel pointed out that the D1 doesn't actually use the timer in > > > question, I think we are okay here? > > > > Yes, that's why D1 needs the C3STOP flag. > > I don't understand what you mean here, you don't appear to be replying > to what I said. > > I was saying that the current D1 configuration does not actually use > the timer-riscv driver as there's another one that has a higher rating > & therefore we are okay to not apply this patch as my revert will not > cause it to be put into sleep states that it cannot return from. > > Your reply makes no sense to me in that context. Sorry for the confusion, I should have written a more complete sentence. D1 does not use the RISC-V timer but it still needs to set the C3STOP flag to inform the timer subsystem that the RISC-V timer will not work in suspend state. Regards, Anup
Hey Samuel, >On Tue, Nov 22, 2022 at 09:42:24PM -0600, Samuel Holland wrote: > On 11/22/22 05:06, Conor Dooley wrote: > > On Mon, Nov 21, 2022 at 06:45:25PM -0600, Samuel Holland wrote: > > > On 11/21/22 14:56, Palmer Dabbelt wrote: > > > > As per [1], whether or not the core can wake up from non-retentive > > > > suspend is a platform-specific detail. We don't have any way to encode > > > > that, so just stop using them until we've sorted that out. > > > > > > We do have *exactly* a way to encode that. Specifically, the existence > > > or non-existence of a non-retentive CPU suspend state in the DT. > > > > > > If a hart has no way of resuming from non-retentive suspend (i.e. a > > > complete lack of interrupt delivery in non-retentive suspend), then it > > > makes zero sense to advertise such a suspend state in the DT. > > > > I would have to agree with that. I think the sprawling conversation has > > confused us all at this point, I (prior to reading this mail) thought > > that suspend was borked on the D1. I don't think anyone is advertising > > specific states in the DT at the moment though, I had a check in the D1 > > patchset and didn't see anything there - unless you're adding it > > dynamically from the bootloader? > > The availability and latency properties of idle states depend on the SBI > implementation, so yes, they need to be added dynamically. Right, thanks for clarifying. > > > I do not have any functioning RISC-V > > > hardware with SMP, so it is hard for me to help debug the root issue in > > > the Linux timer code. I do not know why turning on the C3STOP flag > > > breaks RCU stall detection or clock_nanosleep(), but I agree that > > > breakage should not happen. > > > > > > So while I still think 232ccac1bd9b is the right change to make from a > > > "following the spec" standpoint > > > > Right, so the spec says: > > Request the SBI implementation to put the calling hart in a platform > > specific suspend (or low power) state specified by the suspend_type > > parameter. The hart will automatically come out of suspended state and > > resume normal execution when it receives an interrupt or platform > > specific hardware event. > > > > That, as we have discussed a bunch of times, does not say whether a > > given interrupt should actually arrive during suspend. The correct > > behaviour, to me, is to assume that no events arrive during suspend. > > Are you suggesting that we need some property to declare the delivery of > each kind of interrupt (software, timer, external, PMU)? I'm possibly taking things to an extreme, since if we're having a discussion that covers what the spec does and does not allow I see no harm in going down the rabbit hole! Obviously, some sort of event must get the CPU out of suspend - what I meant was more like "The correct (software) behaviour, to me, is to assume that, when looking at an individual source, its events may not arrive during suspend." I've not looked at the relevant specs to see if they specify whether their interrupts *must* arrive, just the SBI one that the issue was created against. > I assumed that > external interrupt delivery would be required to consider an idle state > "viable", but I suppose it would be _possible_ to have a state where > only timer interrupts are deliverable. Who knows what some hardware folks will come up with! Maybe I am being pretty <whatever the modern version of black & white> is here, but I fear for a repeat whenever someone does something "creative". I know I've not answered your question about other kinds of properties but I am well outside my comfort zone here. Thanks, Conor.
diff --git a/drivers/cpuidle/cpuidle-riscv-sbi.c b/drivers/cpuidle/cpuidle-riscv-sbi.c index 05fe2902df9a..9d1063a54495 100644 --- a/drivers/cpuidle/cpuidle-riscv-sbi.c +++ b/drivers/cpuidle/cpuidle-riscv-sbi.c @@ -214,6 +214,17 @@ static bool sbi_suspend_state_is_valid(u32 state) if (state > SBI_HSM_SUSPEND_NON_RET_DEFAULT && state < SBI_HSM_SUSPEND_NON_RET_PLATFORM) return false; + + /* + * Whether or not RISC-V systems deliver interrupts to harts in a + * non-retentive suspend state is a platform-specific detail. This can + * leave the hart unable to wake up, so just mark these states as + * unsupported until we have a mechanism to expose these + * platform-specific details to Linux. + */ + if (state & SBI_HSM_SUSP_NON_RET_BIT) + return false; + return true; }