Message ID | 20230602161246.1855448-1-amit.pundir@linaro.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp1145680vqr; Fri, 2 Jun 2023 09:20:34 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7Ek8XAd5R9dtv0sohmGKarYdtiTI8DMnRgcI5G/v6onk+9+TOCNnLTGDijmQopCd32/MZa X-Received: by 2002:a05:6a00:240b:b0:650:154:8bd with SMTP id z11-20020a056a00240b00b00650015408bdmr13681510pfh.17.1685722833693; Fri, 02 Jun 2023 09:20:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685722833; cv=none; d=google.com; s=arc-20160816; b=hn25o9Prx8Zg0VEMfrvTpDFVmSnFPsEJ4XzffjGGL3g9T1Lca0r90W7iHcQf1ErIoR MbD5z7YJw1uT+n0WFgLaaS5hDPPZ7k6qAHEturT6ToTSa4Ai+BV2wFo/3oP38atmf/NX h2E5g2Xmy/wV9XVNegtTCDsb1BSfX0oA0diwjh6cqigteS+31anzItQbKChwqHQLleuc bI5iIDUMvYhruSXLgpay+uPmRXtBlhF+zVFVkJ8eSFSWOzqLyqyPsj8tbdCFXkMBwSdW o00UWtuG1KpvXNUHgdnP9ijyReJEylwlp8Eb+fTiMFrJbo0zAhPlk/CDFpzqRjq6OyYE klbQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:author:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=xhmHRyToIEmAAHnN/IwEZChyqNOsrqiKFOCxgESTvfk=; b=Z1UYD2QuU/u5R1Eo5uztZELRnIECcdOqS6zdFHqlVxexItu4Z3sPiBL7KdHetaKp79 XGo7kTMlYdbZXoMiz6N9WtNLoNQQ2YUTAsnziRlbMf+7y0hl+yCZ+18xGrP9mHDDoGFI SDerQPTS9ZWXqadqeSbMYG/dZaWQ9h0NLFq0YrgO5oZLjUhS0WlpUvKKKzVBXlx7Ut4E Y6SdR4lhLv3QFMj7R2Ls7sJBgcCio/EXVZxunNi3AHAcwO23pzRqt0BXQjhOWL1BK5ai i8t9bSCEF5A5MqqQVZTPHN49RvosmaO2rrEOCp1LT5nqlTJdQJeVVxbEQQ5/KrOuvOvZ +1HQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=QNqMgXaW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v14-20020aa799ce000000b00635ebd8854esi982669pfi.173.2023.06.02.09.20.21; Fri, 02 Jun 2023 09:20:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=QNqMgXaW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236500AbjFBQOS (ORCPT <rfc822;limurcpp@gmail.com> + 99 others); Fri, 2 Jun 2023 12:14:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39120 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236778AbjFBQOL (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 2 Jun 2023 12:14:11 -0400 Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1B96410DE for <linux-kernel@vger.kernel.org>; Fri, 2 Jun 2023 09:13:52 -0700 (PDT) Received: by mail-pf1-x431.google.com with SMTP id d2e1a72fcca58-652dd220d67so1043665b3a.3 for <linux-kernel@vger.kernel.org>; Fri, 02 Jun 2023 09:13:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1685722373; x=1688314373; h=content-transfer-encoding:author:mime-version:message-id:date :subject:cc:to:from:from:to:cc:subject:date:message-id:reply-to; bh=xhmHRyToIEmAAHnN/IwEZChyqNOsrqiKFOCxgESTvfk=; b=QNqMgXaW84FsS8Ck/A01cCxyIWV84qC3tILHwN+hr/GzQjLA0wd4t3wXIVaxYnnxel QsT4pQmmRs/82EGyBEjvl94cgJTi3sOXjQB2iLGWt1WQs1TBs9j4LwGPQXzSjhslTuvP XQro5oj1YkzyXdWiiMjhuiDZXpvOAW/kQZZQMF40e/RV3z0hfv5SviuOyehzWw1v9ukb 7UhpStmtLpjLFSg0Qm8wyYbYAhRqsipubfh6crqw4nMyRS4VGrnugWgxzJJM7EvaWFm0 x2GSwpX8E/GVTQSbsWMMeE8g5CnvJ8IwbvzumguJQMI7Ut2qSuoWkWa27rtR5myWANlB fWCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685722373; x=1688314373; h=content-transfer-encoding:author:mime-version:message-id:date :subject:cc:to:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=xhmHRyToIEmAAHnN/IwEZChyqNOsrqiKFOCxgESTvfk=; b=F1n3Yv/1ZfhGlzM20A/KFoNPDqM/O1vSKmzWaSfi8j67hJ893wYY5iLgZz8DwcHkwa agrcpyUkDAcNwLslNBnmhkGpQa3XEbC3pVHSZoMm3QcL5sw9Crib6GgUvwuPltK6qVk+ d5Jtr0F3zPg5t3gXPhszEkS8wChpJ0Buobm9eWFhEt6G1s5uMEzQgKvGeEaIT34wo3RC g++ezE7/435/o7/NWQGdjR+Yx5ylrK5qeuIH1G1uJt1oN2lNY0vnaD90BdPCmf6otku5 Q4Gk5cycRjiOBa3q8QqRYfnE9fyGAkQc6vdHNpXEbMnp9XJtBAgii16L32betKsRzOk9 ENXA== X-Gm-Message-State: AC+VfDzeCl9+gq+K5lV15N39CREhjBmIUGDOJ/XMkIA/5B+GeDwIE1ZT pM9qPfzEzk07XMl8DyXwubECAg== X-Received: by 2002:a05:6a00:15c7:b0:646:6cc3:4a52 with SMTP id o7-20020a056a0015c700b006466cc34a52mr16570169pfu.3.1685722373382; Fri, 02 Jun 2023 09:12:53 -0700 (PDT) Received: from localhost.localdomain ([49.207.50.231]) by smtp.gmail.com with ESMTPSA id 9-20020aa79149000000b00652a72b89d1sm1197014pfi.170.2023.06.02.09.12.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 02 Jun 2023 09:12:52 -0700 (PDT) From: Amit Pundir <amit.pundir@linaro.org> To: Mark Brown <broonie@kernel.org>, Doug Anderson <dianders@chromium.org>, Bjorn Andersson <andersson@kernel.org>, Andy Gross <agross@kernel.org>, Rob Herring <robh+dt@kernel.org>, Konrad Dybcio <konrad.dybcio@linaro.org>, Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org>, Caleb Connolly <caleb.connolly@linaro.org>, Conor Dooley <conor+dt@kernel.org> Cc: regressions <regressions@lists.linux.dev>, linux-arm-msm <linux-arm-msm@vger.kernel.org>, dt <devicetree@vger.kernel.org>, lkml <linux-kernel@vger.kernel.org> Subject: [PATCH] arm64: dts: qcom: sdm845-db845c: Move LVS regulator nodes up Date: Fri, 2 Jun 2023 21:42:46 +0530 Message-Id: <20230602161246.1855448-1-amit.pundir@linaro.org> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Author: Amit Pundir <amit.pundir@linaro.org> Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767608506206052569?= X-GMAIL-MSGID: =?utf-8?q?1767608506206052569?= |
Series |
arm64: dts: qcom: sdm845-db845c: Move LVS regulator nodes up
|
|
Commit Message
Amit Pundir
June 2, 2023, 4:12 p.m. UTC
Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators
list to workaround a boot regression uncovered by the upstream
commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator:
qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS"").
Without this fix DB845c fail to boot at times because one of the
lvs1 or lvs2 regulators fail to turn ON in time.
Link: https://lore.kernel.org/all/CAMi1Hd1avQDcDQf137m2auz2znov4XL8YGrLZsw5edb-NtRJRw@mail.gmail.com/
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
---
arch/arm64/boot/dts/qcom/sdm845-db845c.dts | 24 +++++++++++-----------
1 file changed, 12 insertions(+), 12 deletions(-)
Comments
Hi, On Fri, Jun 2, 2023 at 9:12 AM Amit Pundir <amit.pundir@linaro.org> wrote: > > Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators > list to workaround a boot regression uncovered by the upstream > commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator: > qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS""). > > Without this fix DB845c fail to boot at times because one of the > lvs1 or lvs2 regulators fail to turn ON in time. > > Link: https://lore.kernel.org/all/CAMi1Hd1avQDcDQf137m2auz2znov4XL8YGrLZsw5edb-NtRJRw@mail.gmail.com/ > Signed-off-by: Amit Pundir <amit.pundir@linaro.org> > --- > arch/arm64/boot/dts/qcom/sdm845-db845c.dts | 24 +++++++++++----------- > 1 file changed, 12 insertions(+), 12 deletions(-) > > diff --git a/arch/arm64/boot/dts/qcom/sdm845-db845c.dts b/arch/arm64/boot/dts/qcom/sdm845-db845c.dts > index e14fe9bbb386..df2fde9063dc 100644 > --- a/arch/arm64/boot/dts/qcom/sdm845-db845c.dts > +++ b/arch/arm64/boot/dts/qcom/sdm845-db845c.dts > @@ -301,6 +301,18 @@ regulators-0 { > vdd-l26-supply = <&vreg_s3a_1p35>; > vin-lvs-1-2-supply = <&vreg_s4a_1p8>; > > + vreg_lvs1a_1p8: lvs1 { > + regulator-min-microvolt = <1800000>; > + regulator-max-microvolt = <1800000>; > + regulator-always-on; > + }; > + > + vreg_lvs2a_1p8: lvs2 { > + regulator-min-microvolt = <1800000>; > + regulator-max-microvolt = <1800000>; > + regulator-always-on; > + }; > + > vreg_s3a_1p35: smps3 { > regulator-min-microvolt = <1352000>; > regulator-max-microvolt = <1352000>; > @@ -381,18 +393,6 @@ vreg_l26a_1p2: ldo26 { > regulator-max-microvolt = <1200000>; > regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>; > }; > - > - vreg_lvs1a_1p8: lvs1 { > - regulator-min-microvolt = <1800000>; > - regulator-max-microvolt = <1800000>; > - regulator-always-on; > - }; > - > - vreg_lvs2a_1p8: lvs2 { > - regulator-min-microvolt = <1800000>; > - regulator-max-microvolt = <1800000>; > - regulator-always-on; > - }; This is a hack, but it at least feels less bad than reverting the async probe patch. I'll leave it to Bjorn to decide if he's OK with it. Personally, it feels like this would deserve a comment in the dts to document that these regulators need to be listed first. Ideally, we could still work towards a root cause. I added a few more ideas to help with root causing in reply to the original thread about this. https://lore.kernel.org/r/CAD=FV=UKyjRNZG-ED2meUAR9aXdco+AbUTHiKixTzjCkaJbjTg@mail.gmail.com/ -Doug
On 02/06/2023 18:12, Amit Pundir wrote: > Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators > list to workaround a boot regression uncovered by the upstream > commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator: > qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS""). > > Without this fix DB845c fail to boot at times because one of the > lvs1 or lvs2 regulators fail to turn ON in time. Why regulator would fail to turn on time? If it has ramp-up delay, add it to DT. Otherwise how is it possible? You have a consumer, right? This is not the correct solution and it is hiding real issue. Best regards, Krzysztof
On 07/06/2023 01:34, Doug Anderson wrote: > Hi, > > On Fri, Jun 2, 2023 at 9:12 AM Amit Pundir <amit.pundir@linaro.org> wrote: >> >> Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators >> list to workaround a boot regression uncovered by the upstream >> commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator: >> qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS""). >> >> Without this fix DB845c fail to boot at times because one of the >> lvs1 or lvs2 regulators fail to turn ON in time. >> >> Link: https://lore.kernel.org/all/CAMi1Hd1avQDcDQf137m2auz2znov4XL8YGrLZsw5edb-NtRJRw@mail.gmail.com/ >> Signed-off-by: Amit Pundir <amit.pundir@linaro.org> >> --- >> arch/arm64/boot/dts/qcom/sdm845-db845c.dts | 24 +++++++++++----------- >> 1 file changed, 12 insertions(+), 12 deletions(-) >> >> diff --git a/arch/arm64/boot/dts/qcom/sdm845-db845c.dts b/arch/arm64/boot/dts/qcom/sdm845-db845c.dts >> index e14fe9bbb386..df2fde9063dc 100644 >> --- a/arch/arm64/boot/dts/qcom/sdm845-db845c.dts >> +++ b/arch/arm64/boot/dts/qcom/sdm845-db845c.dts >> @@ -301,6 +301,18 @@ regulators-0 { >> vdd-l26-supply = <&vreg_s3a_1p35>; >> vin-lvs-1-2-supply = <&vreg_s4a_1p8>; >> >> + vreg_lvs1a_1p8: lvs1 { >> + regulator-min-microvolt = <1800000>; >> + regulator-max-microvolt = <1800000>; >> + regulator-always-on; >> + }; >> + >> + vreg_lvs2a_1p8: lvs2 { >> + regulator-min-microvolt = <1800000>; >> + regulator-max-microvolt = <1800000>; >> + regulator-always-on; >> + }; >> + >> vreg_s3a_1p35: smps3 { >> regulator-min-microvolt = <1352000>; >> regulator-max-microvolt = <1352000>; >> @@ -381,18 +393,6 @@ vreg_l26a_1p2: ldo26 { >> regulator-max-microvolt = <1200000>; >> regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>; >> }; >> - >> - vreg_lvs1a_1p8: lvs1 { >> - regulator-min-microvolt = <1800000>; >> - regulator-max-microvolt = <1800000>; >> - regulator-always-on; >> - }; >> - >> - vreg_lvs2a_1p8: lvs2 { >> - regulator-min-microvolt = <1800000>; >> - regulator-max-microvolt = <1800000>; >> - regulator-always-on; >> - }; > > This is a hack, but it at least feels less bad than reverting the > async probe patch. I'll leave it to Bjorn to decide if he's OK with > it. Personally, it feels like this would deserve a comment in the dts > to document that these regulators need to be listed first. > > Ideally, we could still work towards a root cause. I added a few more > ideas to help with root causing in reply to the original thread about > this. > > https://lore.kernel.org/r/CAD=FV=UKyjRNZG-ED2meUAR9aXdco+AbUTHiKixTzjCkaJbjTg@mail.gmail.com/ We do not shape DTS based on given OS behavior. AOSP needs this, BSD needs that and Linux needs something else. Next time someone will move these regulators down because on his system probing is from end of list, not beginning and he has the same problem. No, really, are we going to reshuffle nodes because AOSP needs it? Best regards, Krzysztof
On Wed, 7 Jun 2023 at 13:19, Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> wrote: > > On 07/06/2023 01:34, Doug Anderson wrote: > > Hi, > > > > On Fri, Jun 2, 2023 at 9:12 AM Amit Pundir <amit.pundir@linaro.org> wrote: > >> > >> Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators > >> list to workaround a boot regression uncovered by the upstream > >> commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator: > >> qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS""). > >> > >> Without this fix DB845c fail to boot at times because one of the > >> lvs1 or lvs2 regulators fail to turn ON in time. > >> > >> Link: https://lore.kernel.org/all/CAMi1Hd1avQDcDQf137m2auz2znov4XL8YGrLZsw5edb-NtRJRw@mail.gmail.com/ > >> Signed-off-by: Amit Pundir <amit.pundir@linaro.org> > >> --- > >> arch/arm64/boot/dts/qcom/sdm845-db845c.dts | 24 +++++++++++----------- > >> 1 file changed, 12 insertions(+), 12 deletions(-) > >> > >> diff --git a/arch/arm64/boot/dts/qcom/sdm845-db845c.dts b/arch/arm64/boot/dts/qcom/sdm845-db845c.dts > >> index e14fe9bbb386..df2fde9063dc 100644 > >> --- a/arch/arm64/boot/dts/qcom/sdm845-db845c.dts > >> +++ b/arch/arm64/boot/dts/qcom/sdm845-db845c.dts > >> @@ -301,6 +301,18 @@ regulators-0 { > >> vdd-l26-supply = <&vreg_s3a_1p35>; > >> vin-lvs-1-2-supply = <&vreg_s4a_1p8>; > >> > >> + vreg_lvs1a_1p8: lvs1 { > >> + regulator-min-microvolt = <1800000>; > >> + regulator-max-microvolt = <1800000>; > >> + regulator-always-on; > >> + }; > >> + > >> + vreg_lvs2a_1p8: lvs2 { > >> + regulator-min-microvolt = <1800000>; > >> + regulator-max-microvolt = <1800000>; > >> + regulator-always-on; > >> + }; > >> + > >> vreg_s3a_1p35: smps3 { > >> regulator-min-microvolt = <1352000>; > >> regulator-max-microvolt = <1352000>; > >> @@ -381,18 +393,6 @@ vreg_l26a_1p2: ldo26 { > >> regulator-max-microvolt = <1200000>; > >> regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>; > >> }; > >> - > >> - vreg_lvs1a_1p8: lvs1 { > >> - regulator-min-microvolt = <1800000>; > >> - regulator-max-microvolt = <1800000>; > >> - regulator-always-on; > >> - }; > >> - > >> - vreg_lvs2a_1p8: lvs2 { > >> - regulator-min-microvolt = <1800000>; > >> - regulator-max-microvolt = <1800000>; > >> - regulator-always-on; > >> - }; > > > > This is a hack, but it at least feels less bad than reverting the > > async probe patch. I'll leave it to Bjorn to decide if he's OK with > > it. Personally, it feels like this would deserve a comment in the dts > > to document that these regulators need to be listed first. > > > > Ideally, we could still work towards a root cause. I added a few more > > ideas to help with root causing in reply to the original thread about > > this. > > > > https://lore.kernel.org/r/CAD=FV=UKyjRNZG-ED2meUAR9aXdco+AbUTHiKixTzjCkaJbjTg@mail.gmail.com/ > > We do not shape DTS based on given OS behavior. AOSP needs this, BSD > needs that and Linux needs something else. Next time someone will move > these regulators down because on his system probing is from end of list, > not beginning and he has the same problem. > > No, really, are we going to reshuffle nodes because AOSP needs it? Hi, other than the fact that I reproduced it on AOSP, there is nothing AOSP specific in this patch. I'm sure there may be another platforms/OS (which load kernel modules from a ramdisk) that may trip on this bug. But I can try reproducing it on an OS of your choice if it helps. Regards, Amit Pundir > > Best regards, > Krzysztof >
On 07/06/2023 11:17, Amit Pundir wrote: > On Wed, 7 Jun 2023 at 13:19, Krzysztof Kozlowski > <krzysztof.kozlowski@linaro.org> wrote: >> >> On 07/06/2023 01:34, Doug Anderson wrote: >>> Hi, >>> >>> On Fri, Jun 2, 2023 at 9:12 AM Amit Pundir <amit.pundir@linaro.org> wrote: >>>> >>>> Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators >>>> list to workaround a boot regression uncovered by the upstream >>>> commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator: >>>> qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS""). >>>> >>>> Without this fix DB845c fail to boot at times because one of the >>>> lvs1 or lvs2 regulators fail to turn ON in time. >>>> >>>> Link: https://lore.kernel.org/all/CAMi1Hd1avQDcDQf137m2auz2znov4XL8YGrLZsw5edb-NtRJRw@mail.gmail.com/ >>>> Signed-off-by: Amit Pundir <amit.pundir@linaro.org> >>>> --- >>>> arch/arm64/boot/dts/qcom/sdm845-db845c.dts | 24 +++++++++++----------- >>>> 1 file changed, 12 insertions(+), 12 deletions(-) >>>> >>>> diff --git a/arch/arm64/boot/dts/qcom/sdm845-db845c.dts b/arch/arm64/boot/dts/qcom/sdm845-db845c.dts >>>> index e14fe9bbb386..df2fde9063dc 100644 >>>> --- a/arch/arm64/boot/dts/qcom/sdm845-db845c.dts >>>> +++ b/arch/arm64/boot/dts/qcom/sdm845-db845c.dts >>>> @@ -301,6 +301,18 @@ regulators-0 { >>>> vdd-l26-supply = <&vreg_s3a_1p35>; >>>> vin-lvs-1-2-supply = <&vreg_s4a_1p8>; >>>> >>>> + vreg_lvs1a_1p8: lvs1 { >>>> + regulator-min-microvolt = <1800000>; >>>> + regulator-max-microvolt = <1800000>; >>>> + regulator-always-on; >>>> + }; >>>> + >>>> + vreg_lvs2a_1p8: lvs2 { >>>> + regulator-min-microvolt = <1800000>; >>>> + regulator-max-microvolt = <1800000>; >>>> + regulator-always-on; >>>> + }; >>>> + >>>> vreg_s3a_1p35: smps3 { >>>> regulator-min-microvolt = <1352000>; >>>> regulator-max-microvolt = <1352000>; >>>> @@ -381,18 +393,6 @@ vreg_l26a_1p2: ldo26 { >>>> regulator-max-microvolt = <1200000>; >>>> regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>; >>>> }; >>>> - >>>> - vreg_lvs1a_1p8: lvs1 { >>>> - regulator-min-microvolt = <1800000>; >>>> - regulator-max-microvolt = <1800000>; >>>> - regulator-always-on; >>>> - }; >>>> - >>>> - vreg_lvs2a_1p8: lvs2 { >>>> - regulator-min-microvolt = <1800000>; >>>> - regulator-max-microvolt = <1800000>; >>>> - regulator-always-on; >>>> - }; >>> >>> This is a hack, but it at least feels less bad than reverting the >>> async probe patch. I'll leave it to Bjorn to decide if he's OK with >>> it. Personally, it feels like this would deserve a comment in the dts >>> to document that these regulators need to be listed first. >>> >>> Ideally, we could still work towards a root cause. I added a few more >>> ideas to help with root causing in reply to the original thread about >>> this. >>> >>> https://lore.kernel.org/r/CAD=FV=UKyjRNZG-ED2meUAR9aXdco+AbUTHiKixTzjCkaJbjTg@mail.gmail.com/ >> >> We do not shape DTS based on given OS behavior. AOSP needs this, BSD >> needs that and Linux needs something else. Next time someone will move >> these regulators down because on his system probing is from end of list, >> not beginning and he has the same problem. >> >> No, really, are we going to reshuffle nodes because AOSP needs it? > > Hi, other than the fact that I reproduced it on AOSP, there is nothing > AOSP specific in this patch. I'm sure there may be another > platforms/OS (which load kernel modules from a ramdisk) that may trip > on this bug. But I can try reproducing it on an OS of your choice if > it helps. I wrote earlier imaginary system where RPM driver loads the regulators from the end. It would require re-shuffling to previous order of the nodes. Feel free to change the RPM drivers to simulate it and you should see that your patch stops helping. The problem looks like in missing consumers, missing probe dependencies or something in the driver how it handles these. DTS should not be used for solving OS related problems. Best regards, Krzysztof
On Wed, 7 Jun 2023 at 15:46, Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> wrote: > > > The problem looks like in missing consumers, missing probe dependencies > or something in the driver how it handles these. Missing consumers seem to be the case here, if I'm reading the $debugfs/regulator/regulator_summary correctly(?) https://www.irccloud.com/pastebin/raw/2jwn0EQw. lvs1 and lvs2 sysfs entries in /sys/class/regulator/ do not list any consumers explicitly either. Regards, Amit Pundir > > DTS should not be used for solving OS related problems. > > Best regards, > Krzysztof >
Hi, On Thu, Jun 8, 2023 at 10:27 AM Amit Pundir <amit.pundir@linaro.org> wrote: > > On Wed, 7 Jun 2023 at 15:46, Krzysztof Kozlowski > <krzysztof.kozlowski@linaro.org> wrote: > > > > > > The problem looks like in missing consumers, missing probe dependencies > > or something in the driver how it handles these. > > Missing consumers seem to be the case here, if I'm reading the > $debugfs/regulator/regulator_summary correctly(?) > https://www.irccloud.com/pastebin/raw/2jwn0EQw. > lvs1 and lvs2 sysfs entries in /sys/class/regulator/ do not list any > consumers explicitly either. They are marked as always-on regulators, though. The lack of an explicit consumer in device tree shouldn't really matter. I don't think this is the source of your problem. -Doug
Linux regression tracking (Thorsten Leemhuis)
June 14, 2023, 6:18 p.m. UTC |
#8
Addressed
Unaddressed
On 02.06.23 18:12, Amit Pundir wrote: > Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators > list to workaround a boot regression uncovered by the upstream > commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator: > qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS""). > > Without this fix DB845c fail to boot at times because one of the > lvs1 or lvs2 regulators fail to turn ON in time. /me waves friendly FWIW, as it's not obvious: this... > Link: https://lore.kernel.org/all/CAMi1Hd1avQDcDQf137m2auz2znov4XL8YGrLZsw5edb-NtRJRw@mail.gmail.com/ ...is a report about a regression. One that we could still solve before 6.4 is out. One I'll likely will point Linus to, unless a fix comes into sight. When I noticed the reluctant replies to this patch I earlier today asked in the thread with the report what the plan forward was: https://lore.kernel.org/all/CAD%3DFV%3DV-h4EUKHCM9UivsFHRsJPY5sAiwXV3a1hUX9DUMkkxdg@mail.gmail.com/ Dough there replied: ``` Of the two proposals made (the revert vs. the reordering of the dts), the reordering of the dts seems better. It only affects the one buggy board (rather than preventing us to move to async probe for everyone) and it also has a chance of actually fixing something (changing the order that regulators probe in rpmh-regulator might legitimately work around the problem). That being said, just like the revert the dts reordering is still just papering over the problem and is fragile / not guaranteed to work forever. ``` Papering over obviously is not good, but has anyone a better idea to fix this? Or is "not fixing" for some reason an viable option here? Ciao, Thorsten > Signed-off-by: Amit Pundir <amit.pundir@linaro.org> > --- > arch/arm64/boot/dts/qcom/sdm845-db845c.dts | 24 +++++++++++----------- > 1 file changed, 12 insertions(+), 12 deletions(-) > > diff --git a/arch/arm64/boot/dts/qcom/sdm845-db845c.dts b/arch/arm64/boot/dts/qcom/sdm845-db845c.dts > index e14fe9bbb386..df2fde9063dc 100644 > --- a/arch/arm64/boot/dts/qcom/sdm845-db845c.dts > +++ b/arch/arm64/boot/dts/qcom/sdm845-db845c.dts > @@ -301,6 +301,18 @@ regulators-0 { > vdd-l26-supply = <&vreg_s3a_1p35>; > vin-lvs-1-2-supply = <&vreg_s4a_1p8>; > > + vreg_lvs1a_1p8: lvs1 { > + regulator-min-microvolt = <1800000>; > + regulator-max-microvolt = <1800000>; > + regulator-always-on; > + }; > + > + vreg_lvs2a_1p8: lvs2 { > + regulator-min-microvolt = <1800000>; > + regulator-max-microvolt = <1800000>; > + regulator-always-on; > + }; > + > vreg_s3a_1p35: smps3 { > regulator-min-microvolt = <1352000>; > regulator-max-microvolt = <1352000>; > @@ -381,18 +393,6 @@ vreg_l26a_1p2: ldo26 { > regulator-max-microvolt = <1200000>; > regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>; > }; > - > - vreg_lvs1a_1p8: lvs1 { > - regulator-min-microvolt = <1800000>; > - regulator-max-microvolt = <1800000>; > - regulator-always-on; > - }; > - > - vreg_lvs2a_1p8: lvs2 { > - regulator-min-microvolt = <1800000>; > - regulator-max-microvolt = <1800000>; > - regulator-always-on; > - }; > }; > > regulators-1 {
On 14/06/2023 20:18, Linux regression tracking (Thorsten Leemhuis) wrote: > On 02.06.23 18:12, Amit Pundir wrote: >> Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators >> list to workaround a boot regression uncovered by the upstream >> commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator: >> qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS""). >> >> Without this fix DB845c fail to boot at times because one of the >> lvs1 or lvs2 regulators fail to turn ON in time. > > /me waves friendly > > FWIW, as it's not obvious: this... > >> Link: https://lore.kernel.org/all/CAMi1Hd1avQDcDQf137m2auz2znov4XL8YGrLZsw5edb-NtRJRw@mail.gmail.com/ > > ...is a report about a regression. One that we could still solve before > 6.4 is out. One I'll likely will point Linus to, unless a fix comes into > sight. > > When I noticed the reluctant replies to this patch I earlier today asked > in the thread with the report what the plan forward was: > https://lore.kernel.org/all/CAD%3DFV%3DV-h4EUKHCM9UivsFHRsJPY5sAiwXV3a1hUX9DUMkkxdg@mail.gmail.com/ > > Dough there replied: > > ``` > Of the two proposals made (the revert vs. the reordering of the dts), > the reordering of the dts seems better. It only affects the one buggy > board (rather than preventing us to move to async probe for everyone) > and it also has a chance of actually fixing something (changing the > order that regulators probe in rpmh-regulator might legitimately work > around the problem). That being said, just like the revert the dts > reordering is still just papering over the problem and is fragile / > not guaranteed to work forever. > ``` > > Papering over obviously is not good, but has anyone a better idea to fix > this? Or is "not fixing" for some reason an viable option here? > I understand there is a regression, although kernel is not mainline (hash df7443a96851 is unknown) and the only solutions were papering the problem. Reverting commit is a temporary workaround. Moving nodes in DTS is not acceptable because it hides actual problem and only solves this one particular observed problem, while actual issue is still there. It would be nice to be able to reproduce it on real mainline with normal operating system (not AOSP) - with ramdiks/without/whatever. So far no one did it, right? Best regards, Krzysztof
On Thu, 15 Jun 2023 at 00:17, Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> wrote: > > On 14/06/2023 20:18, Linux regression tracking (Thorsten Leemhuis) wrote: > > On 02.06.23 18:12, Amit Pundir wrote: > >> Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators > >> list to workaround a boot regression uncovered by the upstream > >> commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator: > >> qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS""). > >> > >> Without this fix DB845c fail to boot at times because one of the > >> lvs1 or lvs2 regulators fail to turn ON in time. > > > > /me waves friendly > > > > FWIW, as it's not obvious: this... > > > >> Link: https://lore.kernel.org/all/CAMi1Hd1avQDcDQf137m2auz2znov4XL8YGrLZsw5edb-NtRJRw@mail.gmail.com/ > > > > ...is a report about a regression. One that we could still solve before > > 6.4 is out. One I'll likely will point Linus to, unless a fix comes into > > sight. > > > > When I noticed the reluctant replies to this patch I earlier today asked > > in the thread with the report what the plan forward was: > > https://lore.kernel.org/all/CAD%3DFV%3DV-h4EUKHCM9UivsFHRsJPY5sAiwXV3a1hUX9DUMkkxdg@mail.gmail.com/ > > > > Dough there replied: > > > > ``` > > Of the two proposals made (the revert vs. the reordering of the dts), > > the reordering of the dts seems better. It only affects the one buggy > > board (rather than preventing us to move to async probe for everyone) > > and it also has a chance of actually fixing something (changing the > > order that regulators probe in rpmh-regulator might legitimately work > > around the problem). That being said, just like the revert the dts > > reordering is still just papering over the problem and is fragile / > > not guaranteed to work forever. > > ``` > > > > Papering over obviously is not good, but has anyone a better idea to fix > > this? Or is "not fixing" for some reason an viable option here? > > > > I understand there is a regression, although kernel is not mainline > (hash df7443a96851 is unknown) and the only solutions were papering the > problem. Reverting commit is a temporary workaround. Moving nodes in DTS > is not acceptable because it hides actual problem and only solves this > one particular observed problem, while actual issue is still there. It > would be nice to be able to reproduce it on real mainline with normal > operating system (not AOSP) - with ramdiks/without/whatever. So far no > one did it, right? No, I did not try non-AOSP system yet. I'll try it tomorrow, if that helps. With mainline hash. Regards, Amit Pundir
Hi, On Wed, Jun 14, 2023 at 11:47 AM Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> wrote: > > On 14/06/2023 20:18, Linux regression tracking (Thorsten Leemhuis) wrote: > > On 02.06.23 18:12, Amit Pundir wrote: > >> Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators > >> list to workaround a boot regression uncovered by the upstream > >> commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator: > >> qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS""). > >> > >> Without this fix DB845c fail to boot at times because one of the > >> lvs1 or lvs2 regulators fail to turn ON in time. > > > > /me waves friendly > > > > FWIW, as it's not obvious: this... > > > >> Link: https://lore.kernel.org/all/CAMi1Hd1avQDcDQf137m2auz2znov4XL8YGrLZsw5edb-NtRJRw@mail.gmail.com/ > > > > ...is a report about a regression. One that we could still solve before > > 6.4 is out. One I'll likely will point Linus to, unless a fix comes into > > sight. > > > > When I noticed the reluctant replies to this patch I earlier today asked > > in the thread with the report what the plan forward was: > > https://lore.kernel.org/all/CAD%3DFV%3DV-h4EUKHCM9UivsFHRsJPY5sAiwXV3a1hUX9DUMkkxdg@mail.gmail.com/ > > > > Dough there replied: > > > > ``` > > Of the two proposals made (the revert vs. the reordering of the dts), > > the reordering of the dts seems better. It only affects the one buggy > > board (rather than preventing us to move to async probe for everyone) > > and it also has a chance of actually fixing something (changing the > > order that regulators probe in rpmh-regulator might legitimately work > > around the problem). That being said, just like the revert the dts > > reordering is still just papering over the problem and is fragile / > > not guaranteed to work forever. > > ``` > > > > Papering over obviously is not good, but has anyone a better idea to fix > > this? Or is "not fixing" for some reason an viable option here? > > > > I understand there is a regression, although kernel is not mainline > (hash df7443a96851 is unknown) and the only solutions were papering the > problem. Reverting commit is a temporary workaround. Moving nodes in DTS > is not acceptable because it hides actual problem and only solves this > one particular observed problem, while actual issue is still there. It > would be nice to be able to reproduce it on real mainline with normal > operating system (not AOSP) - with ramdiks/without/whatever. So far no > one did it, right? The worry I have about the revert here is that it will never be able to be undone and that doesn't seem great long term. I'm all for a temporary revert to fix a problem while the root cause is understood, but in this case I have a hard time believing that we'll make more progress towards a root cause once the revert lands. All the investigation we've done so far seems to indicate that the revert only fixes the problem by luck... I completely agree that moving the nodes in the DTS is a hack and just hides the problem. However, it also at least limits the workaround to the one board showing the problem and doesn't mean we're stuck with synchronous probe for rpmh-regulator for all eternity because nobody can understand this timing issue on db845c. -Doug
On Thu, 15 Jun 2023 at 00:38, Amit Pundir <amit.pundir@linaro.org> wrote: > > On Thu, 15 Jun 2023 at 00:17, Krzysztof Kozlowski > <krzysztof.kozlowski@linaro.org> wrote: > > > > On 14/06/2023 20:18, Linux regression tracking (Thorsten Leemhuis) wrote: > > > On 02.06.23 18:12, Amit Pundir wrote: > > >> Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators > > >> list to workaround a boot regression uncovered by the upstream > > >> commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator: > > >> qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS""). > > >> > > >> Without this fix DB845c fail to boot at times because one of the > > >> lvs1 or lvs2 regulators fail to turn ON in time. > > > > > > /me waves friendly > > > > > > FWIW, as it's not obvious: this... > > > > > >> Link: https://lore.kernel.org/all/CAMi1Hd1avQDcDQf137m2auz2znov4XL8YGrLZsw5edb-NtRJRw@mail.gmail.com/ > > > > > > ...is a report about a regression. One that we could still solve before > > > 6.4 is out. One I'll likely will point Linus to, unless a fix comes into > > > sight. > > > > > > When I noticed the reluctant replies to this patch I earlier today asked > > > in the thread with the report what the plan forward was: > > > https://lore.kernel.org/all/CAD%3DFV%3DV-h4EUKHCM9UivsFHRsJPY5sAiwXV3a1hUX9DUMkkxdg@mail.gmail.com/ > > > > > > Dough there replied: > > > > > > ``` > > > Of the two proposals made (the revert vs. the reordering of the dts), > > > the reordering of the dts seems better. It only affects the one buggy > > > board (rather than preventing us to move to async probe for everyone) > > > and it also has a chance of actually fixing something (changing the > > > order that regulators probe in rpmh-regulator might legitimately work > > > around the problem). That being said, just like the revert the dts > > > reordering is still just papering over the problem and is fragile / > > > not guaranteed to work forever. > > > ``` > > > > > > Papering over obviously is not good, but has anyone a better idea to fix > > > this? Or is "not fixing" for some reason an viable option here? > > > > > > > I understand there is a regression, although kernel is not mainline > > (hash df7443a96851 is unknown) and the only solutions were papering the > > problem. Reverting commit is a temporary workaround. Moving nodes in DTS > > is not acceptable because it hides actual problem and only solves this > > one particular observed problem, while actual issue is still there. It > > would be nice to be able to reproduce it on real mainline with normal > > operating system (not AOSP) - with ramdiks/without/whatever. So far no > > one did it, right? > > No, I did not try non-AOSP system yet. I'll try it tomorrow, if that > helps. With mainline hash. Hi, here is the crash report on db845c running vanilla v6.4-rc6 with a debian build https://bugs.linaro.org/attachment.cgi?id=1142 And fwiw here is the db845c crash log with AOSP running vanilla v6.4-rc6 https://bugs.linaro.org/attachment.cgi?id=1141 Regards, Amit Pundir PS: rootfs in this bug report doesn't matter much because I'm loading all the kernel modules from a ramdisk and in the case of a crash the UFS doesn't probe anyway.
On 15/06/2023 15:47, Amit Pundir wrote: > On Thu, 15 Jun 2023 at 00:38, Amit Pundir <amit.pundir@linaro.org> wrote: >> >> On Thu, 15 Jun 2023 at 00:17, Krzysztof Kozlowski >> <krzysztof.kozlowski@linaro.org> wrote: >>> >>> On 14/06/2023 20:18, Linux regression tracking (Thorsten Leemhuis) wrote: >>>> On 02.06.23 18:12, Amit Pundir wrote: >>>>> Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators >>>>> list to workaround a boot regression uncovered by the upstream >>>>> commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator: >>>>> qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS""). >>>>> >>>>> Without this fix DB845c fail to boot at times because one of the >>>>> lvs1 or lvs2 regulators fail to turn ON in time. >>>> >>>> /me waves friendly >>>> >>>> FWIW, as it's not obvious: this... >>>> >>>>> Link: https://lore.kernel.org/all/CAMi1Hd1avQDcDQf137m2auz2znov4XL8YGrLZsw5edb-NtRJRw@mail.gmail.com/ >>>> >>>> ...is a report about a regression. One that we could still solve before >>>> 6.4 is out. One I'll likely will point Linus to, unless a fix comes into >>>> sight. >>>> >>>> When I noticed the reluctant replies to this patch I earlier today asked >>>> in the thread with the report what the plan forward was: >>>> https://lore.kernel.org/all/CAD%3DFV%3DV-h4EUKHCM9UivsFHRsJPY5sAiwXV3a1hUX9DUMkkxdg@mail.gmail.com/ >>>> >>>> Dough there replied: >>>> >>>> ``` >>>> Of the two proposals made (the revert vs. the reordering of the dts), >>>> the reordering of the dts seems better. It only affects the one buggy >>>> board (rather than preventing us to move to async probe for everyone) >>>> and it also has a chance of actually fixing something (changing the >>>> order that regulators probe in rpmh-regulator might legitimately work >>>> around the problem). That being said, just like the revert the dts >>>> reordering is still just papering over the problem and is fragile / >>>> not guaranteed to work forever. >>>> ``` >>>> >>>> Papering over obviously is not good, but has anyone a better idea to fix >>>> this? Or is "not fixing" for some reason an viable option here? >>>> >>> >>> I understand there is a regression, although kernel is not mainline >>> (hash df7443a96851 is unknown) and the only solutions were papering the >>> problem. Reverting commit is a temporary workaround. Moving nodes in DTS >>> is not acceptable because it hides actual problem and only solves this >>> one particular observed problem, while actual issue is still there. It >>> would be nice to be able to reproduce it on real mainline with normal >>> operating system (not AOSP) - with ramdiks/without/whatever. So far no >>> one did it, right? >> >> No, I did not try non-AOSP system yet. I'll try it tomorrow, if that >> helps. With mainline hash. > > Hi, here is the crash report on db845c running vanilla v6.4-rc6 with a > debian build https://bugs.linaro.org/attachment.cgi?id=1142 > > And fwiw here is the db845c crash log with AOSP running vanilla > v6.4-rc6 https://bugs.linaro.org/attachment.cgi?id=1141 > > Regards, > Amit Pundir > > PS: rootfs in this bug report doesn't matter much because I'm loading > all the kernel modules from a ramdisk and in the case of a crash the > UFS doesn't probe anyway. I just tried current next with defconfig (I could not find your config, neither here, nor in your previous mail thread nor in bugzilla). Also with REGULATOR_QCOM_RPMH as module. I tried also v6.4-rc6 - also defconfig with default and module REGULATOR_QCOM_RPMH. All the cases work on my RB3 - no warnings reported. If you do not use defconfig, then in all reports please mention the differences (the best) or at least attach it. Best regards, Krzysztof
On Thu, 15 Jun 2023 at 20:33, Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> wrote: > > On 15/06/2023 15:47, Amit Pundir wrote: > > On Thu, 15 Jun 2023 at 00:38, Amit Pundir <amit.pundir@linaro.org> wrote: > >> > >> On Thu, 15 Jun 2023 at 00:17, Krzysztof Kozlowski > >> <krzysztof.kozlowski@linaro.org> wrote: > >>> > >>> On 14/06/2023 20:18, Linux regression tracking (Thorsten Leemhuis) wrote: > >>>> On 02.06.23 18:12, Amit Pundir wrote: > >>>>> Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators > >>>>> list to workaround a boot regression uncovered by the upstream > >>>>> commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator: > >>>>> qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS""). > >>>>> > >>>>> Without this fix DB845c fail to boot at times because one of the > >>>>> lvs1 or lvs2 regulators fail to turn ON in time. > >>>> > >>>> /me waves friendly > >>>> > >>>> FWIW, as it's not obvious: this... > >>>> > >>>>> Link: https://lore.kernel.org/all/CAMi1Hd1avQDcDQf137m2auz2znov4XL8YGrLZsw5edb-NtRJRw@mail.gmail.com/ > >>>> > >>>> ...is a report about a regression. One that we could still solve before > >>>> 6.4 is out. One I'll likely will point Linus to, unless a fix comes into > >>>> sight. > >>>> > >>>> When I noticed the reluctant replies to this patch I earlier today asked > >>>> in the thread with the report what the plan forward was: > >>>> https://lore.kernel.org/all/CAD%3DFV%3DV-h4EUKHCM9UivsFHRsJPY5sAiwXV3a1hUX9DUMkkxdg@mail.gmail.com/ > >>>> > >>>> Dough there replied: > >>>> > >>>> ``` > >>>> Of the two proposals made (the revert vs. the reordering of the dts), > >>>> the reordering of the dts seems better. It only affects the one buggy > >>>> board (rather than preventing us to move to async probe for everyone) > >>>> and it also has a chance of actually fixing something (changing the > >>>> order that regulators probe in rpmh-regulator might legitimately work > >>>> around the problem). That being said, just like the revert the dts > >>>> reordering is still just papering over the problem and is fragile / > >>>> not guaranteed to work forever. > >>>> ``` > >>>> > >>>> Papering over obviously is not good, but has anyone a better idea to fix > >>>> this? Or is "not fixing" for some reason an viable option here? > >>>> > >>> > >>> I understand there is a regression, although kernel is not mainline > >>> (hash df7443a96851 is unknown) and the only solutions were papering the > >>> problem. Reverting commit is a temporary workaround. Moving nodes in DTS > >>> is not acceptable because it hides actual problem and only solves this > >>> one particular observed problem, while actual issue is still there. It > >>> would be nice to be able to reproduce it on real mainline with normal > >>> operating system (not AOSP) - with ramdiks/without/whatever. So far no > >>> one did it, right? > >> > >> No, I did not try non-AOSP system yet. I'll try it tomorrow, if that > >> helps. With mainline hash. > > > > Hi, here is the crash report on db845c running vanilla v6.4-rc6 with a > > debian build https://bugs.linaro.org/attachment.cgi?id=1142 > > > > And fwiw here is the db845c crash log with AOSP running vanilla > > v6.4-rc6 https://bugs.linaro.org/attachment.cgi?id=1141 > > > > Regards, > > Amit Pundir > > > > PS: rootfs in this bug report doesn't matter much because I'm loading > > all the kernel modules from a ramdisk and in the case of a crash the > > UFS doesn't probe anyway. > > I just tried current next with defconfig (I could not find your config, > neither here, nor in your previous mail thread nor in bugzilla). Also > with REGULATOR_QCOM_RPMH as module. > > I tried also v6.4-rc6 - also defconfig with default and module > REGULATOR_QCOM_RPMH. > > All the cases work on my RB3 - no warnings reported. > > If you do not use defconfig, then in all reports please mention the > differences (the best) or at least attach it. Argh.. Sorry about that. Big mistake from my side. I did want to upload my defconfig but forgot. Defconfig plays a key role because, as I mentioned in one of my previous email, it is a timing/race bug and if I do any much changes in my defconfig (i.e. enable ftrace for example or as little as add printk in qcom_rpmh_regulator code) then I can't reproduce this bug. So needless to say that I can't reproduce this bug with default arm64 defconfig. Please find my custom (but upstream) defconfig here https://bugs.linaro.org/attachment.cgi?id=1143 and prebuilt binaries here https://people.linaro.org/~amit.pundir/db845c-userdebug/rpmh_bug/. "fastboot flash boot ./boot.img-6.4-rc6 reboot" and/or a few (<5) reboots should be enough to trigger the crash. I have downloaded the initrd from here https://snapshots.linaro.org/96boards/dragonboard845c/linaro/debian/569/initrd.img-5.15.0-qcomlt-arm64 but edited ramdisk/init to run "load_module" function early in the boot and ramdisk/conf/initramfs.conf has "MODULES=list" instead of "MODULES=most", where all the kernel modules are listed at /etc/initramfs-tools/modules. Regards, Amit Pundir
On Thu, 15 Jun 2023 at 21:39, Amit Pundir <amit.pundir@linaro.org> wrote: > > On Thu, 15 Jun 2023 at 20:33, Krzysztof Kozlowski > <krzysztof.kozlowski@linaro.org> wrote: > > > > On 15/06/2023 15:47, Amit Pundir wrote: > > > On Thu, 15 Jun 2023 at 00:38, Amit Pundir <amit.pundir@linaro.org> wrote: > > >> > > >> On Thu, 15 Jun 2023 at 00:17, Krzysztof Kozlowski > > >> <krzysztof.kozlowski@linaro.org> wrote: > > >>> > > >>> On 14/06/2023 20:18, Linux regression tracking (Thorsten Leemhuis) wrote: > > >>>> On 02.06.23 18:12, Amit Pundir wrote: > > >>>>> Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators > > >>>>> list to workaround a boot regression uncovered by the upstream > > >>>>> commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator: > > >>>>> qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS""). > > >>>>> > > >>>>> Without this fix DB845c fail to boot at times because one of the > > >>>>> lvs1 or lvs2 regulators fail to turn ON in time. > > >>>> > > >>>> /me waves friendly > > >>>> > > >>>> FWIW, as it's not obvious: this... > > >>>> > > >>>>> Link: https://lore.kernel.org/all/CAMi1Hd1avQDcDQf137m2auz2znov4XL8YGrLZsw5edb-NtRJRw@mail.gmail.com/ > > >>>> > > >>>> ...is a report about a regression. One that we could still solve before > > >>>> 6.4 is out. One I'll likely will point Linus to, unless a fix comes into > > >>>> sight. > > >>>> > > >>>> When I noticed the reluctant replies to this patch I earlier today asked > > >>>> in the thread with the report what the plan forward was: > > >>>> https://lore.kernel.org/all/CAD%3DFV%3DV-h4EUKHCM9UivsFHRsJPY5sAiwXV3a1hUX9DUMkkxdg@mail.gmail.com/ > > >>>> > > >>>> Dough there replied: > > >>>> > > >>>> ``` > > >>>> Of the two proposals made (the revert vs. the reordering of the dts), > > >>>> the reordering of the dts seems better. It only affects the one buggy > > >>>> board (rather than preventing us to move to async probe for everyone) > > >>>> and it also has a chance of actually fixing something (changing the > > >>>> order that regulators probe in rpmh-regulator might legitimately work > > >>>> around the problem). That being said, just like the revert the dts > > >>>> reordering is still just papering over the problem and is fragile / > > >>>> not guaranteed to work forever. > > >>>> ``` > > >>>> > > >>>> Papering over obviously is not good, but has anyone a better idea to fix > > >>>> this? Or is "not fixing" for some reason an viable option here? > > >>>> > > >>> > > >>> I understand there is a regression, although kernel is not mainline > > >>> (hash df7443a96851 is unknown) and the only solutions were papering the > > >>> problem. Reverting commit is a temporary workaround. Moving nodes in DTS > > >>> is not acceptable because it hides actual problem and only solves this > > >>> one particular observed problem, while actual issue is still there. It > > >>> would be nice to be able to reproduce it on real mainline with normal > > >>> operating system (not AOSP) - with ramdiks/without/whatever. So far no > > >>> one did it, right? > > >> > > >> No, I did not try non-AOSP system yet. I'll try it tomorrow, if that > > >> helps. With mainline hash. > > > > > > Hi, here is the crash report on db845c running vanilla v6.4-rc6 with a > > > debian build https://bugs.linaro.org/attachment.cgi?id=1142 > > > > > > And fwiw here is the db845c crash log with AOSP running vanilla > > > v6.4-rc6 https://bugs.linaro.org/attachment.cgi?id=1141 > > > > > > Regards, > > > Amit Pundir > > > > > > PS: rootfs in this bug report doesn't matter much because I'm loading > > > all the kernel modules from a ramdisk and in the case of a crash the > > > UFS doesn't probe anyway. > > > > I just tried current next with defconfig (I could not find your config, > > neither here, nor in your previous mail thread nor in bugzilla). Also > > with REGULATOR_QCOM_RPMH as module. > > > > I tried also v6.4-rc6 - also defconfig with default and module > > REGULATOR_QCOM_RPMH. > > > > All the cases work on my RB3 - no warnings reported. > > > > If you do not use defconfig, then in all reports please mention the > > differences (the best) or at least attach it. > > Argh.. Sorry about that. Big mistake from my side. I did want to > upload my defconfig but forgot. Defconfig plays a key role because, as > I mentioned in one of my previous email, it is a timing/race bug and > if I do any much changes in my defconfig (i.e. enable ftrace for > example or as little as add printk in qcom_rpmh_regulator code) then I > can't reproduce this bug. So needless to say that I can't reproduce > this bug with default arm64 defconfig. > > Please find my custom (but upstream) defconfig here > https://bugs.linaro.org/attachment.cgi?id=1143 and prebuilt binaries > here https://people.linaro.org/~amit.pundir/db845c-userdebug/rpmh_bug/. > "fastboot flash boot ./boot.img-6.4-rc6 reboot" and/or a few (<5) > reboots should be enough to trigger the crash. > > I have downloaded the initrd from here > https://snapshots.linaro.org/96boards/dragonboard845c/linaro/debian/569/initrd.img-5.15.0-qcomlt-arm64 > but edited ramdisk/init to run "load_module" function early in the > boot and ramdisk/conf/initramfs.conf has "MODULES=list" instead of > "MODULES=most", where all the kernel modules are listed at > /etc/initramfs-tools/modules. Sorry it is ramdisk/conf/modules not ramdisk/etc/initramfs-tools/modules. > > Regards, > Amit Pundir
On 15/06/2023 18:09, Amit Pundir wrote: > On Thu, 15 Jun 2023 at 20:33, Krzysztof Kozlowski > <krzysztof.kozlowski@linaro.org> wrote: >> >> On 15/06/2023 15:47, Amit Pundir wrote: >>> On Thu, 15 Jun 2023 at 00:38, Amit Pundir <amit.pundir@linaro.org> wrote: >>>> >>>> On Thu, 15 Jun 2023 at 00:17, Krzysztof Kozlowski >>>> <krzysztof.kozlowski@linaro.org> wrote: >>>>> >>>>> On 14/06/2023 20:18, Linux regression tracking (Thorsten Leemhuis) wrote: >>>>>> On 02.06.23 18:12, Amit Pundir wrote: >>>>>>> Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators >>>>>>> list to workaround a boot regression uncovered by the upstream >>>>>>> commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator: >>>>>>> qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS""). >>>>>>> >>>>>>> Without this fix DB845c fail to boot at times because one of the >>>>>>> lvs1 or lvs2 regulators fail to turn ON in time. >>>>>> >>>>>> /me waves friendly >>>>>> >>>>>> FWIW, as it's not obvious: this... >>>>>> >>>>>>> Link: https://lore.kernel.org/all/CAMi1Hd1avQDcDQf137m2auz2znov4XL8YGrLZsw5edb-NtRJRw@mail.gmail.com/ >>>>>> >>>>>> ...is a report about a regression. One that we could still solve before >>>>>> 6.4 is out. One I'll likely will point Linus to, unless a fix comes into >>>>>> sight. >>>>>> >>>>>> When I noticed the reluctant replies to this patch I earlier today asked >>>>>> in the thread with the report what the plan forward was: >>>>>> https://lore.kernel.org/all/CAD%3DFV%3DV-h4EUKHCM9UivsFHRsJPY5sAiwXV3a1hUX9DUMkkxdg@mail.gmail.com/ >>>>>> >>>>>> Dough there replied: >>>>>> >>>>>> ``` >>>>>> Of the two proposals made (the revert vs. the reordering of the dts), >>>>>> the reordering of the dts seems better. It only affects the one buggy >>>>>> board (rather than preventing us to move to async probe for everyone) >>>>>> and it also has a chance of actually fixing something (changing the >>>>>> order that regulators probe in rpmh-regulator might legitimately work >>>>>> around the problem). That being said, just like the revert the dts >>>>>> reordering is still just papering over the problem and is fragile / >>>>>> not guaranteed to work forever. >>>>>> ``` >>>>>> >>>>>> Papering over obviously is not good, but has anyone a better idea to fix >>>>>> this? Or is "not fixing" for some reason an viable option here? >>>>>> >>>>> >>>>> I understand there is a regression, although kernel is not mainline >>>>> (hash df7443a96851 is unknown) and the only solutions were papering the >>>>> problem. Reverting commit is a temporary workaround. Moving nodes in DTS >>>>> is not acceptable because it hides actual problem and only solves this >>>>> one particular observed problem, while actual issue is still there. It >>>>> would be nice to be able to reproduce it on real mainline with normal >>>>> operating system (not AOSP) - with ramdiks/without/whatever. So far no >>>>> one did it, right? >>>> >>>> No, I did not try non-AOSP system yet. I'll try it tomorrow, if that >>>> helps. With mainline hash. >>> >>> Hi, here is the crash report on db845c running vanilla v6.4-rc6 with a >>> debian build https://bugs.linaro.org/attachment.cgi?id=1142 >>> >>> And fwiw here is the db845c crash log with AOSP running vanilla >>> v6.4-rc6 https://bugs.linaro.org/attachment.cgi?id=1141 >>> >>> Regards, >>> Amit Pundir >>> >>> PS: rootfs in this bug report doesn't matter much because I'm loading >>> all the kernel modules from a ramdisk and in the case of a crash the >>> UFS doesn't probe anyway. >> >> I just tried current next with defconfig (I could not find your config, >> neither here, nor in your previous mail thread nor in bugzilla). Also >> with REGULATOR_QCOM_RPMH as module. >> >> I tried also v6.4-rc6 - also defconfig with default and module >> REGULATOR_QCOM_RPMH. >> >> All the cases work on my RB3 - no warnings reported. >> >> If you do not use defconfig, then in all reports please mention the >> differences (the best) or at least attach it. > > Argh.. Sorry about that. Big mistake from my side. I did want to > upload my defconfig but forgot. Defconfig plays a key role because, as > I mentioned in one of my previous email, it is a timing/race bug and > if I do any much changes in my defconfig (i.e. enable ftrace for > example or as little as add printk in qcom_rpmh_regulator code) then I > can't reproduce this bug. So needless to say that I can't reproduce > this bug with default arm64 defconfig. > > Please find my custom (but upstream) defconfig here > https://bugs.linaro.org/attachment.cgi?id=1143 and prebuilt binaries > here https://people.linaro.org/~amit.pundir/db845c-userdebug/rpmh_bug/. > "fastboot flash boot ./boot.img-6.4-rc6 reboot" and/or a few (<5) > reboots should be enough to trigger the crash. > > I have downloaded the initrd from here > https://snapshots.linaro.org/96boards/dragonboard845c/linaro/debian/569/initrd.img-5.15.0-qcomlt-arm64 > but edited ramdisk/init to run "load_module" function early in the > boot and ramdisk/conf/initramfs.conf has "MODULES=list" instead of > "MODULES=most", where all the kernel modules are listed at > /etc/initramfs-tools/modules. So you have interconnect as module - this is not a supported setup. It might work with if all the modules are loaded very early or might not. Pinctrl is another driver which should be built-in. With your defconfig I see regular issue - console and system dies because of lack of interconnects, most likely. I don't see your WARNs - I just see usual hang. See: https://lore.kernel.org/all/20221021032702.1340963-1-krzysztof.kozlowski@linaro.org/ If you want them to really be modules, then you need to fix all the dependencies (SOFTDEP?), probe ordering glitches. It's not a problem of DTS. Just because something can be built as module, does not mean it will work. We don't test it, we don't work with them as modules. It's kind of the same as here: https://lore.kernel.org/all/ac328b6a-a8e2-873d-4015-814cb4f5588e@canonical.com/ I understand that we might have here regression, if these were working as modules, but I don't think we ever really committed to it. We can as well make it non-module to solve the regression. Best regards, Krzysztof
Hi, On Fri, 16 Jun 2023 at 13:57, Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> wrote: > > > So you have interconnect as module - this is not a supported setup. It > might work with if all the modules are loaded very early or might not. > Pinctrl is another driver which should be built-in. > > With your defconfig I see regular issue - console and system dies > because of lack of interconnects, most likely. I don't see your WARNs - > I just see usual hang. > > See: > https://lore.kernel.org/all/20221021032702.1340963-1-krzysztof.kozlowski@linaro.org/ > > If you want them to really be modules, then you need to fix all the > dependencies (SOFTDEP?), probe ordering glitches. It's not a problem of > DTS. Just because something can be built as module, does not mean it > will work. We don't test it, we don't work with them as modules. I do somewhat agree with most of your arguments but not this one. If a driver doesn't work as a module then it shouldn't be allowed to build as a module. I took a quick look at the history of the interconnect driver and it is tristate from the beginning. And not converted to a modular build later-on like some of the other drivers to support GKI. > > It's kind of the same as here: > https://lore.kernel.org/all/ac328b6a-a8e2-873d-4015-814cb4f5588e@canonical.com/ > > I understand that we might have here regression, if these were working > as modules, but I don't think we ever really committed to it. We can as > well make it non-module to solve the regression. Sure. But since v6.4 is around the corner, can we merge this workaround for now, while a proper fix is being worked upon. Regards, Amit Pundir
On 16/06/2023 19:09, Amit Pundir wrote: > Hi, > > On Fri, 16 Jun 2023 at 13:57, Krzysztof Kozlowski > <krzysztof.kozlowski@linaro.org> wrote: >> >> >> So you have interconnect as module - this is not a supported setup. It >> might work with if all the modules are loaded very early or might not. >> Pinctrl is another driver which should be built-in. >> >> With your defconfig I see regular issue - console and system dies >> because of lack of interconnects, most likely. I don't see your WARNs - >> I just see usual hang. >> >> See: >> https://lore.kernel.org/all/20221021032702.1340963-1-krzysztof.kozlowski@linaro.org/ >> >> If you want them to really be modules, then you need to fix all the >> dependencies (SOFTDEP?), probe ordering glitches. It's not a problem of >> DTS. Just because something can be built as module, does not mean it >> will work. We don't test it, we don't work with them as modules. > > I do somewhat agree with most of your arguments but not this one. If a > driver doesn't work as a module then it shouldn't be allowed to build > as a module. Of course you are right. That's why I am pushing against blindly adding "tristate" by everyone working on GKI. Because such folks like to make them tristate, but not actually test it or work on issues later. That's exactly the case from Google and Samsung patches here: https://lore.kernel.org/all/ac328b6a-a8e2-873d-4015-814cb4f5588e@canonical.com/ and in previous submissions. > I took a quick look at the history of the interconnect > driver and it is tristate from the beginning. And not converted to a > modular build later-on like some of the other drivers to support GKI. OK, maybe it was never actually tested. Or maybe some versions were working on boards where debug serial does not have interconnect, but new chips just followed the pattern without testing? > >> >> It's kind of the same as here: >> https://lore.kernel.org/all/ac328b6a-a8e2-873d-4015-814cb4f5588e@canonical.com/ >> >> I understand that we might have here regression, if these were working >> as modules, but I don't think we ever really committed to it. We can as >> well make it non-module to solve the regression. > > Sure. But since v6.4 is around the corner, can we merge this > workaround for now, while a proper fix is being worked upon. DTS workaround? No. I don't agree. Once it is merged it will not be fixed. I am perfectly fine though with making the interconnect or even rpmh regulator bool instead of tristate. Best regards, Krzysztof
On Sat, 17 Jun 2023 at 12:51, Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> wrote: > > On 16/06/2023 19:09, Amit Pundir wrote: > > Hi, > > > > On Fri, 16 Jun 2023 at 13:57, Krzysztof Kozlowski > > <krzysztof.kozlowski@linaro.org> wrote: > >> > >> > >> So you have interconnect as module - this is not a supported setup. It > >> might work with if all the modules are loaded very early or might not. > >> Pinctrl is another driver which should be built-in. > >> > >> With your defconfig I see regular issue - console and system dies > >> because of lack of interconnects, most likely. I don't see your WARNs - > >> I just see usual hang. > >> > >> See: > >> https://lore.kernel.org/all/20221021032702.1340963-1-krzysztof.kozlowski@linaro.org/ > >> > >> If you want them to really be modules, then you need to fix all the > >> dependencies (SOFTDEP?), probe ordering glitches. It's not a problem of > >> DTS. Just because something can be built as module, does not mean it > >> will work. We don't test it, we don't work with them as modules. > > > > I do somewhat agree with most of your arguments but not this one. If a > > driver doesn't work as a module then it shouldn't be allowed to build > > as a module. > > Of course you are right. That's why I am pushing against blindly adding > "tristate" by everyone working on GKI. Because such folks like to make > them tristate, but not actually test it or work on issues later. > > That's exactly the case from Google and Samsung patches here: > https://lore.kernel.org/all/ac328b6a-a8e2-873d-4015-814cb4f5588e@canonical.com/ > and in previous submissions. > > > I took a quick look at the history of the interconnect > > driver and it is tristate from the beginning. And not converted to a > > modular build later-on like some of the other drivers to support GKI. > > OK, maybe it was never actually tested. Or maybe some versions were > working on boards where debug serial does not have interconnect, but new > chips just followed the pattern without testing? > > > >> > >> It's kind of the same as here: > >> https://lore.kernel.org/all/ac328b6a-a8e2-873d-4015-814cb4f5588e@canonical.com/ > >> > >> I understand that we might have here regression, if these were working > >> as modules, but I don't think we ever really committed to it. We can as > >> well make it non-module to solve the regression. > > > > Sure. But since v6.4 is around the corner, can we merge this > > workaround for now, while a proper fix is being worked upon. > > DTS workaround? No. I don't agree. Once it is merged it will not be fixed. > > I am perfectly fine though with making the interconnect or even rpmh > regulator bool instead of tristate. As Doug also mentioned in one of his earlier emails, this workaround is only limited to one particular board. If I try to change the common interconnect and/or rpmh driver then it will need ack from other stake holders as well and I'll most likely get more pushback from that side. Regards, Amit Pundir
On Wed, Jun 14, 2023 at 12:44:15PM -0700, Doug Anderson wrote: > Hi, > > On Wed, Jun 14, 2023 at 11:47 AM Krzysztof Kozlowski > <krzysztof.kozlowski@linaro.org> wrote: > > > > On 14/06/2023 20:18, Linux regression tracking (Thorsten Leemhuis) wrote: > > > On 02.06.23 18:12, Amit Pundir wrote: > > >> Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators > > >> list to workaround a boot regression uncovered by the upstream > > >> commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator: > > >> qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS""). > > >> > > >> Without this fix DB845c fail to boot at times because one of the > > >> lvs1 or lvs2 regulators fail to turn ON in time. > > > > > > /me waves friendly > > > > > > FWIW, as it's not obvious: this... > > > > > >> Link: https://lore.kernel.org/all/CAMi1Hd1avQDcDQf137m2auz2znov4XL8YGrLZsw5edb-NtRJRw@mail.gmail.com/ > > > > > > ...is a report about a regression. One that we could still solve before > > > 6.4 is out. One I'll likely will point Linus to, unless a fix comes into > > > sight. > > > > > > When I noticed the reluctant replies to this patch I earlier today asked > > > in the thread with the report what the plan forward was: > > > https://lore.kernel.org/all/CAD%3DFV%3DV-h4EUKHCM9UivsFHRsJPY5sAiwXV3a1hUX9DUMkkxdg@mail.gmail.com/ > > > > > > Dough there replied: > > > > > > ``` > > > Of the two proposals made (the revert vs. the reordering of the dts), > > > the reordering of the dts seems better. It only affects the one buggy > > > board (rather than preventing us to move to async probe for everyone) > > > and it also has a chance of actually fixing something (changing the > > > order that regulators probe in rpmh-regulator might legitimately work > > > around the problem). That being said, just like the revert the dts > > > reordering is still just papering over the problem and is fragile / > > > not guaranteed to work forever. > > > ``` > > > > > > Papering over obviously is not good, but has anyone a better idea to fix > > > this? Or is "not fixing" for some reason an viable option here? > > > > > > > I understand there is a regression, although kernel is not mainline > > (hash df7443a96851 is unknown) and the only solutions were papering the > > problem. Reverting commit is a temporary workaround. Moving nodes in DTS > > is not acceptable because it hides actual problem and only solves this > > one particular observed problem, while actual issue is still there. It > > would be nice to be able to reproduce it on real mainline with normal > > operating system (not AOSP) - with ramdiks/without/whatever. So far no > > one did it, right? > > The worry I have about the revert here is that it will never be able > to be undone and that doesn't seem great long term. I'm all for a > temporary revert to fix a problem while the root cause is understood, > but in this case I have a hard time believing that we'll make more > progress towards a root cause once the revert lands. All the > investigation we've done so far seems to indicate that the revert only > fixes the problem by luck... > > I completely agree that moving the nodes in the DTS is a hack and just > hides the problem. However, it also at least limits the workaround to > the one board showing the problem and doesn't mean we're stuck with > synchronous probe for rpmh-regulator for all eternity because nobody > can understand this timing issue on db845c. > I agree that we shouldn't hide this by reverting the regulator change. And as has been stated a few times already, the symptom indicates that we have a misconfigured system. Before accepting a patch just shuffling the bricks, I'd like to see some more analysis of what happens wrt the rpmh right before the timeout. Perhaps the landing team can assist here? Regards, Bjorn
Linux regression tracking (Thorsten Leemhuis)
June 22, 2023, 7:47 a.m. UTC |
#21
Addressed
Unaddressed
Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting for once, to make this easily accessible to everyone. As Linus will likely release 6.4 on this or the following Sunday a quick status inquiry so I can brief him appropriately: is there any hope the regression this patch tried to fix will be resolved any time soon? Doesn't look like it from below message and this thread, but maybe I missed something. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. #regzbot poke On 20.06.23 17:59, Bjorn Andersson wrote: > On Wed, Jun 14, 2023 at 12:44:15PM -0700, Doug Anderson wrote: >> Hi, >> >> On Wed, Jun 14, 2023 at 11:47 AM Krzysztof Kozlowski >> <krzysztof.kozlowski@linaro.org> wrote: >>> >>> On 14/06/2023 20:18, Linux regression tracking (Thorsten Leemhuis) wrote: >>>> On 02.06.23 18:12, Amit Pundir wrote: >>>>> Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators >>>>> list to workaround a boot regression uncovered by the upstream >>>>> commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator: >>>>> qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS""). >>>>> >>>>> Without this fix DB845c fail to boot at times because one of the >>>>> lvs1 or lvs2 regulators fail to turn ON in time. >>>> >>>> /me waves friendly >>>> >>>> FWIW, as it's not obvious: this... >>>> >>>>> Link: https://lore.kernel.org/all/CAMi1Hd1avQDcDQf137m2auz2znov4XL8YGrLZsw5edb-NtRJRw@mail.gmail.com/ >>>> >>>> ...is a report about a regression. One that we could still solve before >>>> 6.4 is out. One I'll likely will point Linus to, unless a fix comes into >>>> sight. >>>> >>>> When I noticed the reluctant replies to this patch I earlier today asked >>>> in the thread with the report what the plan forward was: >>>> https://lore.kernel.org/all/CAD%3DFV%3DV-h4EUKHCM9UivsFHRsJPY5sAiwXV3a1hUX9DUMkkxdg@mail.gmail.com/ >>>> >>>> Dough there replied: >>>> >>>> ``` >>>> Of the two proposals made (the revert vs. the reordering of the dts), >>>> the reordering of the dts seems better. It only affects the one buggy >>>> board (rather than preventing us to move to async probe for everyone) >>>> and it also has a chance of actually fixing something (changing the >>>> order that regulators probe in rpmh-regulator might legitimately work >>>> around the problem). That being said, just like the revert the dts >>>> reordering is still just papering over the problem and is fragile / >>>> not guaranteed to work forever. >>>> ``` >>>> >>>> Papering over obviously is not good, but has anyone a better idea to fix >>>> this? Or is "not fixing" for some reason an viable option here? >>>> >>> >>> I understand there is a regression, although kernel is not mainline >>> (hash df7443a96851 is unknown) and the only solutions were papering the >>> problem. Reverting commit is a temporary workaround. Moving nodes in DTS >>> is not acceptable because it hides actual problem and only solves this >>> one particular observed problem, while actual issue is still there. It >>> would be nice to be able to reproduce it on real mainline with normal >>> operating system (not AOSP) - with ramdiks/without/whatever. So far no >>> one did it, right? >> >> The worry I have about the revert here is that it will never be able >> to be undone and that doesn't seem great long term. I'm all for a >> temporary revert to fix a problem while the root cause is understood, >> but in this case I have a hard time believing that we'll make more >> progress towards a root cause once the revert lands. All the >> investigation we've done so far seems to indicate that the revert only >> fixes the problem by luck... >> >> I completely agree that moving the nodes in the DTS is a hack and just >> hides the problem. However, it also at least limits the workaround to >> the one board showing the problem and doesn't mean we're stuck with >> synchronous probe for rpmh-regulator for all eternity because nobody >> can understand this timing issue on db845c. >> > > I agree that we shouldn't hide this by reverting the regulator change. > > > And as has been stated a few times already, the symptom indicates that > we have a misconfigured system. > > Before accepting a patch just shuffling the bricks, I'd like to see some > more analysis of what happens wrt the rpmh right before the timeout. > Perhaps the landing team can assist here? > > Regards, > Bjorn > >
On Thu, 22 Jun 2023 at 13:17, Linux regression tracking (Thorsten Leemhuis) <regressions@leemhuis.info> wrote: > > Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting > for once, to make this easily accessible to everyone. > > As Linus will likely release 6.4 on this or the following Sunday a quick > status inquiry so I can brief him appropriately: is there any hope the > regression this patch tried to fix will be resolved any time soon? We are most likely to miss v6.4. I'm trying to reproduce the crash with tracing enabled, to share some more debug information. Regards, Amit Pundir > Doesn't look like it from below message and this thread, but maybe I > missed something. > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) > -- > Everything you wanna know about Linux kernel regression tracking: > https://linux-regtracking.leemhuis.info/about/#tldr > If I did something stupid, please tell me, as explained on that page. > > #regzbot poke > > On 20.06.23 17:59, Bjorn Andersson wrote: > > On Wed, Jun 14, 2023 at 12:44:15PM -0700, Doug Anderson wrote: > >> Hi, > >> > >> On Wed, Jun 14, 2023 at 11:47 AM Krzysztof Kozlowski > >> <krzysztof.kozlowski@linaro.org> wrote: > >>> > >>> On 14/06/2023 20:18, Linux regression tracking (Thorsten Leemhuis) wrote: > >>>> On 02.06.23 18:12, Amit Pundir wrote: > >>>>> Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators > >>>>> list to workaround a boot regression uncovered by the upstream > >>>>> commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator: > >>>>> qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS""). > >>>>> > >>>>> Without this fix DB845c fail to boot at times because one of the > >>>>> lvs1 or lvs2 regulators fail to turn ON in time. > >>>> > >>>> /me waves friendly > >>>> > >>>> FWIW, as it's not obvious: this... > >>>> > >>>>> Link: https://lore.kernel.org/all/CAMi1Hd1avQDcDQf137m2auz2znov4XL8YGrLZsw5edb-NtRJRw@mail.gmail.com/ > >>>> > >>>> ...is a report about a regression. One that we could still solve before > >>>> 6.4 is out. One I'll likely will point Linus to, unless a fix comes into > >>>> sight. > >>>> > >>>> When I noticed the reluctant replies to this patch I earlier today asked > >>>> in the thread with the report what the plan forward was: > >>>> https://lore.kernel.org/all/CAD%3DFV%3DV-h4EUKHCM9UivsFHRsJPY5sAiwXV3a1hUX9DUMkkxdg@mail.gmail.com/ > >>>> > >>>> Dough there replied: > >>>> > >>>> ``` > >>>> Of the two proposals made (the revert vs. the reordering of the dts), > >>>> the reordering of the dts seems better. It only affects the one buggy > >>>> board (rather than preventing us to move to async probe for everyone) > >>>> and it also has a chance of actually fixing something (changing the > >>>> order that regulators probe in rpmh-regulator might legitimately work > >>>> around the problem). That being said, just like the revert the dts > >>>> reordering is still just papering over the problem and is fragile / > >>>> not guaranteed to work forever. > >>>> ``` > >>>> > >>>> Papering over obviously is not good, but has anyone a better idea to fix > >>>> this? Or is "not fixing" for some reason an viable option here? > >>>> > >>> > >>> I understand there is a regression, although kernel is not mainline > >>> (hash df7443a96851 is unknown) and the only solutions were papering the > >>> problem. Reverting commit is a temporary workaround. Moving nodes in DTS > >>> is not acceptable because it hides actual problem and only solves this > >>> one particular observed problem, while actual issue is still there. It > >>> would be nice to be able to reproduce it on real mainline with normal > >>> operating system (not AOSP) - with ramdiks/without/whatever. So far no > >>> one did it, right? > >> > >> The worry I have about the revert here is that it will never be able > >> to be undone and that doesn't seem great long term. I'm all for a > >> temporary revert to fix a problem while the root cause is understood, > >> but in this case I have a hard time believing that we'll make more > >> progress towards a root cause once the revert lands. All the > >> investigation we've done so far seems to indicate that the revert only > >> fixes the problem by luck... > >> > >> I completely agree that moving the nodes in the DTS is a hack and just > >> hides the problem. However, it also at least limits the workaround to > >> the one board showing the problem and doesn't mean we're stuck with > >> synchronous probe for rpmh-regulator for all eternity because nobody > >> can understand this timing issue on db845c. > >> > > > > I agree that we shouldn't hide this by reverting the regulator change. > > > > > > And as has been stated a few times already, the symptom indicates that > > we have a misconfigured system. > > > > Before accepting a patch just shuffling the bricks, I'd like to see some > > more analysis of what happens wrt the rpmh right before the timeout. > > Perhaps the landing team can assist here? > > > > Regards, > > Bjorn > > > >
On Thu, 22 Jun 2023 at 17:18, Amit Pundir <amit.pundir@linaro.org> wrote: > > On Thu, 22 Jun 2023 at 13:17, Linux regression tracking (Thorsten > Leemhuis) <regressions@leemhuis.info> wrote: > > > > Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting > > for once, to make this easily accessible to everyone. > > > > As Linus will likely release 6.4 on this or the following Sunday a quick > > status inquiry so I can brief him appropriately: is there any hope the > > regression this patch tried to fix will be resolved any time soon? > > We are most likely to miss v6.4. I'm trying to reproduce the crash > with tracing enabled, to share some more debug information. FWIW, I couldn't reproduce this bug on v6.5 merge window commit d528014517f2 (Revert ".gitignore: ignore *.cover and *.mbx") on 100+ boot tests last night. For the time being I'm keeping an eye on it and will trigger the boot tests occasionally in the v6.5 development cycle. > > Regards, > Amit Pundir >
Linux regression tracking (Thorsten Leemhuis)
July 14, 2023, 11:04 a.m. UTC |
#24
Addressed
Unaddressed
On 07.07.23 07:08, Amit Pundir wrote: > On Thu, 22 Jun 2023 at 17:18, Amit Pundir <amit.pundir@linaro.org> wrote: >> On Thu, 22 Jun 2023 at 13:17, Linux regression tracking (Thorsten >> Leemhuis) <regressions@leemhuis.info> wrote: >>> >>> As Linus will likely release 6.4 on this or the following Sunday a quick >>> status inquiry so I can brief him appropriately: is there any hope the >>> regression this patch tried to fix will be resolved any time soon? >> >> We are most likely to miss v6.4. I'm trying to reproduce the crash >> with tracing enabled, to share some more debug information. > > FWIW, I couldn't reproduce this bug on v6.5 merge window commit > d528014517f2 (Revert ".gitignore: ignore *.cover and *.mbx") > on 100+ boot tests last night. > For the time being I'm keeping an eye on it and will trigger the boot > tests occasionally in the v6.5 development cycle. No update since then, so I assume this remains fixed. I'll thus remove this from the tracking; please holler if you think it might make sense to continue tracking this. #regzbot resolved: apparently solved with during the 6.5 merge window #regzbot ignore-activity Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr That page also explains what to do if mails like this annoy you.
diff --git a/arch/arm64/boot/dts/qcom/sdm845-db845c.dts b/arch/arm64/boot/dts/qcom/sdm845-db845c.dts index e14fe9bbb386..df2fde9063dc 100644 --- a/arch/arm64/boot/dts/qcom/sdm845-db845c.dts +++ b/arch/arm64/boot/dts/qcom/sdm845-db845c.dts @@ -301,6 +301,18 @@ regulators-0 { vdd-l26-supply = <&vreg_s3a_1p35>; vin-lvs-1-2-supply = <&vreg_s4a_1p8>; + vreg_lvs1a_1p8: lvs1 { + regulator-min-microvolt = <1800000>; + regulator-max-microvolt = <1800000>; + regulator-always-on; + }; + + vreg_lvs2a_1p8: lvs2 { + regulator-min-microvolt = <1800000>; + regulator-max-microvolt = <1800000>; + regulator-always-on; + }; + vreg_s3a_1p35: smps3 { regulator-min-microvolt = <1352000>; regulator-max-microvolt = <1352000>; @@ -381,18 +393,6 @@ vreg_l26a_1p2: ldo26 { regulator-max-microvolt = <1200000>; regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>; }; - - vreg_lvs1a_1p8: lvs1 { - regulator-min-microvolt = <1800000>; - regulator-max-microvolt = <1800000>; - regulator-always-on; - }; - - vreg_lvs2a_1p8: lvs2 { - regulator-min-microvolt = <1800000>; - regulator-max-microvolt = <1800000>; - regulator-always-on; - }; }; regulators-1 {